[Docs] Add language identifiers to fenced code blocks (#28955)

Add language identifiers to code blocks
This commit is contained in:
Klaus Hipp
2024-02-12 19:48:31 +01:00
committed by GitHub
parent c617f988f8
commit fe3df9d5b3
66 changed files with 137 additions and 137 deletions

View File

@@ -228,7 +228,7 @@ Contributions that implement this command for other distributed hardware setups
When using `run_eval.py`, the following features can be useful:
* if you running the script multiple times and want to make it easier to track what arguments produced that output, use `--dump-args`. Along with the results it will also dump any custom params that were passed to the script. For example if you used: `--num_beams 8 --early_stopping true`, the output will be:
```
```json
{'bleu': 26.887, 'n_obs': 10, 'runtime': 1, 'seconds_per_sample': 0.1, 'num_beams': 8, 'early_stopping': True}
```
@@ -236,13 +236,13 @@ When using `run_eval.py`, the following features can be useful:
If using `--dump-args --info`, the output will be:
```
```json
{'bleu': 26.887, 'n_obs': 10, 'runtime': 1, 'seconds_per_sample': 0.1, 'num_beams': 8, 'early_stopping': True, 'info': '2020-09-13 18:44:43'}
```
If using `--dump-args --info "pair:en-ru chkpt=best`, the output will be:
```
```json
{'bleu': 26.887, 'n_obs': 10, 'runtime': 1, 'seconds_per_sample': 0.1, 'num_beams': 8, 'early_stopping': True, 'info': 'pair=en-ru chkpt=best'}
```

View File

@@ -53,7 +53,7 @@ Coming soon!
Most examples are equipped with a mechanism to truncate the number of dataset samples to the desired length. This is useful for debugging purposes, for example to quickly check that all stages of the programs can complete, before running the same setup on the full dataset which may take hours to complete.
For example here is how to truncate all three splits to just 50 samples each:
```
```bash
examples/pytorch/token-classification/run_ner.py \
--max_train_samples 50 \
--max_eval_samples 50 \
@@ -62,7 +62,7 @@ examples/pytorch/token-classification/run_ner.py \
```
Most example scripts should have the first two command line arguments and some have the third one. You can quickly check if a given example supports any of these by passing a `-h` option, e.g.:
```
```bash
examples/pytorch/token-classification/run_ner.py -h
```

View File

@@ -277,7 +277,7 @@ language or concept the adapter layers shall be trained. The adapter weights wil
accordingly be called `adapter.{<target_language}.safetensors`.
Let's run an example script. Make sure to be logged in so that your model can be directly uploaded to the Hub.
```
```bash
huggingface-cli login
```

View File

@@ -20,7 +20,7 @@ This folder contains various research projects using 🤗 Transformers. They are
version of 🤗 Transformers that is indicated in the requirements file of each folder. Updating them to the most recent version of the library will require some work.
To use any of them, just run the command
```
```bash
pip install -r requirements.txt
```
inside the folder of your choice.

View File

@@ -8,7 +8,7 @@ The model is loaded with the pre-trained weights for the abstractive summarizati
## Setup
```
```bash
git clone https://github.com/huggingface/transformers && cd transformers
pip install .
pip install nltk py-rouge

View File

@@ -34,7 +34,7 @@ This is for evaluating fine-tuned DeeBERT models, given a number of different ea
## Citation
Please cite our paper if you find the resource useful:
```
```bibtex
@inproceedings{xin-etal-2020-deebert,
title = "{D}ee{BERT}: Dynamic Early Exiting for Accelerating {BERT} Inference",
author = "Xin, Ji and

View File

@@ -183,7 +183,7 @@ Happy distillation!
If you find the resource useful, you should cite the following paper:
```
```bibtex
@inproceedings{sanh2019distilbert,
title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},

View File

@@ -84,7 +84,7 @@ python run_clm_igf.py\
If you find the resource useful, please cite the following paper
```
```bibtex
@inproceedings{antonello-etal-2021-selecting,
title = "Selecting Informative Contexts Improves Language Model Fine-tuning",
author = "Antonello, Richard and Beckage, Nicole and Turek, Javier and Huth, Alexander",

View File

@@ -311,7 +311,7 @@ library from source to profit from the most current additions during the communi
Simply run the following steps:
```
```bash
$ cd ~/
$ git clone https://github.com/huggingface/datasets.git
$ cd datasets
@@ -389,13 +389,13 @@ source ~/<your-venv-name>/bin/activate
Next you should install JAX's TPU version on TPU by running the following command:
```
```bash
$ pip install requests
```
and then:
```
```bash
$ pip install "jax[tpu]>=0.2.16" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
```
@@ -468,7 +468,7 @@ library from source to profit from the most current additions during the communi
Simply run the following steps:
```
```bash
$ cd ~/
$ git clone https://github.com/huggingface/datasets.git
$ cd datasets
@@ -568,7 +568,7 @@ class ModelPyTorch:
Instantiating an object `model_pytorch` of the class `ModelPyTorch` would actually allocate memory for the model weights and attach them to the attributes `self.key_proj`, `self.value_proj`, `self.query_proj`, and `self.logits.proj`. We could access the weights via:
```
```python
key_projection_matrix = model_pytorch.key_proj.weight.data
```
@@ -1224,25 +1224,25 @@ Sometimes you might be using different libraries or a very specific application
A common use case is how to load files you have in your model repository in the Hub from the Streamlit demo. The `huggingface_hub` library is here to help you!
```
```bash
pip install huggingface_hub
```
Here is an example downloading (and caching!) a specific file directly from the Hub
```
```python
from huggingface_hub import hf_hub_download
filepath = hf_hub_download("flax-community/roberta-base-als", "flax_model.msgpack");
```
In many cases you will want to download the full repository. Here is an example downloading all the files from a repo. You can even specify specific revisions!
```
```python
from huggingface_hub import snapshot_download
local_path = snapshot_download("flax-community/roberta-base-als");
```
Note that if you're using 🤗 Transformers library, you can quickly load the model and tokenizer as follows
```
```python
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("REPO_ID")

View File

@@ -42,20 +42,20 @@ Here we call the model `"english-roberta-base-dummy"`, but you can change the mo
You can do this either directly on [huggingface.co](https://huggingface.co/new) (assuming that
you are logged in) or via the command line:
```
```bash
huggingface-cli repo create english-roberta-base-dummy
```
Next we clone the model repository to add the tokenizer and model files.
```
```bash
git clone https://huggingface.co/<your-username>/english-roberta-base-dummy
```
To ensure that all tensorboard traces will be uploaded correctly, we need to
track them. You can run the following command inside your model repo to do so.
```
```bash
cd english-roberta-base-dummy
git lfs track "*tfevents*"
```

View File

@@ -43,17 +43,17 @@ Here we call the model `"clip-roberta-base"`, but you can change the model name
You can do this either directly on [huggingface.co](https://huggingface.co/new) (assuming that
you are logged in) or via the command line:
```
```bash
huggingface-cli repo create clip-roberta-base
```
Next we clone the model repository to add the tokenizer and model files.
```
```bash
git clone https://huggingface.co/<your-username>/clip-roberta-base
```
To ensure that all tensorboard traces will be uploaded correctly, we need to
track them. You can run the following command inside your model repo to do so.
```
```bash
cd clip-roberta-base
git lfs track "*tfevents*"
```

View File

@@ -18,20 +18,20 @@ Here we call the model `"wav2vec2-base-robust"`, but you can change the model na
You can do this either directly on [huggingface.co](https://huggingface.co/new) (assuming that
you are logged in) or via the command line:
```
```bash
huggingface-cli repo create wav2vec2-base-robust
```
Next we clone the model repository to add the tokenizer and model files.
```
```bash
git clone https://huggingface.co/<your-username>/wav2vec2-base-robust
```
To ensure that all tensorboard traces will be uploaded correctly, we need to
track them. You can run the following command inside your model repo to do so.
```
```bash
cd wav2vec2-base-robust
git lfs track "*tfevents*"
```

View File

@@ -6,7 +6,7 @@ Based on the script [`run_mmimdb.py`](https://github.com/huggingface/transformer
### Training on MM-IMDb
```
```bash
python run_mmimdb.py \
--data_dir /path/to/mmimdb/dataset/ \
--model_type bert \

View File

@@ -173,7 +173,7 @@ In particular, hardware manufacturers are announcing devices that will speedup i
If you find this resource useful, please consider citing the following paper:
```
```bibtex
@article{sanh2020movement,
title={Movement Pruning: Adaptive Sparsity by Fine-Tuning},
author={Victor Sanh and Thomas Wolf and Alexander M. Rush},

View File

@@ -30,17 +30,17 @@ Required:
## Setup the environment with Dockerfile
Under the directory of `transformers/`, build the docker image:
```
```bash
docker build . -f examples/research_projects/quantization-qdqbert/Dockerfile -t bert_quantization:latest
```
Run the docker:
```
```bash
docker run --gpus all --privileged --rm -it --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 bert_quantization:latest
```
In the container:
```
```bash
cd transformers/examples/research_projects/quantization-qdqbert/
```
@@ -48,7 +48,7 @@ cd transformers/examples/research_projects/quantization-qdqbert/
Calibrate the pretrained model and finetune with quantization awared:
```
```bash
python3 run_quant_qa.py \
--model_name_or_path bert-base-uncased \
--dataset_name squad \
@@ -60,7 +60,7 @@ python3 run_quant_qa.py \
--percentile 99.99
```
```
```bash
python3 run_quant_qa.py \
--model_name_or_path calib/bert-base-uncased \
--dataset_name squad \
@@ -80,7 +80,7 @@ python3 run_quant_qa.py \
To export the QAT model finetuned above:
```
```bash
python3 run_quant_qa.py \
--model_name_or_path finetuned_int8/bert-base-uncased \
--output_dir ./ \
@@ -97,19 +97,19 @@ Recalibrating will affect the accuracy of the model, but the change should be mi
### Benchmark the INT8 QAT ONNX model inference with TensorRT using dummy input
```
```bash
trtexec --onnx=model.onnx --explicitBatch --workspace=16384 --int8 --shapes=input_ids:64x128,attention_mask:64x128,token_type_ids:64x128 --verbose
```
### Benchmark the INT8 QAT ONNX model inference with [ONNX Runtime-TRT](https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html) using dummy input
```
```bash
python3 ort-infer-benchmark.py
```
### Evaluate the INT8 QAT ONNX model inference with TensorRT
```
```bash
python3 evaluate-hf-trt-qa.py \
--onnx_model_path=./model.onnx \
--output_dir ./ \
@@ -126,7 +126,7 @@ python3 evaluate-hf-trt-qa.py \
Finetune a fp32 precision model with [transformers/examples/pytorch/question-answering/](../../pytorch/question-answering/):
```
```bash
python3 ../../pytorch/question-answering/run_qa.py \
--model_name_or_path bert-base-uncased \
--dataset_name squad \
@@ -145,7 +145,7 @@ python3 ../../pytorch/question-answering/run_qa.py \
### PTQ by calibrating and evaluating the finetuned FP32 model above:
```
```bash
python3 run_quant_qa.py \
--model_name_or_path ./finetuned_fp32/bert-base-uncased \
--dataset_name squad \
@@ -161,7 +161,7 @@ python3 run_quant_qa.py \
### Export the INT8 PTQ model to ONNX
```
```bash
python3 run_quant_qa.py \
--model_name_or_path ./calib/bert-base-uncased \
--output_dir ./ \
@@ -175,7 +175,7 @@ python3 run_quant_qa.py \
### Evaluate the INT8 PTQ ONNX model inference with TensorRT
```
```bash
python3 evaluate-hf-trt-qa.py \
--onnx_model_path=./model.onnx \
--output_dir ./ \

View File

@@ -45,7 +45,7 @@ We publish two `base` models which can serve as a starting point for finetuning
The `base` models initialize the question encoder with [`facebook/dpr-question_encoder-single-nq-base`](https://huggingface.co/facebook/dpr-question_encoder-single-nq-base) and the generator with [`facebook/bart-large`](https://huggingface.co/facebook/bart-large).
If you would like to initialize finetuning with a base model using different question encoder and generator architectures, you can build it with a consolidation script, e.g.:
```
```bash
python examples/research_projects/rag/consolidate_rag_checkpoint.py \
--model_type rag_sequence \
--generator_name_or_path facebook/bart-large-cnn \

View File

@@ -216,7 +216,7 @@ library from source to profit from the most current additions during the communi
Simply run the following steps:
```
```bash
$ cd ~/
$ git clone https://github.com/huggingface/datasets.git
$ cd datasets

View File

@@ -21,7 +21,7 @@ To install locally:
In the root of the repo run:
```
```bash
conda create -n vqganclip python=3.8
conda activate vqganclip
git-lfs install
@@ -30,7 +30,7 @@ pip install -r requirements.txt
```
### Generate new images
```
```python
from VQGAN_CLIP import VQGAN_CLIP
vqgan_clip = VQGAN_CLIP()
vqgan_clip.generate("a picture of a smiling woman")
@@ -41,7 +41,7 @@ To get a test image, run
`git clone https://huggingface.co/datasets/erwann/vqgan-clip-pic test_images`
To edit:
```
```python
from VQGAN_CLIP import VQGAN_CLIP
vqgan_clip = VQGAN_CLIP()

View File

@@ -138,20 +138,20 @@ For bigger datasets, we recommend to train Wav2Vec2 locally instead of in a goog
First, you need to clone the `transformers` repo with:
```
```bash
$ git clone https://github.com/huggingface/transformers.git
```
Second, head over to the `examples/research_projects/wav2vec2` directory, where the `run_common_voice.py` script is located.
```
```bash
$ cd transformers/examples/research_projects/wav2vec2
```
Third, install the required packages. The
packages are listed in the `requirements.txt` file and can be installed with
```
```bash
$ pip install -r requirements.txt
```
@@ -259,7 +259,7 @@ Then and add the following files that fully define a XLSR-Wav2Vec2 checkpoint in
- `pytorch_model.bin`
Having added the above files, you should run the following to push files to your model repository.
```
```bash
git add . && git commit -m "Add model files" && git push
```

View File

@@ -134,7 +134,7 @@ which helps with capping GPU memory usage.
To learn how to deploy Deepspeed Integration please refer to [this guide](https://huggingface.co/transformers/main/main_classes/deepspeed.html#deepspeed-trainer-integration).
But to get started quickly all you need is to install:
```
```bash
pip install deepspeed
```
and then use the default configuration files in this directory:
@@ -148,7 +148,7 @@ Here are examples of how you can use DeepSpeed:
ZeRO-2:
```
```bash
PYTHONPATH=../../../src deepspeed --num_gpus 2 \
run_asr.py \
--output_dir=output_dir --num_train_epochs=2 --per_device_train_batch_size=2 \
@@ -162,7 +162,7 @@ run_asr.py \
```
For ZeRO-2 with more than 1 gpu you need to use (which is already in the example configuration file):
```
```json
"zero_optimization": {
...
"find_unused_parameters": true,
@@ -172,7 +172,7 @@ For ZeRO-2 with more than 1 gpu you need to use (which is already in the example
ZeRO-3:
```
```bash
PYTHONPATH=../../../src deepspeed --num_gpus 2 \
run_asr.py \
--output_dir=output_dir --num_train_epochs=2 --per_device_train_batch_size=2 \
@@ -192,7 +192,7 @@ It is recommended to pre-train Wav2Vec2 with Trainer + Deepspeed (please refer t
Here is an example of how you can use DeepSpeed ZeRO-2 to pretrain a small Wav2Vec2 model:
```
```bash
PYTHONPATH=../../../src deepspeed --num_gpus 4 run_pretrain.py \
--output_dir="./wav2vec2-base-libri-100h" \
--num_train_epochs="3" \
@@ -238,7 +238,7 @@ Output directory will contain 0000.txt and 0001.txt. Each file will have format
#### Run command
```
```bash
python alignment.py \
--model_name="arijitx/wav2vec2-xls-r-300m-bengali" \
--wav_dir="./wavs"

View File

@@ -21,7 +21,7 @@ classification performance to the original zero-shot model
A teacher NLI model can be distilled to a more efficient student model by running [`distill_classifier.py`](https://github.com/huggingface/transformers/blob/main/examples/research_projects/zero-shot-distillation/distill_classifier.py):
```
```bash
python distill_classifier.py \
--data_file <unlabeled_data.txt> \
--class_names_file <class_names.txt> \

View File

@@ -41,7 +41,7 @@ can also be used by passing the name of the TPU resource with the `--tpu` argume
This script trains a masked language model.
### Example command
```
```bash
python run_mlm.py \
--model_name_or_path distilbert-base-cased \
--output_dir output \
@@ -50,7 +50,7 @@ python run_mlm.py \
```
When using a custom dataset, the validation file can be separately passed as an input argument. Otherwise some split (customizable) of training data is used as validation.
```
```bash
python run_mlm.py \
--model_name_or_path distilbert-base-cased \
--output_dir output \
@@ -62,7 +62,7 @@ python run_mlm.py \
This script trains a causal language model.
### Example command
```
```bash
python run_clm.py \
--model_name_or_path distilgpt2 \
--output_dir output \
@@ -72,7 +72,7 @@ python run_clm.py \
When using a custom dataset, the validation file can be separately passed as an input argument. Otherwise some split (customizable) of training data is used as validation.
```
```bash
python run_clm.py \
--model_name_or_path distilgpt2 \
--output_dir output \

View File

@@ -45,7 +45,7 @@ README, but for more information you can see the 'Input Datasets' section of
[this document](https://www.tensorflow.org/guide/tpu).
### Example command
```
```bash
python run_qa.py \
--model_name_or_path distilbert-base-cased \
--output_dir output \

View File

@@ -36,7 +36,7 @@ may not always be what you want, especially if you have more than two fields!
Here is a snippet of a valid input JSON file, though note that your texts can be much longer than these, and are not constrained
(despite the field name) to being single grammatical sentences:
```
```json
{"sentence1": "COVID-19 vaccine updates: How is the rollout proceeding?", "label": "news"}
{"sentence1": "Manchester United celebrates Europa League success", "label": "sports"}
```
@@ -69,7 +69,7 @@ README, but for more information you can see the 'Input Datasets' section of
[this document](https://www.tensorflow.org/guide/tpu).
### Example command
```
```bash
python run_text_classification.py \
--model_name_or_path distilbert-base-cased \
--train_file training_data.json \
@@ -101,7 +101,7 @@ README, but for more information you can see the 'Input Datasets' section of
[this document](https://www.tensorflow.org/guide/tpu).
### Example command
```
```bash
python run_glue.py \
--model_name_or_path distilbert-base-cased \
--task_name mnli \