[Docs] Add language identifiers to fenced code blocks (#28955)

Add language identifiers to code blocks
2024-02-12 19:48:31 +01:00
parent c617f988f8
commit fe3df9d5b3
66 changed files with 137 additions and 137 deletions
--- a/examples/legacy/seq2seq/README.md
+++ b/examples/legacy/seq2seq/README.md
@@ -228,7 +228,7 @@ Contributions that implement this command for other distributed hardware setups
 When using `run_eval.py`, the following features can be useful:

 * if you running the script multiple times and want to make it easier to track what arguments produced that output, use `--dump-args`. Along with the results it will also dump any custom params that were passed to the script. For example if you used: `--num_beams 8 --early_stopping true`, the output will be:
-   ```
+   ```json
   {'bleu': 26.887, 'n_obs': 10, 'runtime': 1, 'seconds_per_sample': 0.1, 'num_beams': 8, 'early_stopping': True}
   ```

@@ -236,13 +236,13 @@ When using `run_eval.py`, the following features can be useful:

   If using `--dump-args --info`, the output will be:

-   ```
+   ```json
   {'bleu': 26.887, 'n_obs': 10, 'runtime': 1, 'seconds_per_sample': 0.1, 'num_beams': 8, 'early_stopping': True, 'info': '2020-09-13 18:44:43'}
   ```

   If using `--dump-args --info "pair:en-ru chkpt=best`, the output will be:

-   ```
+   ```json
   {'bleu': 26.887, 'n_obs': 10, 'runtime': 1, 'seconds_per_sample': 0.1, 'num_beams': 8, 'early_stopping': True, 'info': 'pair=en-ru chkpt=best'}
   ```

--- a/examples/pytorch/README.md
+++ b/examples/pytorch/README.md
@@ -53,7 +53,7 @@ Coming soon!
 Most examples are equipped with a mechanism to truncate the number of dataset samples to the desired length. This is useful for debugging purposes, for example to quickly check that all stages of the programs can complete, before running the same setup on the full dataset which may take hours to complete.

 For example here is how to truncate all three splits to just 50 samples each:
-```
+```bash
 examples/pytorch/token-classification/run_ner.py \
 --max_train_samples 50 \
 --max_eval_samples 50 \
@@ -62,7 +62,7 @@ examples/pytorch/token-classification/run_ner.py \
 ```

 Most example scripts should have the first two command line arguments and some have the third one. You can quickly check if a given example supports any of these by passing a `-h` option, e.g.:
-```
+```bash
 examples/pytorch/token-classification/run_ner.py -h
 ```

--- a/examples/pytorch/speech-recognition/README.md
+++ b/examples/pytorch/speech-recognition/README.md
@@ -277,7 +277,7 @@ language or concept the adapter layers shall be trained. The adapter weights wil
 accordingly be called `adapter.{<target_language}.safetensors`.

 Let's run an example script. Make sure to be logged in so that your model can be directly uploaded to the Hub.
-```
+```bash
 huggingface-cli login
 ```

--- a/examples/research_projects/README.md
+++ b/examples/research_projects/README.md
@@ -20,7 +20,7 @@ This folder contains various research projects using 🤗 Transformers. They are
 version of 🤗 Transformers that is indicated in the requirements file of each folder. Updating them to the most recent version of the library will require some work.

 To use any of them, just run the command
-```
+```bash
 pip install -r requirements.txt
 ```
 inside the folder of your choice.
--- a/examples/research_projects/bertabs/README.md
+++ b/examples/research_projects/bertabs/README.md
@@ -8,7 +8,7 @@ The model is loaded with the pre-trained weights for the abstractive summarizati

 ## Setup

-```
+```bash
 git clone https://github.com/huggingface/transformers && cd transformers
 pip install .
 pip install nltk py-rouge
--- a/examples/research_projects/deebert/README.md
+++ b/examples/research_projects/deebert/README.md
@@ -34,7 +34,7 @@ This is for evaluating fine-tuned DeeBERT models, given a number of different ea
 ## Citation

 Please cite our paper if you find the resource useful:
-```
+```bibtex
@inproceedings{xin-etal-2020-deebert,
    title = "{D}ee{BERT}: Dynamic Early Exiting for Accelerating {BERT} Inference",
    author = "Xin, Ji  and
--- a/examples/research_projects/distillation/README.md
+++ b/examples/research_projects/distillation/README.md
@@ -183,7 +183,7 @@ Happy distillation!

 If you find the resource useful, you should cite the following paper:

-```
+```bibtex
@inproceedings{sanh2019distilbert,
  title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
  author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},
--- a/examples/research_projects/information-gain-filtration/README.md
+++ b/examples/research_projects/information-gain-filtration/README.md
@@ -84,7 +84,7 @@ python run_clm_igf.py\

 If you find the resource useful, please cite the following paper

-```
+```bibtex
@inproceedings{antonello-etal-2021-selecting,
    title = "Selecting Informative Contexts Improves Language Model Fine-tuning",
    author = "Antonello, Richard and Beckage, Nicole and Turek, Javier and Huth, Alexander",
--- a/examples/research_projects/jax-projects/README.md
+++ b/examples/research_projects/jax-projects/README.md
@@ -311,7 +311,7 @@ library from source to profit from the most current additions during the communi

 Simply run the following steps:

-```
+```bash
 $ cd ~/
 $ git clone https://github.com/huggingface/datasets.git
 $ cd datasets
@@ -389,13 +389,13 @@ source ~/<your-venv-name>/bin/activate

 Next you should install JAX's TPU version on TPU by running the following command: 

-```
+```bash
 $ pip install requests
 ```

 and then:

-```
+```bash
 $ pip install "jax[tpu]>=0.2.16" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
 ```

@@ -468,7 +468,7 @@ library from source to profit from the most current additions during the communi

 Simply run the following steps:

-```
+```bash
 $ cd ~/
 $ git clone https://github.com/huggingface/datasets.git
 $ cd datasets
@@ -568,7 +568,7 @@ class ModelPyTorch:

 Instantiating an object `model_pytorch` of the class `ModelPyTorch` would actually allocate memory for the model weights and attach them to the attributes `self.key_proj`, `self.value_proj`, `self.query_proj`, and `self.logits.proj`. We could access the weights via:

-```
+```python
 key_projection_matrix = model_pytorch.key_proj.weight.data
 ```

@@ -1224,25 +1224,25 @@ Sometimes you might be using different libraries or a very specific application

 A common use case is how to load files you have in your model repository in the Hub from the Streamlit demo. The `huggingface_hub` library is here to help you!

-```
+```bash
 pip install huggingface_hub
 ```

 Here is an example downloading (and caching!) a specific file directly from the Hub
-```
+```python
 from huggingface_hub import hf_hub_download
 filepath = hf_hub_download("flax-community/roberta-base-als", "flax_model.msgpack");
 ```

 In many cases you will want to download the full repository. Here is an example downloading all the files from a repo. You can even specify specific revisions!

-```
+```python
 from huggingface_hub import snapshot_download
 local_path = snapshot_download("flax-community/roberta-base-als");
 ```

 Note that if you're using 🤗 Transformers library, you can quickly load the model and tokenizer as follows
-```
+```python
 from transformers import AutoTokenizer, AutoModelForMaskedLM
  
 tokenizer = AutoTokenizer.from_pretrained("REPO_ID")
--- a/examples/research_projects/jax-projects/dataset-streaming/README.md
+++ b/examples/research_projects/jax-projects/dataset-streaming/README.md
@@ -42,20 +42,20 @@ Here we call the model `"english-roberta-base-dummy"`, but you can change the mo
 You can do this either directly on [huggingface.co](https://huggingface.co/new) (assuming that
 you are logged in) or via the command line:

-```
+```bash
 huggingface-cli repo create english-roberta-base-dummy
 ```

 Next we clone the model repository to add the tokenizer and model files.

-```
+```bash
 git clone https://huggingface.co/<your-username>/english-roberta-base-dummy
 ```

 To ensure that all tensorboard traces will be uploaded correctly, we need to 
 track them. You can run the following command inside your model repo to do so.

-```
+```bash
 cd english-roberta-base-dummy
 git lfs track "*tfevents*"
 ```
--- a/examples/research_projects/jax-projects/hybrid_clip/README.md
+++ b/examples/research_projects/jax-projects/hybrid_clip/README.md
@@ -43,17 +43,17 @@ Here we call the model `"clip-roberta-base"`, but you can change the model name
 You can do this either directly on [huggingface.co](https://huggingface.co/new) (assuming that
 you are logged in) or via the command line:

-```
+```bash
 huggingface-cli repo create clip-roberta-base
 ```
 Next we clone the model repository to add the tokenizer and model files.
-```
+```bash
 git clone https://huggingface.co/<your-username>/clip-roberta-base
 ```
 To ensure that all tensorboard traces will be uploaded correctly, we need to 
 track them. You can run the following command inside your model repo to do so.

-```
+```bash
 cd clip-roberta-base
 git lfs track "*tfevents*"
 ```
--- a/examples/research_projects/jax-projects/wav2vec2/README.md
+++ b/examples/research_projects/jax-projects/wav2vec2/README.md
@@ -18,20 +18,20 @@ Here we call the model `"wav2vec2-base-robust"`, but you can change the model na
 You can do this either directly on [huggingface.co](https://huggingface.co/new) (assuming that
 you are logged in) or via the command line:

-```
+```bash
 huggingface-cli repo create wav2vec2-base-robust
 ```

 Next we clone the model repository to add the tokenizer and model files.

-```
+```bash
 git clone https://huggingface.co/<your-username>/wav2vec2-base-robust
 ```

 To ensure that all tensorboard traces will be uploaded correctly, we need to 
 track them. You can run the following command inside your model repo to do so.

-```
+```bash
 cd wav2vec2-base-robust
 git lfs track "*tfevents*"
 ```
--- a/examples/research_projects/mm-imdb/README.md
+++ b/examples/research_projects/mm-imdb/README.md
@@ -6,7 +6,7 @@ Based on the script [`run_mmimdb.py`](https://github.com/huggingface/transformer

 ### Training on MM-IMDb

-```
+```bash
 python run_mmimdb.py \
    --data_dir /path/to/mmimdb/dataset/ \
    --model_type bert \
--- a/examples/research_projects/movement-pruning/README.md
+++ b/examples/research_projects/movement-pruning/README.md
@@ -173,7 +173,7 @@ In particular, hardware manufacturers are announcing devices that will speedup i

 If you find this resource useful, please consider citing the following paper:

-```
+```bibtex
@article{sanh2020movement,
    title={Movement Pruning: Adaptive Sparsity by Fine-Tuning},
    author={Victor Sanh and Thomas Wolf and Alexander M. Rush},
--- a/examples/research_projects/quantization-qdqbert/README.md
+++ b/examples/research_projects/quantization-qdqbert/README.md
@@ -30,17 +30,17 @@ Required:
 ## Setup the environment with Dockerfile

 Under the directory of `transformers/`, build the docker image:
-```
+```bash
 docker build . -f examples/research_projects/quantization-qdqbert/Dockerfile -t bert_quantization:latest
 ```

 Run the docker:
-```
+```bash
 docker run --gpus all --privileged --rm -it --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 bert_quantization:latest
 ```

 In the container:
-```
+```bash
 cd transformers/examples/research_projects/quantization-qdqbert/
 ```

@@ -48,7 +48,7 @@ cd transformers/examples/research_projects/quantization-qdqbert/

 Calibrate the pretrained model and finetune with quantization awared:

-```
+```bash
 python3 run_quant_qa.py \
  --model_name_or_path bert-base-uncased \
  --dataset_name squad \
@@ -60,7 +60,7 @@ python3 run_quant_qa.py \
  --percentile 99.99
 ```

-```
+```bash
 python3 run_quant_qa.py \
  --model_name_or_path calib/bert-base-uncased \
  --dataset_name squad \
@@ -80,7 +80,7 @@ python3 run_quant_qa.py \

 To export the QAT model finetuned above:

-```
+```bash
 python3 run_quant_qa.py \
  --model_name_or_path finetuned_int8/bert-base-uncased \
  --output_dir ./ \
@@ -97,19 +97,19 @@ Recalibrating will affect the accuracy of the model, but the change should be mi

 ### Benchmark the INT8 QAT ONNX model inference with TensorRT using dummy input

-```
+```bash
 trtexec --onnx=model.onnx --explicitBatch --workspace=16384 --int8 --shapes=input_ids:64x128,attention_mask:64x128,token_type_ids:64x128 --verbose
 ```

 ### Benchmark the INT8 QAT ONNX model inference with [ONNX Runtime-TRT](https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html) using dummy input

-```
+```bash
 python3 ort-infer-benchmark.py
 ```

 ### Evaluate the INT8 QAT ONNX model inference with TensorRT

-```
+```bash
 python3 evaluate-hf-trt-qa.py \
  --onnx_model_path=./model.onnx \
  --output_dir ./ \
@@ -126,7 +126,7 @@ python3 evaluate-hf-trt-qa.py \

 Finetune a fp32 precision model with [transformers/examples/pytorch/question-answering/](../../pytorch/question-answering/):

-```
+```bash
 python3 ../../pytorch/question-answering/run_qa.py \
  --model_name_or_path bert-base-uncased \
  --dataset_name squad \
@@ -145,7 +145,7 @@ python3 ../../pytorch/question-answering/run_qa.py \

 ### PTQ by calibrating and evaluating the finetuned FP32 model above:

-```
+```bash
 python3 run_quant_qa.py \
  --model_name_or_path ./finetuned_fp32/bert-base-uncased \
  --dataset_name squad \
@@ -161,7 +161,7 @@ python3 run_quant_qa.py \

 ### Export the INT8 PTQ model to ONNX

-```
+```bash
 python3 run_quant_qa.py \
  --model_name_or_path ./calib/bert-base-uncased \
  --output_dir ./ \
@@ -175,7 +175,7 @@ python3 run_quant_qa.py \

 ### Evaluate the INT8 PTQ ONNX model inference with TensorRT

-```
+```bash
 python3 evaluate-hf-trt-qa.py \
  --onnx_model_path=./model.onnx \
  --output_dir ./ \
--- a/examples/research_projects/rag/README.md
+++ b/examples/research_projects/rag/README.md
@@ -45,7 +45,7 @@ We publish two `base` models which can serve as a starting point for finetuning
 The `base` models initialize the question encoder with [`facebook/dpr-question_encoder-single-nq-base`](https://huggingface.co/facebook/dpr-question_encoder-single-nq-base) and the generator with [`facebook/bart-large`](https://huggingface.co/facebook/bart-large).

 If you would like to initialize finetuning with a base model using different question encoder and generator architectures, you can build it with a consolidation script, e.g.:
-```
+```bash
 python examples/research_projects/rag/consolidate_rag_checkpoint.py \
    --model_type rag_sequence \
    --generator_name_or_path facebook/bart-large-cnn \
--- a/examples/research_projects/robust-speech-event/README.md
+++ b/examples/research_projects/robust-speech-event/README.md
@@ -216,7 +216,7 @@ library from source to profit from the most current additions during the communi

 Simply run the following steps:

-```
+```bash
 $ cd ~/
 $ git clone https://github.com/huggingface/datasets.git
 $ cd datasets
--- a/examples/research_projects/vqgan-clip/README.md
+++ b/examples/research_projects/vqgan-clip/README.md
@@ -21,7 +21,7 @@ To install locally:

 In the root of the repo run:

-```
+```bash
 conda create -n vqganclip python=3.8
 conda activate vqganclip
 git-lfs install
@@ -30,7 +30,7 @@ pip install -r requirements.txt
 ```

 ### Generate new images
-```
+```python
 from VQGAN_CLIP import VQGAN_CLIP
 vqgan_clip = VQGAN_CLIP()
 vqgan_clip.generate("a picture of a smiling woman")
@@ -41,7 +41,7 @@ To get a test image, run
 `git clone https://huggingface.co/datasets/erwann/vqgan-clip-pic test_images`

 To edit:
-```
+```python
 from VQGAN_CLIP import VQGAN_CLIP
 vqgan_clip = VQGAN_CLIP()

--- a/examples/research_projects/wav2vec2/FINE_TUNE_XLSR_WAV2VEC2.md
+++ b/examples/research_projects/wav2vec2/FINE_TUNE_XLSR_WAV2VEC2.md
@@ -138,20 +138,20 @@ For bigger datasets, we recommend to train Wav2Vec2 locally instead of in a goog

 First, you need to clone the `transformers` repo with:

-```
+```bash
 $ git clone https://github.com/huggingface/transformers.git
 ```

 Second, head over to the `examples/research_projects/wav2vec2` directory, where the `run_common_voice.py` script is located.

-```
+```bash
 $ cd transformers/examples/research_projects/wav2vec2
 ```

 Third, install the required packages. The
 packages are listed in the `requirements.txt` file and can be installed with

-```
+```bash
 $ pip install -r requirements.txt
 ```

@@ -259,7 +259,7 @@ Then and add the following files that fully define a XLSR-Wav2Vec2 checkpoint in
 - `pytorch_model.bin`

 Having added the above files, you should run the following to push files to your model repository.  
-```
+```bash
 git add . && git commit -m "Add model files" && git push
 ```

--- a/examples/research_projects/wav2vec2/README.md
+++ b/examples/research_projects/wav2vec2/README.md
@@ -134,7 +134,7 @@ which helps with capping GPU memory usage.
 To learn how to deploy Deepspeed Integration please refer to [this guide](https://huggingface.co/transformers/main/main_classes/deepspeed.html#deepspeed-trainer-integration).

 But to get started quickly all you need is to install:
-```
+```bash
 pip install deepspeed
 ```
 and then use the default configuration files in this directory:
@@ -148,7 +148,7 @@ Here are examples of how you can use DeepSpeed:

 ZeRO-2:

-```
+```bash
 PYTHONPATH=../../../src deepspeed --num_gpus 2 \
 run_asr.py \
 --output_dir=output_dir --num_train_epochs=2 --per_device_train_batch_size=2 \
@@ -162,7 +162,7 @@ run_asr.py \
 ```

 For ZeRO-2 with more than 1 gpu you need to use (which is already in the example configuration file):
-```
+```json
    "zero_optimization": {
        ...
        "find_unused_parameters": true,
@@ -172,7 +172,7 @@ For ZeRO-2 with more than 1 gpu you need to use (which is already in the example

 ZeRO-3:

-```
+```bash
 PYTHONPATH=../../../src deepspeed --num_gpus 2 \
 run_asr.py \
 --output_dir=output_dir --num_train_epochs=2 --per_device_train_batch_size=2 \
@@ -192,7 +192,7 @@ It is recommended to pre-train Wav2Vec2 with Trainer + Deepspeed (please refer t

 Here is an example of how you can use DeepSpeed ZeRO-2 to pretrain a small Wav2Vec2 model:

-```
+```bash
 PYTHONPATH=../../../src deepspeed --num_gpus 4 run_pretrain.py \
 --output_dir="./wav2vec2-base-libri-100h" \
 --num_train_epochs="3" \
@@ -238,7 +238,7 @@ Output directory will contain 0000.txt and 0001.txt. Each file will have format
    
 #### Run command

-```
+```bash
 python alignment.py  \
 --model_name="arijitx/wav2vec2-xls-r-300m-bengali" \
 --wav_dir="./wavs"
--- a/examples/research_projects/zero-shot-distillation/README.md
+++ b/examples/research_projects/zero-shot-distillation/README.md
@@ -21,7 +21,7 @@ classification performance to the original zero-shot model

 A teacher NLI model can be distilled to a more efficient student model by running [`distill_classifier.py`](https://github.com/huggingface/transformers/blob/main/examples/research_projects/zero-shot-distillation/distill_classifier.py):

-```
+```bash
 python distill_classifier.py \
 --data_file <unlabeled_data.txt> \
 --class_names_file <class_names.txt> \
--- a/examples/tensorflow/language-modeling/README.md
+++ b/examples/tensorflow/language-modeling/README.md
@@ -41,7 +41,7 @@ can also be used by passing the name of the TPU resource with the `--tpu` argume
 This script trains a masked language model.

 ### Example command
-```
+```bash
 python run_mlm.py \
 --model_name_or_path distilbert-base-cased \
 --output_dir output \
@@ -50,7 +50,7 @@ python run_mlm.py \
 ```

 When using a custom dataset, the validation file can be separately passed as an input argument. Otherwise some split (customizable) of training data is used as validation.
-```
+```bash
 python run_mlm.py \
 --model_name_or_path distilbert-base-cased \
 --output_dir output \
@@ -62,7 +62,7 @@ python run_mlm.py \
 This script trains a causal language model.

 ### Example command
-```
+```bash
 python run_clm.py \
 --model_name_or_path distilgpt2 \
 --output_dir output \
@@ -72,7 +72,7 @@ python run_clm.py \

 When using a custom dataset, the validation file can be separately passed as an input argument. Otherwise some split (customizable) of training data is used as validation.

-```
+```bash
 python run_clm.py \
 --model_name_or_path distilgpt2 \
 --output_dir output \
--- a/examples/tensorflow/question-answering/README.md
+++ b/examples/tensorflow/question-answering/README.md
@@ -45,7 +45,7 @@ README, but for more information you can see the 'Input Datasets' section of
 [this document](https://www.tensorflow.org/guide/tpu).

 ### Example command
-```
+```bash
 python run_qa.py \
 --model_name_or_path distilbert-base-cased \
 --output_dir output \
--- a/examples/tensorflow/text-classification/README.md
+++ b/examples/tensorflow/text-classification/README.md
@@ -36,7 +36,7 @@ may not always be what you want, especially if you have more than two fields!

 Here is a snippet of a valid input JSON file, though note that your texts can be much longer than these, and are not constrained
 (despite the field name) to being single grammatical sentences:
-```
+```json
 {"sentence1": "COVID-19 vaccine updates: How is the rollout proceeding?", "label": "news"}
 {"sentence1": "Manchester United celebrates Europa League success", "label": "sports"}
 ```
@@ -69,7 +69,7 @@ README, but for more information you can see the 'Input Datasets' section of
 [this document](https://www.tensorflow.org/guide/tpu).

 ### Example command
-```
+```bash
 python run_text_classification.py \
 --model_name_or_path distilbert-base-cased \
 --train_file training_data.json \
@@ -101,7 +101,7 @@ README, but for more information you can see the 'Input Datasets' section of
 [this document](https://www.tensorflow.org/guide/tpu).

 ### Example command
-```
+```bash
 python run_glue.py \
 --model_name_or_path distilbert-base-cased \
 --task_name mnli \