Fix doc errors and typos across the board (#8139)

* Fix doc errors and typos across the board * Fix a typo * Fix the CI * Fix more typos * Fix CI * More fixes * Fix CI * More fixes * More fixes
2020-10-29 10:33:33 -04:00
parent 4731a00c3e
commit 969859d5f6
160 changed files with 342 additions and 364 deletions
--- a/model_cards/aubmindlab/bert-base-arabert/README.md
+++ b/model_cards/aubmindlab/bert-base-arabert/README.md
@@ -7,7 +7,7 @@ language: ar

 **AraBERT** is an Arabic pretrained lanaguage model based on [Google's BERT architechture](https://github.com/google-research/bert). AraBERT uses the same BERT-Base config. More details are available in the [AraBERT PAPER](https://arxiv.org/abs/2003.00104v2) and in the [AraBERT Meetup](https://github.com/WissamAntoun/pydata_khobar_meetup)

-There are two version off the model AraBERTv0.1 and AraBERTv1, with the difference being that AraBERTv1 uses pre-segmented text where prefixes and suffixes were splitted using the [Farasa Segmenter](http://alt.qcri.org/farasa/segmenter.html).
+There are two version off the model AraBERTv0.1 and AraBERTv1, with the difference being that AraBERTv1 uses pre-segmented text where prefixes and suffixes were split using the [Farasa Segmenter](http://alt.qcri.org/farasa/segmenter.html).

 The model was trained on ~70M sentences or ~23GB of Arabic text with ~3B words. The training corpora are a collection of publically available large scale raw arabic text ([Arabic Wikidumps](https://archive.org/details/arwiki-20190201), [The 1.5B words Arabic Corpus](https://www.semanticscholar.org/paper/1.5-billion-words-Arabic-Corpus-El-Khair/f3eeef4afb81223df96575adadf808fe7fe440b4), [The OSIAN Corpus](https://www.aclweb.org/anthology/W19-4619), Assafir news articles, and 4 other manually crawled news websites (Al-Akhbar, Annahar, AL-Ahram, AL-Wafd) from [the Wayback Machine](http://web.archive.org/))

--- a/model_cards/aubmindlab/bert-base-arabertv01/README.md
+++ b/model_cards/aubmindlab/bert-base-arabertv01/README.md
@@ -7,7 +7,7 @@ language: ar

 **AraBERT** is an Arabic pretrained lanaguage model based on [Google's BERT architechture](https://github.com/google-research/bert). AraBERT uses the same BERT-Base config. More details are available in the [AraBERT PAPER](https://arxiv.org/abs/2003.00104v2) and in the [AraBERT Meetup](https://github.com/WissamAntoun/pydata_khobar_meetup)

-There are two version off the model AraBERTv0.1 and AraBERTv1, with the difference being that AraBERTv1 uses pre-segmented text where prefixes and suffixes were splitted using the [Farasa Segmenter](http://alt.qcri.org/farasa/segmenter.html).
+There are two version off the model AraBERTv0.1 and AraBERTv1, with the difference being that AraBERTv1 uses pre-segmented text where prefixes and suffixes were split using the [Farasa Segmenter](http://alt.qcri.org/farasa/segmenter.html).

 The model was trained on ~70M sentences or ~23GB of Arabic text with ~3B words. The training corpora are a collection of publically available large scale raw arabic text ([Arabic Wikidumps](https://archive.org/details/arwiki-20190201), [The 1.5B words Arabic Corpus](https://www.semanticscholar.org/paper/1.5-billion-words-Arabic-Corpus-El-Khair/f3eeef4afb81223df96575adadf808fe7fe440b4), [The OSIAN Corpus](https://www.aclweb.org/anthology/W19-4619), Assafir news articles, and 4 other manually crawled news websites (Al-Akhbar, Annahar, AL-Ahram, AL-Wafd) from [the Wayback Machine](http://web.archive.org/))

--- a/model_cards/elgeish/cs224n-squad2.0-albert-large-v2/README.md
+++ b/model_cards/elgeish/cs224n-squad2.0-albert-large-v2/README.md
@@ -4,7 +4,7 @@ tags:
 ---

 ## CS224n SQuAD2.0 Project Dataset
-The goal of this model is to save CS224n students GPU time when establising
+The goal of this model is to save CS224n students GPU time when establishing
 baselines to beat for the [Default Final Project](http://web.stanford.edu/class/cs224n/project/default-final-project-handout.pdf).
 The training set used to fine-tune this model is the same as
 the [official one](https://rajpurkar.github.io/SQuAD-explorer/); however,
--- a/model_cards/jannesg/takalane_afr_roberta/README.md
+++ b/model_cards/jannesg/takalane_afr_roberta/README.md
@@ -34,7 +34,7 @@ model = AutoModelWithLMHead.from_pretrained("jannesg/takalane_afr_roberta")

 #### Limitations and bias

-Updates will be added continously to improve performance. 
+Updates will be added continuously to improve performance. 

 ## Training data

--- a/model_cards/mrm8488/CodeBERTaPy/README.md
+++ b/model_cards/mrm8488/CodeBERTaPy/README.md
@@ -94,7 +94,7 @@ fill_mask(PYTHON_CODE3)

 > Great! 🎉

-## This work is heavely inspired on [CodeBERTa](https://github.com/huggingface/transformers/blob/master/model_cards/huggingface/CodeBERTa-small-v1/README.md) by huggingface team
+## This work is heavily inspired on [CodeBERTa](https://github.com/huggingface/transformers/blob/master/model_cards/huggingface/CodeBERTa-small-v1/README.md) by huggingface team

 <br>

--- a/model_cards/mrm8488/TinyBERT-spanish-uncased-finetuned-ner/README.md
+++ b/model_cards/mrm8488/TinyBERT-spanish-uncased-finetuned-ner/README.md
@@ -11,7 +11,7 @@ This model is a fine-tuned on [NER-C](https://www.kaggle.com/nltkdata/conll-corp

 - [Dataset:  CONLL Corpora ES](https://www.kaggle.com/nltkdata/conll-corpora) 

-I preprocessed the dataset and splitted it as train / dev (80/20)
+I preprocessed the dataset and split it as train / dev (80/20)

 | Dataset                | # Examples |
 | ---------------------- | ----- |
--- a/model_cards/mrm8488/bert-multi-cased-finetuned-xquadv1/README.md
+++ b/model_cards/mrm8488/bert-multi-cased-finetuned-xquadv1/README.md
@@ -65,7 +65,7 @@ Citation:

 </details>

-As **XQuAD** is just an evaluation dataset, I used `Data augmentation techniques` (scraping, neural machine translation, etc) to obtain more samples and splited the dataset in order to have a train and test set. The test set was created in a way that contains the same number of samples for each language. Finally, I got:
+As **XQuAD** is just an evaluation dataset, I used `Data augmentation techniques` (scraping, neural machine translation, etc) to obtain more samples and split the dataset in order to have a train and test set. The test set was created in a way that contains the same number of samples for each language. Finally, I got:

 | Dataset     | # samples |
 | ----------- | --------- |
--- a/model_cards/mrm8488/bert-multi-uncased-finetuned-xquadv1/README.md
+++ b/model_cards/mrm8488/bert-multi-uncased-finetuned-xquadv1/README.md
@@ -65,7 +65,7 @@ Citation:

 </details>

-As **XQuAD** is just an evaluation dataset, I used `Data augmentation techniques` (scraping, neural machine translation, etc) to obtain more samples and splited the dataset in order to have a train and test set. The test set was created in a way that contains the same number of samples for each language. Finally, I got:
+As **XQuAD** is just an evaluation dataset, I used `Data augmentation techniques` (scraping, neural machine translation, etc) to obtain more samples and split the dataset in order to have a train and test set. The test set was created in a way that contains the same number of samples for each language. Finally, I got:

 | Dataset     | # samples |
 | ----------- | --------- |
--- a/model_cards/mrm8488/bert-spanish-cased-finetuned-ner/README.md
+++ b/model_cards/mrm8488/bert-spanish-cased-finetuned-ner/README.md
@@ -11,7 +11,7 @@ This model is a fine-tuned on [NER-C](https://www.kaggle.com/nltkdata/conll-corp

 - [Dataset:  CONLL Corpora ES](https://www.kaggle.com/nltkdata/conll-corpora) 

-I preprocessed the dataset and splitted it as train / dev (80/20)
+I preprocessed the dataset and split it as train / dev (80/20)

 | Dataset                | # Examples |
 | ---------------------- | ----- |
--- a/model_cards/mrm8488/bert-spanish-cased-finetuned-pos/README.md
+++ b/model_cards/mrm8488/bert-spanish-cased-finetuned-pos/README.md
@@ -11,7 +11,7 @@ This model is a fine-tuned on Spanish [CONLL CORPORA](https://www.kaggle.com/nlt

 - [Dataset:  CONLL Corpora ES](https://www.kaggle.com/nltkdata/conll-corpora) with data augmentation techniques

-I preprocessed the dataset and splitted it as train / dev (80/20)
+I preprocessed the dataset and split it as train / dev (80/20)

 | Dataset                | # Examples |
 | ---------------------- | ----- |
--- a/model_cards/mrm8488/mobilebert-uncased-finetuned-squadv1/README.md
+++ b/model_cards/mrm8488/mobilebert-uncased-finetuned-squadv1/README.md
@@ -44,7 +44,7 @@ python transformers/examples/question-answering/run_squad.py \
  --save_steps 1000
 ```

-It is importatnt to say that this models converges much faster than other ones. So, it is also cheap to fine-tune.
+It is important to say that this models converges much faster than other ones. So, it is also cheap to fine-tune.

 ## Test set Results 🧾

--- a/model_cards/mrm8488/mobilebert-uncased-finetuned-squadv2/README.md
+++ b/model_cards/mrm8488/mobilebert-uncased-finetuned-squadv2/README.md
@@ -44,7 +44,7 @@ python transformers/examples/question-answering/run_squad.py \
  --version_2_with_negative
 ```

-It is importatnt to say that this models converges much faster than other ones. So, it is also cheap to fine-tune.
+It is important to say that this models converges much faster than other ones. So, it is also cheap to fine-tune.

 ## Test set Results 🧾

--- a/model_cards/mrm8488/spanbert-base-finetuned-squadv1/README.md
+++ b/model_cards/mrm8488/spanbert-base-finetuned-squadv1/README.md
@@ -48,7 +48,7 @@ python code/run_squad.py \
 | SpanBERT (large)        | [94.6](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv1)         | [88.7](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv2)     | 79.6    |  [70.8](https://huggingface.co/mrm8488/spanbert-large-finetuned-tacred)  |


-Note: The numbers marked as * are evaluated on the development sets becaus those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers.
+Note: The numbers marked as * are evaluated on the development sets because those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers.

 ## Model in action

--- a/model_cards/mrm8488/spanbert-base-finetuned-squadv2/README.md
+++ b/model_cards/mrm8488/spanbert-base-finetuned-squadv2/README.md
@@ -54,7 +54,7 @@ python code/run_squad.py \
 | SpanBERT (large)        | [94.6](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv1)          | [88.7](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv2)     | 79.6    |  [70.8](https://huggingface.co/mrm8488/spanbert-large-finetuned-tacred)  |


-Note: The numbers marked as * are evaluated on the development sets becaus those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers.
+Note: The numbers marked as * are evaluated on the development sets because those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers.

 ## Model in action

--- a/model_cards/mrm8488/spanbert-base-finetuned-tacred/README.md
+++ b/model_cards/mrm8488/spanbert-base-finetuned-tacred/README.md
@@ -45,7 +45,7 @@ python code/run_tacred.py \
 | SpanBERT (large)        | [94.6](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv1)        | [88.7](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv2)     | 79.6    |  [70.8](https://huggingface.co/mrm8488/spanbert-base-finetuned-tacred)   |


-Note: The numbers marked as * are evaluated on the development sets becaus those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers.
+Note: The numbers marked as * are evaluated on the development sets because those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers.


 > Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488)
--- a/model_cards/mrm8488/spanbert-large-finetuned-squadv1/README.md
+++ b/model_cards/mrm8488/spanbert-large-finetuned-squadv1/README.md
@@ -48,7 +48,7 @@ python code/run_squad.py \
 | SpanBERT (large)        | **94.6** (this)         | [88.7](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv2)     | 79.6    |  [70.8](https://huggingface.co/mrm8488/spanbert-large-finetuned-tacred)  |


-Note: The numbers marked as * are evaluated on the development sets becaus those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers.
+Note: The numbers marked as * are evaluated on the development sets because those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers.

 ## Model in action

--- a/model_cards/mrm8488/spanbert-large-finetuned-squadv2/README.md
+++ b/model_cards/mrm8488/spanbert-large-finetuned-squadv2/README.md
@@ -54,7 +54,7 @@ python code/run_squad.py \
 | SpanBERT (large)        | [94.6](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv1)          | **88.7** (this)     | 79.6    |  [70.8](https://huggingface.co/mrm8488/spanbert-large-finetuned-tacred)  |


-Note: The numbers marked as * are evaluated on the development sets becaus those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers.
+Note: The numbers marked as * are evaluated on the development sets because those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers.

 ## Model in action

--- a/model_cards/mrm8488/spanbert-large-finetuned-tacred/README.md
+++ b/model_cards/mrm8488/spanbert-large-finetuned-tacred/README.md
@@ -45,7 +45,7 @@ python code/run_tacred.py \
 | SpanBERT (large)        | [94.6](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv1)        | [88.7](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv2)     | 79.6    |  **70.8** (this one)  |


-Note: The numbers marked as * are evaluated on the development sets becaus those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers.
+Note: The numbers marked as * are evaluated on the development sets because those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers.


 > Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488)
--- a/model_cards/mrm8488/t5-base-finetuned-wikiSQL-sql-to-en/README.md
+++ b/model_cards/mrm8488/t5-base-finetuned-wikiSQL-sql-to-en/README.md
@@ -50,7 +50,7 @@ tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-wikiSQL-sql
 model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-wikiSQL-sql-to-en")

 def get_explanation(query):
-  input_text = "translante Sql to English: %s </s>" % query
+  input_text = "translate Sql to English: %s </s>" % query
  features = tokenizer([input_text], return_tensors='pt')

  output = model.generate(input_ids=features['input_ids'], 
--- a/model_cards/mrm8488/t5-base-finetuned-wikiSQL/README.md
+++ b/model_cards/mrm8488/t5-base-finetuned-wikiSQL/README.md
@@ -50,7 +50,7 @@ tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-wikiSQL")
 model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-wikiSQL")

 def get_sql(query):
-  input_text = "translante English to SQL: %s </s>" % query
+  input_text = "translate English to SQL: %s </s>" % query
  features = tokenizer([input_text], return_tensors='pt')

  output = model.generate(input_ids=features['input_ids'], 
--- a/model_cards/mrm8488/t5-small-finetuned-wikiSQL/README.md
+++ b/model_cards/mrm8488/t5-small-finetuned-wikiSQL/README.md
@@ -50,7 +50,7 @@ tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-small-finetuned-wikiSQL")
 model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-small-finetuned-wikiSQL")

 def get_sql(query):
-  input_text = "translante English to SQL: %s </s>" % query
+  input_text = "translate English to SQL: %s </s>" % query
  features = tokenizer([input_text], return_tensors='pt')

  output = model.generate(input_ids=features['input_ids'], 
--- a/model_cards/mrm8488/xlm-multi-finetuned-xquadv1/README.md
+++ b/model_cards/mrm8488/xlm-multi-finetuned-xquadv1/README.md
@@ -71,7 +71,7 @@ Citation:

 </details>

-As XQuAD is just an evaluation dataset, I used Data augmentation techniques (scraping, neural machine translation, etc) to obtain more samples and splited the dataset in order to have a train and test set. The test set was created in a way that contains the same number of samples for each language. Finally, I got:
+As XQuAD is just an evaluation dataset, I used Data augmentation techniques (scraping, neural machine translation, etc) to obtain more samples and split the dataset in order to have a train and test set. The test set was created in a way that contains the same number of samples for each language. Finally, I got:

 | Dataset     | # samples |
 | ----------- | --------- |