From 5d8eb93eeec0476b9f0fddc96f2960be0ce782b6 Mon Sep 17 00:00:00 2001 From: hugo-syn <61210734+hugo-syn@users.noreply.github.com> Date: Thu, 18 Jan 2024 14:35:09 +0100 Subject: [PATCH] chore: Fix multiple typos (#28574) --- examples/research_projects/codeparrot/README.md | 2 +- examples/research_projects/jax-projects/README.md | 2 +- examples/research_projects/jax-projects/big_bird/README.md | 2 +- .../research_projects/jax-projects/model_parallel/README.md | 2 +- examples/research_projects/mlm_wwm/README.md | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/examples/research_projects/codeparrot/README.md b/examples/research_projects/codeparrot/README.md index 6c57c4350f..3259041ba5 100644 --- a/examples/research_projects/codeparrot/README.md +++ b/examples/research_projects/codeparrot/README.md @@ -50,7 +50,7 @@ The raw dataset contains many duplicates. We deduplicated and filtered the datas - fraction of alphanumeric characters < 0.25 - containing the word "auto-generated" or similar in the first 5 lines - filtering with a probability of 0.7 of files with a mention of "test file" or "configuration file" or similar in the first 5 lines -- filtering with a probability of 0.7 of files with high occurence of the keywords "test " or "config" +- filtering with a probability of 0.7 of files with high occurrence of the keywords "test " or "config" - filtering with a probability of 0.7 of files without a mention of the keywords `def` , `for`, `while` and `class` - filtering files that use the assignment operator `=` less than 5 times - filtering files with ratio between number of characters and number of tokens after tokenization < 1.5 (the average ratio is 3.6) diff --git a/examples/research_projects/jax-projects/README.md b/examples/research_projects/jax-projects/README.md index 420a97f768..71f9a7a4e0 100644 --- a/examples/research_projects/jax-projects/README.md +++ b/examples/research_projects/jax-projects/README.md @@ -1153,7 +1153,7 @@ In the following, we will describe how to do so using a standard console, but yo 2. Once you've installed the google cloud sdk, you should set your account by running the following command. Make sure that `` corresponds to the gmail address you used to sign up for this event. ```bash -$ gcloud config set account +$ gcloud config set account ``` 3. Let's also make sure the correct project is set in case your email is used for multiple gcloud projects: diff --git a/examples/research_projects/jax-projects/big_bird/README.md b/examples/research_projects/jax-projects/big_bird/README.md index e8ef274bbe..42586e4958 100644 --- a/examples/research_projects/jax-projects/big_bird/README.md +++ b/examples/research_projects/jax-projects/big_bird/README.md @@ -57,4 +57,4 @@ wget https://huggingface.co/datasets/vasudevgupta/natural-questions-validation/r python3 evaluate.py ``` -You can find our checkpoint on HuggingFace Hub ([see this](https://huggingface.co/vasudevgupta/flax-bigbird-natural-questions)). In case you are interested in PyTorch BigBird fine-tuning, you can refer to [this repositary](https://github.com/thevasudevgupta/bigbird). +You can find our checkpoint on HuggingFace Hub ([see this](https://huggingface.co/vasudevgupta/flax-bigbird-natural-questions)). In case you are interested in PyTorch BigBird fine-tuning, you can refer to [this repository](https://github.com/thevasudevgupta/bigbird). diff --git a/examples/research_projects/jax-projects/model_parallel/README.md b/examples/research_projects/jax-projects/model_parallel/README.md index b63b93862d..97f3cdb047 100644 --- a/examples/research_projects/jax-projects/model_parallel/README.md +++ b/examples/research_projects/jax-projects/model_parallel/README.md @@ -27,7 +27,7 @@ To adapt the script for other models, we need to also change the `ParitionSpec` TODO: Add more explantion. -Before training, let's prepare our model first. To be able to shard the model, the sharded dimention needs to be a multiple of devices it'll be sharded on. But GPTNeo's vocab size is 50257, so we need to resize the embeddings accordingly. +Before training, let's prepare our model first. To be able to shard the model, the sharded dimension needs to be a multiple of devices it'll be sharded on. But GPTNeo's vocab size is 50257, so we need to resize the embeddings accordingly. ```python from transformers import FlaxGPTNeoForCausalLM, GPTNeoConfig diff --git a/examples/research_projects/mlm_wwm/README.md b/examples/research_projects/mlm_wwm/README.md index 9426be7c27..0144b1ad30 100644 --- a/examples/research_projects/mlm_wwm/README.md +++ b/examples/research_projects/mlm_wwm/README.md @@ -95,4 +95,4 @@ python run_mlm_wwm.py \ **Note1:** On TPU, you should the flag `--pad_to_max_length` to make sure all your batches have the same length. -**Note2:** And if you have any questions or something goes wrong when runing this code, don't hesitate to pin @wlhgtc. +**Note2:** And if you have any questions or something goes wrong when running this code, don't hesitate to pin @wlhgtc.