chore: Fix multiple typos (#28574)
This commit is contained in:
@@ -50,7 +50,7 @@ The raw dataset contains many duplicates. We deduplicated and filtered the datas
|
||||
- fraction of alphanumeric characters < 0.25
|
||||
- containing the word "auto-generated" or similar in the first 5 lines
|
||||
- filtering with a probability of 0.7 of files with a mention of "test file" or "configuration file" or similar in the first 5 lines
|
||||
- filtering with a probability of 0.7 of files with high occurence of the keywords "test " or "config"
|
||||
- filtering with a probability of 0.7 of files with high occurrence of the keywords "test " or "config"
|
||||
- filtering with a probability of 0.7 of files without a mention of the keywords `def` , `for`, `while` and `class`
|
||||
- filtering files that use the assignment operator `=` less than 5 times
|
||||
- filtering files with ratio between number of characters and number of tokens after tokenization < 1.5 (the average ratio is 3.6)
|
||||
|
||||
@@ -1153,7 +1153,7 @@ In the following, we will describe how to do so using a standard console, but yo
|
||||
2. Once you've installed the google cloud sdk, you should set your account by running the following command. Make sure that `<your-email-address>` corresponds to the gmail address you used to sign up for this event.
|
||||
|
||||
```bash
|
||||
$ gcloud config set account <your-email-adress>
|
||||
$ gcloud config set account <your-email-address>
|
||||
```
|
||||
|
||||
3. Let's also make sure the correct project is set in case your email is used for multiple gcloud projects:
|
||||
|
||||
@@ -57,4 +57,4 @@ wget https://huggingface.co/datasets/vasudevgupta/natural-questions-validation/r
|
||||
python3 evaluate.py
|
||||
```
|
||||
|
||||
You can find our checkpoint on HuggingFace Hub ([see this](https://huggingface.co/vasudevgupta/flax-bigbird-natural-questions)). In case you are interested in PyTorch BigBird fine-tuning, you can refer to [this repositary](https://github.com/thevasudevgupta/bigbird).
|
||||
You can find our checkpoint on HuggingFace Hub ([see this](https://huggingface.co/vasudevgupta/flax-bigbird-natural-questions)). In case you are interested in PyTorch BigBird fine-tuning, you can refer to [this repository](https://github.com/thevasudevgupta/bigbird).
|
||||
|
||||
@@ -27,7 +27,7 @@ To adapt the script for other models, we need to also change the `ParitionSpec`
|
||||
|
||||
TODO: Add more explantion.
|
||||
|
||||
Before training, let's prepare our model first. To be able to shard the model, the sharded dimention needs to be a multiple of devices it'll be sharded on. But GPTNeo's vocab size is 50257, so we need to resize the embeddings accordingly.
|
||||
Before training, let's prepare our model first. To be able to shard the model, the sharded dimension needs to be a multiple of devices it'll be sharded on. But GPTNeo's vocab size is 50257, so we need to resize the embeddings accordingly.
|
||||
|
||||
```python
|
||||
from transformers import FlaxGPTNeoForCausalLM, GPTNeoConfig
|
||||
|
||||
@@ -95,4 +95,4 @@ python run_mlm_wwm.py \
|
||||
|
||||
**Note1:** On TPU, you should the flag `--pad_to_max_length` to make sure all your batches have the same length.
|
||||
|
||||
**Note2:** And if you have any questions or something goes wrong when runing this code, don't hesitate to pin @wlhgtc.
|
||||
**Note2:** And if you have any questions or something goes wrong when running this code, don't hesitate to pin @wlhgtc.
|
||||
|
||||
Reference in New Issue
Block a user