Fix some typos. (#17560)

* Fix some typos.

Signed-off-by: Yulv-git <yulvchi@qq.com>

* Fix typo.

Signed-off-by: Yulv-git <yulvchi@qq.com>

* make fixup.
This commit is contained in:
Yulv-git
2022-07-11 17:00:13 +08:00
committed by GitHub
parent ad28ca291b
commit 95113d1365
54 changed files with 80 additions and 68 deletions

View File

@@ -140,7 +140,7 @@ class TokenClassificationTask:
# it easier for the model to learn the concept of sequences.
#
# For classification tasks, the first vector (corresponding to [CLS]) is
# used as as the "sentence vector". Note that this only makes sense because
# used as the "sentence vector". Note that this only makes sense because
# the entire model is fine-tuned.
tokens += [sep_token]
label_ids += [pad_token_label_id]

View File

@@ -43,7 +43,7 @@ A good metric to observe during training is the gradient norm which should ideal
When training a model on large datasets it is recommended to run the data preprocessing
in a first run in a **non-distributed** mode via `--preprocessing_only` so that
when running the model in **distributed** mode in a second step the preprocessed data
when running the model in **distributed** mode in a second step the preprocessed data
can easily be loaded on each distributed device.
---

View File

@@ -91,7 +91,7 @@ python scripts/initialize_model.py \
--model_name codeparrot \
--push_to_hub True
```
This will initialize a new model with the architecture and configuration of `gpt2-large` and use the tokenizer to appropriately size the input embeddings. Finally, the initilaized model is pushed the the hub.
This will initialize a new model with the architecture and configuration of `gpt2-large` and use the tokenizer to appropriately size the input embeddings. Finally, the initilaized model is pushed the hub.
We can either pass the name of a text dataset or a pretokenized dataset which speeds up training a bit.
Now that the tokenizer and model are also ready we can start training the model. The main training script is built with `accelerate` to scale across a wide range of platforms and infrastructure scales. We train two models with [110M](https://huggingface.co/lvwerra/codeparrot-small/) and [1.5B](https://huggingface.co/lvwerra/codeparrot/) parameters for 25-30B tokens on a 16xA100 (40GB) machine which takes 1 day and 1 week, respectively.

View File

@@ -43,7 +43,7 @@ if __name__ == "__main__":
with open(args.data_file, "rb") as fp:
data = pickle.load(fp)
logger.info("Counting occurences for MLM.")
logger.info("Counting occurrences for MLM.")
counter = Counter()
for tk_ids in data:
counter.update(tk_ids)

View File

@@ -49,7 +49,7 @@ At the end of the community week, each team should submit a demo of their projec
- **23.06.** Official announcement of the community week. Make sure to sign-up in [this google form](https://forms.gle/tVGPhjKXyEsSgUcs8).
- **23.06. - 30.06.** Participants will be added to an internal Slack channel. Project ideas can be proposed here and groups of 3-5 are formed. Read this document for more information.
- **30.06.** Release of all relevant training scripts in JAX/Flax as well as other documents on how to set up a TPU, how to use the training scripts, how to submit a demo, tips & tricks for JAX/Flax, tips & tricks for efficient use of the hub.
- **30.06.** Release of all relevant training scripts in JAX/Flax as well as other documents on how to set up a TPU, how to use the training scripts, how to submit a demo, tips & tricks for JAX/Flax, tips & tricks for efficient use of the hub.
- **30.06. - 2.07.** Talks about JAX/Flax, TPU, Transformers, Computer Vision & NLP will be held.
- **7.07.** Start of the community week! Access to TPUv3-8 will be given to each team.
- **7.07. - 14.07.** The Hugging Face & JAX/Flax & Cloud team will be available for any questions, problems the teams might run into.

View File

@@ -106,7 +106,7 @@ def main():
return start_logits, end_logits, jnp.argmax(pooled_logits, axis=-1)
def evaluate(example):
# encode question and context so that they are seperated by a tokenizer.sep_token and cut at max_length
# encode question and context so that they are separated by a tokenizer.sep_token and cut at max_length
inputs = tokenizer(
example["question"],
example["context"],

View File

@@ -22,7 +22,7 @@ the JAX/Flax backend and the [`pjit`](https://jax.readthedocs.io/en/latest/jax.e
> Note: The example is experimental and might have bugs. Also currently it only supports single V3-8.
The `partition.py` file defines the `PyTree` of `ParitionSpec` for the GPTNeo model which describes how the model will be sharded.
The actual sharding is auto-matically handled by `pjit`. The weights are sharded accross all local devices.
The actual sharding is auto-matically handled by `pjit`. The weights are sharded across all local devices.
To adapt the script for other models, we need to also change the `ParitionSpec` accordingly.
TODO: Add more explantion.