Fix some typos. (#17560)
* Fix some typos. Signed-off-by: Yulv-git <yulvchi@qq.com> * Fix typo. Signed-off-by: Yulv-git <yulvchi@qq.com> * make fixup.
This commit is contained in:
@@ -140,7 +140,7 @@ class TokenClassificationTask:
|
||||
# it easier for the model to learn the concept of sequences.
|
||||
#
|
||||
# For classification tasks, the first vector (corresponding to [CLS]) is
|
||||
# used as as the "sentence vector". Note that this only makes sense because
|
||||
# used as the "sentence vector". Note that this only makes sense because
|
||||
# the entire model is fine-tuned.
|
||||
tokens += [sep_token]
|
||||
label_ids += [pad_token_label_id]
|
||||
|
||||
@@ -43,7 +43,7 @@ A good metric to observe during training is the gradient norm which should ideal
|
||||
|
||||
When training a model on large datasets it is recommended to run the data preprocessing
|
||||
in a first run in a **non-distributed** mode via `--preprocessing_only` so that
|
||||
when running the model in **distributed** mode in a second step the preprocessed data
|
||||
when running the model in **distributed** mode in a second step the preprocessed data
|
||||
can easily be loaded on each distributed device.
|
||||
|
||||
---
|
||||
|
||||
@@ -91,7 +91,7 @@ python scripts/initialize_model.py \
|
||||
--model_name codeparrot \
|
||||
--push_to_hub True
|
||||
```
|
||||
This will initialize a new model with the architecture and configuration of `gpt2-large` and use the tokenizer to appropriately size the input embeddings. Finally, the initilaized model is pushed the the hub.
|
||||
This will initialize a new model with the architecture and configuration of `gpt2-large` and use the tokenizer to appropriately size the input embeddings. Finally, the initilaized model is pushed the hub.
|
||||
|
||||
We can either pass the name of a text dataset or a pretokenized dataset which speeds up training a bit.
|
||||
Now that the tokenizer and model are also ready we can start training the model. The main training script is built with `accelerate` to scale across a wide range of platforms and infrastructure scales. We train two models with [110M](https://huggingface.co/lvwerra/codeparrot-small/) and [1.5B](https://huggingface.co/lvwerra/codeparrot/) parameters for 25-30B tokens on a 16xA100 (40GB) machine which takes 1 day and 1 week, respectively.
|
||||
|
||||
@@ -43,7 +43,7 @@ if __name__ == "__main__":
|
||||
with open(args.data_file, "rb") as fp:
|
||||
data = pickle.load(fp)
|
||||
|
||||
logger.info("Counting occurences for MLM.")
|
||||
logger.info("Counting occurrences for MLM.")
|
||||
counter = Counter()
|
||||
for tk_ids in data:
|
||||
counter.update(tk_ids)
|
||||
|
||||
@@ -49,7 +49,7 @@ At the end of the community week, each team should submit a demo of their projec
|
||||
|
||||
- **23.06.** Official announcement of the community week. Make sure to sign-up in [this google form](https://forms.gle/tVGPhjKXyEsSgUcs8).
|
||||
- **23.06. - 30.06.** Participants will be added to an internal Slack channel. Project ideas can be proposed here and groups of 3-5 are formed. Read this document for more information.
|
||||
- **30.06.** Release of all relevant training scripts in JAX/Flax as well as other documents on how to set up a TPU, how to use the training scripts, how to submit a demo, tips & tricks for JAX/Flax, tips & tricks for efficient use of the hub.
|
||||
- **30.06.** Release of all relevant training scripts in JAX/Flax as well as other documents on how to set up a TPU, how to use the training scripts, how to submit a demo, tips & tricks for JAX/Flax, tips & tricks for efficient use of the hub.
|
||||
- **30.06. - 2.07.** Talks about JAX/Flax, TPU, Transformers, Computer Vision & NLP will be held.
|
||||
- **7.07.** Start of the community week! Access to TPUv3-8 will be given to each team.
|
||||
- **7.07. - 14.07.** The Hugging Face & JAX/Flax & Cloud team will be available for any questions, problems the teams might run into.
|
||||
|
||||
@@ -106,7 +106,7 @@ def main():
|
||||
return start_logits, end_logits, jnp.argmax(pooled_logits, axis=-1)
|
||||
|
||||
def evaluate(example):
|
||||
# encode question and context so that they are seperated by a tokenizer.sep_token and cut at max_length
|
||||
# encode question and context so that they are separated by a tokenizer.sep_token and cut at max_length
|
||||
inputs = tokenizer(
|
||||
example["question"],
|
||||
example["context"],
|
||||
|
||||
@@ -22,7 +22,7 @@ the JAX/Flax backend and the [`pjit`](https://jax.readthedocs.io/en/latest/jax.e
|
||||
> Note: The example is experimental and might have bugs. Also currently it only supports single V3-8.
|
||||
|
||||
The `partition.py` file defines the `PyTree` of `ParitionSpec` for the GPTNeo model which describes how the model will be sharded.
|
||||
The actual sharding is auto-matically handled by `pjit`. The weights are sharded accross all local devices.
|
||||
The actual sharding is auto-matically handled by `pjit`. The weights are sharded across all local devices.
|
||||
To adapt the script for other models, we need to also change the `ParitionSpec` accordingly.
|
||||
|
||||
TODO: Add more explantion.
|
||||
|
||||
Reference in New Issue
Block a user