chore: Fix multiple typos (#28574)
This commit is contained in:
@@ -27,7 +27,7 @@ To adapt the script for other models, we need to also change the `ParitionSpec`
|
||||
|
||||
TODO: Add more explantion.
|
||||
|
||||
Before training, let's prepare our model first. To be able to shard the model, the sharded dimention needs to be a multiple of devices it'll be sharded on. But GPTNeo's vocab size is 50257, so we need to resize the embeddings accordingly.
|
||||
Before training, let's prepare our model first. To be able to shard the model, the sharded dimension needs to be a multiple of devices it'll be sharded on. But GPTNeo's vocab size is 50257, so we need to resize the embeddings accordingly.
|
||||
|
||||
```python
|
||||
from transformers import FlaxGPTNeoForCausalLM, GPTNeoConfig
|
||||
|
||||
Reference in New Issue
Block a user