[docstring] Fix docstring for LlamaConfig (#26685)
* Your commit message here * fix LlamaConfig docstring * run make fixup * fix formatting after review reformat of the file to prevent script issues * rerun make fixup after reformat
This commit is contained in:
@@ -58,11 +58,6 @@ class LlamaConfig(PretrainedConfig):
|
|||||||
by meanpooling all the original heads within that group. For more details checkout [this
|
by meanpooling all the original heads within that group. For more details checkout [this
|
||||||
paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to
|
paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to
|
||||||
`num_attention_heads`.
|
`num_attention_heads`.
|
||||||
pretraining_tp (`int`, *optional*, defaults to `1`):
|
|
||||||
Experimental feature. Tensor parallelism rank used during pretraining. Please refer to [this
|
|
||||||
document](https://huggingface.co/docs/transformers/parallelism) to understand more about it. This value is
|
|
||||||
necessary to ensure exact reproducibility of the pretraining results. Please refer to [this
|
|
||||||
issue](https://github.com/pytorch/pytorch/issues/76232).
|
|
||||||
hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
|
hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
|
||||||
The non-linear activation function (function or string) in the decoder.
|
The non-linear activation function (function or string) in the decoder.
|
||||||
max_position_embeddings (`int`, *optional*, defaults to 2048):
|
max_position_embeddings (`int`, *optional*, defaults to 2048):
|
||||||
@@ -70,12 +65,23 @@ class LlamaConfig(PretrainedConfig):
|
|||||||
Llama 2 up to 4096, CodeLlama up to 16384.
|
Llama 2 up to 4096, CodeLlama up to 16384.
|
||||||
initializer_range (`float`, *optional*, defaults to 0.02):
|
initializer_range (`float`, *optional*, defaults to 0.02):
|
||||||
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
|
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
|
||||||
rms_norm_eps (`float`, *optional*, defaults to 1e-12):
|
rms_norm_eps (`float`, *optional*, defaults to 1e-06):
|
||||||
The epsilon used by the rms normalization layers.
|
The epsilon used by the rms normalization layers.
|
||||||
use_cache (`bool`, *optional*, defaults to `True`):
|
use_cache (`bool`, *optional*, defaults to `True`):
|
||||||
Whether or not the model should return the last key/values attentions (not used by all models). Only
|
Whether or not the model should return the last key/values attentions (not used by all models). Only
|
||||||
relevant if `config.is_decoder=True`.
|
relevant if `config.is_decoder=True`.
|
||||||
tie_word_embeddings(`bool`, *optional*, defaults to `False`):
|
pad_token_id (`int`, *optional*):
|
||||||
|
Padding token id.
|
||||||
|
bos_token_id (`int`, *optional*, defaults to 1):
|
||||||
|
Beginning of stream token id.
|
||||||
|
eos_token_id (`int`, *optional*, defaults to 2):
|
||||||
|
End of stream token id.
|
||||||
|
pretraining_tp (`int`, *optional*, defaults to 1):
|
||||||
|
Experimental feature. Tensor parallelism rank used during pretraining. Please refer to [this
|
||||||
|
document](https://huggingface.co/docs/transformers/parallelism) to understand more about it. This value is
|
||||||
|
necessary to ensure exact reproducibility of the pretraining results. Please refer to [this
|
||||||
|
issue](https://github.com/pytorch/pytorch/issues/76232).
|
||||||
|
tie_word_embeddings (`bool`, *optional*, defaults to `False`):
|
||||||
Whether to tie weight embeddings
|
Whether to tie weight embeddings
|
||||||
rope_theta (`float`, *optional*, defaults to 10000.0):
|
rope_theta (`float`, *optional*, defaults to 10000.0):
|
||||||
The base period of the RoPE embeddings.
|
The base period of the RoPE embeddings.
|
||||||
@@ -87,10 +93,9 @@ class LlamaConfig(PretrainedConfig):
|
|||||||
these scaling strategies behave:
|
these scaling strategies behave:
|
||||||
https://www.reddit.com/r/LocalLLaMA/comments/14mrgpr/dynamically_scaled_rope_further_increases/. This is an
|
https://www.reddit.com/r/LocalLLaMA/comments/14mrgpr/dynamically_scaled_rope_further_increases/. This is an
|
||||||
experimental feature, subject to breaking API changes in future versions.
|
experimental feature, subject to breaking API changes in future versions.
|
||||||
attention_bias (`bool`, defaults to `False`):
|
attention_bias (`bool`, defaults to `False`, *optional*, defaults to `False`):
|
||||||
Whether to use a bias in the query, key, value and output projection layers during self-attention.
|
Whether to use a bias in the query, key, value and output projection layers during self-attention.
|
||||||
|
|
||||||
Example:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
>>> from transformers import LlamaModel, LlamaConfig
|
>>> from transformers import LlamaModel, LlamaConfig
|
||||||
|
|||||||
@@ -361,7 +361,6 @@ OBJECTS_TO_IGNORE = [
|
|||||||
"LevitConfig",
|
"LevitConfig",
|
||||||
"LiltConfig",
|
"LiltConfig",
|
||||||
"LiltModel",
|
"LiltModel",
|
||||||
"LlamaConfig",
|
|
||||||
"LlamaTokenizer",
|
"LlamaTokenizer",
|
||||||
"LlamaTokenizerFast",
|
"LlamaTokenizerFast",
|
||||||
"LongT5Config",
|
"LongT5Config",
|
||||||
|
|||||||
Reference in New Issue
Block a user