Example of pad_to_multiple_of for padding and truncation guide & docstring update (#22278)
* added an example of pad_to_multiple_of * make style * addressed feedback
This commit is contained in:
@@ -50,6 +50,7 @@ The following table summarizes the recommended way to setup padding and truncati
|
||||
| | | `tokenizer(batch_sentences, padding='longest')` |
|
||||
| | padding to max model input length | `tokenizer(batch_sentences, padding='max_length')` |
|
||||
| | padding to specific length | `tokenizer(batch_sentences, padding='max_length', max_length=42)` |
|
||||
| | padding to a multiple of a value | `tokenizer(batch_sentences, padding=True, pad_to_multiple_of=8) |
|
||||
| truncation to max model input length | no padding | `tokenizer(batch_sentences, truncation=True)` or |
|
||||
| | | `tokenizer(batch_sentences, truncation=STRATEGY)` |
|
||||
| | padding to max sequence in batch | `tokenizer(batch_sentences, padding=True, truncation=True)` or |
|
||||
|
||||
@@ -1342,8 +1342,9 @@ ENCODE_KWARGS_DOCSTRING = r"""
|
||||
tokenizer assumes the input is already split into words (for instance, by splitting it on whitespace)
|
||||
which it will tokenize. This is useful for NER or token classification.
|
||||
pad_to_multiple_of (`int`, *optional*):
|
||||
If set will pad the sequence to a multiple of the provided value. This is especially useful to enable
|
||||
the use of Tensor Cores on NVIDIA hardware with compute capability `>= 7.5` (Volta).
|
||||
If set will pad the sequence to a multiple of the provided value. Requires `padding` to be activated.
|
||||
This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability
|
||||
`>= 7.5` (Volta).
|
||||
return_tensors (`str` or [`~utils.TensorType`], *optional*):
|
||||
If set, will return tensors instead of list of python integers. Acceptable values are:
|
||||
|
||||
|
||||
Reference in New Issue
Block a user