This commit is contained in:
Sylvain Gugger
2021-04-26 11:45:04 -04:00
parent ce11318e7e
commit b03b2a653d
3 changed files with 9 additions and 9 deletions

View File

@@ -1286,9 +1286,9 @@ ENCODE_KWARGS_DOCSTRING = r"""
returned to provide some overlap between truncated and overflowing sequences. The value of this
argument defines the number of overlapping tokens.
is_split_into_words (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not the input is already pre-tokenized (e.g., split into words). If set to :obj:`True`,
the tokenizer assumes the input is already split into words (for instance, by splitting it on
whitespace) which it will tokenize. This is useful for NER or token classification.
Whether or not the input is already pre-tokenized (e.g., split into words). If set to :obj:`True`, the
tokenizer assumes the input is already split into words (for instance, by splitting it on whitespace)
which it will tokenize. This is useful for NER or token classification.
pad_to_multiple_of (:obj:`int`, `optional`):
If set will pad the sequence to a multiple of the provided value. This is especially useful to enable
the use of Tensor Cores on NVIDIA hardware with compute capability >= 7.5 (Volta).