Correct order of overflowing tokens for LayoutLmV2 tokenizer (#13495)
* correct order of overflowing tokens for LayoutLmV2 tokenizer * test to check order of overflowing_tokens for a seq of input_ids * fix up quality * added suggested changes * check that tests the bbox sequence * pair_input test added * pass quality test * check bbox sequence added * unittest method * comments added * add overflowing bbox test * improved "seq_1" Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com> * improve code quality Co-authored-by: SaulLu <lucilesaul.com@gmail.com> Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
This commit is contained in:
@@ -3015,7 +3015,7 @@ class PreTrainedTokenizerBase(SpecialTokensMixin, PushToHubMixin):
|
||||
|
||||
Returns:
|
||||
:obj:`Tuple[List[int], List[int], List[int]]`: The truncated ``ids``, the truncated ``pair_ids`` and the
|
||||
list of overflowing tokens. Note: The `longest_first` strategy returns empty list of overflowing_tokens if
|
||||
list of overflowing tokens. Note: The `longest_first` strategy returns empty list of overflowing tokens if
|
||||
a pair of sequences (or a batch of pairs) is provided.
|
||||
"""
|
||||
if num_tokens_to_remove <= 0:
|
||||
|
||||
Reference in New Issue
Block a user