Correct order of overflowing tokens for LayoutLmV2 tokenizer (#13495)

* correct order of overflowing tokens for LayoutLmV2 tokenizer

* test to check order of overflowing_tokens for a seq of input_ids

* fix up quality

* added suggested changes

* check that tests the bbox sequence

* pair_input test added

* pass quality test

* check bbox sequence added

* unittest method

* comments added

* add overflowing bbox test

* improved "seq_1"

Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>

* improve code quality

Co-authored-by: SaulLu <lucilesaul.com@gmail.com>
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
This commit is contained in:
Apoorv Garg
2021-11-09 18:19:53 +05:30
committed by GitHub
parent 95b3ec3bc9
commit 6326aa4bf0
3 changed files with 551 additions and 35 deletions

View File

@@ -3015,7 +3015,7 @@ class PreTrainedTokenizerBase(SpecialTokensMixin, PushToHubMixin):
Returns:
:obj:`Tuple[List[int], List[int], List[int]]`: The truncated ``ids``, the truncated ``pair_ids`` and the
list of overflowing tokens. Note: The `longest_first` strategy returns empty list of overflowing_tokens if
list of overflowing tokens. Note: The `longest_first` strategy returns empty list of overflowing tokens if
a pair of sequences (or a batch of pairs) is provided.
"""
if num_tokens_to_remove <= 0: