HuggingFace_transformer

Files

Ben Eyal 9f9ddcc2de 🚨 🚨 🚨 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in convert_tokens_to_string (#15775 )

* Add test for SentencePiece not adding special tokens to strings

* Add SentencePieceStringConversionMixin to fix issue 15003

* Fix conversion from tokens to string for most SentencePiece tokenizers

Tokenizers fixed:
- AlbertTokenizer
- BarthezTokenizer
- CamembertTokenizer
- FNetTokenizer
- M2M100Tokenizer
- MBart50Tokenizer
- PegasusTokenizer
- Speech2TextTokenizer

* Fix MarianTokenizer, adjust SentencePiece test to accomodate vocab

* Fix DebertaV2Tokenizer

* Ignore LayoutXLMTokenizer in SentencePiece string conversion test

* Run 'make style' and 'make quality'

* Clean convert_tokens_to_string test

Instead of explicitly ignoring LayoutXLMTokenizer in the test,
override the test in LayoutLMTokenizationTest and do nothing in it.

* Remove commented out code

* Improve robustness of convert_tokens_to_string test

Instead of comparing lengths of re-tokenized text and input_ids,
check that converting all special tokens to string yields a string
with all special tokens.

* Inline and remove SentencePieceStringConversionMixin

The convert_tokens_to_string method is now implemented
in each relevant SentencePiece tokenizer.

* Run 'make style' and 'make quality'

* Revert removal of space in convert_tokens_to_string

* Remove redundant import

* Revert test text to original

* Uncomment the lowercasing of the reverse_text variable

* Mimic Rust tokenizer behavior for tokenizers

- Albert
- Barthez
- Camembert
- MBart50
- T5

* Fix accidentally skipping test in wrong tokenizer

* Add test for equivalent Rust and slow tokenizer behavior

* Override _decode in BigBirdTokenizer to mimic Rust behavior

* Override _decode in FNetTokenizer to mimic Rust behavior

* Override _decode in XLNetTokenizer to mimic Rust behavior

* Remove unused 're' import

* Update DebertaV2Tokenizer to mimic Rust tokenizer

* Deberta tokenizer now behaves like Albert and its `convert_tokens_to_string` is not tested.

* Ignore problematic tests in Deberta V2

* Add comment on why the Deberta V2 tests are skipped

2022-11-02 15:45:38 -04:00

albert

Improve model tester (#19984 )

2022-11-02 17:38:44 +01:00

auto

Update auto processor to check image processor created (#20021 )

2022-11-02 15:19:33 +00:00

bart

Generate: contrastive search with full optional outputs (#19963 )

2022-11-01 18:15:36 +00:00

barthez

Move test model folders (#17034 )

2022-05-03 14:42:02 +02:00

bartpho

Move test model folders (#17034 )

2022-05-03 14:42:02 +02:00

beit

Add Image Processors (#19796 )

2022-11-02 11:57:36 +00:00

bert

Allow usage of TF Text BertTokenizer on TFBertTokenizer to make it servable on TF Serving (#19590 )

2022-10-14 15:18:02 +01:00

bert_generation

Black preview (#17217 )

2022-05-12 16:25:55 -04:00

bert_japanese

Add sentencepiece to BertJapaneseTokenizer (#19769 )

2022-10-21 10:04:49 -04:00

bertweet

Move test model folders (#17034 )

2022-05-03 14:42:02 +02:00

big_bird

wrap forward passes with torch.no_grad() (#19273 )

2022-10-04 16:13:22 +02:00

bigbird_pegasus

Black preview (#17217 )

2022-05-12 16:25:55 -04:00

blenderbot

🚨🚨🚨 TF: Remove TFWrappedEmbeddings (breaking: TF embedding initialization updated for encoder-decoder models) (#19263 )

2022-10-11 16:48:03 +01:00

blenderbot_small

🚨🚨🚨 TF: Remove TFWrappedEmbeddings (breaking: TF embedding initialization updated for encoder-decoder models) (#19263 )

2022-10-11 16:48:03 +01:00

bloom

Skip BloomEmbeddingTest.test_embeddings for PyTorch < 1.10 (#19261 )

2022-10-10 10:05:30 +02:00

bort

Move test model folders (#17034 )

2022-05-03 14:42:02 +02:00

byt5

Replace as_target context managers by direct calls (#18325 )

2022-07-29 08:09:09 -04:00

camembert

Move test model folders (#17034 )

2022-05-03 14:42:02 +02:00

canine

Improve model tester (#19984 )

2022-11-02 17:38:44 +01:00

clip

Add Image Processors (#19796 )

2022-11-02 11:57:36 +00:00

codegen

Update expected values in CodeGen tests (#17888 )

2022-07-01 15:33:36 +02:00

conditional_detr

[Conditional, Deformable DETR] Add postprocessing methods (#19709 )

2022-10-31 08:28:44 +01:00

convbert

wrap forward passes with torch.no_grad() (#19274 )

2022-10-04 16:12:03 +02:00

convnext

Add Image Processors (#19796 )

2022-11-02 11:57:36 +00:00

cpm

Move test model folders (#17034 )

2022-05-03 14:42:02 +02:00

ctrl

Improve model tester (#19984 )

2022-11-02 17:38:44 +01:00

cvt

[CvT] Tensorflow implementation (#18597 )

2022-10-11 18:16:52 +01:00

data2vec

Improve model tester (#19984 )

2022-11-02 17:38:44 +01:00

deberta

fix train_new_from_iterator in the case of byte-level tokenizers (#17549 )

2022-06-08 15:30:41 +02:00

deberta_v2

🚨 🚨 🚨 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in convert_tokens_to_string (#15775 )

2022-11-02 15:45:38 -04:00

decision_transformer

Update expected values in DecisionTransformerModelIntegrationTest (#18016 )

2022-07-05 14:53:43 +02:00

deformable_detr

[Conditional, Deformable DETR] Add postprocessing methods (#19709 )

2022-10-31 08:28:44 +01:00

deit

Add Image Processors (#19796 )

2022-11-02 11:57:36 +00:00

detr

[fix] Add DeformableDetrFeatureExtractor (#19140 )

2022-09-22 09:45:24 +02:00

distilbert

Move test model folders (#17034 )

2022-05-03 14:42:02 +02:00

dit

Move test model folders (#17034 )

2022-05-03 14:42:02 +02:00

donut

remvoe _create_and_check_torch_fx_tracing in specific test files (#18667 )

2022-09-07 16:22:09 +02:00

dpr

Replace as_target context managers by direct calls (#18325 )

2022-07-29 08:09:09 -04:00

dpt

Add Image Processors (#19796 )

2022-11-02 11:57:36 +00:00

electra

Improve model tester (#19984 )

2022-11-02 17:38:44 +01:00

encoder_decoder

Fix gradient checkpoint test in encoder-decoder (#20017 )

2022-11-02 14:15:09 +01:00

ernie

add task_type_id to BERT to support ERNIE-2.0 and ERNIE-3.0 models (#18686 )

2022-09-09 07:36:46 -04:00

esm

Improve model tester (#19984 )

2022-11-02 17:38:44 +01:00

flaubert

Improve model tester (#19984 )

2022-11-02 17:38:44 +01:00

flava

Add Image Processors (#19796 )

2022-11-02 11:57:36 +00:00

fnet

wrap forward passes with torch.no_grad() (#19413 )

2022-10-10 15:03:46 -04:00

fsmt

Improve model tester (#19984 )

2022-11-02 17:38:44 +01:00

funnel

Update serving code to enable saved_model=True (#18153 )

2022-07-22 18:05:38 +01:00

glpn

Move test model folders (#17034 )

2022-05-03 14:42:02 +02:00

gpt2

Generate: contrastive search with full optional outputs (#19963 )

2022-11-01 18:15:36 +00:00

gpt_neo

fix train_new_from_iterator in the case of byte-level tokenizers (#17549 )

2022-06-08 15:30:41 +02:00

gpt_neox

skip some gpt_neox tests that require 80G RAM (#17923 )

2022-07-01 09:04:38 -04:00

gpt_neox_japanese

Add support for Japanese GPT-NeoX-based model by ABEJA, Inc. (#18814 )

2022-09-14 10:17:40 -04:00

gptj

Generate: contrastive search with full optional outputs (#19963 )

2022-11-01 18:15:36 +00:00

groupvit

Fix TFGroupViT CI (#19461 )

2022-10-11 14:29:15 +02:00

herbert

Move test model folders (#17034 )

2022-05-03 14:42:02 +02:00

hubert

Fix train_step, test_step and tests for CLIP (#18684 )

2022-09-09 20:01:02 +01:00

ibert

Improve model tester (#19984 )

2022-11-02 17:38:44 +01:00

imagegpt

Add Image Processors (#19796 )

2022-11-02 11:57:36 +00:00

layoutlm

TF: TF 2.10 unpin + related onnx test skips (#18995 )

2022-09-12 19:30:27 +01:00

layoutlmv2

Add Image Processors (#19796 )

2022-11-02 11:57:36 +00:00

layoutlmv3

Add Image Processors (#19796 )

2022-11-02 11:57:36 +00:00

layoutxlm

🚨 🚨 🚨 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in convert_tokens_to_string (#15775 )

2022-11-02 15:45:38 -04:00

led

Update LEDModelIntegrationTests expected values (#19841 )

2022-10-24 16:05:26 +02:00

levit

Add Image Processors (#19796 )

2022-11-02 11:57:36 +00:00

lilt

Add LiLT (#19450 )

2022-10-12 10:11:20 +02:00

longformer

Improve model tester (#19984 )

2022-11-02 17:38:44 +01:00

longt5

Skip test_export_to_onnx for LongT5 if torch < 1.11 (#19122 )

2022-09-20 21:52:18 +02:00

luke

Adding fine-tuning models to LUKE (#18353 )

2022-08-01 11:09:47 -04:00

lxmert

Update serving code to enable saved_model=True (#18153 )

2022-07-22 18:05:38 +01:00

m2m_100

Add missing lang tokens in M2M100Tokenizer.get_vocab (#18416 )

2022-10-25 09:18:24 -04:00

marian

🚨🚨🚨 TF: Remove TFWrappedEmbeddings (breaking: TF embedding initialization updated for encoder-decoder models) (#19263 )

2022-10-11 16:48:03 +01:00

markuplm

Add MarkupLM (#19198 )

2022-09-30 08:25:43 +02:00

maskformer

Fix image segmentation pipeline errors, resolve backward compatibility issues (#19768 )

2022-10-21 18:09:58 +03:00

mbart

🚨🚨🚨 TF: Remove TFWrappedEmbeddings (breaking: TF embedding initialization updated for encoder-decoder models) (#19263 )

2022-10-11 16:48:03 +01:00

mbart50

Replace as_target context managers by direct calls (#18325 )

2022-07-29 08:09:09 -04:00

mctct

Replace as_target context managers by direct calls (#18325 )

2022-07-29 08:09:09 -04:00

megatron_bert

Move test model folders (#17034 )

2022-05-03 14:42:02 +02:00

megatron_gpt2

Move test model folders (#17034 )

2022-05-03 14:42:02 +02:00

mluke

Black preview (#17217 )

2022-05-12 16:25:55 -04:00

mobilebert

Fix train_step, test_step and tests for CLIP (#18684 )

2022-09-09 20:01:02 +01:00

mobilevit

Add Image Processors (#19796 )

2022-11-02 11:57:36 +00:00

mpnet

Move test model folders (#17034 )

2022-05-03 14:42:02 +02:00

mt5

Fix expected loss values in some (m)T5 tests (#18177 )

2022-07-18 15:26:21 +02:00

mvp

Replace as_target context managers by direct calls (#18325 )

2022-07-29 08:09:09 -04:00

nezha

speed up test (#18106 )

2022-07-12 04:28:28 -04:00

nllb

Replace as_target context managers by direct calls (#18325 )

2022-07-29 08:09:09 -04:00

nystromformer

Move test model folders (#17034 )

2022-05-03 14:42:02 +02:00

openai

Improve model tester (#19984 )

2022-11-02 17:38:44 +01:00

opt

Generate: contrastive search with full optional outputs (#19963 )

2022-11-01 18:15:36 +00:00

owlvit

fix owlvit tests, update docstring examples (#18586 )

2022-08-11 19:10:25 +03:00

pegasus

🚨🚨🚨 TF: Remove TFWrappedEmbeddings (breaking: TF embedding initialization updated for encoder-decoder models) (#19263 )

2022-10-11 16:48:03 +01:00

pegasus_x

Fix CI for PegasusX (#19025 )

2022-09-14 14:45:00 +02:00

perceiver

Replace as_target context managers by direct calls (#18325 )

2022-07-29 08:09:09 -04:00

phobert

Move test model folders (#17034 )

2022-05-03 14:42:02 +02:00

plbart

Replace as_target context managers by direct calls (#18325 )

2022-07-29 08:09:09 -04:00

poolformer

Add Image Processors (#19796 )

2022-11-02 11:57:36 +00:00

prophetnet

Black preview (#17217 )

2022-05-12 16:25:55 -04:00

qdqbert

Move test model folders (#17034 )

2022-05-03 14:42:02 +02:00

rag

Avoid GPU OOM for a TF Rag test (#17638 )

2022-06-10 18:50:29 +02:00

realm

Black preview (#17217 )

2022-05-12 16:25:55 -04:00

reformer

Black preview (#17217 )

2022-05-12 16:25:55 -04:00

regnet

Run tests if skip condition not met (#18764 )

2022-08-30 14:03:28 +02:00

rembert

Move test model folders (#17034 )

2022-05-03 14:42:02 +02:00

resnet

FX support for ConvNext, Wav2Vec2 and ResNet (#19053 )

2022-09-16 10:57:41 +02:00

retribert

fix retribert's test_torch_encode_plus_sent_to_model (#17231 )

2022-05-17 14:33:13 +02:00

roberta

Improve model tester (#19984 )

2022-11-02 17:38:44 +01:00

roformer

wrap forward passes with torch.no_grad() (#19438 )

2022-10-10 14:54:54 -04:00

segformer

Add Image Processors (#19796 )

2022-11-02 11:57:36 +00:00

sew

Move test model folders (#17034 )

2022-05-03 14:42:02 +02:00

sew_d

Move test model folders (#17034 )

2022-05-03 14:42:02 +02:00

speech_encoder_decoder

send model to the correct device (#18800 )

2022-08-29 18:46:30 +02:00

speech_to_text

remvoe _create_and_check_torch_fx_tracing in specific test files (#18667 )

2022-09-07 16:22:09 +02:00

speech_to_text_2

Fx support for multiple model architectures (#17393 )

2022-05-31 10:02:55 +02:00

splinter

Fix Splinter test (#17854 )

2022-06-24 16:26:14 +02:00

squeezebert

Move test model folders (#17034 )

2022-05-03 14:42:02 +02:00

swin

remvoe _create_and_check_torch_fx_tracing in specific test files (#18667 )

2022-09-07 16:22:09 +02:00

swinv2

Add swin transformer v2 (#17469 )

2022-07-27 11:14:47 -04:00

Generate: contrastive search with full optional outputs (#19963 )

2022-11-01 18:15:36 +00:00

table_transformer

Add table transformer [v2] (#19614 )

2022-10-18 15:20:09 +02:00

tapas

Fix test_tf_encode_plus_sent_to_model for TAPAS (#19559 )

2022-10-14 16:10:36 +02:00

tapex

Replace as_target context managers by direct calls (#18325 )

2022-07-29 08:09:09 -04:00

time_series_transformer

Add a decorator for flaky tests (#19498 )

2022-10-12 14:00:17 -04:00

trajectory_transformer

Add trajectory transformer (#17141 )

2022-05-17 19:07:43 -04:00

transfo_xl

Improve model tester (#19984 )

2022-11-02 17:38:44 +01:00

trocr

Fx support for multiple model architectures (#17393 )

2022-05-31 10:02:55 +02:00

unispeech

Move test model folders (#17034 )

2022-05-03 14:42:02 +02:00

unispeech_sat

Move test model folders (#17034 )

2022-05-03 14:42:02 +02:00

van

has_attentions - consistent test skipping logic and tf tests (#17495 )

2022-06-09 09:50:03 +02:00

videomae

Add Image Processors (#19796 )

2022-11-02 11:57:36 +00:00

vilt

Add Image Processors (#19796 )

2022-11-02 11:57:36 +00:00

vision_encoder_decoder

PT <-> TF for composite models (#19732 )

2022-10-21 12:40:39 +02:00

vision_text_dual_encoder

Move test model folders (#17034 )

2022-05-03 14:42:02 +02:00

visual_bert

wrap forward passes with torch.no_grad() (#19439 )

2022-10-10 14:54:36 -04:00

vit

Add Image Processors (#19796 )

2022-11-02 11:57:36 +00:00

vit_mae

TF: tests for (de)serializable models with resized tokens (#19013 )

2022-09-16 16:38:08 +01:00

vit_msn

Some fixes regarding auto mappings and test class names (#19923 )

2022-10-27 14:38:59 +02:00

wav2vec2

Fix bug in Wav2Vec2's GPU tests (#19803 )

2022-10-27 09:00:03 -04:00

wav2vec2_conformer

[Test] Fix W2V-Conformer integration test (#17303 )

2022-05-17 18:20:36 +02:00

wav2vec2_phoneme

Move test model folders (#17034 )

2022-05-03 14:42:02 +02:00

wav2vec2_with_lm

Allow user-managed Pool in Wav2Vec2ProcessorWithLM.batch_decode (#18351 )

2022-10-18 08:48:03 -04:00

wavlm

Move test model folders (#17034 )

2022-05-03 14:42:02 +02:00

whisper

Run some TF Whisper tests in subprocesses to avoid GPU OOM (#19772 )

2022-10-21 21:59:18 +02:00

x_clip

[X-CLIP] Fix doc tests (#19523 )

2022-10-12 17:05:12 +02:00

xglm

XGLM - Fix Softmax NaNs when using FP16 (#18057 )

2022-09-29 10:42:07 +02:00

xlm

Improve model tester (#19984 )

2022-11-02 17:38:44 +01:00

xlm_prophetnet

Black preview (#17217 )

2022-05-12 16:25:55 -04:00

xlm_roberta

Black preview (#17217 )

2022-05-12 16:25:55 -04:00

xlm_roberta_xl

Improve model tester (#19984 )

2022-11-02 17:38:44 +01:00

xlnet

Return scalar losses instead of per-sample means (#18013 )

2022-07-04 17:26:19 +01:00

yolos

[fix] Add DeformableDetrFeatureExtractor (#19140 )

2022-09-22 09:45:24 +02:00

yoso

fix train_new_from_iterator in the case of byte-level tokenizers (#17549 )

2022-06-08 15:30:41 +02:00

__init__.py

Move test model folders (#17034 )

2022-05-03 14:42:02 +02:00