Sylvain Gugger
2295d783d5
Copy tokenizer files in each of their repo ( #10624 )
...
* Move tokenizer files in each repo
* Fix mBART50 tests
* Fix mBART tests
* Fix Marian tests
* Update templates
2021-03-10 11:26:23 -05:00
Suraj Patil
d26b37e744
Speech2TextTransformer ( #10175 )
...
* s2t
* fix config
* conversion script
* fix import
* add tokenizer
* fix tok init
* fix tokenizer
* first version working
* fix embeds
* fix lm head
* remove extra heads
* fix convert script
* handle encoder attn mask
* style
* better enc attn mask
* override _prepare_attention_mask_for_generation
* handle attn_maks in encoder and decoder
* input_ids => input_features
* enable use_cache
* remove old code
* expand embeddings if needed
* remove logits bias
* masked_lm_loss => loss
* hack tokenizer to support feature processing
* fix model_input_names
* style
* fix error message
* doc
* remove inputs_embeds
* remove input_embeds
* remove unnecessary docstring
* quality
* SpeechToText => Speech2Text
* style
* remove shared_embeds
* subsample => conv
* remove Speech2TextTransformerDecoderWrapper
* update output_lengths formula
* fix table
* remove max_position_embeddings
* update conversion scripts
* add possibility to do upper case for now
* add FeatureExtractor and Processor
* add tests for extractor
* require_torch_audio => require_torchaudio
* add processor test
* update import
* remove classification head
* attention mask is now 1D
* update docstrings
* attention mask should be of type long
* handle attention mask from generate
* alwyas return attention_mask
* fix test
* style
* doc
* Speech2TextTransformer => Speech2Text
* Speech2TextTransformerConfig => Speech2TextConfig
* remove dummy_inputs
* nit
* style
* multilinguial tok
* fix tokenizer
* add tgt_lang setter
* save lang_codes
* fix tokenizer
* add forced_bos_token_id to tokenizer
* apply review suggestions
* add torchaudio to extra deps
* add speech deps to CI
* fix dep
* add libsndfile to ci
* libsndfile1
* add speech to extras all
* libsndfile1 -> libsndfile1
* libsndfile
* libsndfile1-dev
* apt update
* add sudo to install
* update deps table
* install libsndfile1-dev on CI
* tuple to list
* init conv layer
* add model tests
* quality
* add integration tests
* skip_special_tokens
* add speech_to_text_transformer in toctree
* fix tokenizer
* fix fp16 tests
* add tokenizer tests
* fix copyright
* input_values => input_features
* doc
* add model in readme
* doc
* change checkpoint names
* fix copyright
* fix code example
* add max_model_input_sizes in tokenizer
* fix integration tests
* add do_lower_case to tokenizer
* remove clamp trick
* fix "Add modeling imports here"
* fix copyrights
* fix tests
* SpeechToTextTransformer => SpeechToText
* fix naming
* fix table formatting
* fix typo
* style
* fix typos
* remove speech dep from extras[testing]
* fix copies
* rename doc file,
* put imports under is_torch_available
* run feat extract tests when torch is available
* dummy objects for processor and extractor
* fix imports in tests
* fix import in modeling test
* fxi imports
* fix torch import
* fix imports again
* fix positional embeddings
* fix typo in import
* adapt new extractor refactor
* style
* fix torchscript test
* doc
* doc
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
* fix docs, copied from, style
* fix docstring
* handle imports
* remove speech from all extra deps
* remove s2t from seq2seq lm mapping
* better names
* skip training tests
* add install instructions
* List => Tuple
* doc
* fix conversion script
* fix urls
* add instruction for libsndfile
* fix fp16 test
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
2021-03-10 21:42:04 +05:30
Sylvain Gugger
72d9e039f9
Fix tests of TrainerCallback ( #10615 )
...
* Fix tests of TrainerCallback
* Update tests/test_trainer_callback.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
2021-03-09 16:25:32 -05:00
Suraj Patil
20c10258a4
layerdrop 0 ( #10604 )
2021-03-09 17:35:07 +03:00
Patrick von Platen
9a06b6b11b
[FeatureExtractorSavingUtils] Refactor PretrainedFeatureExtractor ( #10594 )
...
* save first version
* finish refactor
* finish refactor
* correct naming
* correct naming
* shorter names
* Update src/transformers/feature_extraction_common_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
* change name
* finish
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
2021-03-09 12:16:59 +03:00
Lysandre Debut
546cbe7e9e
Speedup tf tests ( #10601 )
...
* Pipeline tests should be slow
* Temporarily mark some tests as slow
* Temporarily mark Barthez tests as slow
2021-03-08 21:44:07 -05:00
Ratthachat (Jung)
696e8a4365
Add TFRag ( #9002 )
...
* Create modeling_tf_dpr.py
* Add TFDPR
* Add back TFPegasus, TFMarian, TFMBart, TFBlenderBot
last commit accidentally deleted these 4 lines, so I recover them back
* Add TFDPR
* Add TFDPR
* clean up some comments, add TF input-style doc string
* Add TFDPR
* Make return_dict=False as default
* Fix return_dict bug (in .from_pretrained)
* Add get_input_embeddings()
* Create test_modeling_tf_dpr.py
The current version is already passed all 27 tests!
Please see the test run at :
https://colab.research.google.com/drive/1czS_m9zy5k-iSJbzA_DP1k1xAAC_sdkf?usp=sharing
* fix quality
* delete init weights
* run fix copies
* fix repo consis
* del config_class, load_tf_weights
They shoud be 'pytorch only'
* add config_class back
after removing it, test failed ... so totally only removing "use_tf_weights = None" on Lysandre suggestion
* newline after .. note::
* import tf, np (Necessary for ModelIntegrationTest)
* slow_test from_pretrained with from_pt=True
At the moment we don't have TF weights (since we don't have official official TF model)
Previously, I did not run slow test, so I missed this bug
* Add simple TFDPRModelIntegrationTest
Note that this is just a test that TF and Pytorch gives approx. the same output.
However, I could not test with the official DPR repo's output yet
* upload correct tf model
* remove position_ids as missing keys
* create modeling_tf_rag
* add tests for tf
* add tf tests
* revert wrong pt commit
* further refactor
* further refactor
* refactor
* Update modeling_tf_rag.py
- input_processing
- fix prepare_input_for_generation (mostly fix generate bug)
- bring back from_pretrained hack in order to test generate
* delete colab pieces of code
* Show case of greedy "generate"
Temporarily change from beam_search test to greedy_search test to show case that TF and PT do get equivalent output.
* cosmetic update
* correct typos
* update
* push some progress
* make easy check
* fix rag save from pretrained
* Update src/transformers/modeling_tf_utils.py
* remove commented out lines
* delete unnecessary lines
* add simple test case for nq_checkpoint
Add nq_checkpoint test to show that current version without hack still fails
* temporarily put ugly hack back again
* Add TFRagSequenceForGeneration!!
* __init__.py , import TFRagSequenceForGeneration
* Add TFRagSequence tests!
* rag init.py - add TFRagSequenceForGeneration
* fix from_pretrained
* fix prepare_inputs_for_generation
* Beam search for RagToken!
* minor clean up
* add tf.cast in TFRagModel
* More tf.cast
* Add all remaining tests (still have issues)
* delete all T5 related
* make style
* fix load weight prefix
* fix bart
* fix return_dict for tf_rag
make all tests pass .. Hooray
* fix some tests
* fix code quality
* fix qualtiy check
* finish tests tf rag
* add tf rag to docs
* remove TFT5 from docstring
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
* remove TFT5 from docstring
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
* Delete outdated comments
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
* improve doc strings
* add generative model classes
* fix adjust token logic
* refactor generate for TFRag
* using shape_list, not _get_shape
Co-authored-by: Julien Plu <plu.julien@gmail.com >
* axis=[1]->axis=1
* delete NEED_HELP comment
* improve readability
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* improve readability
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* improve readability
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* Indicating model is in a developing state in docstrings
As suggested by Julien
* small last changes
* apply sylvains suggestions
* finish tf rag
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: patrickvonplaten <patrick@huggingface.co >
Co-authored-by: Julien Plu <plu.julien@gmail.com >
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
2021-03-09 00:49:51 +03:00
Sylvain Gugger
3ced9b3eb9
Check layer types for Optimizer construction ( #10598 )
...
* Check layer types for Optimizer construction
* Duplicate class
2021-03-08 16:40:11 -05:00
Sylvain Gugger
821d518e03
Revert "Tests"
...
This reverts commit b35e7b68ca .
2021-03-08 16:05:55 -05:00
Sylvain Gugger
4196bfeda0
Revert "Style"
...
This reverts commit a8ec52efc2 .
2021-03-08 16:05:52 -05:00
Sylvain Gugger
a8ec52efc2
Style
2021-03-08 16:04:46 -05:00
Sylvain Gugger
b35e7b68ca
Tests
2021-03-08 16:04:30 -05:00
Stas Bekman
6f84531e61
offline mode for firewalled envs (part 2) ( #10569 )
...
* more readable test
* add all the missing places
* one more nltk
* better exception check
* revert
2021-03-08 08:52:20 -08:00
Stas Bekman
f882966004
fix double wrapping + test ( #10583 )
2021-03-08 10:15:55 -05:00
Suraj Patil
2a737bffef
[M2M100] fix positional embeddings ( #10590 )
...
* fix tests
* emb should be a parameter
* fix positional embeddings
* fix make_weights
* don't save pos embeds
* add comment to describe the clamping
2021-03-08 16:06:19 +05:30
Suraj Patil
f6e74a63ca
Add m2m100 ( #10236 )
...
* m2m_100
* no layernorm_embedding
* sinusoidal positional embeddings
* update pos embeddings
* add default config values
* tokenizer
* add conversion script
* fix config
* fix pos embed
* remove _float_tensor
* update tokenizer
* update lang codes
* handle lang codes
* fix pos embeds
* fix spm key
* put embedding weights on device
* remove qa and seq classification heads
* fix convert script
* lang codes pn one line
* fix embeds
* fix tokenizer
* fix tokenizer
* add fast tokenizer
* style
* M2M100MT => M2M100
* fix copyright, style
* tokenizer converter
* vocab file
* remove fast tokenizer
* fix embeds
* fix tokenizer
* fix tests
* add tokenizer tests
* add integration test
* quality
* fix model name
* fix test
* doc
* doc
* fix doc
* add copied from statements
* fix tokenizer tests
* apply review suggestions
* fix urls
* fix shift_tokens_right
* apply review suggestions
* fix
* fix doc
* add lang code to id
* remove unused function
* update checkpoint names
* fix copy
* fix tokenizer
* fix checkpoint names
* fix merge issue
* style
2021-03-06 22:14:16 +05:30
Stas Bekman
88a951e3cc
offline mode for firewalled envs ( #10407 )
...
* offline mode start
* add specific values
* fix fallback
* add test
* better values check and range
* test that actually works
* document the offline mode
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* more strict check
* cleaner test
* pt-only test
* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
2021-03-05 17:27:48 -08:00
Lysandre Debut
6b58e15507
Fix torch 1.8.0 segmentation fault ( #10546 )
...
* Only run one test
* Patch segfault
* Fix summarization pipeline
* Ready for merge
2021-03-05 12:10:19 -05:00
Nicolas Patry
54e55b52d4
Fixing conversation test for torch 1.8 ( #10545 )
2021-03-05 09:24:14 -05:00
Patrick von Platen
c503a1c15e
[ProphetNet] Bart-like Refactor ( #10501 )
...
* first step to refactor
* make all fast tests pass
* make all slow tests pass
* save intermediate
* correct cache
* finish PR
* make fp16 work
2021-03-04 23:27:12 +03:00
Sylvain Gugger
6290169eb3
Rework TPU checkpointing in Trainer ( #10504 )
...
* Rework TPU checkpointing in Trainer
* Wraps the barrier in a dist test
* Address review comments
* Remove line
2021-03-04 11:46:11 -05:00
Mehrad Moradshahi
1750e62900
Generate can return cross-attention weights too ( #10493 )
2021-03-03 13:57:02 +05:30
Patrick von Platen
0234de8418
Add Fine-Tuning for Wav2Vec2 ( #10145 )
...
* add encode labels function to tokenizer
* start adding finetuning
* init dropout
* upload
* correct convert script
* apply changes
* fix second typo
* make first dummy training run
* adapt convert script
* push confg for comparison
* remove conf
* finish training
* adapt data collator
* add research folder
* update according to fairseq feedback
* some minor corrections
* refactor masking indices a bit
* some minor changes
* clean tokenizer
* finish clean-up
* remove previous logic
* update run script
* correct training
* finish changes
* finish model
* correct bug
* fix training a bit more
* add some tests
* finish gradient checkpointing
* finish example
* correct gradient checkpointing
* improve tokenization method
* revert changes in tokenizer
* revert general change
* adapt fine-tuning
* update
* save intermediate test
* Update README.md
* finish finetuning
* delete conversion script
* Update src/transformers/models/wav2vec2/configuration_wav2vec2.py
* Update src/transformers/models/wav2vec2/processing_wav2vec2.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
* finish wav2vec2 script
* finish wav2vec2 fine-tuning
* finalize test
* correct test
* adapt tests
* finish
* remove test file
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
2021-03-01 12:13:17 +03:00
Tanmay Garg
256482ac92
Introduce save_strategy training argument ( #10286 )
...
* Introduce save_strategy training argument
* deprecate EvaluationStrategy
* collapse EvaluationStrategy and LoggingStrategy into a single
IntervalStrategy enum
* modify tests to use modified enum
2021-02-27 19:34:22 -05:00
Kai Fricke
98569d4ba2
Add Ray Tune hyperparameter search integration test ( #10414 )
2021-02-26 10:18:33 -05:00
Julien Chaumond
83d2d55c94
[ci, flax] non-existing models are unlikely to pass tests ( #10409 )
...
😂
2021-02-26 12:35:36 +03:00
Sylvain Gugger
26f8b2cb10
Make Barthez tokenizer tests a bit faster ( #10399 )
...
* Make Barthez tokenizer tests a bit faster
* Quality
2021-02-25 11:42:25 -05:00
Sehoon Kim
63645b3b11
I-BERT model support ( #10153 )
...
* IBertConfig, IBertTokentizer added
* IBert Model names moified
* tokenizer bugfix
* embedding -> QuantEmbedding
* quant utils added
* quant_mode added to configuration
* QuantAct added, Embedding layer + QuantAct addition
* QuantAct added
* unused path removed, QKV quantized
* self attention layer all quantized, except softmax
* temporarl commit
* all liner layers quantized
* quant_utils bugfix
* bugfix: requantization missing
* IntGELU added
* IntSoftmax added
* LayerNorm implemented
* LayerNorm implemented all
* names changed: roberta->ibert
* config not inherit from ROberta
* No support for CausalLM
* static quantization added, quantize_model.py removed
* import modules uncommented
* copyrights fixed
* minor bugfix
* quant_modules, quant_utils merged as one file
* import * fixed
* unused runfile removed
* make style run
* configutration.py docstring fixed
* refactoring: comments removed, function name fixed
* unused dependency removed
* typo fixed
* comments(Copied from), assertion string added
* refactoring: super(..) -> super(), etc.
* refactoring
* refarctoring
* make style
* refactoring
* cuda -> to(x.device)
* weight initialization removed
* QuantLinear set_param removed
* QuantEmbedding set_param removed
* IntLayerNorm set_param removed
* assert string added
* assertion error message fixed
* is_decoder removed
* enc-dec arguments/functions removed
* Converter removed
* quant_modules docstring fixed
* conver_slow_tokenizer rolled back
* quant_utils docstring fixed
* unused aruments e.g. use_cache removed from config
* weight initialization condition fixed
* x_min, x_max initialized with small values to avoid div-zero exceptions
* testing code for ibert
* test emb, linear, gelu, softmax added
* test ln and act added
* style reformatted
* force_dequant added
* error tests overrided
* make style
* Style + Docs
* force dequant tests added
* Fix fast tokenizer in init
* Fix doc
* Remove space
* docstring, IBertConfig, chunk_size
* test_modeling_ibert refactoring
* quant_modules.py refactoring
* e2e integration test added
* tokenizers removed
* IBertConfig added to tokenizer_auto.py
* bugfix
* fix docs & test
* fix style num 2
* final fixes
Co-authored-by: Sehoon Kim <sehoonkim@berkeley.edu >
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr >
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com >
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
2021-02-25 10:06:42 -05:00
Patrick von Platen
cb38ffcc5e
[PretrainedFeatureExtractor] + Wav2Vec2FeatureExtractor, Wav2Vec2Processor, Wav2Vec2Tokenizer ( #10324 )
...
* push to show
* small improvement
* small improvement
* Update src/transformers/feature_extraction_utils.py
* Update src/transformers/feature_extraction_utils.py
* implement base
* add common tests
* make all tests pass for wav2vec2
* make padding work & add more tests
* finalize feature extractor utils
* add call method to feature extraction
* finalize feature processor
* finish tokenizer
* finish general processor design
* finish tests
* typo
* remove bogus file
* finish docstring
* add docs
* finish docs
* small fix
* correct docs
* save intermediate
* load changes
* apply changes
* apply changes to doc
* change tests
* apply surajs recommend
* final changes
* Apply suggestions from code review
* fix typo
* fix import
* correct docstring
2021-02-25 17:42:46 +03:00
abhishek thakur
2d458b2c7d
ConvBERT fix torch <> tf weights conversion ( #10314 )
...
* convbert conversion test
* fin
* fin
* fin
* clean up tf<->pt conversion
* remove from_pt
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com >
2021-02-24 14:55:34 +03:00
Sylvain Gugger
9e147d31f6
Deprecate prepare_seq2seq_batch ( #10287 )
...
* Deprecate prepare_seq2seq_batch
* Fix last tests
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
Co-authored-by: Suraj Patil <surajp815@gmail.com >
* More review comments
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
Co-authored-by: Suraj Patil <surajp815@gmail.com >
2021-02-22 12:36:16 -05:00
Julien Plu
19e737b93e
Making TF Longformer-like models compliant with AMP ( #10233 )
...
* AMP
* Add LED
* Apply style
* Fix longformer
2021-02-22 15:41:56 +01:00
Pengcheng He
9a7e63729f
Integrate DeBERTa v2(the 1.5B model surpassed human performance on Su… ( #10018 )
...
* Integrate DeBERTa v2(the 1.5B model surpassed human performance on SuperGLUE); Add DeBERTa v2 900M,1.5B models;
* DeBERTa-v2
* Fix v2 model loading issue (#10129 )
* Doc members
* Update src/transformers/models/deberta/modeling_deberta.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* Address Sylvain's comments
* Address Patrick's comments
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
* Style
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr >
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
2021-02-19 18:34:44 -05:00
Julien Plu
34df26ec3a
Making TF OpenAI GPT model compliant with AMP and XLA ( #10261 )
...
* Fix AMP and XLA
* Remove useless var
2021-02-19 09:33:25 -05:00
Julien Plu
3e116ed331
Making TF TransfoXL model compliant with AMP ( #10264 )
...
* Fix AMP
* Apply style
* Remove unused import
2021-02-19 06:58:07 -05:00
Julien Plu
86caeb7636
Fix XLA and AMP ( #10262 )
2021-02-19 06:57:16 -05:00
Julien Plu
3d72d47f09
Making TF MPNet model compliant with XLA ( #10260 )
...
* Fix XLA
* Rework cast
* Apply style
2021-02-19 06:56:41 -05:00
Julien Plu
fb56bf2584
Making TF MobileBert model compliant with AMP ( #10259 )
...
* Fix AMP
* Trigger CI
* Rework cast
2021-02-19 06:55:25 -05:00
Julien Plu
2fc6284f04
Making TF Lxmert model compliant with AMP ( #10257 )
...
* Fix AMP
* Rework cast
* Apply style
2021-02-19 06:54:14 -05:00
Stas Bekman
4eddc459a9
[trainer] implement support for full fp16 in evaluation/predict ( #10268 )
...
* implement --fp16_full_eval
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* style
* add test
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
2021-02-18 17:02:35 -08:00
Stas Bekman
d9a81fc0c5
fix func signature ( #10271 )
2021-02-18 16:44:42 -08:00
Stas Bekman
97e688bc22
[Trainer] memory tracker metrics ( #10225 )
...
* memory tracker metrics
* go back to eval for somewhat consistency
* handle no-gpu case
* deal with stackable eval calls
* restore callback order
* style
* simplify the API
* add test
* docs
* consistently use eval_ prefix
* improve docs
* Update src/transformers/trainer_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* rename method
* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
2021-02-18 09:27:32 -08:00
Julien Plu
2acae50a0c
Reduce the time spent for the TF slow tests ( #10152 )
...
* rework savedmodel slow test
* Improve savedmodel tests
* Remove useless content
2021-02-18 15:52:57 +01:00
Julien Plu
14ed3b978e
Fix AMP ( #10216 )
2021-02-18 06:29:43 -05:00
Julien Plu
bdf1669e3f
Making TF GPT2 compliant with XLA and AMP ( #10230 )
...
* Fix XLA and AMP
* Fix AMP and XLA
* Apply style
* Apply Patrick's comment
2021-02-18 09:36:01 +01:00
Julien Plu
7246785a67
Make TF CTRL compliant with XLA and AMP ( #10209 )
...
* Fix XLA and AMP
* Apply style
* Remove useless cast
2021-02-17 18:54:15 +01:00
Julien Plu
fdb2351ebb
Making TF XLM-like models XLA and AMP compliant ( #10211 )
...
* Fix Flaubert and XLM
* Remove useless cast
* Tiny fix
* Tiny fix
2021-02-17 18:02:48 +01:00
Julien Plu
83d803ba02
Making TF BART-like models XLA and AMP compliant ( #10191 )
...
* Update BART
* Update Blenderbot
* Update BlenderbotSmall
* Update Marian
* Update MBart
* Update MBart
* Update Pegasus
* Update template
* Fix Marian and Pegasus
* Apply style
* Default initializer
* Default initializer
* Default initializer
* Remove int32 casts
* Fix template
* Remove more cast
2021-02-17 17:48:56 +01:00
Daniel Stancl
8d79e5ca49
Fix head masking for TFT5 ( #9877 )
...
* Fix head_mask and decoder_head_mask in TFT5 models
* Enable test_headmasking both fot TFT5 tester
and TFT5EncoderOnly tester
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com >
2021-02-17 19:00:09 +03:00
Sylvain Gugger
7169d1ea7b
Store FLOS as floats to avoid overflow. ( #10213 )
2021-02-16 11:15:15 -05:00