* Make forward pass work
* More improvements
* Remove unused imports
* Remove timm dependency
* Improve loss calculation of token classifier
* Fix most tests
* Add docs
* Add model integration test
* Make all tests pass
* Add LayoutLMv3FeatureExtractor
* Improve integration test + make fixup
* Add example script
* Fix style
* Add LayoutLMv3Processor
* Fix style
* Add option to add visual labels
* Make more tokenizer tests pass
* Fix more tests
* Make more tests pass
* Fix bug and improve docs
* Fix import of processors
* Improve docstrings
* Fix toctree and improve docs
* Fix auto tokenizer
* Move tests to model folder
* Move tests to model folder
* change default behavior add_prefix_space
* add prefix space for fast
* add_prefix_spcae set to True for Fast
* no space before `unique_no_split` token
* add test to hightligh special treatment of added tokens
* fix `test_batch_encode_dynamic_overflowing` by building a long enough example
* fix `test_full_tokenizer` with add_prefix_token
* Fix tokenizer integration test
* Make the code more readable
* Add tests for LayoutLMv3Processor
* Fix style
* Add model to README and update init
* Apply suggestions from code review
* Replace asserts by value errors
* Add suggestion by @ducviet00
* Add model to doc tests
* Simplify script
* Improve README
* a step ahead to fix
* Update pair_input_test
* Make all tokenizer tests pass - phew
* Make style
* Add LayoutLMv3 to CI job
* Fix auto mapping
* Fix CI job name
* Make all processor tests pass
* Make tests of LayoutLMv2 and LayoutXLM consistent
* Add copied from statements to fast tokenizer
* Add copied from statements to slow tokenizer
* Remove add_visual_labels attribute
* Fix tests
* Add link to notebooks
* Improve docs of LayoutLMv3Processor
* Fix reference to section
Co-authored-by: SaulLu <lucilesaul.com@gmail.com>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* average loss over batches and accumulated steps for tracking
* fix layernorm weight decay
* use AdamW from Pytorch instead of Transformers
* add shuffling of sequences inside the batches
* add shuffling of sequences inside the batches
* add logging dir and reformat code
* fix lr tracking
* remove Mistral scaling
* keep Mistral scaling
* reformat code
* fix error
* fix error
* use shuffling function from Pytorch
* remove argument for shuffling batch sequences as it isn't optional
* update package versions and install accelerate from source
* remove unused package
* Update loss average over accumulated steps
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* Update loss average over accumulated steps
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* use one shuffle buffer argument
* compute avg_loss in one line
Co-authored-by: Loubna ben allal <loubnabenallal@gmail.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* Fix length in no_trainer examples
* Add setup and teardown
* Use new accelerator config generator to automatically make tests able to run based on environment
* Add information gain filtration algorithm
* Complying with black requirements
* Added author
* Fixed import order
* flake8 corrections
Co-authored-by: Javier Turek <javier.turek@intel.com>
- Add --ignore_mismatched_sizes argument to classification examples
- Expand the error message when loading a model whose head dimensions are different from expected dimensions
* fixed bug run_mlm_flax_stream.py
Fixed bug caused by an update to tokenizer keys introduced in recent transformers versions (between `4.6.2` and `4.18.0`) where additional keys were introduced to the tokenizer output.
* Update run_mlm_flax_stream.py
* adding missing paranthesis
* formatted to black
* remove cols from dataset instead
* reformat to black
* moved rem. columns to map
* formatted to black
Co-authored-by: KennethEnevoldsen <kennethcenevolsen@gmail.com>
1. Fixes evaluation errors popping up when you train/eval on squad v2 (one was newly encountered and one that was previously reported Running SQuAD 1.0 sample command raises IndexError #15401 but not completely fixed).
2. Removes boolean arguments that don't use store_true. Please, don't use these: *ANY non-empty string is being converted to True in this case and this clearly is not the desired behavior (and it creates a LOT of confusion).
3. All no-trainer test scripts are now saving metric values in the same way (with the right prefix eval_), which is consistent with the trainer-based versions.
4. Adds forgotten model.eval() in the no-trainer versions. This improved some results, but not everything (see the discussion in the end). Please, see the F1 scores and the discussion below.
* Add first draft
* Improve script and README
* Improve README
* Apply suggestions from code review
* Improve script, add link to resulting model
* Add corresponding test
* Adjust learning rate
* add tflops logging and fix grad accumulation
* add accelerate tracking and checkpointing
* scale loss of last batch correctly
* fix typo
* compress loss computation
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* add resume from checkpoint argument
* add load_state accelerate from checkpoint, register lr scheduler and add tflops function
* reformat code
* reformat code
* add condition on path for resume checkpoint
* combine if conditions
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* add source for tflops formula
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* Correct Logging of Eval metric to Tensorboard
An empty dictionary ``eval_metrics`` was being logged, is replaced by ``eval_metric`` which is the output dictionary of ``metric.compute()``.
* Remove unused variable
* Add first draft
* Improve README and run fixup
* Make script aligned with other scripts, improve README
* Improve script and add test
* Remove print statement
* Apply suggestions from code review
* Add num_labels to make test pass
* Improve README
* begin do_init
* add params_shape_tree
* raise error if params are accessed when do_init is False
* don't allow do_init=False when keys are missing
* make shape tree a property
* assign self._params at the end
* add test for do_init
* add do_init arg to all flax models
* fix param setting
* disbale do_init for composite models
* update test
* add do_init in FlaxBigBirdForMultipleChoice
* better names and errors
* improve test
* style
* add a warning when do_init=False
* remove extra if
* set params after _required_params
* add test for from_pretrained
* do_init => _do_init
* chage warning to info
* fix typo
* add params in init_weights
* add params to gpt neo init
* add params to init_weights
* update do_init test
* Trigger CI
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* update template
* trigger CI
* style
* style
* fix template
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add first draft from previous PR
* First draft
* Improve README and remove num_labels
* Make script more aligned with other scripts
* Improve README and apply suggestion from code review
* Change tracking to store_true
* Remove step param and use it in the log dictionary directly
* use vars(args) when passing args to init_trackers
* Include tracking tests since tensorboard is already a dep
* Fix t5 shard on TPU Pods
The current script doesn't work properly on a TPU pod because the global batch is not divided correctly per host.
This pull request fixes this issue by dividing the global batch to each host before it is shared on each host.
* fix style
Co-authored-by: ahmed-elnaggar <ahmed.elnaggar@allianz.com>