Younes Belkada
163ac3d3ee
Add Switch transformers (#19323)
* first commit
* add more comments
* add router v1
* clean up
- remove `tf` modeling files
* clean up
- remove `tf` modeling files
* clean up
* v0 routers
* added more router
- Implemented `ExpertsChooseMaskedRouter`
- added tests
- 2 more routers to implement
* last router
* improved docstring
- completed the docstring in `router.py`
- added more args in the config
* v0 sparse mlp
* replace wrong naming
* forward pass run
* update MOE layer
* small router update
* fixup
* consistency
* remove scatter router
* remove abstract layer
* update test and model for integration testing
* v1 conversion
* update
* hardcode hack
* all keys match
* add gin conversion, without additional libraries
* update conversion sctipy
* delete router file
* update tests wrt router deletion
* fix router issues
* update expert code
* update, logits match, code needsREFACTORING
* Refactor code
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>
* add generate tests
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
* add support for router loss
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>
* fix forward error
* refactor a bit
* remove `FlaxSwitchTransformers` modules
* more tests pass
* Update code
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>
* fixup
* fix tests
* fix doc
* fix doc + tokenization
* fix tokenizer test
* fix test
* fix loss output
* update code for backward pass
* add loss support
* update documentation
* fix documentation, clean tokenizer
* more doc fix, cleanup example_switch
* fix failing test
* fix test
* fix test
* fix loss issue
* move layer
* update doc and fix router capacity usage
* fixup
* add sparse mlp index for documentation on hub
* fixup
* test sparse mix architecture
* Apply suggestions from code review
* Update docs/source/en/model_doc/switch_transformers.mdx
* fixup on update
* fix tests
* fix another test
* attempt fix
* Update src/transformers/models/switch_transformers/configuration_switch_transformers.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/switch_transformers/convert_switch_transformers_original_flax_checkpoint_to_pytorch.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* try
* all tests pass
* fix jitter noise
* Apply suggestions from code review
* doc tests pass
* Update src/transformers/models/switch_transformers/modeling_switch_transformers.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/switch_transformers/modeling_switch_transformers.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* remove assert
* change config order
* fix readme japanese
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* remove parallelizable tests + add one liners
* remove ONNX config
* fix nits
- add `T5Tokenizer` in auto mapping
- remove `Switch Transformers` from ONNX supported models
* remove `_get_router`
* remove asserts
* add check in test for `router_dtype`
* add `SwitchTransformersConfig` in `run_pipeline_test`
* Update tests/pipelines/test_pipelines_summarization.py
* add huge model conversion script
* fix slow tests
- add better casting for `Linear8bitLt`
- remove `torchscript` tests
* add make dir
* style on new script
* fix nits
- doctest
- remove `_keys_to_ignore_on_load_unexpected`
* Update src/transformers/models/switch_transformers/configuration_switch_transformers.py
* add google as authors
* fix year
* remove last `assert` statements
* standardize vertical spaces
* fix failing import
* fix another failing test
* Remove strange àuthorized_keys`
* removing todo and padding that is never used
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: ybelkada <younes@huggingface.co>
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Arthur Zucker <arthur@huggingface.co>
2022-11-15 13:06:45 +01:00
..
2022-04-04 10:25:46 -04:00
2022-11-08 19:54:41 +00:00
2022-11-09 15:34:08 +00:00
2022-04-04 10:25:46 -04:00
2022-04-04 10:25:46 -04:00
2022-11-08 19:54:41 +00:00
2022-04-29 17:42:15 -04:00
2022-04-04 10:25:46 -04:00
2022-11-01 11:09:53 -07:00
2022-04-04 10:25:46 -04:00
2022-11-04 11:32:44 -04:00
2022-11-04 11:32:44 -04:00
2022-11-04 11:32:44 -04:00
2022-11-04 11:32:44 -04:00
2022-11-07 09:19:04 -05:00
2022-04-04 10:25:46 -04:00
2022-04-13 11:36:54 +02:00
2022-04-04 10:25:46 -04:00
2022-04-04 10:25:46 -04:00
2022-11-08 19:54:41 +00:00
2022-11-09 18:31:22 +01:00
2022-06-24 17:10:38 +02:00
2022-10-31 08:28:44 +01:00
2022-04-04 10:25:46 -04:00
2022-11-08 19:54:41 +00:00
2022-05-02 12:47:39 -04:00
2022-04-04 10:25:46 -04:00
2022-10-11 18:16:52 +01:00
2022-06-08 14:03:18 +01:00
2022-05-10 16:21:44 -04:00
2022-04-04 10:25:46 -04:00
2022-04-04 10:25:46 -04:00
2022-10-31 08:28:44 +01:00
2022-11-08 19:54:41 +00:00
2022-10-07 00:02:26 +02:00
2022-04-04 10:25:46 -04:00
2022-10-28 13:16:07 -07:00
2022-04-04 10:25:46 -04:00
2022-11-09 15:34:08 +00:00
2022-04-04 10:25:46 -04:00
2022-11-08 19:54:41 +00:00
2022-05-03 11:26:19 +02:00
2022-07-27 10:08:59 +02:00
2022-09-09 07:36:46 -04:00
2022-10-31 21:32:58 -04:00
2022-10-26 17:22:57 +02:00
2022-04-04 10:25:46 -04:00
2022-11-08 19:54:41 +00:00
2022-04-04 10:25:46 -04:00
2022-04-04 10:25:46 -04:00
2022-04-04 10:25:46 -04:00
2022-11-08 19:54:41 +00:00
2022-11-07 09:19:04 -05:00
2022-04-04 10:25:46 -04:00
2022-09-14 10:17:40 -04:00
2022-05-24 09:31:10 -04:00
2022-11-09 15:34:08 +00:00
2022-09-29 10:48:04 +01:00
2022-04-04 10:25:46 -04:00
2022-04-04 10:25:46 -04:00
2022-04-04 10:25:46 -04:00
2022-11-08 19:54:41 +00:00
2022-11-10 21:05:27 +01:00
2022-09-02 14:48:19 +02:00
2022-11-08 19:54:41 +00:00
2022-11-08 19:54:41 +00:00
2022-04-04 10:25:46 -04:00
2022-11-04 11:32:44 -04:00
2022-11-08 19:54:41 +00:00
2022-10-18 17:42:46 +02:00
2022-11-07 09:19:04 -05:00
2022-08-16 10:20:46 -05:00
2022-08-01 11:09:47 -04:00
2022-04-04 10:25:46 -04:00
2022-09-29 13:27:58 +02:00
2022-07-29 08:09:09 -04:00
2022-10-24 17:54:23 +02:00
2022-11-10 15:25:30 +01:00
2022-07-29 08:09:09 -04:00
2022-07-29 08:09:09 -04:00
2022-04-04 10:25:46 -04:00
2022-04-04 10:25:46 -04:00
2022-04-04 10:25:46 -04:00
2022-04-04 10:25:46 -04:00
2022-11-14 01:00:10 -05:00
2022-11-08 19:54:41 +00:00
2022-04-04 10:25:46 -04:00
2022-11-07 09:19:04 -05:00
2022-06-29 09:30:55 -04:00
2022-06-23 12:36:22 -04:00
2022-07-29 08:09:09 -04:00
2022-04-04 10:25:46 -04:00
2022-04-04 10:25:46 -04:00
2022-10-10 09:30:59 -04:00
2022-08-11 19:10:25 +03:00
2022-09-02 19:54:02 +02:00
2022-04-04 10:25:46 -04:00
2022-11-08 19:54:41 +00:00
2022-04-04 10:25:46 -04:00
2022-07-29 08:09:09 -04:00
2022-11-08 19:54:41 +00:00
2022-11-04 11:32:44 -04:00
2022-04-04 10:25:46 -04:00
2022-04-04 10:25:46 -04:00
2022-04-04 10:25:46 -04:00
2022-04-04 10:25:46 -04:00
2022-06-29 13:45:14 +01:00
2022-04-04 10:25:46 -04:00
2022-07-04 10:59:15 +01:00
2022-04-04 10:25:46 -04:00
2022-10-27 11:33:15 -07:00
2022-11-08 10:03:43 -05:00
2022-04-04 10:25:46 -04:00
2022-11-08 19:54:41 +00:00
2022-04-04 10:25:46 -04:00
2022-04-04 10:25:46 -04:00
2022-11-09 15:34:08 +00:00
2022-11-03 14:18:45 +01:00
2022-07-29 08:09:09 -04:00
2022-11-07 09:19:04 -05:00
2022-04-04 10:25:46 -04:00
2022-05-16 22:19:53 +01:00
2022-07-27 11:14:47 -04:00
2022-11-15 13:06:45 +01:00
2022-11-09 15:34:08 +00:00
2022-11-07 09:19:04 -05:00
2022-10-18 17:42:46 +02:00
2022-11-14 01:04:26 -05:00
2022-04-08 10:57:51 +02:00
2022-09-30 15:32:59 -04:00
2022-05-17 19:07:43 -04:00
2022-04-04 10:25:46 -04:00
2022-11-09 15:34:08 +00:00
2022-06-21 10:24:50 +02:00
2022-05-17 00:43:16 +02:00
2022-05-17 00:43:16 +02:00
2022-04-04 10:25:46 -04:00
2022-11-08 19:54:41 +00:00
2022-11-08 19:54:41 +00:00
2022-07-27 10:08:59 +02:00
2022-11-07 09:19:04 -05:00
2022-04-04 10:25:46 -04:00
2022-04-04 10:25:46 -04:00
2022-09-22 07:15:03 -04:00
2022-11-08 19:54:41 +00:00
2022-04-04 10:25:46 -04:00
2022-06-15 18:34:15 +02:00
2022-10-28 13:28:18 -07:00
2022-05-17 00:43:16 +02:00
2022-11-09 15:34:08 +00:00
2022-09-19 19:22:34 +02:00
2022-08-24 10:51:05 +01:00
2022-04-04 10:25:46 -04:00
2022-04-04 10:25:46 -04:00
2022-04-04 10:25:46 -04:00
2022-04-04 10:25:46 -04:00
2022-04-04 10:25:46 -04:00
2022-04-04 10:25:46 -04:00
2022-04-04 10:25:46 -04:00
2022-10-18 09:52:51 +02:00
2022-04-04 10:25:46 -04:00