Cyril Vallez
6630c5b714
Add xlstm model (#39665)
* Add xLSTM cleanly with optimizations.
* Fix style.
* Fix modeling test.
* Make xLSTM package optional.
* Fix: Update torch version check.
* Fix: Bad variable naming in test.
* Fix: Import structure cleaning with Ruff.
* Fix: Update docstrings.
* Fix: Mitigate unused config attr tests by explicit usage.
* Fix: Skip tests, if xlstm library is not installed.
* Feat: Enable longer context window for inference by chunking.
* Fix: Make training test pass by lowering target accuracy.
* Chore: Increase test verbosity for failing generation test.
* Update docs/source/en/model_doc/xlstm.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Fix: Make xlstm available even without CUDA.
* Chore: Remove unnecessary import.
* Fix: Remove BOS insertion.
* Chore: Improve xLSTMCache documentation.
* Integrate basic xLSTM fallback code.
* Chore: Remove unnecessary import.
* Chore: Remove duplicate LayerNorm.
* chore: update copyright, minor reformatting
* fix: refactor mLSTMStateType due to missing torch import
* fix: add missing import
* Chore: Replace einops.
* fix: apply ruff formatting
* fix: run `make fix-copies` to re-generate dummy_pt_objects.py
* fix: make type hints Python 3.9 compatible
* fix: remove obsolete import
* fix: remove obsolete method from docs
* chore: remove obsolete `force_bos_token_insert` from config
* Chore: Remove duplicated xLSTMCache class.
* Fix: Formatting of modeling_xlstm.py
* Chore: Remove xlstm package requirement from test. Re-add update_rnn_state.
* Fix: Update xLSTMCache docstring.
* Feat: Add proper initialization of xLSTM.
* Chore: Re-format files.
* Chore: Adapt format.
* Fix: xLSTMCache import restructuring.
* Fix: Add __all__ lists to modeling and configuration files.
* Chore: Reformat.
* Fix: Remove unnecessary update_rnn_state function.
* Fix: Undo test accuracy quickfix.
* Fix: Update copyright year, remvoe config copy.
* Chore: Flatten all internal configs to xLSTMConfig.
* Fix: Unused config variables check.
* Chore: Remove unnecessary imports.
* Fix: Unify xlstm cache argument from batch_size to max_batch_size.
* Chore: Remove bad default arg value for xLSTMCache.
* Chore: Rename core configuration arguments to HF default in xLSTM.
* Chore: Fix formatting.
* Fix: xLSTM Cache config access.
* Fix: Update xlstm tests for config update.
* Feat: Re-add embbeding_dim, num_blocks config options for compat with xLSTM-7B.
* Fix: Configuration xLSTM python3.9 syntax.
* Fix: Difference to main in test_utils.py assertion.
* Fix: Bad syntax in xlstm config for python3.9.
* Fix: xLSTMConfig docstring.
* Fix: xLSTMConfig docstring.
* Fix typing issues in xLSTM and BeiT, Paligemma.
* Fix: Exclude xLSTM from test cache utils.
* Chore: Fix style.
* Chore: Fix format.
* Chore: Remove unnecessary LayerNorm, NormLayer layer abstractions.
* Chore: Remove asserts and replace with ValueErrors.
* Chore: Update __init__.py structure of xLSTM.
* Chore: Clean xLSTM initialization of weights.
* Fix index names in modeling_xlstm.py
* Update xlstm model test typing annotations.
* Fix: Remove all asserts.
* Revert changes to the main __init__.py
* Fix: Move xLSTMCache to modeling_xlstm.py
* Fix: Remove xLSTMForCausalLM mapping from modeling_auto.py
* Remove xLSTMCache from dummy_pt_objects.py
* Fix: Remove extended torchdynamo compilation check integrating cuda graph captures.
* Revert test_cache_utils.py xLSTM change.
* Fix: Move xLSTM init functions before init call.
* Remove xLSTMCache from generation utils.
* Fix: Clean xLSTM init functionality for recursive calls.
* Fix: Move xLSTMCache before its first call.
* Fix formatting.
* Add partial docstring for xLSTMModel forward.
* Fix xLSTMCache docstring in xLSTMModel.
* Remove xLSTMCache from public documentation. Update auto_docstring.
* Remove all agressive shape comments
* style
* Fix names
* simplify
* remove output_hidden_states
* Update modeling_xlstm.py
* Update modeling_xlstm.py
* Update test_modeling_xlstm.py
* Update modeling_xlstm.py
* Update modeling_xlstm.py
* fix
* fix
* style
* style
---------
Co-authored-by: Korbinian Poeppel <korbinian.poeppel@nx-ai.com>
Co-authored-by: Korbinian Pöppel <37810656+kpoeppel@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Sebastian Böck <sebastian.boeck@nx-ai.com>
Co-authored-by: Korbinian Poeppel <poeppel@ml.jku.at>
2025-07-25 19:39:17 +02:00
..
2025-07-22 16:10:25 +02:00
2025-07-23 11:41:10 +02:00
2025-07-25 19:39:17 +02:00
2025-07-23 11:41:10 +02:00
2025-05-23 16:39:47 +00:00
2025-07-25 14:10:04 +02:00
2024-11-28 16:04:05 +01:00
2024-05-28 18:29:22 +02:00
2025-07-25 19:39:17 +02:00
2025-06-13 15:32:40 +00:00
2025-05-30 16:05:07 +00:00
2025-06-26 12:25:14 -07:00
2025-03-03 10:33:46 -08:00
2025-07-03 17:04:16 +01:00
2025-07-22 15:06:43 +02:00
2025-07-18 18:00:34 +00:00
2025-03-03 10:33:46 -08:00
2025-07-22 16:10:25 +02:00
2025-03-07 13:09:02 +00:00
2025-06-30 07:56:55 -07:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2024-09-09 10:47:24 +02:00
2025-07-03 17:04:16 +01:00
2025-07-25 14:10:04 +02:00
2025-03-03 10:33:46 -08:00
2025-03-04 13:47:41 +00:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-06-30 07:56:55 -07:00
2025-03-03 10:33:46 -08:00
2025-03-04 13:47:41 +00:00
2025-06-20 17:36:57 +01:00
2025-03-11 15:29:14 +01:00
2025-06-13 11:07:09 +00:00
2025-05-19 10:37:54 -07:00
2025-06-05 14:07:23 -07:00
2025-05-12 11:55:51 +02:00
2025-06-13 12:02:27 -07:00
2025-04-07 15:19:47 +02:00
2025-07-22 16:10:25 +02:00
2025-07-22 15:06:43 +02:00
2025-06-13 11:07:09 +00:00
2025-06-17 19:37:18 +01:00
2025-06-13 11:07:09 +00:00
2025-07-25 14:10:04 +02:00
2025-06-25 14:55:22 +00:00
2025-07-23 12:43:11 +00:00
2025-07-16 13:35:53 +02:00
2024-09-09 10:47:24 +02:00
2025-03-03 10:33:46 -08:00
2025-07-16 12:15:15 -07:00
2025-06-06 20:04:44 +02:00
2025-06-30 08:54:05 -07:00
2025-07-22 15:06:43 +02:00
2025-03-03 10:33:46 -08:00
2025-06-06 20:04:44 +02:00
2025-06-06 20:04:44 +02:00
2025-04-29 13:28:06 -07:00
2025-06-26 14:40:45 -07:00
2025-06-23 12:33:10 -07:00
2025-03-03 10:33:46 -08:00
2024-11-26 09:23:34 -08:00
2023-11-06 19:45:03 +00:00
2025-03-03 10:33:46 -08:00
2025-03-04 13:47:41 +00:00
2025-04-15 08:35:05 -07:00
2024-09-09 10:47:24 +02:00
2025-05-19 13:16:35 +00:00
2025-07-25 14:10:04 +02:00
2025-03-11 13:47:38 +00:00
2025-03-03 10:33:46 -08:00
2025-07-22 15:31:10 +02:00
2025-06-25 17:29:10 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-07-03 17:04:16 +01:00
2025-05-08 16:47:45 +01:00
2025-07-14 09:25:06 -07:00
2025-07-23 12:43:11 +00:00
2025-07-22 16:44:08 +00:00
2024-02-16 08:16:58 +01:00
2025-05-12 11:55:51 +02:00