Add RWKV-4 (#22797)

* First draft of RWKV-4

* Add support for generate

* Style post-rebase

* Properly use state

* Write doc

* Fix doc

* More math

* Add model to README, dummies and clean config

* Fix init

* multiple fixes:

- fix common tests
- fix configuraion default values
- add CI test for checking state computation
- fix some CI tests

* correct tokenizer

* some tweaks

- fix config docstring
- fix failing tests

* fix CI tests

- add output_attention / output_hidden_states
- override test_initialization
- fix failing CIs

* fix conversion script

- fix sharded case
- add new arguments

* add slow tests + more fixes on conversion script

* add another test

* final fixes

* change single name variable

* add mock attention mask for pipeline to work

* correct eos token id

* fix nits

* add checkpoints

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add `tie_word_embeddings` in docstring

* change tensor name

* fix final nits

* Trigger CI

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
This commit is contained in:
Sylvain Gugger
2023-05-09 13:04:10 -04:00
committed by GitHub
parent 9a50cb6195
commit b4d4d6fe87
28 changed files with 2284 additions and 3 deletions

View File

@@ -93,15 +93,20 @@ config_common_kwargs = {
class ConfigTester(object):
def __init__(self, parent, config_class=None, has_text_modality=True, **kwargs):
def __init__(self, parent, config_class=None, has_text_modality=True, common_properties=None, **kwargs):
self.parent = parent
self.config_class = config_class
self.has_text_modality = has_text_modality
self.inputs_dict = kwargs
self.common_properties = common_properties
def create_and_test_config_common_properties(self):
config = self.config_class(**self.inputs_dict)
common_properties = ["hidden_size", "num_attention_heads", "num_hidden_layers"]
common_properties = (
["hidden_size", "num_attention_heads", "num_hidden_layers"]
if self.common_properties is None
else self.common_properties
)
# Add common fields for text models
if self.has_text_modality: