[Flax] Return Attention from BERT, ELECTRA, RoBERTa and GPT2 (#11918)

* Added logic to return attention from flax-bert model and added test cases to check that * Added new line at the end of file to test_modeling_flax_common.py * fixing code style * Fixing Roberta and Elextra models too from cpoying bert * Added temporary hack to not run test_attention_outputs for FlaxGPT2 * Returning attention weights from GPT2 and changed the tests accordingly. * last fixes * bump flax dependency Co-authored-by: jayendra <jayendra@infocusp.in> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-05-28 16:16:56 +05:30
parent e1205e478a
commit af1a10bff4
7 changed files with 89 additions and 53 deletions
--- a/setup.py
+++ b/setup.py
@@ -97,7 +97,7 @@ _deps = [
    "fastapi",
    "filelock",
    "flake8>=3.8.3",
-    "flax>=0.3.2",
+    "flax>=0.3.4",
    "fugashi>=1.0",
    "huggingface-hub==0.0.8",
    "importlib_metadata",