Xlnet outputs (#5883)

Slightly breaking change, changes functionality for `use_cache` in XLNet: if use_cache is True and mem_len is 0 or None (which is the case in the base model config), the model behaves like GPT-2 and returns mems to be used as past in generation. At training time `use_cache` is overriden and always True.
2020-07-18 17:33:13 +02:00
parent a55809241f
commit 4b506a37e3
4 changed files with 131 additions and 47 deletions
--- a/src/transformers/configuration_utils.py
+++ b/src/transformers/configuration_utils.py
@@ -47,7 +47,7 @@ class PretrainedConfig(object):
                Whether or not the model should return all hidden-states.
            output_attentions (:obj:`bool`, `optional`, defaults to :obj:`False`):
                Whether or not the model should returns all attentions.
-            use_cache (:obj:`bool`, `optional`, defaults to :obj:`False`):
+            use_cache (:obj:`bool`, `optional`, defaults to :obj:`True`):
                Whether or not the model should return the last key/values attentions (not used by all models).
            return_tuple (:obj:`bool`, `optional`, defaults to :obj:`False`):
                Whether or not the model should return tuples instead of :obj:`ModelOutput` objects.