only init encoder_attention_mask if stack is decoder

We currently initialize `encoder_attention_mask` when it is `None`, whether the stack is that of an encoder or a decoder. Since this may lead to bugs that are difficult to tracks down, I added a condition that assesses whether the current stack is a decoder.
2019-11-08 11:22:19 +01:00
parent 1c542df7e5
commit 28d0ba35d7
1 changed files with 1 additions and 1 deletions
--- a/transformers/modeling_bert.py
+++ b/transformers/modeling_bert.py
@@ -656,7 +656,7 @@ class BertModel(BertPreTrainedModel):
        if attention_mask is None:
            attention_mask = torch.ones(input_shape, device=device)
-        if encoder_attention_mask is None:
+        if self.config.is_decoder and encoder_attention_mask is None:
            encoder_attention_mask = torch.ones(input_shape, device=device)
        if token_type_ids is None:
            token_type_ids = torch.zeros(input_shape, dtype=torch.long, device=device)