use functional interface for softmax in attention (#14198)

* use functional interface instead of instantiating module and immediately calling it

* fix torch.nn.functional to nn.functional. Thank you Stas!
This commit is contained in:
Thomas Viehmann
2021-11-30 17:47:33 +01:00
committed by GitHub
parent 4176bc161c
commit 6ed9882ddb
28 changed files with 29 additions and 29 deletions

View File

@@ -152,7 +152,7 @@ class BertSelfAttention(nn.Module):
attention_scores = attention_scores + attention_mask
# Normalize the attention scores to probabilities.
attention_probs = nn.Softmax(dim=-1)(attention_scores)
attention_probs = nn.functional.softmax(attention_scores, dim=-1)
# This is actually dropping out entire tokens to attend to, which might
# seem a bit unusual, but is taken from the original Transformer paper.