Files
HuggingFace_transformer/pytorch_pretrained_bert
Abhi Sharma 9e666aaa29 Fix gradient overflow issue during attention mask
This fix is in reference to issue #382. GPT2 can now be trained in mixed precision, which I've confirmed with testing. I also tested unconditional generation on multiple seeds before and after changing 1e10 to 1e4 and there was no difference. Please let me know if there is anything else I can do to make this pull request better. Thanks for all your work!
2019-04-16 11:42:34 -07:00
..
2019-02-17 23:38:51 +01:00
2019-04-15 15:43:01 +02:00
2019-03-18 15:13:35 +01:00
2019-03-18 13:18:07 +01:00
2019-04-15 12:55:38 +02:00
2019-04-15 14:24:52 +02:00