Files
HuggingFace_transformer/tests
Sidd Karamcheti 3a8de58c51 Add Mistral GPT-2 Stability Tweaks (#13573)
* Add layer-wise scaling

* Add reorder & upcasting argument

* Add OpenAI GPT-2 weight initialization scheme

* start `layer_idx` count at zero for consistency

* disentangle attn and reordered and upscaled attn function

* rename `scale_attn_by_layer` to `scale_attn_by_layer_id`

* make autocast from amp compatible with pytorch<1.6

* fix docstring

* style fixes

* Add fixes from PR feedback, style tweaks

* Fix doc whitespace

* Reformat

* First pass scale_attn_by_layer_idx and reorder_and_upcast_attn tests

* Rename scale_attn_by_layer_idx, add tip

* Remove extra newline

* add test for weight initialization

* update code format

* add assert check weights are fp32

* remove assert

* Fix incorrect merge

* Fix shape mismatch in baddbmm

* Add generation test for Mistral flags

Co-authored-by: leandro <leandro.vonwerra@spoud.io>
Co-authored-by: Keshav Santhanam <keshav2@stanford.edu>
Co-authored-by: J38 <jebolton@stanford.edu>
2021-10-04 07:37:09 -04:00
..
2020-12-07 18:36:34 -05:00
2020-12-07 18:36:34 -05:00
2021-05-12 13:48:15 +05:30
2021-01-27 21:25:11 +03:00
2020-12-07 18:36:34 -05:00
2021-09-06 16:09:24 +02:00
2021-09-25 21:20:21 +02:00
2021-09-22 00:28:43 +02:00
2021-09-20 13:24:30 +02:00
2020-12-07 18:36:34 -05:00
2020-12-07 18:36:34 -05:00
2021-08-30 06:02:08 -04:00
2021-01-27 21:25:11 +03:00
2021-05-05 12:38:01 +02:00
2021-06-01 19:07:37 +01:00
2020-12-07 18:36:34 -05:00
2021-05-12 13:48:15 +05:30
2021-09-24 08:57:49 -04:00
2020-12-07 18:36:34 -05:00
2020-12-07 18:36:34 -05:00
2021-04-26 13:50:34 +02:00