Files
HuggingFace_transformer/docs/source/model_doc
Sidd Karamcheti 3a8de58c51 Add Mistral GPT-2 Stability Tweaks (#13573)
* Add layer-wise scaling

* Add reorder & upcasting argument

* Add OpenAI GPT-2 weight initialization scheme

* start `layer_idx` count at zero for consistency

* disentangle attn and reordered and upscaled attn function

* rename `scale_attn_by_layer` to `scale_attn_by_layer_id`

* make autocast from amp compatible with pytorch<1.6

* fix docstring

* style fixes

* Add fixes from PR feedback, style tweaks

* Fix doc whitespace

* Reformat

* First pass scale_attn_by_layer_idx and reorder_and_upcast_attn tests

* Rename scale_attn_by_layer_idx, add tip

* Remove extra newline

* add test for weight initialization

* update code format

* add assert check weights are fp32

* remove assert

* Fix incorrect merge

* Fix shape mismatch in baddbmm

* Add generation test for Mistral flags

Co-authored-by: leandro <leandro.vonwerra@spoud.io>
Co-authored-by: Keshav Santhanam <keshav2@stanford.edu>
Co-authored-by: J38 <jebolton@stanford.edu>
2021-10-04 07:37:09 -04:00
..
2021-08-30 17:29:27 +02:00
2021-09-21 08:50:33 +02:00
2021-06-14 15:16:08 +05:30
2021-04-21 11:11:20 -04:00
2021-09-21 13:34:19 +02:00
2021-08-04 16:31:23 +02:00
2021-06-14 20:01:03 +01:00
2021-04-21 09:47:27 -04:00
2021-09-01 15:05:40 +02:00
2021-07-20 09:28:50 -04:00
2021-04-21 09:47:27 -04:00
2021-04-21 09:47:27 -04:00
2021-08-31 06:32:47 -04:00
2021-08-12 05:01:26 -04:00
2021-04-21 09:47:27 -04:00
2021-06-17 10:37:54 -04:00
2021-01-27 21:25:11 +03:00
2021-09-30 18:55:20 +02:00
2021-05-04 20:56:09 +02:00
2021-09-20 13:24:30 +02:00
2021-04-21 09:47:27 -04:00
2021-07-06 18:55:18 +05:30
2021-04-21 09:47:27 -04:00
2021-10-04 12:30:50 +02:00
2021-09-02 09:46:05 +02:00
2021-05-03 09:07:29 -04:00
2021-09-02 11:32:18 +02:00
2020-12-10 09:29:38 -05:00
2021-09-20 07:53:31 -04:00
2020-12-07 18:36:34 -05:00
2021-04-21 09:47:27 -04:00
2021-04-21 11:11:20 -04:00
2021-08-17 08:29:01 -04:00
2021-09-01 15:05:40 +02:00
2021-09-01 15:05:40 +02:00
2021-08-13 11:44:04 +05:30
2021-08-26 17:25:20 +02:00
2021-04-21 09:47:27 -04:00
2020-12-07 18:36:34 -05:00
2021-04-21 11:11:20 -04:00