Sidd Karamcheti
3a8de58c51
Add Mistral GPT-2 Stability Tweaks (#13573)
* Add layer-wise scaling
* Add reorder & upcasting argument
* Add OpenAI GPT-2 weight initialization scheme
* start `layer_idx` count at zero for consistency
* disentangle attn and reordered and upscaled attn function
* rename `scale_attn_by_layer` to `scale_attn_by_layer_id`
* make autocast from amp compatible with pytorch<1.6
* fix docstring
* style fixes
* Add fixes from PR feedback, style tweaks
* Fix doc whitespace
* Reformat
* First pass scale_attn_by_layer_idx and reorder_and_upcast_attn tests
* Rename scale_attn_by_layer_idx, add tip
* Remove extra newline
* add test for weight initialization
* update code format
* add assert check weights are fp32
* remove assert
* Fix incorrect merge
* Fix shape mismatch in baddbmm
* Add generation test for Mistral flags
Co-authored-by: leandro <leandro.vonwerra@spoud.io>
Co-authored-by: Keshav Santhanam <keshav2@stanford.edu>
Co-authored-by: J38 <jebolton@stanford.edu>
2021-10-04 07:37:09 -04:00
..
2021-09-30 11:58:33 -04:00
2021-07-09 17:39:09 -07:00
2021-08-12 05:49:02 -04:00
2021-09-29 12:47:35 +01:00
2021-10-04 07:37:09 -04:00
2021-06-14 13:34:32 -07:00
2021-09-10 14:47:48 +02:00
2021-06-16 15:14:53 -04:00
2021-01-05 06:18:48 -05:00
2021-08-10 09:36:44 +02:00
2021-09-27 14:19:38 -04:00
2020-06-17 14:01:10 -04:00
2021-04-21 11:11:20 -04:00
2021-03-26 08:07:59 -04:00
2021-07-21 09:36:02 -07:00
2020-05-27 11:36:55 -04:00
2021-04-05 10:51:16 -04:00
2020-02-25 13:48:24 -05:00
2021-06-15 06:37:37 -04:00
2021-09-29 12:47:35 +01:00
2021-06-23 09:45:30 -04:00
2021-06-14 13:34:32 -07:00
2021-09-14 18:07:36 +02:00
2021-06-15 06:37:37 -04:00
2021-04-21 11:11:20 -04:00
2020-04-06 14:32:39 -04:00
2021-08-31 06:49:05 -04:00
2021-09-22 07:51:38 -04:00
2021-09-13 13:32:32 +02:00
2020-12-23 10:15:49 -05:00
2021-08-31 06:28:37 -04:00
2021-03-30 11:15:55 -04:00
2021-09-10 14:47:48 +02:00
2021-07-12 18:02:51 +02:00
2021-09-29 06:50:15 -07:00
2021-08-12 03:42:25 -04:00
2021-09-30 09:26:49 -07:00
2021-06-15 06:37:37 -04:00
2021-09-20 12:31:46 -04:00
2021-04-14 08:39:23 -07:00