Sam Shleifer
9a687ebb77
[Marian Fixes] prevent predicting pad_token_id before softmax, support language codes, name multilingual models ( #4290 )
2020-05-13 17:29:41 -04:00
Julien Chaumond
4bf5042240
Fix BART tests on GPU ( #4298 )
2020-05-12 09:11:50 -04:00
Julien Chaumond
455c639093
CDN urls ( #4030 )
...
* [file_utils] use_cdn + documentation
* Move to cdn. urls for weights
* [urls] Hotfix for bert-base-japanese
2020-04-28 20:27:14 -04:00
Sam Shleifer
847e7f3379
MarianMTModel.from_pretrained('Helsinki-NLP/opus-marian-en-de') ( #3908 )
...
Co-Authored-By: Stefan Schweter <stefan@schweter.it >
2020-04-28 18:22:37 -04:00
Julien Chaumond
a946b6b51b
[housekeeping] Upgrade # type Python 2 syntax
...
cc @sshleifer
2020-04-23 10:39:24 -04:00
Patrick von Platen
01c37dcdb5
[Config, Caching] Remove output_past everywhere and replace by use_cache argument ( #3734 )
...
* remove output_past from pt
* make style
* add optional input length for gpt2
* add use cache to prepare input
* save memory in gpt2
* correct gpt2 test inputs
* make past input optional for gpt2
* finish use_cache for all models
* make style
* delete modeling_gpt2 change in test file
* correct docstring
* correct is true statements for gpt2
2020-04-14 14:40:28 -04:00
Sam Shleifer
7a7fdf71f8
Multilingual BART - ( #3602 )
...
- support mbart-en-ro weights
- add MBartTokenizer
2020-04-10 11:25:39 -04:00
Sam Shleifer
715aa5b135
[Bart] Replace config.output_past with use_cache kwarg ( #3632 )
2020-04-07 19:08:26 -04:00
Patrick von Platen
390c128592
[Encoder-Decoder] Force models outputs to always have batch_size as their first dim ( #3536 )
...
* solve conflicts
* improve comments
2020-04-02 15:18:33 +02:00
dougian
1f72865726
[BART] Update encoder and decoder on set_input_embedding ( #3501 )
...
Co-authored-by: Ioannis Douratsos <ioannisd@amazon.com >
2020-03-30 12:20:37 -04:00
Patrick von Platen
5b44e0a31b
[T5] Add training documenation ( #3507 )
...
* Add clear description of how to train T5
* correct docstring in T5
* correct typo
* correct docstring format
* update t5 model docs
* implement collins feedback
* fix typo and add more explanation for sentinal tokens
* delete unnecessary todos
2020-03-30 13:35:53 +02:00
Sam Shleifer
f6a23d1911
[BART] add bart-large-xsum weights ( #3422 )
2020-03-29 10:51:13 -04:00
Patrick von Platen
fa9af2468a
Add T5 to docs ( #3461 )
...
* add t5 docs basis
* improve docs
* add t5 docs
* improve t5 docstring
* add t5 tokenizer docstring
* finish docstring
* make style
* add pretrained models
* correct typo
* make examples work
* finalize docs
2020-03-27 10:57:16 -04:00
Sam Shleifer
3ee431dd4c
[Bart/Memory] Two separate, smaller decoder attention masks ( #3371 )
2020-03-26 21:34:15 -04:00
Sam Shleifer
63f4d8cad0
[Bart/Memory] SelfAttention only returns weights if config.outp… ( #3369 )
2020-03-26 18:42:39 -04:00
Sam Shleifer
2b2a2f8df2
[Bart] Fix: put dummy_inputs on correct device ( #3398 )
...
* Dummy inputs to model.device
* Move self.device to ModuleUtilsMixin
2020-03-26 18:42:09 -04:00
Sam Shleifer
1a5aefc95c
[Seq2Seq Generation] Call encoder before expanding input_ids ( #3370 )
2020-03-26 18:41:19 -04:00
Sam Shleifer
39371ee454
[Bart/Memory] don't create lm_head ( #3323 )
...
* delete lm_head, skips weight tying
* Fixed s3
2020-03-26 18:40:39 -04:00
Patrick von Platen
95e00d0808
Clean special token init in modeling_....py ( #3264 )
...
* make style
* fix conflicts
2020-03-20 21:41:04 +01:00
Patrick von Platen
bbf26c4e61
Support T5 Generation ( #3228 )
...
* fix conflicts
* update bart max length test
* correct spelling mistakes
* implemented model specific encode function
* fix merge conflicts
* better naming
* save intermediate state -> need to rethink strucuture a bit
* leave tf problem as it is for now
* current version
* add layers.pop
* remove ipdb
* make style
* clean return cut decoding
* remove ipdbs
* Fix restoring layers in the decoders that doesnt exists.
* push good intermediate solution for now
* fix conflicts
* always good to refuse to merge conflicts when rebasing
* fix small bug
* improve function calls
* remove unused file
* add correct scope behavior for t5_generate
Co-authored-by: Morgan Funtowicz <funtowiczmo@gmail.com >
2020-03-19 23:18:23 +01:00
Sam Shleifer
4e4403c9b4
[BART] torch 1.0 compatibility ( #3322 )
...
* config.activation_function
2020-03-19 11:56:54 -04:00
Sam Shleifer
ad7233fc01
[BART] cleanup: remove redundant kwargs, improve docstrings ( #3319 )
2020-03-19 11:16:51 -04:00
Sam Shleifer
11573231c6
[BART] generation_mode as a kwarg not a class attribute ( #3278 )
2020-03-16 12:47:53 -04:00
Sam Shleifer
5ea8ba67b4
[BART] Remove unused kwargs ( #3279 )
...
* Remove unused kwargs
* dont call forward in tests
2020-03-15 23:00:44 -04:00
Sam Shleifer
2bd79e23de
[BART] FP16 testing fixes ( #3266 )
2020-03-13 19:48:26 -04:00
Sam Shleifer
2e81b9d8d7
Bart: update example for #3140 compatibility ( #3233 )
...
* Update bart example docs
2020-03-12 10:36:37 -04:00
Patrick von Platen
a332cc9f7f
finalize generation merge
2020-03-11 11:53:36 +01:00
patrickvonplaten
2acfe63964
best current version and make style
2020-03-11 11:06:56 +01:00
patrickvonplaten
c62444da39
fix conflicts
2020-03-11 11:06:56 +01:00
Patrick von Platen
7a11e925cf
work in progress
2020-03-11 11:06:56 +01:00
Patrick von Platen
7cba11fb9b
better naming
2020-03-11 11:06:56 +01:00
Patrick von Platen
ff648221bd
fix conflicts
2020-03-11 11:06:56 +01:00
Patrick von Platen
d8e2b3c547
fix conflicts
2020-03-11 11:06:56 +01:00
Sam Shleifer
ed37f9fa4f
[Bart] _prepare_decoder_inputs should use large negative ( #3158 )
2020-03-06 16:06:36 -05:00
Sam Shleifer
e58b3ec5df
add imports to examples ( #3160 )
2020-03-06 11:15:33 -05:00
Thomas Wolf
6ffe03a0a1
Merge pull request #3137 from tomhosking/bart-refactor
...
Refactor BartModel so that input checks are handled within enc/dec
2020-03-06 13:06:34 +01:00
Sam Shleifer
857e0a0d3b
Rename BartForMaskedLM -> BartForConditionalGeneration ( #3114 )
...
* improved documentation
2020-03-05 17:41:18 -05:00
sshleifer
14d40584b2
remove newline
2020-03-05 13:06:35 -05:00
sshleifer
1360dacaa3
cleanup deltas
2020-03-05 12:57:42 -05:00
sshleifer
810079de1f
no ipdb
2020-03-05 12:48:14 -05:00
sshleifer
c36fdc88d4
tests pass
2020-03-05 12:33:08 -05:00
Tom Hosking
06a6cb6f36
Refactor BartModel so that input checks are handled within BartEncoder and BartDecoder
2020-03-05 13:45:41 +00:00
Sam Shleifer
e9e6efdc45
BartForSequenceClassification: fix num_labels, add test ( #3110 )
2020-03-03 15:54:29 -05:00
Sam Shleifer
5c5af879b6
[Bart] dont call .forward ( #3094 )
2020-03-03 15:14:12 -05:00
Sam Shleifer
b54ef78d0c
Bart-CNN ( #3059 )
...
`generate` code that produces 99% identical summarizations to fairseq on CNN test data, with caching.
2020-03-02 10:35:53 -05:00
Julien Chaumond
9cda3620b6
Fix (non-slow) tests on GPU (torch) ( #3024 )
...
* Fix tests on GPU (torch)
* Fix bart slow tests
Co-authored-by: Sam Shleifer <sshleifer@gmail.com >
2020-02-26 11:59:25 -05:00
Sam Shleifer
92487a1dc0
Bart: fix layerdrop and cached decoder_input_ids for generation ( #2969 )
2020-02-22 16:25:04 -05:00
Sam Shleifer
53ce3854a1
New BartModel ( #2745 )
...
* Results same as fairseq
* Wrote a ton of tests
* Struggled with api signatures
* added some docs
2020-02-20 18:11:13 -05:00