HuggingFace_transformer

Author	SHA1	Message	Date
Rémi Louf	fa218e648a	fix syntax errors	2019-10-10 15:16:07 +02:00
Rémi Louf	3e1cd8241e	fix stupid (re)naming issue	2019-10-10 14:18:20 +02:00
Rémi Louf	81ee29ee8d	remove the staticmethod used to load the config	2019-10-10 14:13:37 +02:00
Rémi Louf	d7092d592c	rename the attributes in the Bert Layer Since the preloading of weights relies on the name of the class's attributes changing the namespace breaks loading pretrained weights on Bert and all related models. I reverted `self_attention` to `attention` and us `crossattention` for the decoder instead.	2019-10-10 12:51:14 +02:00
Rémi Louf	51261167b4	prune both attention and self-attention heads	2019-10-10 12:17:22 +02:00
Rémi Louf	17177e7379	add is_decoder as an attribute to Config class	2019-10-10 12:03:58 +02:00
Rémi Louf	df85a0ff0b	replace double quotes with simple quotes	2019-10-10 11:38:26 +02:00
Rémi Louf	9ca788b2e8	merge the two Bert layers classes	2019-10-10 11:33:28 +02:00
Rémi Louf	edfc8f8225	Remove and do the branching in	2019-10-10 10:17:27 +02:00
Rémi Louf	09cfd12235	remove and do the branching in	2019-10-10 10:15:27 +02:00
Rémi Louf	877ef2c6ca	override `from_pretrained` in Bert2Rnd In the seq2seq model we need to both load pretrained weights in the encoder and initialize the decoder randomly. Because the `from_pretrained` method defined in the base class relies on module names to assign weights, it would also initialize the decoder with pretrained weights. To avoid this we override the method to only initialize the encoder with pretrained weights.	2019-10-10 10:02:18 +02:00
Rémi Louf	770b15b58c	rename class in __init__	2019-10-08 17:32:28 +02:00
Rémi Louf	8abfee9ec3	rename Bert2Bert -> Bert2Rnd	2019-10-08 16:30:58 +02:00
Rémi Louf	0700983090	Add BertDecoderModel and Bert2Bert classes I am not sure what happens when the class is initialized with the pretrained weights.	2019-10-08 16:30:58 +02:00
Rémi Louf	75feacf172	add general structure for Bert2Bert class	2019-10-08 16:30:58 +02:00
Rémi Louf	15a2fc88a6	add General attention classes The modifications that I introduced in a previous commit did break Bert's internal API. I reverted these changes and added more general classes to handle the encoder-decoder attention case. There may be a more elegant way to deal with retro-compatibility (I am not comfortable with the current state of the code), but I cannot see it right now.	2019-10-08 16:30:58 +02:00
Rémi Louf	cd6a59d5c1	add a decoder layer for Bert	2019-10-08 16:30:58 +02:00
Rémi Louf	a0dcefa382	generalize BertSelfAttention to take separate query, key, value There is currently no way to specify the quey, key and value separately in the Attention module. However, the decoder's "encoder-decoder attention" layers take the decoder's last output as a query, the encoder's states as key and value. We thus modify the existing code so query, key and value can be added separately. This obviously poses some naming conventions; `BertSelfAttention` is not a self-attention module anymore. The way the residual is forwarded is now awkard, etc. We will need to do some refacto once the decoder is fully implemented.	2019-10-07 17:53:58 +02:00
Rémi Louf	31adbb247c	add class wireframes for Bert decoder	2019-10-07 16:43:21 +02:00
Rémi Louf	dda1adad6d	rename BertLayer to BertEncoderLayer	2019-10-07 16:31:46 +02:00
Rémi Louf	0053c0e052	do some (light) housekeeping Several packages were imported but never used, indentation and line spaces did not follow PEP8.	2019-10-07 16:29:15 +02:00
Santiago Castro	63ed224b7c	initialy -> initially	2019-10-02 15:04:18 +00:00
thomwolf	80bf868a26	Merge branch 'master' into tf2	2019-09-26 12:04:47 +02:00
thomwolf	31c23bd5ee	[BIG] pytorch-transformers => transformers	2019-09-26 10:15:53 +02:00

24 Commits