HuggingFace_transformer

Author	SHA1	Message	Date
Julien Chaumond	d5319793c4	Fix BERT	2019-11-06 14:03:47 -05:00
Julien Chaumond	00337e9687	[inputs_embeds] All PyTorch models	2019-11-05 00:39:18 +00:00
thomwolf	1724cee8c4	switch from properties to methods	2019-11-04 15:34:10 +01:00
thomwolf	9b45d0f878	Add common properties input_embeddings and output_embeddings	2019-11-04 12:28:56 +01:00
Thomas Wolf	228cdd6a6e	Merge branch 'master' into conditional-generation	2019-10-30 16:40:35 +01:00
Rémi Louf	9c1bdb5b61	revert renaming of lm_labels to ltr_lm_labels	2019-10-30 10:43:13 +01:00
Rémi Louf	098a89f312	update docstrings; rename lm_labels to more explicit ltr_lm_labels	2019-10-29 20:08:03 +01:00
Rémi Louf	dfce409691	resolve PR comments	2019-10-29 17:10:20 +01:00
Rémi Louf	4c3ac4a7d8	here's one big commit	2019-10-28 10:49:50 +01:00
Rémi Louf	dc580dd4c7	add lm_labels for the LM cross-entropy	2019-10-28 10:49:49 +01:00
Rémi Louf	f873a3edb2	the decoder attends to the output of the encoder stack (last layer)	2019-10-28 10:49:00 +01:00
Rémi Louf	87d60b6e19	reword explanation of encoder_attention_mask	2019-10-17 10:18:19 +02:00
Rémi Louf	638fe7f5a4	correct composition of padding and causal masks	2019-10-17 10:13:07 +02:00
Rémi Louf	4e0f24348f	document the MLM modification + raise exception on MLM training with encoder-decoder	2019-10-17 09:41:53 +02:00
Rémi Louf	a424892fab	correct syntax error: dim() and not dims()	2019-10-16 18:24:32 +02:00
Rémi Louf	0752069617	adapt attention masks for the decoder case The introduction of a decoder introduces 2 changes: - We need to be able to specify a separate mask in the cross attention to mask the positions corresponding to padding tokens in the encoder state. - The self-attention in the decoder needs to be causal on top of not attending to padding tokens.	2019-10-16 16:12:22 +02:00
thomwolf	0ef9bc923a	Cleaning up seq2seq [WIP]	2019-10-14 11:58:13 +02:00
jeffxtang	e76d71521c	the working example code to use BertForQuestionAnswering and get an answer from a text and a question	2019-10-11 17:04:02 -07:00
Rémi Louf	f8e98d6779	load pretrained embeddings in Bert decoder In Rothe et al.'s "Leveraging Pre-trained Checkpoints for Sequence Generation Tasks", Bert2Bert is initialized with pre-trained weights for the encoder, and only pre-trained embeddings for the decoder. The current version of the code completely randomizes the weights of the decoder. We write a custom function to initiliaze the weights of the decoder; we first initialize the decoder with the weights and then randomize everything but the embeddings.	2019-10-11 16:48:11 +02:00
Stefan Schweter	5f25a5f367	model: add support for new German BERT models (cased and uncased) from @dbmdz	2019-10-11 10:20:33 +02:00
Rémi Louf	fa218e648a	fix syntax errors	2019-10-10 15:16:07 +02:00
Rémi Louf	3e1cd8241e	fix stupid (re)naming issue	2019-10-10 14:18:20 +02:00
Rémi Louf	81ee29ee8d	remove the staticmethod used to load the config	2019-10-10 14:13:37 +02:00
Rémi Louf	d7092d592c	rename the attributes in the Bert Layer Since the preloading of weights relies on the name of the class's attributes changing the namespace breaks loading pretrained weights on Bert and all related models. I reverted `self_attention` to `attention` and us `crossattention` for the decoder instead.	2019-10-10 12:51:14 +02:00
Rémi Louf	51261167b4	prune both attention and self-attention heads	2019-10-10 12:17:22 +02:00
Rémi Louf	17177e7379	add is_decoder as an attribute to Config class	2019-10-10 12:03:58 +02:00
Rémi Louf	df85a0ff0b	replace double quotes with simple quotes	2019-10-10 11:38:26 +02:00
Rémi Louf	9ca788b2e8	merge the two Bert layers classes	2019-10-10 11:33:28 +02:00
Rémi Louf	edfc8f8225	Remove and do the branching in	2019-10-10 10:17:27 +02:00
Rémi Louf	09cfd12235	remove and do the branching in	2019-10-10 10:15:27 +02:00
Rémi Louf	877ef2c6ca	override `from_pretrained` in Bert2Rnd In the seq2seq model we need to both load pretrained weights in the encoder and initialize the decoder randomly. Because the `from_pretrained` method defined in the base class relies on module names to assign weights, it would also initialize the decoder with pretrained weights. To avoid this we override the method to only initialize the encoder with pretrained weights.	2019-10-10 10:02:18 +02:00
Rémi Louf	770b15b58c	rename class in __init__	2019-10-08 17:32:28 +02:00
Rémi Louf	8abfee9ec3	rename Bert2Bert -> Bert2Rnd	2019-10-08 16:30:58 +02:00
Rémi Louf	0700983090	Add BertDecoderModel and Bert2Bert classes I am not sure what happens when the class is initialized with the pretrained weights.	2019-10-08 16:30:58 +02:00
Rémi Louf	75feacf172	add general structure for Bert2Bert class	2019-10-08 16:30:58 +02:00
Rémi Louf	15a2fc88a6	add General attention classes The modifications that I introduced in a previous commit did break Bert's internal API. I reverted these changes and added more general classes to handle the encoder-decoder attention case. There may be a more elegant way to deal with retro-compatibility (I am not comfortable with the current state of the code), but I cannot see it right now.	2019-10-08 16:30:58 +02:00
Rémi Louf	cd6a59d5c1	add a decoder layer for Bert	2019-10-08 16:30:58 +02:00
Rémi Louf	a0dcefa382	generalize BertSelfAttention to take separate query, key, value There is currently no way to specify the quey, key and value separately in the Attention module. However, the decoder's "encoder-decoder attention" layers take the decoder's last output as a query, the encoder's states as key and value. We thus modify the existing code so query, key and value can be added separately. This obviously poses some naming conventions; `BertSelfAttention` is not a self-attention module anymore. The way the residual is forwarded is now awkard, etc. We will need to do some refacto once the decoder is fully implemented.	2019-10-07 17:53:58 +02:00
Rémi Louf	31adbb247c	add class wireframes for Bert decoder	2019-10-07 16:43:21 +02:00
Rémi Louf	dda1adad6d	rename BertLayer to BertEncoderLayer	2019-10-07 16:31:46 +02:00
Rémi Louf	0053c0e052	do some (light) housekeeping Several packages were imported but never used, indentation and line spaces did not follow PEP8.	2019-10-07 16:29:15 +02:00
Santiago Castro	63ed224b7c	initialy -> initially	2019-10-02 15:04:18 +00:00
thomwolf	80bf868a26	Merge branch 'master' into tf2	2019-09-26 12:04:47 +02:00
thomwolf	31c23bd5ee	[BIG] pytorch-transformers => transformers	2019-09-26 10:15:53 +02:00

44 Commits