Files
HuggingFace_transformer/transformers
Rostislav Nedelchev 76c0bc06d5 [XLNet] Changed post-processing of attention w.r.t to target_mapping
Whenever target_mapping is provided to the input, XLNet outputs two different attention streams.
Based on that the attention output would be on of the two:
- a list of tensors (usual case for most transformers)
- a list of 2-tuples of tensors, one tesor for each of attention streams
Docs and unit-tests have been updated
2019-11-30 21:01:04 +01:00
..
2019-11-27 11:07:22 -05:00
2019-11-26 13:08:12 -05:00
2019-10-09 11:07:43 +02:00
2019-11-05 13:31:58 -05:00
2019-11-29 11:25:37 -05:00
2019-10-29 17:10:20 +01:00
2019-11-27 23:11:37 +01:00
2019-11-26 13:08:12 -05:00
2019-11-26 13:08:12 -05:00
2019-11-29 11:25:37 -05:00
2019-11-11 10:15:14 -05:00
2019-11-25 11:32:00 -05:00
2019-11-11 10:15:14 -05:00
2019-11-27 11:07:22 -05:00