Rostislav Nedelchev
76c0bc06d5
[XLNet] Changed post-processing of attention w.r.t to target_mapping
...
Whenever target_mapping is provided to the input, XLNet outputs two different attention streams.
Based on that the attention output would be on of the two:
- a list of tensors (usual case for most transformers)
- a list of 2-tuples of tensors, one tesor for each of attention streams
Docs and unit-tests have been updated
2019-11-30 21:01:04 +01:00
..
2019-11-26 13:08:12 -05:00
2019-09-26 10:15:53 +02:00
2019-09-26 10:15:53 +02:00
2019-11-06 14:03:47 -05:00
2019-11-26 13:08:12 -05:00
2019-11-04 16:03:36 +01:00
2019-11-06 14:03:47 -05:00
2019-11-12 11:29:21 -05:00
2019-10-08 17:11:58 +02:00
2019-11-11 16:20:15 +01:00
2019-11-04 16:03:36 +01:00
2019-09-26 10:15:53 +02:00
2019-09-26 10:15:53 +02:00
2019-10-24 14:32:48 -04:00
2019-11-26 13:08:12 -05:00
2019-09-26 10:15:53 +02:00
2019-11-12 11:29:21 -05:00
2019-11-26 14:39:47 -05:00
2019-10-09 11:07:43 +02:00
2019-09-26 10:15:53 +02:00
2019-10-09 11:07:43 +02:00
2019-09-26 10:15:53 +02:00
2019-10-24 14:32:48 -04:00
2019-09-26 10:15:53 +02:00
2019-09-26 10:15:53 +02:00
2019-10-11 15:55:01 +02:00
2019-09-26 10:15:53 +02:00
2019-09-26 10:15:53 +02:00
2019-11-30 21:01:04 +01:00
2019-11-14 15:39:08 +01:00
2019-11-26 13:08:12 -05:00
2019-11-04 16:03:36 +01:00
2019-11-04 16:03:36 +01:00
2019-10-08 17:19:28 +02:00
2019-11-04 16:03:36 +01:00
2019-09-26 10:15:53 +02:00
2019-09-26 10:15:53 +02:00
2019-11-04 16:03:36 +01:00
2019-11-27 17:14:49 +01:00
2019-09-26 10:15:53 +02:00
2019-11-04 16:03:36 +01:00
2019-11-04 16:03:36 +01:00
2019-11-04 16:03:36 +01:00