* Attention mask is important in the case of batching... * Improve the fix. * Making the sentence different enough that they exhibit different predictions.
ForInstanceSegmentation
image-segmentation
torch.diag