Add link to pytorch CrossEntropyLoss so that one understand why '-100' is ignore by the loss function.
evaluate