XLA train step fixes (#17973)

* Copy inputs to train and test step before modifying them, as this breaks things * Add XLA tests, fix our loss functions to be XLA-compatible * make fixup * Update loss computation test to expect vector of per-sample losses * Patch loss for TFLED * Patch loss for TFAlbert * Add a tf_legacy_loss config flag that enables old loss functions * Stop using config.get() because it's not a dict * Skip loss computation test for RAG because its loss is very strange and I'm afraid to rewrite it * make fixup * Add XLA-compatible RAG loss * Fix dtype of loss mask for TFAlbert * Fix test for XLNet too because it overrides the default one * make fixup * Fix config test * No more depending on GPU NaN behaviour * Add test, avoid potential zero division * Fix test item assignment * Fix loss computation masking test * make fixup * Fix dtype bugs
2022-07-01 19:11:14 +01:00
parent 485bbe79d5
commit d6cec45801
10 changed files with 278 additions and 83 deletions
--- a/tests/test_configuration_common.py
+++ b/tests/test_configuration_common.py
@@ -42,6 +42,7 @@ config_common_kwargs = {
    "torchscript": True,
    "torch_dtype": "float16",
    "use_bfloat16": True,
+    "tf_legacy_loss": True,
    "pruned_heads": {"a": 1},
    "tie_word_embeddings": False,
    "is_decoder": True,