Fix many HPU failures in the CI (#39066)

* more torch.hpu patches * increase top_k because it results in flaky behavior when Tempreture, TopP and TopK are used together, which ends up killing beams early. * remove temporal fix * fix scatter operation when input and src are the same * trigger * fix and reduce * skip finding batch size as it makes the hpu go loco * fix fsdp (yay all are passing) * fix checking equal nan values * style * remove models list * order * rename to cuda_extensions * Update src/transformers/trainer.py
2025-07-03 11:17:27 +02:00
parent bff964c429
commit 18e0cae207
5 changed files with 71 additions and 54 deletions
--- a/utils/split_model_tests.py
+++ b/utils/split_model_tests.py
@@ -62,4 +62,5 @@ if __name__ == "__main__":
        start = end
        end = start + num_jobs_per_splits + (1 if idx < num_jobs % args.num_splits else 0)
        model_splits.append(d[start:end])
+
    print(model_splits)