Fix many HPU failures in the CI (#39066)
* more torch.hpu patches * increase top_k because it results in flaky behavior when Tempreture, TopP and TopK are used together, which ends up killing beams early. * remove temporal fix * fix scatter operation when input and src are the same * trigger * fix and reduce * skip finding batch size as it makes the hpu go loco * fix fsdp (yay all are passing) * fix checking equal nan values * style * remove models list * order * rename to cuda_extensions * Update src/transformers/trainer.py
This commit is contained in:
committed by
GitHub
parent
bff964c429
commit
18e0cae207
@@ -62,4 +62,5 @@ if __name__ == "__main__":
|
||||
start = end
|
||||
end = start + num_jobs_per_splits + (1 if idx < num_jobs % args.num_splits else 0)
|
||||
model_splits.append(d[start:end])
|
||||
|
||||
print(model_splits)
|
||||
|
||||
Reference in New Issue
Block a user