Fix pad across processes dim in trainer and not being able to set the timeout (#24775)
* dim, and rm copy * Don't rm copy for now * Oops * pad index * Should be a working test * Tickle down ddp timeout * Put fix back in now that testing locally is done * Better comment specifying timeout Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
This commit is contained in:
@@ -1714,7 +1714,9 @@ class TrainingArguments:
|
||||
del os.environ["ACCELERATE_USE_DEEPSPEED"]
|
||||
self._n_gpu = 1
|
||||
else:
|
||||
self.distributed_state = PartialState(backend=self.ddp_backend)
|
||||
self.distributed_state = PartialState(
|
||||
backend=self.ddp_backend, timeout=timedelta(seconds=self.ddp_timeout)
|
||||
)
|
||||
self._n_gpu = 1
|
||||
if not is_sagemaker_mp_enabled():
|
||||
device = self.distributed_state.device
|
||||
|
||||
Reference in New Issue
Block a user