align xpu's autocast behavior w/ cuda by using device agnostic torch APIs (#38284)

* siwtch to device agnostic autocast in nemotron to align xpu behavior w/
cuda

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix issue

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* use torch.cast as other modeling code for decision_transformer&gpt2&imagegpt

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* refine

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* update get_autocast_gpu_dtype to device agnostic one

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* fix comments

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
This commit is contained in:
Yao Matrix
2025-06-19 19:48:23 +08:00
committed by GitHub
parent 0a53df1a77
commit a9ce8c69c9
26 changed files with 138 additions and 37 deletions

View File

@@ -462,7 +462,7 @@ class Mamba2IntegrationTest(unittest.TestCase):
config = Mamba2Config(num_heads=24, head_dim=64, hidden_size=768, expand=2, n_groups=1)
torch.manual_seed(42)
with torch.amp.autocast(device_type=torch_device, dtype=dtype):
with torch.autocast(device_type=torch_device, dtype=dtype):
with torch.no_grad():
mixer = Mamba2Mixer(config, layer_idx=0).to(torch_device)
hidden_states = torch.rand(size=(B, T, D), dtype=dtype, device=torch_device)