align xpu's autocast behavior w/ cuda by using device agnostic torch APIs (#38284)

* siwtch to device agnostic autocast in nemotron to align xpu behavior w/ cuda Signed-off-by: Matrix Yao <matrix.yao@intel.com> * fix issue Signed-off-by: Matrix Yao <matrix.yao@intel.com> * fix style Signed-off-by: Matrix Yao <matrix.yao@intel.com> * use torch.cast as other modeling code for decision_transformer&gpt2&imagegpt Signed-off-by: Matrix Yao <matrix.yao@intel.com> * refine Signed-off-by: Matrix Yao <matrix.yao@intel.com> * update get_autocast_gpu_dtype to device agnostic one Signed-off-by: Matrix YAO <matrix.yao@intel.com> * fix style Signed-off-by: Matrix YAO <matrix.yao@intel.com> * fix comments Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix style Signed-off-by: YAO Matrix <matrix.yao@intel.com> --------- Signed-off-by: Matrix Yao <matrix.yao@intel.com> Signed-off-by: Matrix YAO <matrix.yao@intel.com> Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-06-19 19:48:23 +08:00
parent 0a53df1a77
commit a9ce8c69c9
26 changed files with 138 additions and 37 deletions
--- a/tests/models/mamba2/test_modeling_mamba2.py
+++ b/tests/models/mamba2/test_modeling_mamba2.py
@@ -462,7 +462,7 @@ class Mamba2IntegrationTest(unittest.TestCase):
        config = Mamba2Config(num_heads=24, head_dim=64, hidden_size=768, expand=2, n_groups=1)

        torch.manual_seed(42)
-        with torch.amp.autocast(device_type=torch_device, dtype=dtype):
+        with torch.autocast(device_type=torch_device, dtype=dtype):
            with torch.no_grad():
                mixer = Mamba2Mixer(config, layer_idx=0).to(torch_device)
                hidden_states = torch.rand(size=(B, T, D), dtype=dtype, device=torch_device)