diff --git a/README.md b/README.md index e51a2c51bb..e3a42595dd 100644 --- a/README.md +++ b/README.md @@ -70,7 +70,7 @@ Explore the [Hub](https://huggingface.com/) today to find a model and use Transf ## Installation -Transformers works with Python 3.9+ [PyTorch](https://pytorch.org/get-started/locally/) 2.0+, [TensorFlow](https://www.tensorflow.org/install/pip) 2.6+, and [Flax](https://flax.readthedocs.io/en/latest/) 0.4.1+. +Transformers works with Python 3.9+ [PyTorch](https://pytorch.org/get-started/locally/) 2.1+, [TensorFlow](https://www.tensorflow.org/install/pip) 2.6+, and [Flax](https://flax.readthedocs.io/en/latest/) 0.4.1+. Create and activate a virtual environment with [venv](https://docs.python.org/3/library/venv.html) or [uv](https://docs.astral.sh/uv/), a fast Rust-based Python package and project manager. diff --git a/docs/source/en/installation.md b/docs/source/en/installation.md index 45a043ac8d..911c84858f 100644 --- a/docs/source/en/installation.md +++ b/docs/source/en/installation.md @@ -20,7 +20,7 @@ rendered properly in your Markdown viewer. # Installation -Transformers works with [PyTorch](https://pytorch.org/get-started/locally/), [TensorFlow 2.0](https://www.tensorflow.org/install/pip), and [Flax](https://flax.readthedocs.io/en/latest/). It has been tested on Python 3.9+, PyTorch 2.0+, TensorFlow 2.6+, and Flax 0.4.1+. +Transformers works with [PyTorch](https://pytorch.org/get-started/locally/), [TensorFlow 2.0](https://www.tensorflow.org/install/pip), and [Flax](https://flax.readthedocs.io/en/latest/). It has been tested on Python 3.9+, PyTorch 2.1+, TensorFlow 2.6+, and Flax 0.4.1+. ## Virtual environment diff --git a/i18n/README_ar.md b/i18n/README_ar.md index c7249ac23d..cdf813445d 100644 --- a/i18n/README_ar.md +++ b/i18n/README_ar.md @@ -245,7 +245,7 @@ limitations under the License. ### باستخدام pip -تم اختبار هذا المستودع على Python 3.9+، Flax 0.4.1+، PyTorch 2.0+، و TensorFlow 2.6+. +تم اختبار هذا المستودع على Python 3.9+، Flax 0.4.1+، PyTorch 2.1+، و TensorFlow 2.6+. يجب تثبيت 🤗 Transformers في [بيئة افتراضية](https://docs.python.org/3/library/venv.html). إذا كنت غير معتاد على البيئات الافتراضية Python، فراجع [دليل المستخدم](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). diff --git a/i18n/README_de.md b/i18n/README_de.md index 78447af41a..b913df894d 100644 --- a/i18n/README_de.md +++ b/i18n/README_de.md @@ -246,7 +246,7 @@ Das Modell selbst ist ein reguläres [PyTorch `nn.Module`](https://pytorch.org/d ### Mit pip -Dieses Repository wurde mit Python 3.9+, Flax 0.4.1+, PyTorch 2.0+ und TensorFlow 2.6+ getestet. +Dieses Repository wurde mit Python 3.9+, Flax 0.4.1+, PyTorch 2.1+ und TensorFlow 2.6+ getestet. Sie sollten 🤗 Transformers in einer [virtuellen Umgebung](https://docs.python.org/3/library/venv.html) installieren. Wenn Sie mit virtuellen Python-Umgebungen nicht vertraut sind, schauen Sie sich den [Benutzerleitfaden](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/) an. diff --git a/i18n/README_es.md b/i18n/README_es.md index 57eb8117fc..36bb3e71ef 100644 --- a/i18n/README_es.md +++ b/i18n/README_es.md @@ -222,7 +222,7 @@ El modelo en si es un [Pytorch `nn.Module`](https://pytorch.org/docs/stable/nn.h ### Con pip -Este repositorio está probado en Python 3.9+, Flax 0.4.1+, PyTorch 2.0+ y TensorFlow 2.6+. +Este repositorio está probado en Python 3.9+, Flax 0.4.1+, PyTorch 2.1+ y TensorFlow 2.6+. Deberías instalar 🤗 Transformers en un [entorno virtual](https://docs.python.org/3/library/venv.html). Si no estas familiarizado con los entornos virtuales de Python, consulta la [guía de usuario](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). diff --git a/i18n/README_fr.md b/i18n/README_fr.md index 5925978c44..6512b4af07 100644 --- a/i18n/README_fr.md +++ b/i18n/README_fr.md @@ -243,7 +243,7 @@ Le modèle lui-même est un module [`nn.Module` PyTorch](https://pytorch.org/doc ### Avec pip -Ce référentiel est testé sur Python 3.9+, Flax 0.4.1+, PyTorch 2.0+ et TensorFlow 2.6+. +Ce référentiel est testé sur Python 3.9+, Flax 0.4.1+, PyTorch 2.1+ et TensorFlow 2.6+. Vous devriez installer 🤗 Transformers dans un [environnement virtuel](https://docs.python.org/3/library/venv.html). Si vous n'êtes pas familier avec les environnements virtuels Python, consultez le [guide utilisateur](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). diff --git a/i18n/README_hd.md b/i18n/README_hd.md index 1541e4df66..76ee4355bd 100644 --- a/i18n/README_hd.md +++ b/i18n/README_hd.md @@ -198,7 +198,7 @@ checkpoint: जाँच बिंदु ### पिप का उपयोग करना -इस रिपॉजिटरी का परीक्षण Python 3.9+, Flax 0.4.1+, PyTorch 2.0+ और TensorFlow 2.6+ के तहत किया गया है। +इस रिपॉजिटरी का परीक्षण Python 3.9+, Flax 0.4.1+, PyTorch 2.1+ और TensorFlow 2.6+ के तहत किया गया है। आप [वर्चुअल एनवायरनमेंट](https://docs.python.org/3/library/venv.html) में 🤗 ट्रांसफॉर्मर इंस्टॉल कर सकते हैं। यदि आप अभी तक पायथन के वर्चुअल एनवायरनमेंट से परिचित नहीं हैं, तो कृपया इसे [उपयोगकर्ता निर्देश](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/) पढ़ें। diff --git a/i18n/README_ja.md b/i18n/README_ja.md index fc3d4ae945..c57a07b56b 100644 --- a/i18n/README_ja.md +++ b/i18n/README_ja.md @@ -256,7 +256,7 @@ Hugging Faceチームによって作られた **[トランスフォーマーを ### pipにて -このリポジトリは、Python 3.9+, Flax 0.4.1+, PyTorch 2.0+, TensorFlow 2.6+ でテストされています。 +このリポジトリは、Python 3.9+, Flax 0.4.1+, PyTorch 2.1+, TensorFlow 2.6+ でテストされています。 🤗Transformersは[仮想環境](https://docs.python.org/3/library/venv.html)にインストールする必要があります。Pythonの仮想環境に慣れていない場合は、[ユーザーガイド](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/)を確認してください。 diff --git a/i18n/README_ko.md b/i18n/README_ko.md index 6d6559398e..fded56a37c 100644 --- a/i18n/README_ko.md +++ b/i18n/README_ko.md @@ -242,7 +242,7 @@ Transformers에 달린 100,000개의 별을 축하하기 위해, 우리는 커 ### pip로 설치하기 -이 저장소는 Python 3.9+, Flax 0.4.1+, PyTorch 2.0+, TensorFlow 2.6+에서 테스트 되었습니다. +이 저장소는 Python 3.9+, Flax 0.4.1+, PyTorch 2.1+, TensorFlow 2.6+에서 테스트 되었습니다. [가상 환경](https://docs.python.org/3/library/venv.html)에 🤗 Transformers를 설치하세요. Python 가상 환경에 익숙하지 않다면, [사용자 가이드](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/)를 확인하세요. diff --git a/i18n/README_pt-br.md b/i18n/README_pt-br.md index f865f1b6ed..e3c71c6a3f 100644 --- a/i18n/README_pt-br.md +++ b/i18n/README_pt-br.md @@ -253,7 +253,7 @@ O modelo em si é um [Pytorch `nn.Module`](https://pytorch.org/docs/stable/nn.ht ### Com pip -Este repositório é testado no Python 3.9+, Flax 0.4.1+, PyTorch 2.0+ e TensorFlow 2.6+. +Este repositório é testado no Python 3.9+, Flax 0.4.1+, PyTorch 2.1+ e TensorFlow 2.6+. Você deve instalar o 🤗 Transformers em um [ambiente virtual](https://docs.python.org/3/library/venv.html). Se você não está familiarizado com ambientes virtuais em Python, confira o [guia do usuário](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). diff --git a/i18n/README_ru.md b/i18n/README_ru.md index c153474f33..c30237fef8 100644 --- a/i18n/README_ru.md +++ b/i18n/README_ru.md @@ -244,7 +244,7 @@ Hugging Face Hub. Мы хотим, чтобы Transformers позволил ра ### С помощью pip -Данный репозиторий протестирован на Python 3.9+, Flax 0.4.1+, PyTorch 2.0+ и TensorFlow 2.6+. +Данный репозиторий протестирован на Python 3.9+, Flax 0.4.1+, PyTorch 2.1+ и TensorFlow 2.6+. Устанавливать 🤗 Transformers следует в [виртуальной среде](https://docs.python.org/3/library/venv.html). Если вы не знакомы с виртуальными средами Python, ознакомьтесь с [руководством пользователя](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). diff --git a/i18n/README_te.md b/i18n/README_te.md index 791ed6414f..aee579b52a 100644 --- a/i18n/README_te.md +++ b/i18n/README_te.md @@ -246,7 +246,7 @@ limitations under the License. ### పిప్ తో -ఈ రిపోజిటరీ పైథాన్ 3.9+, ఫ్లాక్స్ 0.4.1+, PyTorch 2.0+ మరియు TensorFlow 2.6+లో పరీక్షించబడింది. +ఈ రిపోజిటరీ పైథాన్ 3.9+, ఫ్లాక్స్ 0.4.1+, PyTorch 2.1+ మరియు TensorFlow 2.6+లో పరీక్షించబడింది. మీరు [వర్చువల్ వాతావరణం](https://docs.python.org/3/library/venv.html)లో 🤗 ట్రాన్స్‌ఫార్మర్‌లను ఇన్‌స్టాల్ చేయాలి. మీకు పైథాన్ వర్చువల్ పరిసరాల గురించి తెలియకుంటే, [యూజర్ గైడ్](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/) చూడండి. diff --git a/i18n/README_ur.md b/i18n/README_ur.md index 2d4d7745f6..bba5988e77 100644 --- a/i18n/README_ur.md +++ b/i18n/README_ur.md @@ -259,7 +259,7 @@ limitations under the License. #### ‏ pip کے ساتھ -یہ ریپوزٹری Python 3.9+، Flax 0.4.1+، PyTorch 2.0+، اور TensorFlow 2.6+ پر ٹیسٹ کی گئی ہے۔ +یہ ریپوزٹری Python 3.9+، Flax 0.4.1+، PyTorch 2.1+، اور TensorFlow 2.6+ پر ٹیسٹ کی گئی ہے۔ آپ کو 🤗 Transformers کو ایک [ورچوئل ماحول](https://docs.python.org/3/library/venv.html) میں انسٹال کرنا چاہیے۔ اگر آپ Python ورچوئل ماحول سے واقف نہیں ہیں، تو [یوزر گائیڈ](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/) دیکھیں۔ diff --git a/i18n/README_vi.md b/i18n/README_vi.md index 4f7f67bfce..f78e3b6d4e 100644 --- a/i18n/README_vi.md +++ b/i18n/README_vi.md @@ -245,7 +245,7 @@ Chính mô hình là một [Pytorch `nn.Module`](https://pytorch.org/docs/stable ### Sử dụng pip -Thư viện này được kiểm tra trên Python 3.9+, Flax 0.4.1+, PyTorch 2.0+ và TensorFlow 2.6+. +Thư viện này được kiểm tra trên Python 3.9+, Flax 0.4.1+, PyTorch 2.1+ và TensorFlow 2.6+. Bạn nên cài đặt 🤗 Transformers trong một [môi trường ảo Python](https://docs.python.org/3/library/venv.html). Nếu bạn chưa quen với môi trường ảo Python, hãy xem [hướng dẫn sử dụng](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). diff --git a/i18n/README_zh-hans.md b/i18n/README_zh-hans.md index 637aba3174..22e7db3918 100644 --- a/i18n/README_zh-hans.md +++ b/i18n/README_zh-hans.md @@ -198,7 +198,7 @@ checkpoint: 检查点 ### 使用 pip -这个仓库已在 Python 3.9+、Flax 0.4.1+、PyTorch 2.0+ 和 TensorFlow 2.6+ 下经过测试。 +这个仓库已在 Python 3.9+、Flax 0.4.1+、PyTorch 2.1+ 和 TensorFlow 2.6+ 下经过测试。 你可以在[虚拟环境](https://docs.python.org/3/library/venv.html)中安装 🤗 Transformers。如果你还不熟悉 Python 的虚拟环境,请阅此[用户说明](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/)。 diff --git a/i18n/README_zh-hant.md b/i18n/README_zh-hant.md index dcafd4958e..9bd494552c 100644 --- a/i18n/README_zh-hant.md +++ b/i18n/README_zh-hant.md @@ -210,7 +210,7 @@ Tokenizer 為所有的預訓練模型提供了預處理,並可以直接轉換 ### 使用 pip -這個 Repository 已在 Python 3.9+、Flax 0.4.1+、PyTorch 2.0+ 和 TensorFlow 2.6+ 下經過測試。 +這個 Repository 已在 Python 3.9+、Flax 0.4.1+、PyTorch 2.1+ 和 TensorFlow 2.6+ 下經過測試。 你可以在[虛擬環境](https://docs.python.org/3/library/venv.html)中安裝 🤗 Transformers。如果你還不熟悉 Python 的虛擬環境,請閱此[使用者指引](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/)。 diff --git a/setup.py b/setup.py index bd21d56979..10dd08651d 100644 --- a/setup.py +++ b/setup.py @@ -187,7 +187,7 @@ _deps = [ "tiktoken", "timm<=1.0.11", "tokenizers>=0.21,<0.22", - "torch>=2.0", + "torch>=2.1", "torchaudio", "torchvision", "pyctcdecode>=0.4.0", diff --git a/src/transformers/dependency_versions_table.py b/src/transformers/dependency_versions_table.py index 86cc5debec..aac2d9da3d 100644 --- a/src/transformers/dependency_versions_table.py +++ b/src/transformers/dependency_versions_table.py @@ -92,7 +92,7 @@ deps = { "tiktoken": "tiktoken", "timm": "timm<=1.0.11", "tokenizers": "tokenizers>=0.21,<0.22", - "torch": "torch>=2.0", + "torch": "torch>=2.1", "torchaudio": "torchaudio", "torchvision": "torchvision", "pyctcdecode": "pyctcdecode>=0.4.0", diff --git a/src/transformers/modeling_utils.py b/src/transformers/modeling_utils.py index a462d1e348..470bbe4ad9 100644 --- a/src/transformers/modeling_utils.py +++ b/src/transformers/modeling_utils.py @@ -485,20 +485,15 @@ str_to_torch_dtype = { "F64": torch.float64, "I64": torch.int64, "F8_E4M3": torch.float8_e4m3fn, + "F8_E5M2": torch.float8_e5m2, } -if is_torch_greater_or_equal("2.1.0"): - str_to_torch_dtype["F8_E4M3"] = torch.float8_e4m3fn if is_torch_greater_or_equal("2.3.0"): str_to_torch_dtype["U16"] = torch.uint16 str_to_torch_dtype["U32"] = torch.uint32 str_to_torch_dtype["U64"] = torch.uint64 -if is_torch_greater_or_equal("2.1.0"): - str_to_torch_dtype["F8_E4M3"] = torch.float8_e4m3fn - str_to_torch_dtype["F8_E5M2"] = torch.float8_e5m2 - def load_state_dict( checkpoint_file: Union[str, os.PathLike], @@ -546,12 +541,7 @@ def load_state_dict( map_location = "cpu" extra_args = {} # mmap can only be used with files serialized with zipfile-based format. - if ( - isinstance(checkpoint_file, str) - and map_location != "meta" - and version.parse(torch.__version__) >= version.parse("2.1.0") - and is_zipfile(checkpoint_file) - ): + if isinstance(checkpoint_file, str) and map_location != "meta" and is_zipfile(checkpoint_file): extra_args = {"mmap": True} return torch.load( checkpoint_file, diff --git a/src/transformers/models/mask2former/modeling_mask2former.py b/src/transformers/models/mask2former/modeling_mask2former.py index e4fba109a0..60d37ff35b 100644 --- a/src/transformers/models/mask2former/modeling_mask2former.py +++ b/src/transformers/models/mask2former/modeling_mask2former.py @@ -34,10 +34,8 @@ from ...file_utils import ( ) from ...modeling_outputs import BaseModelOutput, BaseModelOutputWithCrossAttentions from ...modeling_utils import PreTrainedModel -from ...pytorch_utils import is_torch_greater_or_equal_than_2_1 from ...utils import is_accelerate_available, logging from ...utils.backbone_utils import load_backbone -from ...utils.import_utils import is_torchdynamo_compiling from .configuration_mask2former import Mask2FormerConfig @@ -2018,18 +2016,8 @@ class Mask2FormerMaskPredictor(nn.Module): ): mask_embeddings = self.mask_embedder(outputs.transpose(0, 1)) - is_tracing = torch.jit.is_tracing() or isinstance(outputs, torch.fx.Proxy) or is_torchdynamo_compiling() # Sum up over the channels - if is_tracing and not is_torch_greater_or_equal_than_2_1: - # Equivalent to einsum('bqc, bchw -> bqhw') but jit friendly - batch_size, num_queries, num_channels = mask_embeddings.shape - _, _, height, width = pixel_embeddings.shape - outputs_mask = torch.zeros((batch_size, num_queries, height, width), device=mask_embeddings.device) - for c in range(num_channels): - outputs_mask += mask_embeddings[..., c][..., None, None] * pixel_embeddings[:, None, c] - - else: - outputs_mask = torch.einsum("bqc, bchw -> bqhw", mask_embeddings, pixel_embeddings) + outputs_mask = torch.einsum("bqc, bchw -> bqhw", mask_embeddings, pixel_embeddings) attention_mask = nn.functional.interpolate( outputs_mask, size=attention_mask_target_size, mode="bilinear", align_corners=False diff --git a/src/transformers/models/maskformer/modeling_maskformer.py b/src/transformers/models/maskformer/modeling_maskformer.py index 5c1873b4d6..4f83cc59da 100644 --- a/src/transformers/models/maskformer/modeling_maskformer.py +++ b/src/transformers/models/maskformer/modeling_maskformer.py @@ -27,7 +27,6 @@ from ...activations import ACT2FN from ...modeling_attn_mask_utils import _prepare_4d_attention_mask from ...modeling_outputs import BaseModelOutputWithCrossAttentions from ...modeling_utils import PreTrainedModel -from ...pytorch_utils import is_torch_greater_or_equal_than_2_1 from ...utils import ( ModelOutput, add_start_docstrings, @@ -39,7 +38,6 @@ from ...utils import ( requires_backends, ) from ...utils.backbone_utils import load_backbone -from ...utils.import_utils import is_torchdynamo_compiling from ..detr import DetrConfig from .configuration_maskformer import MaskFormerConfig from .configuration_maskformer_swin import MaskFormerSwinConfig @@ -1685,7 +1683,6 @@ class MaskFormerForInstanceSegmentation(MaskFormerPreTrainedModel): # get the auxiliary predictions (one for each decoder's layer) auxiliary_logits: List[str, Tensor] = [] - is_tracing = torch.jit.is_tracing() or isinstance(outputs, torch.fx.Proxy) or is_torchdynamo_compiling() # This code is a little bit cumbersome, an improvement can be to return a list of predictions. If we have auxiliary loss then we are going to return more than one element in the list if self.config.use_auxiliary_loss: stacked_transformer_decoder_outputs = torch.stack(outputs.transformer_decoder_hidden_states) @@ -1693,18 +1690,7 @@ class MaskFormerForInstanceSegmentation(MaskFormerPreTrainedModel): class_queries_logits = classes[-1] # get the masks mask_embeddings = self.mask_embedder(stacked_transformer_decoder_outputs) - - if is_tracing and not is_torch_greater_or_equal_than_2_1: - # Equivalent to einsum('lbqc, bchw -> lbqhw') but jit friendly - num_embeddings, batch_size, num_queries, num_channels = mask_embeddings.shape - _, _, height, width = pixel_embeddings.shape - binaries_masks = torch.zeros( - (num_embeddings, batch_size, num_queries, height, width), device=mask_embeddings.device - ) - for c in range(num_channels): - binaries_masks += mask_embeddings[..., c][..., None, None] * pixel_embeddings[None, :, None, c] - else: - binaries_masks = torch.einsum("lbqc, bchw -> lbqhw", mask_embeddings, pixel_embeddings) + binaries_masks = torch.einsum("lbqc, bchw -> lbqhw", mask_embeddings, pixel_embeddings) masks_queries_logits = binaries_masks[-1] # go til [:-1] because the last one is always used @@ -1720,18 +1706,7 @@ class MaskFormerForInstanceSegmentation(MaskFormerPreTrainedModel): # get the masks mask_embeddings = self.mask_embedder(transformer_decoder_hidden_states) # sum up over the channels - - if is_tracing and not is_torch_greater_or_equal_than_2_1: - # Equivalent to einsum('bqc, bchw -> bqhw') but jit friendly - batch_size, num_queries, num_channels = mask_embeddings.shape - _, _, height, width = pixel_embeddings.shape - masks_queries_logits = torch.zeros( - (batch_size, num_queries, height, width), device=mask_embeddings.device - ) - for c in range(num_channels): - masks_queries_logits += mask_embeddings[..., c][..., None, None] * pixel_embeddings[:, None, c] - else: - masks_queries_logits = torch.einsum("bqc, bchw -> bqhw", mask_embeddings, pixel_embeddings) + masks_queries_logits = torch.einsum("bqc, bchw -> bqhw", mask_embeddings, pixel_embeddings) return class_queries_logits, masks_queries_logits, auxiliary_logits diff --git a/src/transformers/pytorch_utils.py b/src/transformers/pytorch_utils.py index ff3856c9d4..b2fe912535 100644 --- a/src/transformers/pytorch_utils.py +++ b/src/transformers/pytorch_utils.py @@ -32,9 +32,9 @@ is_torch_greater_or_equal_than_2_6 = is_torch_greater_or_equal("2.6", accept_dev is_torch_greater_or_equal_than_2_4 = is_torch_greater_or_equal("2.4", accept_dev=True) is_torch_greater_or_equal_than_2_3 = is_torch_greater_or_equal("2.3", accept_dev=True) is_torch_greater_or_equal_than_2_2 = is_torch_greater_or_equal("2.2", accept_dev=True) -is_torch_greater_or_equal_than_2_1 = is_torch_greater_or_equal("2.1", accept_dev=True) # For backwards compatibility (e.g. some remote codes on Hub using those variables). +is_torch_greater_or_equal_than_2_1 = is_torch_greater_or_equal("2.1", accept_dev=True) is_torch_greater_or_equal_than_2_0 = is_torch_greater_or_equal("2.0", accept_dev=True) is_torch_greater_or_equal_than_1_13 = is_torch_greater_or_equal("1.13", accept_dev=True) is_torch_greater_or_equal_than_1_12 = is_torch_greater_or_equal("1.12", accept_dev=True) diff --git a/src/transformers/quantizers/quantizer_fbgemm_fp8.py b/src/transformers/quantizers/quantizer_fbgemm_fp8.py index f0fa8f063b..7b84c4685c 100644 --- a/src/transformers/quantizers/quantizer_fbgemm_fp8.py +++ b/src/transformers/quantizers/quantizer_fbgemm_fp8.py @@ -11,11 +11,8 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. -import importlib from typing import TYPE_CHECKING, Any, Dict, List, Optional -from packaging import version - from .base import HfQuantizer @@ -48,9 +45,9 @@ class FbgemmFp8HfQuantizer(HfQuantizer): self.quantization_config = quantization_config def validate_environment(self, *args, **kwargs): - if not is_torch_available() or version.parse(importlib.metadata.version("torch")) < version.parse("2.1.0"): + if not is_torch_available(): raise ImportError( - "Using fbgemm fp8 quantization requires torch > 2.1.0" + "Using fbgemm fp8 quantization requires torch >= 2.1.0" "Please install the latest version of torch ( pip install --upgrade torch )" ) if not is_fbgemm_gpu_available(): diff --git a/src/transformers/quantizers/quantizer_finegrained_fp8.py b/src/transformers/quantizers/quantizer_finegrained_fp8.py index 16ce7f6a9e..cbfc2167b3 100644 --- a/src/transformers/quantizers/quantizer_finegrained_fp8.py +++ b/src/transformers/quantizers/quantizer_finegrained_fp8.py @@ -1,8 +1,5 @@ -import importlib from typing import TYPE_CHECKING, Any, Dict, List, Optional -from packaging import version - from ..utils import is_accelerate_available, is_torch_available, logging from .base import HfQuantizer from .quantizers_utils import get_module_from_name @@ -32,7 +29,7 @@ class FineGrainedFP8HfQuantizer(HfQuantizer): self.quantization_config = quantization_config def validate_environment(self, *args, **kwargs): - if not is_torch_available() or version.parse(importlib.metadata.version("torch")) < version.parse("2.1.0"): + if not is_torch_available(): raise ImportError( "Using fp8 quantization requires torch >= 2.1.0" "Please install the latest version of torch ( pip install --upgrade torch )" diff --git a/src/transformers/trainer.py b/src/transformers/trainer.py index 727c2ca7fe..d6915345fa 100755 --- a/src/transformers/trainer.py +++ b/src/transformers/trainer.py @@ -1902,22 +1902,14 @@ class Trainer: jit_model.forward = original_forward autocast_handler = AutocastKwargs(cache_enabled=False) with self.accelerator.autocast(autocast_handler=autocast_handler), torch.no_grad(): - if version.parse(version.parse(torch.__version__).base_version) >= version.parse("2.0.0"): - if isinstance(example_batch, dict): - jit_model = torch.jit.trace(jit_model, example_kwarg_inputs=example_batch, strict=False) - else: - jit_model = torch.jit.trace( - jit_model, - example_kwarg_inputs={key: example_batch[key] for key in example_batch}, - strict=False, - ) + if isinstance(example_batch, dict): + jit_model = torch.jit.trace(jit_model, example_kwarg_inputs=example_batch, strict=False) else: - jit_inputs = [] - for key in example_batch: - example_tensor = torch.ones_like(example_batch[key]) - jit_inputs.append(example_tensor) - jit_inputs = tuple(jit_inputs) - jit_model = torch.jit.trace(jit_model, jit_inputs, strict=False) + jit_model = torch.jit.trace( + jit_model, + example_kwarg_inputs={key: example_batch[key] for key in example_batch}, + strict=False, + ) jit_model = torch.jit.freeze(jit_model) with torch.no_grad(): jit_model(**example_batch) diff --git a/src/transformers/training_args.py b/src/transformers/training_args.py index 0e153df47a..4fc3a76329 100644 --- a/src/transformers/training_args.py +++ b/src/transformers/training_args.py @@ -24,7 +24,6 @@ from pathlib import Path from typing import Any, Optional, Union from huggingface_hub import get_full_repo_name -from packaging import version from .debug_utils import DebugOption from .trainer_utils import ( @@ -1290,7 +1289,7 @@ class TrainingArguments: default_optim = "adamw_torch" # XXX: enable when pytorch==2.0.1 comes out - we want to give it time to get all the bugs sorted out - # if is_torch_available() and version.parse(version.parse(torch.__version__).base_version) >= version.parse("2.1.0"): + # if is_torch_available(): # default_optim = "adamw_torch_fused" # and update the doc above to: # optim (`str` or [`training_args.OptimizerNames`], *optional*, defaults to `"adamw_torch_fused"` (for torch<2.1.0 `"adamw_torch"`): @@ -1732,12 +1731,6 @@ class TrainingArguments: FutureWarning, ) self.optim = OptimizerNames.ADAFACTOR - if self.optim == OptimizerNames.ADAMW_TORCH_FUSED and is_torch_available(): - if version.parse(version.parse(torch.__version__).base_version) < version.parse("2.0.0"): - raise ValueError("--optim adamw_torch_fused requires PyTorch 2.0 or higher") - # there is a bug in fp16/AMP in pt-2.0.0 - if version.parse(version.parse(torch.__version__).base_version) == version.parse("2.0.0") and self.fp16: - raise ValueError("--optim adamw_torch_fused with --fp16 requires PyTorch>2.0") # We need to setup the accelerator config here *before* the first call to `self.device` if is_accelerate_available(): diff --git a/src/transformers/utils/import_utils.py b/src/transformers/utils/import_utils.py index 61fb91e5a8..184b618e7e 100644 --- a/src/transformers/utils/import_utils.py +++ b/src/transformers/utils/import_utils.py @@ -379,15 +379,12 @@ def is_torch_sdpa_available(): elif _torch_version == "N/A": return False - # NOTE: We require torch>=2.1 (and not torch>=2.0) to use SDPA in Transformers for two reasons: - # - Allow the global use of the `scale` argument introduced in https://github.com/pytorch/pytorch/pull/95259 - # - Memory-efficient attention supports arbitrary attention_mask: https://github.com/pytorch/pytorch/pull/104310 # NOTE: MLU is OK with non-contiguous inputs. if is_torch_mlu_available(): - return version.parse(_torch_version) >= version.parse("2.1.0") + return True # NOTE: NPU can use SDPA in Transformers with torch>=2.1.0. if is_torch_npu_available(): - return version.parse(_torch_version) >= version.parse("2.1.0") + return True # NOTE: We require torch>=2.1.1 to avoid a numerical issue in SDPA with non-contiguous inputs: https://github.com/pytorch/pytorch/issues/112577 return version.parse(_torch_version) >= version.parse("2.1.1") @@ -833,7 +830,7 @@ def is_torchdynamo_available(): if not is_torch_available(): return False - return version.parse(_torch_version) >= version.parse("2.0.0") + return True def is_torch_compile_available(): diff --git a/tests/fsdp/test_fsdp.py b/tests/fsdp/test_fsdp.py index 48024298ed..68309d94ea 100644 --- a/tests/fsdp/test_fsdp.py +++ b/tests/fsdp/test_fsdp.py @@ -47,10 +47,7 @@ from transformers.utils import ( if is_torch_available(): - from transformers.pytorch_utils import is_torch_greater_or_equal_than_2_1 from transformers.trainer import FSDP_MODEL_NAME -else: - is_torch_greater_or_equal_than_2_1 = False # default torch.distributed port DEFAULT_MASTER_PORT = "10999" @@ -260,7 +257,6 @@ class TrainerIntegrationFSDP(TestCasePlus, TrainerIntegrationCommon): @require_torch_multi_accelerator @run_first @slow - @unittest.skipIf(not is_torch_greater_or_equal_than_2_1, reason="This test on pytorch 2.0 takes 4 hours.") def test_basic_run_with_cpu_offload(self, dtype): launcher = get_launcher(distributed=True, use_accelerate=False) output_dir = self.get_auto_remove_tmp_dir()