update doc (#34478)
* update doc * Update docs/source/en/perf_train_cpu.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * delete closing tip --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
This commit is contained in:
@@ -30,46 +30,32 @@ Check more detailed information for [oneccl_bind_pt](https://github.com/intel/to
|
||||
|
||||
Wheel files are available for the following Python versions:
|
||||
|
||||
| Extension Version | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 | Python 3.10 |
|
||||
| :---------------: | :--------: | :--------: | :--------: | :--------: | :---------: |
|
||||
| 2.1.0 | | √ | √ | √ | √ |
|
||||
| 2.0.0 | | √ | √ | √ | √ |
|
||||
| 1.13.0 | | √ | √ | √ | √ |
|
||||
| 1.12.100 | | √ | √ | √ | √ |
|
||||
| 1.12.0 | | √ | √ | √ | √ |
|
||||
| Extension Version | Python 3.7 | Python 3.8 | Python 3.9 | Python 3.10 | Python 3.11 |
|
||||
| :---------------: | :--------: | :--------: | :--------: | :---------: | :---------: |
|
||||
| 2.5.0 | | √ | √ | √ | √ |
|
||||
| 2.4.0 | | √ | √ | √ | √ |
|
||||
| 2.3.0 | | √ | √ | √ | √ |
|
||||
| 2.2.0 | | √ | √ | √ | √ |
|
||||
|
||||
Please run `pip list | grep torch` to get your `pytorch_version`.
|
||||
```bash
|
||||
pip install oneccl_bind_pt=={pytorch_version} -f https://developer.intel.com/ipex-whl-stable-cpu
|
||||
```
|
||||
where `{pytorch_version}` should be your PyTorch version, for instance 2.1.0.
|
||||
where `{pytorch_version}` should be your PyTorch version, for instance 2.4.0.
|
||||
Check more approaches for [oneccl_bind_pt installation](https://github.com/intel/torch-ccl).
|
||||
Versions of oneCCL and PyTorch must match.
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
oneccl_bindings_for_pytorch 1.12.0 prebuilt wheel does not work with PyTorch 1.12.1 (it is for PyTorch 1.12.0)
|
||||
PyTorch 1.12.1 should work with oneccl_bindings_for_pytorch 1.12.100
|
||||
|
||||
</Tip>
|
||||
|
||||
## Intel® MPI library
|
||||
Use this standards-based MPI implementation to deliver flexible, efficient, scalable cluster messaging on Intel® architecture. This component is part of the Intel® oneAPI HPC Toolkit.
|
||||
|
||||
oneccl_bindings_for_pytorch is installed along with the MPI tool set. Need to source the environment before using it.
|
||||
|
||||
for Intel® oneCCL >= 1.12.0
|
||||
```bash
|
||||
oneccl_bindings_for_pytorch_path=$(python -c "from oneccl_bindings_for_pytorch import cwd; print(cwd)")
|
||||
source $oneccl_bindings_for_pytorch_path/env/setvars.sh
|
||||
```
|
||||
|
||||
for Intel® oneCCL whose version < 1.12.0
|
||||
```bash
|
||||
torch_ccl_path=$(python -c "import torch; import torch_ccl; import os; print(os.path.abspath(os.path.dirname(torch_ccl.__file__)))")
|
||||
source $torch_ccl_path/env/setvars.sh
|
||||
```
|
||||
|
||||
#### Intel® Extension for PyTorch installation
|
||||
|
||||
Intel Extension for PyTorch (IPEX) provides performance optimizations for CPU training with both Float32 and BFloat16 (refer to the [single CPU section](./perf_train_cpu) to learn more).
|
||||
@@ -155,7 +141,7 @@ This example assumes that you have:
|
||||
The snippet below is an example of a Dockerfile that uses a base image that supports distributed CPU training and then
|
||||
extracts a Transformers release to the `/workspace` directory, so that the example scripts are included in the image:
|
||||
```dockerfile
|
||||
FROM intel/intel-optimized-pytorch:2.3.0-pip-multinode
|
||||
FROM intel/intel-optimized-pytorch:2.4.0-pip-multinode
|
||||
|
||||
RUN apt-get update -y && \
|
||||
apt-get install -y --no-install-recommends --fix-missing \
|
||||
@@ -165,7 +151,7 @@ RUN apt-get update -y && \
|
||||
WORKDIR /workspace
|
||||
|
||||
# Download and extract the transformers code
|
||||
ARG HF_TRANSFORMERS_VER="4.44.0"
|
||||
ARG HF_TRANSFORMERS_VER="4.46.0"
|
||||
RUN pip install --no-cache-dir \
|
||||
transformers==${HF_TRANSFORMERS_VER} && \
|
||||
mkdir transformers && \
|
||||
@@ -319,4 +305,4 @@ with the job, the PyTorchJob resource can be deleted from the cluster using `kub
|
||||
|
||||
This guide covered running distributed PyTorch training jobs using multiple CPUs on bare metal and on a Kubernetes
|
||||
cluster. Both cases utilize Intel Extension for PyTorch and Intel oneCCL Bindings for PyTorch for optimal training
|
||||
performance, and can be used as a template to run your own workload on multiple nodes.
|
||||
performance, and can be used as a template to run your own workload on multiple nodes.
|
||||
|
||||
Reference in New Issue
Block a user