update cpu related doc (#20444)
This commit is contained in:
@@ -25,22 +25,15 @@ Check more detailed information for [Auto Mixed Precision](https://intel.github.
|
|||||||
|
|
||||||
IPEX release is following PyTorch, to install via pip:
|
IPEX release is following PyTorch, to install via pip:
|
||||||
|
|
||||||
For PyTorch-1.10:
|
| PyTorch Version | IPEX version |
|
||||||
|
| :---------------: | :----------: |
|
||||||
|
| 1.13 | 1.13.0+cpu |
|
||||||
|
| 1.12 | 1.12.300+cpu |
|
||||||
|
| 1.11 | 1.11.200+cpu |
|
||||||
|
| 1.10 | 1.10.100+cpu |
|
||||||
|
|
||||||
```
|
```
|
||||||
pip install intel_extension_for_pytorch==1.10.100+cpu -f https://software.intel.com/ipex-whl-stable
|
pip install intel_extension_for_pytorch==<version_name> -f https://developer.intel.com/ipex-whl-stable-cpu
|
||||||
```
|
|
||||||
|
|
||||||
For PyTorch-1.11:
|
|
||||||
|
|
||||||
```
|
|
||||||
pip install intel_extension_for_pytorch==1.11.200+cpu -f https://software.intel.com/ipex-whl-stable
|
|
||||||
```
|
|
||||||
|
|
||||||
For PyTorch-1.12:
|
|
||||||
|
|
||||||
```
|
|
||||||
pip install intel_extension_for_pytorch==1.12.300+cpu -f https://software.intel.com/ipex-whl-stable
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Check more approaches for [IPEX installation](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/installation.html).
|
Check more approaches for [IPEX installation](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/installation.html).
|
||||||
|
|||||||
@@ -27,15 +27,16 @@ Wheel files are available for the following Python versions:
|
|||||||
|
|
||||||
| Extension Version | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 | Python 3.10 |
|
| Extension Version | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 | Python 3.10 |
|
||||||
| :---------------: | :--------: | :--------: | :--------: | :--------: | :---------: |
|
| :---------------: | :--------: | :--------: | :--------: | :--------: | :---------: |
|
||||||
|
| 1.13.0 | | √ | √ | √ | √ |
|
||||||
| 1.12.100 | | √ | √ | √ | √ |
|
| 1.12.100 | | √ | √ | √ | √ |
|
||||||
| 1.12.0 | | √ | √ | √ | √ |
|
| 1.12.0 | | √ | √ | √ | √ |
|
||||||
| 1.11.0 | | √ | √ | √ | √ |
|
| 1.11.0 | | √ | √ | √ | √ |
|
||||||
| 1.10.0 | √ | √ | √ | √ | |
|
| 1.10.0 | √ | √ | √ | √ | |
|
||||||
|
|
||||||
```
|
```
|
||||||
pip install oneccl_bind_pt=={pytorch_version} -f https://software.intel.com/ipex-whl-stable
|
pip install oneccl_bind_pt=={pytorch_version} -f https://developer.intel.com/ipex-whl-stable-cpu
|
||||||
```
|
```
|
||||||
where `{pytorch_version}` should be your PyTorch version, for instance 1.12.0.
|
where `{pytorch_version}` should be your PyTorch version, for instance 1.13.0.
|
||||||
Check more approaches for [oneccl_bind_pt installation](https://github.com/intel/torch-ccl).
|
Check more approaches for [oneccl_bind_pt installation](https://github.com/intel/torch-ccl).
|
||||||
Versions of oneCCL and PyTorch must match.
|
Versions of oneCCL and PyTorch must match.
|
||||||
|
|
||||||
@@ -63,6 +64,10 @@ torch_ccl_path=$(python -c "import torch; import torch_ccl; import os; print(os
|
|||||||
source $torch_ccl_path/env/setvars.sh
|
source $torch_ccl_path/env/setvars.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
|
#### IPEX installation:
|
||||||
|
|
||||||
|
IPEX provides performance optimizations for CPU training with both Float32 and BFloat16, you could refer [single CPU section](./perf_train_cpu).
|
||||||
|
|
||||||
|
|
||||||
The following "Usage in Trainer" takes mpirun in Intel® MPI library as an example.
|
The following "Usage in Trainer" takes mpirun in Intel® MPI library as an example.
|
||||||
|
|
||||||
@@ -90,7 +95,8 @@ The following command enables training with 2 processes on one Xeon node, with o
|
|||||||
--doc_stride 128 \
|
--doc_stride 128 \
|
||||||
--output_dir /tmp/debug_squad/ \
|
--output_dir /tmp/debug_squad/ \
|
||||||
--no_cuda \
|
--no_cuda \
|
||||||
--xpu_backend ccl
|
--xpu_backend ccl \
|
||||||
|
--use_ipex
|
||||||
```
|
```
|
||||||
The following command enables training with a total of four processes on two Xeons (node0 and node1, taking node0 as the main process), ppn (processes per node) is set to 2, with one process running per one socket. The variables OMP_NUM_THREADS/CCL_WORKER_COUNT can be tuned for optimal performance.
|
The following command enables training with a total of four processes on two Xeons (node0 and node1, taking node0 as the main process), ppn (processes per node) is set to 2, with one process running per one socket. The variables OMP_NUM_THREADS/CCL_WORKER_COUNT can be tuned for optimal performance.
|
||||||
|
|
||||||
@@ -100,7 +106,7 @@ In node0, you need to create a configuration file which contains the IP addresse
|
|||||||
xxx.xxx.xxx.xxx #node0 ip
|
xxx.xxx.xxx.xxx #node0 ip
|
||||||
xxx.xxx.xxx.xxx #node1 ip
|
xxx.xxx.xxx.xxx #node1 ip
|
||||||
```
|
```
|
||||||
Now, run the following command in node0 and **4DDP** will be enabled in node0 and node1:
|
Now, run the following command in node0 and **4DDP** will be enabled in node0 and node1 with BF16 auto mixed precision:
|
||||||
```shell script
|
```shell script
|
||||||
export CCL_WORKER_COUNT=1
|
export CCL_WORKER_COUNT=1
|
||||||
export MASTER_ADDR=xxx.xxx.xxx.xxx #node0 ip
|
export MASTER_ADDR=xxx.xxx.xxx.xxx #node0 ip
|
||||||
@@ -118,5 +124,7 @@ Now, run the following command in node0 and **4DDP** will be enabled in node0 an
|
|||||||
--doc_stride 128 \
|
--doc_stride 128 \
|
||||||
--output_dir /tmp/debug_squad/ \
|
--output_dir /tmp/debug_squad/ \
|
||||||
--no_cuda \
|
--no_cuda \
|
||||||
--xpu_backend ccl
|
--xpu_backend ccl \
|
||||||
```
|
--use_ipex \
|
||||||
|
--bf16
|
||||||
|
```
|
||||||
|
|||||||
Reference in New Issue
Block a user