[bnb] Minor modifications (#18631)

* bnb minor modifications - refactor documentation - add troubleshooting README - add PyPi library on DockerFile * Apply suggestions from code review Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review * put in one block - put bash instructions in one block * update readme - refactor a bit hardware requirements * change text a bit * Apply suggestions from code review Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> * apply suggestions Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> * add link to paper * Apply suggestions from code review Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Update tests/mixed_int8/README.md * Apply suggestions from code review * refactor a bit * add instructions Turing & Amperer Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * add A6000 * clarify a bit * remove small part * Update tests/mixed_int8/README.md Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2022-08-17 00:48:10 +02:00
parent 25e651a2de
commit 6d175c1129
4 changed files with 154 additions and 58 deletions
--- a/tests/mixed_int8/README.md
+++ b/tests/mixed_int8/README.md
@@ -1,37 +1,120 @@
 # Testing mixed int8 quantization

+![HFxbitsandbytes.png](https://s3.amazonaws.com/moonup/production/uploads/1660567705337-62441d1d9fdefb55a0b7d12c.png)
+
+The following is the recipe on how to effectively debug `bitsandbytes` integration on Hugging Face `transformers`.
+
+## Library requirements
+
+ `transformers>=4.22.0`
+ `accelerate>=0.12.0` 
+ `bitsandbytes>=0.31.5`.
 ## Hardware requirements

-I am using a setup of 2 GPUs that are NVIDIA-Tesla T4 15GB
+The following instructions are tested with 2 NVIDIA-Tesla T4 GPUs. To run successfully `bitsandbytes` you would need a 8-bit core tensor supported GPU. Note that Turing, Ampere or newer architectures - e.g. T4, RTX20s RTX30s, A40-A100, A6000 should be supported. 

 ## Virutal envs

-```conda create --name int8-testing python==3.8```
-```git clone https://github.com/younesbelkada/transformers.git && git checkout integration-8bit```
-```pip install -e ".[dev]"```
-```pip install -i https://test.pypi.org/simple/ bitsandbytes```
-```pip install git+https://github.com/huggingface/accelerate.git@e0212893ea6098cc0a7a3c7a6eb286a9104214c1```
+```bash
+conda create --name int8-testing python==3.8
+pip install bitsandbytes>=0.31.5
+pip install accelerate>=0.12.0
+pip install transformers>=4.23.0
+```
+if `transformers>=4.23.0` is not released yet, then use:
+```
+pip install git+https://github.com/huggingface/transformers.git
+```

+## Troubleshooting

-## Trobleshooting
+A list of common errors:

-```conda create --name int8-testing python==3.8```
-```pip install -i https://test.pypi.org/simple/ bitsandbytes```
-```conda install pytorch torchvision torchaudio -c pytorch```
-```git clone https://github.com/younesbelkada/transformers.git && git checkout integration-8bit```
-```pip install -e ".[dev]"```
-```pip install git+https://github.com/huggingface/accelerate.git@b52b793ea8bac108ba61192eead3cf11ca02433d```
+### Torch does not correctly do the operations on GPU

-### Check driver settings:
+First check that:

+```py
+import torch
+
+vec = torch.randn(1, 2, 3).to(0)
+```
+
+Works without any error. If not, install torch using `conda` like:
+
+```bash
+conda create --name int8-testing python==3.8
+conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge
+pip install bitsandbytes>=0.31.5
+pip install accelerate>=0.12.0
+pip install transformers>=4.23.0
+```
+For the latest pytorch instructions please see [this](https://pytorch.org/get-started/locally/)
+
+and the snippet above should work.
+
+### ` bitsandbytes operations are not supported under CPU!`
+
+This happens when some Linear weights are set to the CPU when using `accelerate`. Please check carefully `model.hf_device_map` and make sure that there is no `Linear` module that is assigned to CPU. It is fine to have the last module (usually the Lm_head) set on CPU.
+
+### `To use the type as a Parameter, please correct the detach() semantics defined by __torch_dispatch__() implementation.`
+
+Use the latest version of `accelerate` with a command such as: `pip install -U accelerate` and the problem should be solved.
+
+### `Parameter has no attribue .CB` 
+
+Same solution as above.
+
+### `RuntimeError: CUDA error: an illegal memory access was encountered ... consider passing CUDA_LAUNCH_BLOCKING=1`
+
+Run your script by pre-pending `CUDA_LAUNCH_BLOCKING=1` and you should observe an error as described in the next section.
+
+### `CUDA illegal memory error: an illegal memory access at line...`:
+
+Check the CUDA verisons with:
 ```
 nvcc --version
 ```
-
+and confirm it is the same version as the one detected by `bitsandbytes`. If not, run:
 ```
 ls -l $CONDA_PREFIX/lib/libcudart.so
 ```
+or 
+```
+ls -l $LD_LIBRARY_PATH
+```
+Check if `libcudart.so` has a correct symlink that is set. Sometimes `nvcc` detects the correct CUDA version but `bitsandbytes` doesn't. You have to make sure that the symlink that is set for the file `libcudart.so` is redirected to the correct CUDA file. 

-### Recurrent bugs
+Here is an example of a badly configured CUDA installation:

-Sometimes you have to run a "dummy" inference pass when dealing with a multi-GPU setup. Checkout the ```test_multi_gpu_loading``` and the ```test_pipeline``` functions.
+`nvcc --version` gives:
+
+![Screenshot 2022-08-15 at 15.12.23.png](https://s3.amazonaws.com/moonup/production/uploads/1660569220888-62441d1d9fdefb55a0b7d12c.png)
+
+which means that the detected CUDA version is 11.3 but `bitsandbytes` outputs:
+
+![image.png](https://s3.amazonaws.com/moonup/production/uploads/1660569284243-62441d1d9fdefb55a0b7d12c.png)
+
+First check:
+
+```bash
+echo $LD_LIBRARY_PATH
+```
+
+If this contains multiple paths separated by `:`. Then you have to make sure that the correct CUDA version is set. By doing:
+
+```bash
+ls -l $path/libcudart.so
+```
+
+On each path (`$path`) separated by `:`.
+If not, simply run
+```bash
+ls -l $LD_LIBRARY_PATH/libcudart.so
+```
+
+and you can see
+
+![Screenshot 2022-08-15 at 15.12.33.png](https://s3.amazonaws.com/moonup/production/uploads/1660569176504-62441d1d9fdefb55a0b7d12c.png)
+
+If you see that the file is linked to the wrong CUDA version (here 10.2), find the correct location for `libcudart.so` (`find --name libcudart.so`) and replace the environment variable `LD_LIBRARY_PATH` with the one containing the correct `libcudart.so` file.