From df311a5ccf50be3031474e289b43b1be43111144 Mon Sep 17 00:00:00 2001 From: Stas Bekman Date: Fri, 4 Dec 2020 15:43:35 -0800 Subject: [PATCH] [seq2seq] document the caveat of leaky native amp (#8930) * document the caveat of leaky native amp * Update examples/seq2seq/README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --- examples/seq2seq/README.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/examples/seq2seq/README.md b/examples/seq2seq/README.md index d025d46c97..6ac3cf8d7d 100644 --- a/examples/seq2seq/README.md +++ b/examples/seq2seq/README.md @@ -79,6 +79,11 @@ test.target ``` The `.source` files are the input, the `.target` files are the desired output. +### Potential issues + +- native AMP (`--fp16` and no apex) may lead to a huge memory leak and require 10x gpu memory. This has been fixed in pytorch-nightly and the minimal official version to have this fix will be pytorch-1.8. Until then if you have to use mixed precision please use AMP only with pytorch-nightly or NVIDIA's apex. Reference: https://github.com/huggingface/transformers/issues/8403 + + ### Tips and Tricks General Tips: @@ -592,4 +597,3 @@ The feature is still experimental, because: + we can make it much more robust if we have memory mapped/preprocessed datasets. + The speedup over sortish sampler is not that large at the moment. -