From 2dcc5a16291dc959c06ed0fce8d3ddf93a99c98e Mon Sep 17 00:00:00 2001 From: Julien Chaumond Date: Mon, 2 Sep 2019 12:27:11 -0400 Subject: [PATCH] [doc] Add blurb about large-scale model downloads cc @n1t0 @lysandrejik @thomwolf --- docs/source/installation.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/source/installation.rst b/docs/source/installation.rst index 79d1d74a6a..6512a0cef3 100644 --- a/docs/source/installation.rst +++ b/docs/source/installation.rst @@ -52,6 +52,12 @@ If you want to reproduce the original tokenization process of the ``OpenAI GPT`` If you don't install ``ftfy`` and ``SpaCy``\ , the ``OpenAI GPT`` tokenizer will default to tokenize using BERT's ``BasicTokenizer`` followed by Byte-Pair Encoding (which should be fine for most usage, don't worry). +Note on model downloads (Continuous Integration or large-scale deployments) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +If you expect to be downloading large volumes of models (more than 1,000) from our hosted bucket (for instance through your CI setup, or a large-scale production deployment), please cache the model files on your end. It will be way faster, and cheaper. Feel free to contact us privately if you need any help. + + Do you want to run a Transformer model on a mobile device? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^