From 0d81fc853edac730067c0a2b3120dcc87ca6d15e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?R=C3=A9mi=20Louf?= Date: Tue, 15 Oct 2019 15:26:33 +0200 Subject: [PATCH] specify in readme that both datasets are required --- examples/README.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/examples/README.md b/examples/README.md index ba58a61012..e0fe1fc704 100644 --- a/examples/README.md +++ b/examples/README.md @@ -395,13 +395,17 @@ This fine-tuned model is available as a checkpoint under the reference Based on the script [`run_seq2seq_finetuning.py`](https://github.com/huggingface/transformers/blob/master/examples/run_seq2seq_finetuning.py). -Before running this script you should download **both** CNN and Daily Mail datasets (the links next to "Stories") from [Kyunghyun Cho's website](https://cs.nyu.edu/~kcho/DMQA/) in the same folder. Then uncompress the archives by running: +Before running this script you should download **both** CNN and Daily Mail +datasets from [Kyunghyun Cho's website](https://cs.nyu.edu/~kcho/DMQA/) (the +links next to "Stories") in the same folder. Then uncompress the archives by running: ```bash tar -xvf cnn_stories.tgz && tar -xvf dailymail_stories.tgz ``` -We will refer as `$DATA_PATH` the path to where you uncompressed both archive. +note that the finetuning script **will not work** if you do not download both +datasets. We will refer as `$DATA_PATH` the path to where you uncompressed both +archive. ## Bert2Bert and abstractive summarization @@ -414,4 +418,4 @@ python run_seq2seq_finetuning.py \ --model_name_or_path=bert2bert \ --do_train \ --data_path=$DATA_PATH \ -``` \ No newline at end of file +```