diff --git a/examples/seq2seq/README.md b/examples/seq2seq/README.md index e5a6f9da79..d0384bf274 100644 --- a/examples/seq2seq/README.md +++ b/examples/seq2seq/README.md @@ -14,7 +14,7 @@ wget https://s3.amazonaws.com/datasets.huggingface.co/summarization/xsum.tar.gz tar -xzvf xsum.tar.gz export XSUM_DIR=${PWD}/xsum ``` -this should make a directory called cnn_dm/ with files like `test.source`. +this should make a directory called `xsum/` with files like `test.source`. To use your own data, copy that files format. Each article to be summarized is on its own line. CNN/DailyMail data @@ -22,8 +22,8 @@ CNN/DailyMail data cd examples/seq2seq wget https://s3.amazonaws.com/datasets.huggingface.co/summarization/cnn_dm.tgz tar -xzvf cnn_dm.tgz - export CNN_DIR=${PWD}/cnn_dm +this should make a directory called `cnn_dm/` with files like `test.source`. ``` WMT16 English-Romanian Translation Data: @@ -32,6 +32,7 @@ cd examples/seq2seq wget https://s3.amazonaws.com/datasets.huggingface.co/translation/wmt_en_ro.tar.gz tar -xzvf wmt_en_ro.tar.gz export ENRO_DIR=${PWD}/wmt_en_ro +this should make a directory called `wmt_en_ro/` with files like `test.source`. ``` If you are using your own data, it must be formatted as one directory with 6 files: train.source, train.target, val.source, val.target, test.source, test.target.