typos in seq2seq/readme (#5937)

This commit is contained in:
Aditya Soni
2020-07-21 19:14:59 +05:30
committed by GitHub
parent d32279438a
commit ccbf74a685

View File

@@ -14,7 +14,7 @@ wget https://s3.amazonaws.com/datasets.huggingface.co/summarization/xsum.tar.gz
tar -xzvf xsum.tar.gz tar -xzvf xsum.tar.gz
export XSUM_DIR=${PWD}/xsum export XSUM_DIR=${PWD}/xsum
``` ```
this should make a directory called cnn_dm/ with files like `test.source`. this should make a directory called `xsum/` with files like `test.source`.
To use your own data, copy that files format. Each article to be summarized is on its own line. To use your own data, copy that files format. Each article to be summarized is on its own line.
CNN/DailyMail data CNN/DailyMail data
@@ -22,8 +22,8 @@ CNN/DailyMail data
cd examples/seq2seq cd examples/seq2seq
wget https://s3.amazonaws.com/datasets.huggingface.co/summarization/cnn_dm.tgz wget https://s3.amazonaws.com/datasets.huggingface.co/summarization/cnn_dm.tgz
tar -xzvf cnn_dm.tgz tar -xzvf cnn_dm.tgz
export CNN_DIR=${PWD}/cnn_dm export CNN_DIR=${PWD}/cnn_dm
this should make a directory called `cnn_dm/` with files like `test.source`.
``` ```
WMT16 English-Romanian Translation Data: WMT16 English-Romanian Translation Data:
@@ -32,6 +32,7 @@ cd examples/seq2seq
wget https://s3.amazonaws.com/datasets.huggingface.co/translation/wmt_en_ro.tar.gz wget https://s3.amazonaws.com/datasets.huggingface.co/translation/wmt_en_ro.tar.gz
tar -xzvf wmt_en_ro.tar.gz tar -xzvf wmt_en_ro.tar.gz
export ENRO_DIR=${PWD}/wmt_en_ro export ENRO_DIR=${PWD}/wmt_en_ro
this should make a directory called `wmt_en_ro/` with files like `test.source`.
``` ```
If you are using your own data, it must be formatted as one directory with 6 files: train.source, train.target, val.source, val.target, test.source, test.target. If you are using your own data, it must be formatted as one directory with 6 files: train.source, train.target, val.source, val.target, test.source, test.target.