typos in seq2seq/readme (#5937)
This commit is contained in:
@@ -14,7 +14,7 @@ wget https://s3.amazonaws.com/datasets.huggingface.co/summarization/xsum.tar.gz
|
||||
tar -xzvf xsum.tar.gz
|
||||
export XSUM_DIR=${PWD}/xsum
|
||||
```
|
||||
this should make a directory called cnn_dm/ with files like `test.source`.
|
||||
this should make a directory called `xsum/` with files like `test.source`.
|
||||
To use your own data, copy that files format. Each article to be summarized is on its own line.
|
||||
|
||||
CNN/DailyMail data
|
||||
@@ -22,8 +22,8 @@ CNN/DailyMail data
|
||||
cd examples/seq2seq
|
||||
wget https://s3.amazonaws.com/datasets.huggingface.co/summarization/cnn_dm.tgz
|
||||
tar -xzvf cnn_dm.tgz
|
||||
|
||||
export CNN_DIR=${PWD}/cnn_dm
|
||||
this should make a directory called `cnn_dm/` with files like `test.source`.
|
||||
```
|
||||
|
||||
WMT16 English-Romanian Translation Data:
|
||||
@@ -32,6 +32,7 @@ cd examples/seq2seq
|
||||
wget https://s3.amazonaws.com/datasets.huggingface.co/translation/wmt_en_ro.tar.gz
|
||||
tar -xzvf wmt_en_ro.tar.gz
|
||||
export ENRO_DIR=${PWD}/wmt_en_ro
|
||||
this should make a directory called `wmt_en_ro/` with files like `test.source`.
|
||||
```
|
||||
|
||||
If you are using your own data, it must be formatted as one directory with 6 files: train.source, train.target, val.source, val.target, test.source, test.target.
|
||||
|
||||
Reference in New Issue
Block a user