From a1ecc90d6b056e768f22535eead7963c6394163c Mon Sep 17 00:00:00 2001 From: Sam Shleifer Date: Thu, 8 Oct 2020 14:12:39 -0400 Subject: [PATCH] [pseudo] Switch URLS to CDN (#7661) --- examples/seq2seq/precomputed_pseudo_labels.md | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/examples/seq2seq/precomputed_pseudo_labels.md b/examples/seq2seq/precomputed_pseudo_labels.md index d6788f38aa..3c179825fd 100644 --- a/examples/seq2seq/precomputed_pseudo_labels.md +++ b/examples/seq2seq/precomputed_pseudo_labels.md @@ -4,24 +4,24 @@ These are the generations of various large models on various large **training** ### Available Pseudo-labels | Dataset | Model | Link | Rouge Scores | Notes |---------|-----------------------------|----------------------------------------------------------------------------------------|--------------------|------------------------------------------------------------------------------------------------------------- -| XSUM | `facebook/bart-large-xsum` | [download](https://s3.amazonaws.com/datasets.huggingface.co/pseudo/xsum/bart_xsum_pl.tgz) | 49.8/28.0/42.5 | -| XSUM | `google/pegasus-xsum` | [download](https://s3.amazonaws.com/datasets.huggingface.co/pseudo/xsum/pegasus_xsum.tgz) | 53.3/32.7/46.5 | -| XSUM | `facebook/bart-large-xsum` | [download](https://s3.amazonaws.com/datasets.huggingface.co/pseudo/xsum/xsum_pl2_bart.tgz) | | Bart pseudolabels filtered to those with Rouge2 > 10.0 w GT. -| CNN/DM | `sshleifer/pegasus-cnn-ft-v2` | [download](https://s3.amazonaws.com/datasets.huggingface.co/pseudo/cnn_dm/pegasus_cnn_cnn_pls.tgz) | 47.316/26.65/44.56 | do not worry about the fact that train.source is one line shorter. -| CNN/DM | `facebook/bart-large-cnn` | [download](https://s3.amazonaws.com/datasets.huggingface.co/pseudo/cnn_dm/cnn_bart_pl.tgz) | | 5K (2%) are missing, there should be 282173 -| CNN/DM | `google/pegasus-xsum` | [download](https://s3.amazonaws.com/datasets.huggingface.co/pseudo/cnn_dm/pegasus_xsum_on_cnn.tgz) | 21.5/6.76/25 | extra labels for xsum distillation Used max_source_length=512, (and all other pegasus-xsum configuration). -| EN-RO | `Helsinki-NLP/opus-mt-en-ro` | [download](https://s3.amazonaws.com/datasets.huggingface.co/pseudo/wmt_en_ro/opus_mt_en_ro.tgz) | | -| EN-RO | `facebook/mbart-large-en-ro` | [download](https://s3.amazonaws.com/datasets.huggingface.co/pseudo/wmt_en_ro/mbart_large_en_ro.tgz) | | +| XSUM | `facebook/bart-large-xsum` | [download](https://cdn-datasets.huggingface.co/pseudo/xsum/bart_xsum_pl.tgz) | 49.8/28.0/42.5 | +| XSUM | `google/pegasus-xsum` | [download](https://cdn-datasets.huggingface.co/pseudo/xsum/pegasus_xsum.tgz) | 53.3/32.7/46.5 | +| XSUM | `facebook/bart-large-xsum` | [download](https://cdn-datasets.huggingface.co/pseudo/xsum/xsum_pl2_bart.tgz) | | Bart pseudolabels filtered to those with Rouge2 > 10.0 w GT. +| CNN/DM | `sshleifer/pegasus-cnn-ft-v2` | [download](https://cdn-datasets.huggingface.co/pseudo/cnn_dm/pegasus_cnn_cnn_pls.tgz) | 47.316/26.65/44.56 | do not worry about the fact that train.source is one line shorter. +| CNN/DM | `facebook/bart-large-cnn` | [download](https://cdn-datasets.huggingface.co/pseudo/cnn_dm/cnn_bart_pl.tgz) | | 5K (2%) are missing, there should be 282173 +| CNN/DM | `google/pegasus-xsum` | [download](https://cdn-datasets.huggingface.co/pseudo/cnn_dm/pegasus_xsum_on_cnn.tgz) | 21.5/6.76/25 | extra labels for xsum distillation Used max_source_length=512, (and all other pegasus-xsum configuration). +| EN-RO | `Helsinki-NLP/opus-mt-en-ro` | [download](https://cdn-datasets.huggingface.co/pseudo/wmt_en_ro/opus_mt_en_ro.tgz) | | +| EN-RO | `facebook/mbart-large-en-ro` | [download](https://cdn-datasets.huggingface.co/pseudo/wmt_en_ro/mbart_large_en_ro.tgz) | | (EN_RO = WMT 2016 English-Romanian). Example Download Command: ```bash -curl -S https://s3.amazonaws.com/datasets.huggingface.co/pseudo/xsum/bart_xsum_pl.tgz | tar -xvz -C . +curl -S https://cdn-datasets.huggingface.co/pseudo/xsum/bart_xsum_pl.tgz | tar -xvz -C . ``` ### Generating New Pseudolabels -Here is the command I used to generate the pseudolabels in the second row of the table, after downloading XSUM from [here](https://s3.amazonaws.com/datasets.huggingface.co/summarization/xsum.tar.gz). +Here is the command I used to generate the pseudolabels in the second row of the table, after downloading XSUM from [here](https://cdn-datasets.huggingface.co/summarization/xsum.tar.gz). ```bash python -m torch.distributed.launch --nproc_per_node=8 run_distributed_eval.py \