[s2s] distill t5-large -> t5-small (#8376)

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
This commit is contained in:
Sumithra Bhakthavatsalam
2020-11-11 14:58:45 -08:00
committed by GitHub
parent a5b682329c
commit 81ebd70671
4 changed files with 108 additions and 67 deletions

View File

@@ -380,7 +380,7 @@ cp xsum/test* all_pl
then use `all_pl` as DATA in the command above.
#### Direct Knowledge Distillation (KD)
+ In this method, we use try to enforce that the student and teacher produce similar encoder_outputs, logits, and hidden_states using `BartSummarizationDistiller`.
+ In this method, we use try to enforce that the student and teacher produce similar encoder_outputs, logits, and hidden_states using `SummarizationDistiller`.
+ This method was used for `sshleifer/distilbart-xsum-12-6`, `6-6`, and `9-6` checkpoints were produced.
+ You must use [`distillation.py`](./distillation.py). Note that this command initializes the student for you.