[s2s] distill t5-large -> t5-small (#8376)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
This commit is contained in:
committed by
GitHub
parent
a5b682329c
commit
81ebd70671
@@ -380,7 +380,7 @@ cp xsum/test* all_pl
|
||||
then use `all_pl` as DATA in the command above.
|
||||
|
||||
#### Direct Knowledge Distillation (KD)
|
||||
+ In this method, we use try to enforce that the student and teacher produce similar encoder_outputs, logits, and hidden_states using `BartSummarizationDistiller`.
|
||||
+ In this method, we use try to enforce that the student and teacher produce similar encoder_outputs, logits, and hidden_states using `SummarizationDistiller`.
|
||||
+ This method was used for `sshleifer/distilbart-xsum-12-6`, `6-6`, and `9-6` checkpoints were produced.
|
||||
+ You must use [`distillation.py`](./distillation.py). Note that this command initializes the student for you.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user