Fix W605 flake8 warning (x5).
This commit is contained in:
@@ -22,8 +22,8 @@
|
|||||||
--model_name openai-gpt \
|
--model_name openai-gpt \
|
||||||
--do_train \
|
--do_train \
|
||||||
--do_eval \
|
--do_eval \
|
||||||
--train_dataset $ROC_STORIES_DIR/cloze_test_val__spring2016\ -\ cloze_test_ALL_val.csv \
|
--train_dataset "$ROC_STORIES_DIR/cloze_test_val__spring2016 - cloze_test_ALL_val.csv" \
|
||||||
--eval_dataset $ROC_STORIES_DIR/cloze_test_test__spring2016\ -\ cloze_test_ALL_test.csv \
|
--eval_dataset "$ROC_STORIES_DIR/cloze_test_test__spring2016 - cloze_test_ALL_test.csv" \
|
||||||
--output_dir ../log \
|
--output_dir ../log \
|
||||||
--train_batch_size 16 \
|
--train_batch_size 16 \
|
||||||
"""
|
"""
|
||||||
|
|||||||
@@ -725,10 +725,10 @@ class XLMTokenizer(PreTrainedTokenizer):
|
|||||||
make && make install
|
make && make install
|
||||||
pip install kytea
|
pip install kytea
|
||||||
```
|
```
|
||||||
- [jieba](https://github.com/fxsjy/jieba): Chinese tokenizer *
|
- [jieba](https://github.com/fxsjy/jieba): Chinese tokenizer (*)
|
||||||
- Install with `pip install jieba`
|
- Install with `pip install jieba`
|
||||||
|
|
||||||
\* The original XLM used [Stanford Segmenter](https://nlp.stanford.edu/software/stanford-segmenter-2018-10-16.zip).
|
(*) The original XLM used [Stanford Segmenter](https://nlp.stanford.edu/software/stanford-segmenter-2018-10-16.zip).
|
||||||
However, the wrapper (`nltk.tokenize.stanford_segmenter`) is slow due to JVM overhead, and it will be deprecated.
|
However, the wrapper (`nltk.tokenize.stanford_segmenter`) is slow due to JVM overhead, and it will be deprecated.
|
||||||
Jieba is a lot faster and pip-installable. Note there is some mismatch with the Stanford Segmenter. It should be fine
|
Jieba is a lot faster and pip-installable. Note there is some mismatch with the Stanford Segmenter. It should be fine
|
||||||
if you fine-tune the model with Chinese supervisionself. If you want the same exact behaviour, use the original XLM
|
if you fine-tune the model with Chinese supervisionself. If you want the same exact behaviour, use the original XLM
|
||||||
|
|||||||
Reference in New Issue
Block a user