Fix W605 flake8 warning (x5).

This commit is contained in:
Aymeric Augustin
2019-12-21 18:03:57 +01:00
parent 7dce8dc7ac
commit 5eab3cf6bc
2 changed files with 4 additions and 4 deletions

View File

@@ -22,8 +22,8 @@
--model_name openai-gpt \
--do_train \
--do_eval \
--train_dataset $ROC_STORIES_DIR/cloze_test_val__spring2016\ -\ cloze_test_ALL_val.csv \
--eval_dataset $ROC_STORIES_DIR/cloze_test_test__spring2016\ -\ cloze_test_ALL_test.csv \
--train_dataset "$ROC_STORIES_DIR/cloze_test_val__spring2016 - cloze_test_ALL_val.csv" \
--eval_dataset "$ROC_STORIES_DIR/cloze_test_test__spring2016 - cloze_test_ALL_test.csv" \
--output_dir ../log \
--train_batch_size 16 \
"""

View File

@@ -725,10 +725,10 @@ class XLMTokenizer(PreTrainedTokenizer):
make && make install
pip install kytea
```
- [jieba](https://github.com/fxsjy/jieba): Chinese tokenizer *
- [jieba](https://github.com/fxsjy/jieba): Chinese tokenizer (*)
- Install with `pip install jieba`
\* The original XLM used [Stanford Segmenter](https://nlp.stanford.edu/software/stanford-segmenter-2018-10-16.zip).
(*) The original XLM used [Stanford Segmenter](https://nlp.stanford.edu/software/stanford-segmenter-2018-10-16.zip).
However, the wrapper (`nltk.tokenize.stanford_segmenter`) is slow due to JVM overhead, and it will be deprecated.
Jieba is a lot faster and pip-installable. Note there is some mismatch with the Stanford Segmenter. It should be fine
if you fine-tune the model with Chinese supervisionself. If you want the same exact behaviour, use the original XLM