Add sudachi and jumanpp tokenizers for bert_japanese (#19043)
* add sudachipy and jumanpp tokenizers for bert_japanese * use ImportError instead of ModuleNotFoundError in SudachiTokenizer and JumanppTokenizer * put test cases of test_tokenization_bert_japanese in one line * add require_sudachi and require_jumanpp decorator for testing * add sudachi and pyknp(jumanpp) to dependencies * remove sudachi_dict_small and sudachi_dict_full from dependencies * empty commit for ci
This commit is contained in:
@@ -409,6 +409,16 @@ jobs:
|
||||
keys:
|
||||
- v0.5-custom_tokenizers-{{ checksum "setup.py" }}
|
||||
- v0.5-custom_tokenizers-
|
||||
- run: sudo apt-get -y update && sudo apt-get install -y cmake
|
||||
- run:
|
||||
name: install jumanpp
|
||||
command: |
|
||||
wget https://github.com/ku-nlp/jumanpp/releases/download/v2.0.0-rc3/jumanpp-2.0.0-rc3.tar.xz
|
||||
tar xvf jumanpp-2.0.0-rc3.tar.xz
|
||||
mkdir jumanpp-2.0.0-rc3/bld
|
||||
cd jumanpp-2.0.0-rc3/bld
|
||||
sudo cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr/local
|
||||
sudo make install
|
||||
- run: pip install --upgrade pip
|
||||
- run: pip install .[ja,testing,sentencepiece,jieba,spacy,ftfy,rjieba]
|
||||
- run: python -m unidic download
|
||||
|
||||
Reference in New Issue
Block a user