HuggingFace_transformer

Files

Joshua Lochner 6e2d04e429 Fix slow GemmaTokenizer and improve SPM slow -> fast conversion process (#32191 )

* Remove user-defined tokens which can be obtained through merges

* Remove debug line

* formatting

* Refactor spm slow -> fast converter

* revert unnecessary refactor

* set comprehension

* remove test files

* Use `vocab_scores`

* Always replace spiece underline with space in decode

* we no longer need token filtering

* Add save fast load slow unit test

* Remove tokenizers version check

* Remove duplicate code

* Make `<start_of_turn>` and `<end_of_turn>` special tokens

* Bias merge priority with length if score is the same

* Add unit test for merge priority

* CI

2024-07-30 23:36:38 +02:00

__init__.py

[ gemma] Adds support for Gemma 💎 (#29167 )

2024-02-21 14:21:28 +01:00

test_modeling_flax_gemma.py

FIX [Gemma / CI] Make sure our runners have access to the model (#29242 )

2024-02-28 06:25:23 +01:00

test_modeling_gemma.py

Tests: remove cuda versions when the result is the same 🧹🧹 (#31955 )

2024-07-16 16:49:54 +01:00

test_tokenization_gemma.py

Fix slow GemmaTokenizer and improve SPM slow -> fast conversion process (#32191 )

2024-07-30 23:36:38 +02:00