Support reading tiktoken tokenizer.model file (#31656)
* use existing TikTokenConverter to read tiktoken tokenizer.model file * del test file * create titktoken integration file * adding tiktoken llama test * ALTNATIVE IMPLEMENTATION: supports llama 405B * fix one char * remove redundant line * small fix * rm unused import * flag for converting from tiktokeng * remove unneeded file * ruff * remove llamatiktokenconverter, stick to general converter * tiktoken support v2 * update test * remove stale changes * udpate doc * protect import * use is_protobuf_available * add templateprocessor in tiktokenconverter * reverting templateprocessor from tiktoken support * update test * add require_tiktoken * dev-ci * trigger build * trigger build again * dev-ci * [build-ci-image] tiktoken * dev-ci * dev-ci * dev-ci * dev-ci * change tiktoken file name * feedback review * feedback rev * applying feedback, removing tiktoken converters * conform test * adding docs for review * add doc file for review * add doc file for review * add doc file for review * support loading model without config.json file * Revert "support loading model without config.json file" This reverts commit 2753602e51c34cef2f184eb11f36d2ad1b02babb. * remove dev var * updating docs * safely import protobuf * fix protobuf import error * fix protobuf import error * trying isort to fix ruff error * fix ruff error * try to fix ruff again * try to fix ruff again * try to fix ruff again * doc table of contents * add fix for consistency.dockerfile torchaudio * ruff * applying feedback * minor typo * merging with push-ci-image * clean up imports * revert dockerfile consistency
This commit is contained in:
3
setup.py
3
setup.py
@@ -99,6 +99,7 @@ _deps = [
|
||||
"accelerate>=0.26.0",
|
||||
"av==9.2.0", # Latest version of PyAV (10.0.0) has issues with audio stream.
|
||||
"beautifulsoup4",
|
||||
"blobfile",
|
||||
"codecarbon==1.2.0",
|
||||
"cookiecutter==1.7.3",
|
||||
"dataclasses",
|
||||
@@ -177,6 +178,7 @@ _deps = [
|
||||
"tensorflow-probability<0.24",
|
||||
"tf2onnx",
|
||||
"timeout-decorator",
|
||||
"tiktoken",
|
||||
"timm<=0.9.16",
|
||||
"tokenizers>=0.19,<0.20",
|
||||
"torch",
|
||||
@@ -311,6 +313,7 @@ extras["codecarbon"] = deps_list("codecarbon")
|
||||
extras["video"] = deps_list("decord", "av")
|
||||
|
||||
extras["sentencepiece"] = deps_list("sentencepiece", "protobuf")
|
||||
extras["tiktoken"] = deps_list("tiktoken", "blobfile")
|
||||
extras["testing"] = (
|
||||
deps_list(
|
||||
"pytest",
|
||||
|
||||
Reference in New Issue
Block a user