New HIGGS quantization interfaces, JIT kernel compilation support. (#36148)

* new flute

* new higgs working

* small adjustments

* progress and quallity

* small updates

* style

---------

Co-authored-by: Andrey Panferov <panferov.andrey3@wb.ru>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
This commit is contained in:
Andrei Panferov
2025-02-14 12:26:45 +01:00
committed by GitHub
parent 15ec971b8e
commit 5f726f8b8e
5 changed files with 54 additions and 80 deletions

View File

@@ -65,12 +65,12 @@ class HiggsConfigTest(unittest.TestCase):
@require_accelerate
# @require_read_token
class HiggsTest(unittest.TestCase):
model_name = "meta-llama/Meta-Llama-3.1-8B"
model_name = "unsloth/Llama-3.2-1B"
input_text = "A quick brown fox jumps over the"
input_text = "Font test: A quick brown fox jumps over the"
max_new_tokens = 2
EXPECTED_OUTPUT = "A quick brown fox jumps over the lazy dog"
EXPECTED_OUTPUT = "Font test: A quick brown fox jumps over the lazy dog"
device_map = "cuda"