Files

Cyril Vallez d8f6d3790a ⚠️⚠️ Use dtype instead of torch_dtype everywhere! (#39782 )

* update everywhere

* style

* pipelines

* switch it everywhere in tests

* switch it everywhere in docs

* switch in converters everywhere

* update in examples

* update in model docstrings

* style

* warnings

* style

* Update configuration_utils.py

* fix

* Update configuration_utils.py

* fixes and add first test

* add pipeline tests

* Update test_pipelines_common.py

* add config test

* Update test_modeling_common.py

* add new ones

* post rebase

* add new

* post rebase adds

2025-08-22 12:34:16 +02:00

benches

⚠️⚠️ Use dtype instead of torch_dtype everywhere! (#39782 )

2025-08-22 12:34:16 +02:00

config

⚠️⚠️ Use dtype instead of torch_dtype everywhere! (#39782 )

2025-08-22 12:34:16 +02:00

utils

Benchmarking improvements (#39768 )

2025-08-15 15:59:11 +02:00

__init__.py

[Benchmark] Reuse optimum-benchmark (#30615 )

2024-05-21 15:15:19 +02:00

.gitignore

Benchmarking improvements (#39768 )

2025-08-15 15:59:11 +02:00

benchmark.py

Fix typos in comments (#37694 )

2025-04-24 15:59:56 +01:00

benchmarks_entrypoint.py

Benchmarking improvements (#39768 )

2025-08-15 15:59:11 +02:00

default.yml

feat: add benchmarks_entrypoint.py (#34495 )

2024-12-18 18:59:07 +01:00

grafana_dashboard.json

feat: add benchmarks_entrypoint.py (#34495 )

2024-12-18 18:59:07 +01:00

grafana_datasource.yaml

feat: add benchmarks_entrypoint.py (#34495 )

2024-12-18 18:59:07 +01:00

optimum_benchmark_wrapper.py

[Benchmark] Reuse optimum-benchmark (#30615 )

2024-05-21 15:15:19 +02:00

README.md

Fix some typos about benchmark scripts. (#37027 )

2025-03-28 14:10:20 +00:00

requirements.txt

Benchmarking improvements (#39768 )

2025-08-15 15:59:11 +02:00

README.md

Benchmarks

You might want to add new benchmarks.

You will need to define a python function named run_benchmark in your python file and the file must be located in this benchmark/ directory.

The expected function signature is the following:

def run_benchmark(logger: Logger, branch: str, commit_id: str, commit_msg: str, num_tokens_to_generate=100):

Writing metrics to the database

MetricsRecorder is thread-safe, in the sense of the python Thread. This means you can start a background thread to do the readings on the device measurements while not blocking the main thread to execute the model measurements.

cf llama.py to see an example of this in practice.

from benchmarks_entrypoint import MetricsRecorder
import psycopg2

def run_benchmark(logger: Logger, branch: str, commit_id: str, commit_msg: str, num_tokens_to_generate=100):
  metrics_recorder = MetricsRecorder(psycopg2.connect("dbname=metrics"), logger, branch, commit_id, commit_msg)
  benchmark_id = metrics_recorder.initialise_benchmark({"gpu_name": gpu_name, "model_id": model_id})
    # To collect device measurements
    metrics_recorder.collect_device_measurements(
        benchmark_id, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes
    )
    # To collect your model measurements
    metrics_recorder.collect_model_measurements(
        benchmark_id,
        {
            "model_load_time": model_load_time,
            "first_eager_forward_pass_time_secs": first_eager_fwd_pass_time,
            "second_eager_forward_pass_time_secs": second_eager_fwd_pass_time,
            "first_eager_generate_time_secs": first_eager_generate_time,
            "second_eager_generate_time_secs": second_eager_generate_time,
            "time_to_first_token_secs": time_to_first_token,
            "time_to_second_token_secs": time_to_second_token,
            "time_to_third_token_secs": time_to_third_token,
            "time_to_next_token_mean_secs": mean_time_to_next_token,
            "first_compile_generate_time_secs": first_compile_generate_time,
            "second_compile_generate_time_secs": second_compile_generate_time,
            "third_compile_generate_time_secs": third_compile_generate_time,
            "fourth_compile_generate_time_secs": fourth_compile_generate_time,
        },
    )