TechEcho

1 comment

Hi Everyone!We built an open-sourced tool to benchmark GGUF models with a single line of code. GitHub Link: <a href="https://github.com/NexaAI/nexa-sdk/tree/main/nexa/eval">https://github.com/NexaAI/nexa-sdk/tree/main/nexa/eval</a>Motivations:GGUF quantization is crucial for running models locally on devices, but quantizations can dramatically affect model's performance. It's essential to test models post-quantization (how benchmark comes in clutch). But we noticed a couple of challenges:1. No easy, fast way to benchmark quantized GGUF models locally or on self-hosted servers.2. GGUF quantization evaluation results in the existing benchmarks are inconsistent, showing lower scores than the official results from model developers.Our Solution - We built a tool that:1. Benchmarks GGUF models with one line of code.2. Supports multiprocessing and 8 evaluation tasks.3. In our testing, it's the fastest benchmark for GGUF models available.Example:Type below in terminal to benchmark Llama3.2-1B-Instruct Q4_K_M quant on the "ifeval" dataset for general language understanding. It took 80 minutes on a 4090 with 4 workers for multiprocessing.nexa eval Llama3.2-1B-Instruct:q4_K_M --tasks ifeval --num_workers 4We started with text models and plan to expand to more on-device models and modalities. Your feedback is welcome! If you find this useful, feel free to let us know on GitHub: <a href="https://github.com/NexaAI/nexa-sdk/tree/main/nexa/eval">https://github.com/NexaAI/nexa-sdk/tree/main/nexa/eval</a>

Benchmark GGUF model with ONE line of code

1 comment

Benchmark GGUF model with ONE line of code

1 comment