TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Benchmark GGUF model with ONE line of code

6 点作者 alanzhuly7 个月前

1 comment

alanzhuly7 个月前
Hi Everyone!<p>We built an open-sourced tool to benchmark GGUF models with a single line of code. GitHub Link: <a href="https:&#x2F;&#x2F;github.com&#x2F;NexaAI&#x2F;nexa-sdk&#x2F;tree&#x2F;main&#x2F;nexa&#x2F;eval">https:&#x2F;&#x2F;github.com&#x2F;NexaAI&#x2F;nexa-sdk&#x2F;tree&#x2F;main&#x2F;nexa&#x2F;eval</a><p>Motivations:<p>GGUF quantization is crucial for running models locally on devices, but quantizations can dramatically affect model&#x27;s performance. It&#x27;s essential to test models post-quantization (how benchmark comes in clutch). But we noticed a couple of challenges:<p>1. No easy, fast way to benchmark quantized GGUF models locally or on self-hosted servers.<p>2. GGUF quantization evaluation results in the existing benchmarks are inconsistent, showing lower scores than the official results from model developers.<p>Our Solution - We built a tool that:<p>1. Benchmarks GGUF models with one line of code.<p>2. Supports multiprocessing and 8 evaluation tasks.<p>3. In our testing, it&#x27;s the fastest benchmark for GGUF models available.<p>Example:<p>Type below in terminal to benchmark Llama3.2-1B-Instruct Q4_K_M quant on the &quot;ifeval&quot; dataset for general language understanding. It took 80 minutes on a 4090 with 4 workers for multiprocessing.<p>nexa eval Llama3.2-1B-Instruct:q4_K_M --tasks ifeval --num_workers 4<p>We started with text models and plan to expand to more on-device models and modalities. Your feedback is welcome! If you find this useful, feel free to let us know on GitHub: <a href="https:&#x2F;&#x2F;github.com&#x2F;NexaAI&#x2F;nexa-sdk&#x2F;tree&#x2F;main&#x2F;nexa&#x2F;eval">https:&#x2F;&#x2F;github.com&#x2F;NexaAI&#x2F;nexa-sdk&#x2F;tree&#x2F;main&#x2F;nexa&#x2F;eval</a>