TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Gemma 3 QAT Models

4 pointsby mdp202123 days ago

2 comments

mdp202123 days ago
Also see<p># Smarter Local LLMs, Lower VRAM Costs – All Without Sacrificing Quality, Thanks to Google’s New [Quantization-Aware Training] &quot;QAT&quot; Optimization<p><a href="https:&#x2F;&#x2F;www.hardware-corner.net&#x2F;smarter-local-llm-lower-vram-20250419&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.hardware-corner.net&#x2F;smarter-local-llm-lower-vram...</a><p>&gt; <i>According to Google, they’ve «reduced the perplexity drop by 54% (using llama.cpp perplexity evaluation) when quantizing down to Q4_0.»</i>
philipkglass23 days ago
Are there comparisons between int4 QAT versions of these models and the more common GGUF Q4_K_M quantizations generated post-training? The QAT models appear to be slightly larger:<p><a href="https:&#x2F;&#x2F;ollama.com&#x2F;library&#x2F;gemma3&#x2F;tags">https:&#x2F;&#x2F;ollama.com&#x2F;library&#x2F;gemma3&#x2F;tags</a><p>I presume QAT are better but I don&#x27;t see how much better.
评论 #43738110 未加载