Gemma3 – The current strongest model that fits on a single GPU

252 pointsby brylie2 months ago

20 comments

archerx2 months ago

I have tried a lot of local models. I have 656GB of them on my computer so I have experience with a diverse array of LLMs. Gemma has been nothing to write home about and has been disappointing every single time I have used it.Models that are worth writing home about are;EXAONE-3.5-7.8B-Instruct - It was excellent at taking podcast transcriptions and generating show notes and summaries.Rocinante-12B-v2i - Fun for stories and D&DQwen2.5-Coder-14B-Instruct - Good for simple coding tasksOpenThinker-7B - Good and fast reasoningThe Deepseek destills - Able to handle more complex task while still being fastDeepHermes-3-Llama-3-8B - A really good vLLMMedical-Llama3-v2 - Very interesting but be carefulPlus more but not Gemma.

评论 #43341171 未加载

评论 #43341718 未加载

评论 #43341089 未加载

评论 #43341646 未加载

评论 #43341749 未加载

评论 #43341246 未加载

评论 #43342124 未加载

评论 #43341516 未加载

评论 #43341633 未加载

评论 #43341829 未加载

评论 #43341181 未加载

评论 #43341169 未加载

评论 #43341469 未加载

评论 #43342067 未加载

评论 #43342036 未加载

评论 #43342268 未加载

评论 #43343083 未加载

评论 #43341292 未加载

评论 #43342370 未加载

评论 #43342110 未加载

评论 #43342171 未加载

评论 #43341175 未加载

danielhanchen2 months ago

I wrote a mini guide on running Gemma 3 at <a href="https://docs.unsloth.ai/basics/tutorial-how-to-run-gemma-3-effectively">https://docs.unsloth.ai/basics/tutorial-how-to-run-gemma-3-e...</a>!The recommended settings according to the Gemma team are:temperature = 0.95top_p = 0.95top_k = 64Also beware of double BOS tokens! You can run my uploaded GGUFs with the recommended chat template and settings via ollama run hf.co/unsloth/gemma-3-27b-it-GGUF:Q4_K_M

评论 #43348716 未加载

评论 #43344790 未加载

评论 #43348918 未加载

swores2 months ago

See the other HN submission (for the Gemma3 technical report doc) for a more active discussion thread - 50 comments at time of writing this.<a href="https://news.ycombinator.com/item?id=43340491">https://news.ycombinator.com/item?id=43340491</a>

iamgopal2 months ago

Small Models should be train on specific problem in specific language, and should be built one upon another, the way container works. I see a future where a factory or home have local AI server which have many highly specific models, continuously being trained by super large LLM on the web, and are connected via network to all instruments and computer to basically control whole factory. I also see a future where all machinery comes with AI-Readable language for their own functioning. A http like AI protocol for two way communication between machine and an AI. Lots of possibility.

antirez2 months ago

After reading the technical report do the effort of downloading the model and run it against a few prompts. In 5 minutes you understand how broken LLM benchmarking is.

评论 #43341782 未加载

评论 #43341993 未加载

评论 #43341258 未加载

smcleod2 months ago

No mention of how well it's claimed to perform with tool calling?The Gemma series of models has historically been pretty poor when it comes to coding and tool calling - two things that are very important to agentic systems, so it will be interesting to see how 3 does in this regard.

评论 #43350630 未加载

mythz2 months ago

Not sure if anyone else experiences this, but ollama downloads starts off strong but the last few MBs take forever.Finally just finished downloading (gemma3:27b). Requires the latest version of Ollama to use, but now working, getting about 21 tok/s on my local 2x A4000.From my few test prompts looks like a quality model, going to run more tests to compare against mistral-small:24b to see if it's going to become my new local model.

评论 #43341124 未加载

评论 #43341070 未加载

评论 #43341255 未加载

elif2 months ago

Good job Google. It is kinda hilarious that 'open'AI seems to be the big player least likely to release any of their models.

评论 #43342003 未加载

wtcactus2 months ago

The claim of “strongest” (what does that even mean?) seems moot. I don’t think a multimodal model is the way to go to use on single, home, GPUs.I would much rather have specific tailored models to use in different scenarios, that could be loaded into the GPU when needed. It’s a waste of parameters to have half of the VRAM loaded with parts of the model targeting image generation when all I want to do is write code.

评论 #43341414 未加载

评论 #43342023 未加载

singularity20012 months ago

How does it compare to OlympicCoder 7B [0] which allegedly beats Claude Sonnet 3.7 in the International Olympiad in Informatics [1] ?[0] <a href="https://huggingface.co/open-r1/OlympicCoder-7B?local-app=vllm" rel="nofollow">https://huggingface.co/open-r1/OlympicCoder-7B?local-app=vll...</a>[1] <a href="https://pbs.twimg.com/media/GlyjSTtXYAAR188?format=jpg&name=4096x4096" rel="nofollow">https://pbs.twimg.com/media/GlyjSTtXYAAR188?format=jpg&name=...</a>

tarruda2 months ago

My usual non-scientific benchmark is asking it to implement the game Tetris in python, and then iterating with the LLM to fix/tweak it.My prompt to Gemma 27b (q4) on open webui + ollama: "Can you create the game tetris in python?"It immediately starts writing code. After the code is finished, I noticed something very strange, it starts a paragraph like this:" Key improvements and explanations:<pre><code> Clearer Code Structure: The code is now organized into a Tetris class, making it much more maintainable and readable. This is essential for any non-trivial game.</code></pre> "Followed by a bunch of fixes/improvements, as if this was not the first iteration of the script.I also notice a very obvious error: In the `if __name__ == '__main__':` block, it tries to instantiate a `Tetris` class, when the name of the class it created was "TetrisGame".Nevertheless, I try to run it and paste the `NameError: name 'Tetris' is not defined` error along with stack trace specifying the line. Gemma then gives me this response:"The error message "NameError: name 'Tetris' is not defined" means that the Python interpreter cannot find a class or function named Tetris. This usually happens when:"Then continues with a generic explanation with how to fix this error in arbitrary programs. It seems like it completely ignored the code it just wrote.

评论 #43343775 未加载

评论 #43343064 未加载

评论 #43342327 未加载

sigmoid102 months ago

These bar charts are getting more disingenuous every day. This one makes it seem like Gemma3 ranks as nr. 2 on the arena just behind the full DeepSeek R1. But they just cut out everything that ranks higher. In reality, R1 currently ranks as nr. 6 in terms of Elo. It's still impressive for such a small model to compete with much bigger models, but at this point you can't trust any publication by anyone who has any skin in model development.

评论 #43340997 未加载

评论 #43341260 未加载

评论 #43340972 未加载

leumon2 months ago

In my opinion qwq is the strongest model that fits on a single gpu (Rtx 3090 for example, in Q4_K_M quantization which is the standard in Ollama)

评论 #43341355 未加载

aravindputrevu2 months ago

I'm curious. Is there any value to do these OSS models?Suddenly after reasoning models, it looks like OSS models have lost their charm

评论 #43341766 未加载

chaosprint2 months ago

How does this compare with qwq 32B?

wewewedxfgdf2 months ago

Discrete GPUs are finished for AI.They've had years to provide the needed memory but can't/won't.The future of local LLMs is APUs such as Apple M series and AMD Strix Halo.Within 12 months everyone will have relegated discrete GPUs to the AI dustbin and be running 128GB to 512GB of delicious local RAM with vastly more RAM than any discrete GPU could dream of.

评论 #43341990 未加载

评论 #43342038 未加载

tekichan2 months ago

I found deepseek better for trivial tasks

casey22 months ago

coalma3

axiosgunnar2 months ago

PSA: DO NOT USE OLLAMA FOR TESTING.Ollama silently (!!!) drops messages if the context window is exceeded (instead of, you know, just erroring? who in the world made this decision).The workaround until now was to (not use ollama or) make sure to only send a single message. But now they seem to silently truncate single messages as well, instead of erroring! (this explains the sibling comment where a user could not reproduce the results locally).Use LM Studio, llama.cpp, openrouter or anything else, but stay away from ollama!

评论 #43344351 未加载

tarruda2 months ago

Is "OpenAI" the only AI company that hasn't released any model weights?

评论 #43341803 未加载

评论 #43341295 未加载

评论 #43341602 未加载

评论 #43341590 未加载

20 comments

archerx2 months ago

评论 #43341171 未加载

评论 #43341718 未加载

评论 #43341089 未加载

评论 #43341646 未加载

评论 #43341749 未加载

评论 #43341246 未加载

评论 #43342124 未加载

评论 #43341516 未加载

评论 #43341633 未加载

评论 #43341829 未加载

评论 #43341181 未加载

评论 #43341169 未加载

评论 #43341469 未加载

评论 #43342067 未加载

评论 #43342036 未加载

评论 #43342268 未加载

评论 #43343083 未加载

评论 #43341292 未加载

评论 #43342370 未加载

评论 #43342110 未加载

评论 #43342171 未加载

评论 #43341175 未加载

danielhanchen2 months ago

评论 #43348716 未加载

评论 #43344790 未加载

评论 #43348918 未加载

swores2 months ago

iamgopal2 months ago

antirez2 months ago

After reading the technical report do the effort of downloading the model and run it against a few prompts. In 5 minutes you understand how broken LLM benchmarking is.

评论 #43341782 未加载

评论 #43341993 未加载

评论 #43341258 未加载

smcleod2 months ago

评论 #43350630 未加载

mythz2 months ago

评论 #43341124 未加载

评论 #43341070 未加载

评论 #43341255 未加载

elif2 months ago

Good job Google. It is kinda hilarious that 'open'AI seems to be the big player least likely to release any of their models.

评论 #43342003 未加载

wtcactus2 months ago

评论 #43341414 未加载

评论 #43342023 未加载

singularity20012 months ago

tarruda2 months ago

评论 #43343775 未加载

评论 #43343064 未加载

评论 #43342327 未加载

sigmoid102 months ago

评论 #43340997 未加载

评论 #43341260 未加载

评论 #43340972 未加载

leumon2 months ago

In my opinion qwq is the strongest model that fits on a single gpu (Rtx 3090 for example, in Q4_K_M quantization which is the standard in Ollama)

评论 #43341355 未加载

aravindputrevu2 months ago

I'm curious. Is there any value to do these OSS models?Suddenly after reasoning models, it looks like OSS models have lost their charm