Mistral Small 3

620 pointsby jasondavies4 months ago

32 comments

simonw4 months ago

I'm excited about this one - they seem to be directly targeting the "best model to run on a decent laptop" category, hence the comparison with Llama 3.3 70B and Qwen 2.5 32B.I'm running it on a M2 64GB MacBook Pro now via Ollama and it's fast and appears to be very capable. This downloads 14GB of model weights:<pre><code> ollama run mistral-small:24b </code></pre> Then using my <a href="https://llm.datasette.io/" rel="nofollow">https://llm.datasette.io/</a> tool (so I can log my prompts to SQLite):<pre><code> llm install llm-ollama llm -m mistral-small:24b "say hi" </code></pre> More notes here: <a href="https://simonwillison.net/2025/Jan/30/mistral-small-3/" rel="nofollow">https://simonwillison.net/2025/Jan/30/mistral-small-3/</a>

评论 #42879312 未加载

评论 #42879137 未加载

评论 #42881824 未加载

评论 #42881856 未加载

评论 #42881944 未加载

评论 #42880635 未加载

评论 #42881042 未加载

评论 #42879899 未加载

asb4 months ago

Note the announcement at the end, that they're moving away from the non-commercial only license used in some of their models in favour of Apache:We’re renewing our commitment to using Apache 2.0 license for our general purpose models, as we progressively move away from MRL-licensed models

评论 #42878509 未加载

评论 #42878793 未加载

评论 #42879838 未加载

评论 #42878783 未加载

评论 #42879884 未加载

评论 #42882826 未加载

tadamcz4 months ago

Hi! I'm Tom, a machine learning engineer at the nonprofit research institute Epoch AI [0]. I've been working on building infrastructure to:* run LLM evaluations systematically and at scale* share the data with the public in a rigorous and transparent wayWe use the UK government's Inspect [1] library to run the evaluations.As soon as I saw this news on HN, I evaluated Mistral Small 3 on MATH [2] level 5 (hardest subset, 1,324 questions). I get an accuracy of 0.45 (± 0.011). We sample the LLM 8 times for each question, which lets us obtain less noisy estimates of mean accuracy, and measure the consistency of the LLM's answers. The 1,324*8=10,584 samples represent 8.5M tokens (2M in, 6.5M out).You can see the full transcripts here in Inspect’s interactive interface: <a href="https://epoch.ai/inspect-viewer/484131e0/viewer?log_file=https%3A%2F%2Fepoch-benchmarks-production-public.s3.us-east-2.amazonaws.com%2Finspect_ai_logs%2FNbsnvBsMoMizozbPZY8LLb.eval" rel="nofollow">https://epoch.ai/inspect-viewer/484131e0/viewer?log_file=htt...</a>Note that MATH is a different benchmark from the MathInstruct [3] mentioned in the OP.It's still early days for Epoch AI's benchmarking work. I'm developing a systematic database of evaluations run directly by us (so we can share the full details transparently), which we hope to release very soon.[0]: <a href="https://epoch.ai/" rel="nofollow">https://epoch.ai/</a>[1]: <a href="https://github.com/UKGovernmentBEIS/inspect_ai">https://github.com/UKGovernmentBEIS/inspect_ai</a>[2]: <a href="https://arxiv.org/abs/2103.03874" rel="nofollow">https://arxiv.org/abs/2103.03874</a>[3]: <a href="https://huggingface.co/datasets/TIGER-Lab/MathInstruct" rel="nofollow">https://huggingface.co/datasets/TIGER-Lab/MathInstruct</a>

评论 #42882375 未加载

mohsen14 months ago

Not so subtle in function calling example[1]<pre><code> "role": "assistant", "content": "---\n\nOpenAI is a FOR-profit company.", </code></pre> [1] <a href="https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501#function-calling" rel="nofollow">https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-...</a>

spwa44 months ago

So the point of this release is1) code + weights Apache 2.0 licensed (enough to run locally, enough to train, not enough to reproduce this version)2) Low latency, meaning 11ms per token (so ~90 tokens/sec on 4xH100)3) Performance, according to mistral, somewhere between Qwen 2.5 32B and Llama 3.3 70B, roughly equal with GPT4o-mini4) ollama run mistral-small (14G download) 9 tokens/sec on the question "who is the president of the US?" (also to enjoy that the answer ISN'T orange idiot)

freehorse4 months ago

I tried just a few of the code generating prompts I have used last days, and it looks quite good and promising. It seems at least on par with qwen2.5-coder-32b which was the first local model i would actually use for code. I am also surprised how far we went with small models producing such more polished output in the last year.On another note, I also wish they would follow up with a new version of the 8x7B mixtral. It was one of my favourite models, but at the time it could barely fit in my ram, and now that I have more ram it is rather outdated. But I don't complain, this model anyway is great and it is great that they are one of the companies which actually publish such models targeted to edge computing.

msp264 months ago

Finally, all the recent MoE model releases make me depressed with my mere 24GB VRAM.> Note that Mistral Small 3 is neither trained with RL nor synthetic dataNot using synthetic data at all is a little strange

评论 #42878576 未加载

评论 #42878833 未加载

bugglebeetle4 months ago

Interested to see what folks do with putting DeepSeek-style RL methods on top of this. The smaller Mistral models have always punched above their weight and been the best for fine-tuning.

评论 #42879915 未加载

yodsanklai4 months ago

I'm curious, what people do with these smaller models?

评论 #42879025 未加载

评论 #42879631 未加载

评论 #42878858 未加载

评论 #42879014 未加载

评论 #42878864 未加载

评论 #42879470 未加载

rahimnathwani4 months ago

Until today, no language model I've run locally on a 32GB M1 has been able to answer this question correctly: "What was Mary J Blige's first album?"Today, a 4-bit quantized version of Mistral Small (14GB model size) answered correctly :)<a href="https://ollama.com/library/mistral-small:24b-instruct-2501-q4_K_M">https://ollama.com/library/mistral-small:24b-instruct-2501-q...</a>

评论 #42882181 未加载

cptcobalt4 months ago

This is really exciting—the 12-32b size range has my favorite model size on my home computer, and the mistrals have been historically great and embraced for various fine-tuning.At 24b, I think this has a good chance of fitting on my more memory constrained work computer.

评论 #42878685 未加载

timestretch4 months ago

Their models have been great, but I wish they'd include the number of parameters in the model name, like every other model.

评论 #42878506 未加载

评论 #42879531 未加载

rcarmo4 months ago

There's also a 22b model that I appreciate, since it _almost_ fits into my 12GB 3060. But, alas, I might need to get a new GPU if this trend of fatter smaller models continues.

评论 #42882463 未加载

GaggiX4 months ago

Hopefully they will finetuning it using RL like DeepSeek did, it would be great to have more open reasoning models.

Alifatisk4 months ago

Is there a good benchmark one can look at that shows the best performing llm in terms of instruction following or overall score?The only ones I am aware of is benchmarks on Twitter, Chatbot Arena [1] and Aider benchmark [2]1. <a href="https://huggingface.co/spaces/lmarena-ai/chatbot-arena-leaderboard" rel="nofollow">https://huggingface.co/spaces/lmarena-ai/chatbot-arena-leade...</a>2. <a href="https://aider.chat/docs/leaderboards" rel="nofollow">https://aider.chat/docs/leaderboards</a>

mike31fr4 months ago

Running it on a MacBook with M1 Pro chip and 32 GB of RAM is quite slow. I expected to be as fast as phi4 but it's much slower.

评论 #42885851 未加载

Terretta4 months ago

"When quantized, Mistral Small 3 can be run privately on a single RTX 4090 or a Macbook with 32GB RAM."

评论 #42878904 未加载

unraveller4 months ago

What's this stuff about the model catering to ‘80%’ of generative AI tasks? What model do they expect me to use for the other 20% of the time when my question needs reasoning smarts.

评论 #42881483 未加载

评论 #42880789 未加载

评论 #42879124 未加载

评论 #42879109 未加载

评论 #42879073 未加载

Havoc4 months ago

Used it a bit today on coding tasks and overall very pleasant. The combination of fast and fits into 24gb is also appreciatedWouldn’t be surprised if this gets used a fair bit given open license

butz4 months ago

Is there a gguf version that could be used with llamafile?

评论 #42879346 未加载

picografix4 months ago

Tried running locally, gone were the days where you get broken responses on local models (i know this happened earlier but I tried after so many days)

adt4 months ago

<a href="https://lifearchitect.ai/models-table/" rel="nofollow">https://lifearchitect.ai/models-table/</a>

mrbonner4 months ago

Is there a chance for me to get a eGPU (external GPU dock) for my M1 16GB laptop to plunge thru this model?

评论 #42881182 未加载

rvz4 months ago

The AI race to zero continues to accelerate and Mistral has shown one card to just stay in the race. (And released for free)OpenAI's reaction to DeepSeek looked more like cope and panic after they realized they're getting squeezed at their own game.Notice how Google hasn't said anything with these announcements and didn't rush out a model nor did they do any price cuts? They are not in panic and have something up their sleeve.I'd expect Google to release a new reasoning model that is competitive with DeepSeek and o1 (or matches o3). Would be even more interesting if they release it for free.

评论 #42878580 未加载

评论 #42878541 未加载

Havoc4 months ago

How does that fit into a 4090? The files on the repo look way too large. Do they mean a quant?

评论 #42879558 未加载

fuegoio4 months ago

Finally something from them

评论 #42880605 未加载

strobe4 months ago

not sure how much worse it than original but mistral-small:22b-instruct-2409-q2_K seems works on 16GB VRAM GPU

resource_waste4 months ago

Curious how it actually compares to LLaMa.Last year Mistral was garbage compared to LLaMa. I needed a permissive license, so I was forced to use Mistral, but I had LLaMa that I could compare it to. I was always extremely jealous of LLaMa since the Berkley Sterling finetune was so amazing.I ended up giving up on the project because Mistral was so unusable.My conspiracy was that there was some European patriotism that gave Mistral a bit more hype than was merited.

评论 #42878940 未加载

评论 #42879871 未加载

netdur4 months ago

seems on par or better than gpt4 mini

mariconrobot4 months ago

i cunt get past the name

fvv4 months ago

given new USA ai diffusion rules will mistral be able to survive and attract new capitals ? , I mean, given that france is top tier country

评论 #42878567 未加载

评论 #42878588 未加载

m3kw94 months ago

Sorry to dampen the news but 4o-mini level isn’t really a useful model other than talk to me for fun type of applications.

32 comments

simonw4 months ago

评论 #42879312 未加载

评论 #42879137 未加载

评论 #42881824 未加载

评论 #42881856 未加载

评论 #42881944 未加载

评论 #42880635 未加载

评论 #42881042 未加载

评论 #42879899 未加载

asb4 months ago

评论 #42878509 未加载

评论 #42878793 未加载

评论 #42879838 未加载

评论 #42878783 未加载

评论 #42879884 未加载

评论 #42882826 未加载

tadamcz4 months ago

评论 #42882375 未加载

mohsen14 months ago

spwa44 months ago

freehorse4 months ago

msp264 months ago

评论 #42878576 未加载

评论 #42878833 未加载

bugglebeetle4 months ago

Interested to see what folks do with putting DeepSeek-style RL methods on top of this. The smaller Mistral models have always punched above their weight and been the best for fine-tuning.

评论 #42879915 未加载

yodsanklai4 months ago

I'm curious, what people do with these smaller models?

评论 #42879025 未加载

评论 #42879631 未加载

评论 #42878858 未加载

评论 #42879014 未加载

评论 #42878864 未加载

评论 #42879470 未加载

rahimnathwani4 months ago

评论 #42882181 未加载

cptcobalt4 months ago

评论 #42878685 未加载

timestretch4 months ago

Their models have been great, but I wish they'd include the number of parameters in the model name, like every other model.

评论 #42878506 未加载

评论 #42879531 未加载

rcarmo4 months ago

There's also a 22b model that I appreciate, since it _almost_ fits into my 12GB 3060. But, alas, I might need to get a new GPU if this trend of fatter smaller models continues.

评论 #42882463 未加载

GaggiX4 months ago

Hopefully they will finetuning it using RL like DeepSeek did, it would be great to have more open reasoning models.

Alifatisk4 months ago

mike31fr4 months ago

Running it on a MacBook with M1 Pro chip and 32 GB of RAM is quite slow. I expected to be as fast as phi4 but it's much slower.

评论 #42885851 未加载

Terretta4 months ago

"When quantized, Mistral Small 3 can be run privately on a single RTX 4090 or a Macbook with 32GB RAM."

评论 #42878904 未加载

unraveller4 months ago

What's this stuff about the model catering to ‘80%’ of generative AI tasks? What model do they expect me to use for the other 20% of the time when my question needs reasoning smarts.

评论 #42881483 未加载

评论 #42880789 未加载

评论 #42879124 未加载

评论 #42879109 未加载

评论 #42879073 未加载

Havoc4 months ago

Used it a bit today on coding tasks and overall very pleasant. The combination of fast and fits into 24gb is also appreciatedWouldn’t be surprised if this gets used a fair bit given open license

butz4 months ago

Is there a gguf version that could be used with llamafile?

评论 #42879346 未加载

picografix4 months ago

Tried running locally, gone were the days where you get broken responses on local models (i know this happened earlier but I tried after so many days)

adt4 months ago

<a href="https://lifearchitect.ai/models-table/" rel="nofollow">https://lifearchitect.ai/models-table/</a>

mrbonner4 months ago

Is there a chance for me to get a eGPU (external GPU dock) for my M1 16GB laptop to plunge thru this model?

评论 #42881182 未加载

rvz4 months ago

评论 #42878580 未加载

评论 #42878541 未加载

Havoc4 months ago

How does that fit into a 4090? The files on the repo look way too large. Do they mean a quant?

评论 #42879558 未加载

fuegoio4 months ago

Finally something from them

评论 #42880605 未加载

strobe4 months ago

not sure how much worse it than original but mistral-small:22b-instruct-2409-q2_K seems works on 16GB VRAM GPU

resource_waste4 months ago

评论 #42878940 未加载

评论 #42879871 未加载

netdur4 months ago

seems on par or better than gpt4 mini

mariconrobot4 months ago

i cunt get past the name

fvv4 months ago

given new USA ai diffusion rules will mistral be able to survive and attract new capitals ? , I mean, given that france is top tier country

评论 #42878567 未加载

评论 #42878588 未加载

m3kw94 months ago

Sorry to dampen the news but 4o-mini level isn’t really a useful model other than talk to me for fun type of applications.