From the paper, I was intrigued by how they handled their RL step for Code Data. They trained against hard but solvable code generation tasks by running unit testing. Is that training step done by the other models?<p>> Code Data For coding problems, we curate a high-quality training set comprising open-source datasets and our newly collected problem set. We remove problems without test cases. For problems with golden solutions, we exclude those where the golden solution failed to pass all test cases. For problems without golden solution, we discard problems where no test case can be solved in 16 rollouts of advanced reasoning models. Similar to math data, we utilize an SFT version of MiMo-7B to filter out easy problems that are perfectly solved in all 16 rollouts. This rigorous cleaning process yields 30K code problems.<p>> During each RL iteration, we evaluate thousands of problems to compute the rewards, with each problem potentially containing hundreds of test cases. To improve reward computing efficiency and eliminate GPU idle time, we developed an online judge environment that enables parallel execution of extremely high-volume unit tests.
Why are there so many English-first AI models from China? Are they not interested in serving their own population? Or is it that if they publish Chinese-first models it won't get publicity in the West?
This is incredibly strong coding performance for a 7b. I use Gemini Pro 2.5 which got 67.8 and this got 57.8, very close to Gemini 2.5 Flash which got 60.6.<p>I've become pretty skeptical about eval results given what we've heard about llama4 so we'll see where this lands on the closed evals but very impressive to see.
When you guys use gguf files in ollama, do you normally create a modelfile to go with it, or just hope that whatever default ollama has work with the new model?<p><a href="https://github.com/ollama/ollama/blob/main/docs%2Fmodelfile.md">https://github.com/ollama/ollama/blob/main/docs%2Fmodelfile....</a>
Its funny to see benchmarks where they omit the top performing models like O3 (Which is the best model in many benchmarks currently) and Gemini Pro/Claude 3.7.
MiMo-7B claims to outperform larger models like Qwen-32B and match OpenAI o1-mini on math/code benchmarks — all with a 7B model trained from scratch. Is this a sign that pretraining + RLHF optimization is finally outpacing scale? Or are we just getting better at benchmarking narrow capabilities?
The README says "RL" without specifying what kind of RL is used. Researchers: I know you are busy, and I know good writing takes time, but please don't skip this kind of detail.
I wonder if they will use this model for their AI assistant on their Xiaomi 15 series phones. They most likely will. I'm not really sure what to expect from it.
Umm wow. Great benchmarks. I’m looking forward to chatting with this one.<p>A couple things stand out to me — first is that the 7B model is trained on 25T tokens(!). This is Meta-scale training; Llama 4 Maverick was trained on 22T or so. (Scout, the smaller model: 40T).<p>Second, this is an interesting path to take - not a distilled model or an RL layer to get reasoning out of another model, but a from-scratch RL model with reasoning baked in; the claims seem to indicate you get a lot of extra efficiency per-parameter doing this.<p>I don’t have experience with Xiaomi models, so I’m cautious about this one until I play with it, but it looks like a super viable local reasoning model from the stats.
Been testing it a bit and overall pretty solid. The lengthy think times means one waits quite a while though. Longer than much larger models like say the recent qwen moe<p>That moe strikes me as the better overall tradeoff
Xiaomi in Chinese translates to "Little Rice"<p>Here is the meaning of the name<p>Described here: <a href="https://finance.sina.cn/tech/2020-11-26/detail-iiznctke3397952.d.html" rel="nofollow">https://finance.sina.cn/tech/2020-11-26/detail-iiznctke33979...</a><p>在后来的讨论中,我突然想到了我最喜欢的一句话——“佛观一粒米,大如须弥山”。<p>Translated into English, it means:<p>“In the later discussions, I suddenly thought of one of my favorite sayings — ‘A Buddha sees a single grain of rice as vast as Mount Sumeru.’”<p>This expression emphasizes the idea that even something seemingly small (like a grain of rice) can hold immense significance or value when viewed from a different perspective.<p>Thanks to chatgpt for translating this