Over the holidays, we published a post[1] on using high-precision few-shot examples to get `gpt-4o-mini` to perform similar to `gpt-4o`. I just re-ran that same experiment, but swapped out `gpt-4o-mini` with `phi-4`.<p>`phi-4` really blew me away in terms of learning from few-shots. It measured as being 97% consistent with `gpt-4o` when using high-precision few-shots! Without the few-shots, it was only 37%. That's a huge improvement!<p>By contrast, with few-shots it performs as well as `gpt-4o-mini` (though `gpt-4o-mini`'s baseline without few-shots was 59% – quite a bit higher than `phi-4`'s).<p>[1] <a href="https://bits.logic.inc/p/getting-gpt-4o-mini-to-perform-like" rel="nofollow">https://bits.logic.inc/p/getting-gpt-4o-mini-to-perform-like</a>
Is anyone blown away by how fast we got to running something this powerful locally? I know it's easy to get burnt out on llms but this is pretty incredible.<p>I genuinely think we're only 2 years away from full custom local voice to voice llm assistants that grow with you like JOI in BR2049 and it's going to change how we think about being human and being social, and how we grow up.
It’s odd that MS is releasing models they are competitors to OA. This reinforce the idea that there is no real strategic advantage in owning a model. I think the strategy is now offer cheap and performant infra to run the models.
Was disappointed in all the Phi models before this, whose benchmark results scored way better than it worked in practice, but I've been really impressed with how good Phi-4 is at just 14B. We've run it against the top 1000 most popular StackOverflow questions and it came up 3rd beating out GPT-4 and Sonnet 3.5 in our benchmarks, only behind DeepSeek v3 and WizardLM 8x22B [1]. We're using Mixtral 8x7B to grade the quality of the answers which could explain how WizardLM (based on Mixtral 8x22B) took 2nd Place.<p>Unfortunately I'm only getting 6 tok/s on NVidia A4000 so it's still not great for real-time queries, but luckily now that it's MIT licensed it's available on OpenRouter [2] for a great price of $0.07/$0.14M at a fast 78 tok/s.<p>Because it yields better results and we're able to self-host Phi-4 for free, we've replaced Mistral NeMo with it in our default models for answering new questions [3].<p>[1] <a href="https://pvq.app/leaderboard" rel="nofollow">https://pvq.app/leaderboard</a><p>[2] <a href="https://openrouter.ai/microsoft/phi-4" rel="nofollow">https://openrouter.ai/microsoft/phi-4</a><p>[3] <a href="https://pvq.app/questions/ask" rel="nofollow">https://pvq.app/questions/ask</a>
FWIW, Phi-4 was converted to Ollama by the community last month:<p><a href="https://ollama.com/vanilj/Phi-4">https://ollama.com/vanilj/Phi-4</a>
I was going to ask if this or other Ollama models support structured output (like JSON).<p>Then a quick search revealed you can as of a free weeks ago<p><a href="https://ollama.com/blog/structured-outputs">https://ollama.com/blog/structured-outputs</a>
I’ve seen on the localllama subreddit that some GGUFs have bugs in them. The one recommended was by unsloth. However, I don’t know how the Ollama GGUF holds up.
Related <i>Phi-4: Microsoft's Newest Small Language Model Specializing in Complex Reasoning</i> (439 points, 24 days ago, 144 comments) <a href="https://news.ycombinator.com/item?id=42405323">https://news.ycombinator.com/item?id=42405323</a><p>Also on hugging face <a href="https://huggingface.co/microsoft/phi-4" rel="nofollow">https://huggingface.co/microsoft/phi-4</a>
How come models can be so small now? I don't know a lot about AI, but is there an ELI5 for a software engineer that knows a <i>bit</i> about AI?<p>For context: I've made some simple neural nets with backprop. I read [1].<p>[1] <a href="http://neuralnetworksanddeeplearning.com/" rel="nofollow">http://neuralnetworksanddeeplearning.com/</a>
<i>"built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets"</i><p>Does this mean the model was trained without copyright infringements?
I have unfortunately been disappointed with the llama.cpp/ollama ecosystem of late, and thinking about moving my things to vllm instead.<p>llama.cpp basically dropped support for multimodal visual models. ollama still does support them, but only a handful. Also ollama still does not support vulkan eventhough llama.cpp had vulkan support for a long long time now.<p>This has been very sad to watch. I'm more and more convinced that vllm is the way to go, not ollama.
I've just tried to make it run something, and I just could not force to include the python code inside ``` ``` quotation marks. It always wants to put word python after three quotes, like this:
```python
.. code..
```
I wonder if that's the result of training.
(I use the LLM output to then run the resulting code)