科技回声

12 条评论

simonw5 个月前

The most interesting thing about this is the way it was trained using synthetic data, which is described in quite a bit of detail in the technical report: <a href="https://arxiv.org/abs/2412.08905" rel="nofollow">https://arxiv.org/abs/2412.08905</a>Microsoft haven't officially released the weights yet but there are unofficial GGUFs up on Hugging Face already. I tried this one: <a href="https://huggingface.co/matteogeniaccio/phi-4/tree/main" rel="nofollow">https://huggingface.co/matteogeniaccio/phi-4/tree/main</a>I got it working with my LLM tool like this:<pre><code> llm install llm-gguf llm gguf download-model https://huggingface.co/matteogeniaccio/phi-4/resolve/main/phi-4-Q4_K_M.gguf llm chat -m gguf/phi-4-Q4_K_M </code></pre> Here are some initial transcripts: <a href="https://gist.github.com/simonw/0235fd9f8c7809d0ae078495dd630b67" rel="nofollow">https://gist.github.com/simonw/0235fd9f8c7809d0ae078495dd630...</a>More of my notes on Phi-4 here: <a href="https://simonwillison.net/2024/Dec/15/phi-4-technical-report/" rel="nofollow">https://simonwillison.net/2024/Dec/15/phi-4-technical-report...</a>

评论 #42429439 未加载

评论 #42429083 未加载

评论 #42426805 未加载

评论 #42428413 未加载

评论 #42426863 未加载

评论 #42428155 未加载

评论 #42438048 未加载

评论 #42430788 未加载

评论 #42429504 未加载

评论 #42434603 未加载

评论 #42427092 未加载

评论 #42430202 未加载

thot_experiment5 个月前

For prompt adherence it still fails on tasks that Gemma2 27b nails every time. I haven't been impressed with any of the Phi family of models. The large context is very nice, though Gemma2 plays very well with self-extend.

评论 #42428994 未加载

评论 #42427036 未加载

xeckr5 个月前

Looks like it punches way above its weight(s).How far are we from running a GPT-3/GPT-4 level LLM on regular consumer hardware, like a MacBook Pro?

评论 #42407365 未加载

评论 #42427138 未加载

评论 #42406078 未加载

评论 #42427040 未加载

评论 #42428219 未加载

评论 #42426796 未加载

评论 #42426745 未加载

excerionsforte5 个月前

Looks like someone converted it for Ollama use already: <a href="https://ollama.com/vanilj/Phi-4">https://ollama.com/vanilj/Phi-4</a>

评论 #42432896 未加载

jsight5 个月前

I really like the ~3B param version of phi-3. It wasn't very powerful and overused memory, but was surprisingly strong for such a small model.I'm not sure how I can be impressed by a 14B Phi-4. That isn't really small any more, and I doubt it will be significantly better than llama 3 or Mistral at this point. Maybe that will be wrong, but I don't have high hopes.

travisgriggs5 个月前

Where have I been? What is a “small” language model? Wikipedia just talks about LLMs. Is this a sort of spectrum? Are there medium language models? Or is it a more nuanced classifier?

评论 #42427668 未加载

评论 #42427389 未加载

评论 #42427319 未加载

评论 #42427796 未加载

mupuff12345 个月前

So we moved from "reasoning" to "complex reasoning".I wonder what will be next month's buzzphrase.

评论 #42429527 未加载

评论 #42430841 未加载

zurfer5 个月前

Model releases without comprehensive coverage of benchmarks make me deeply skeptical.The worst was the gpt4o update in November. Basically a 2 liner on what it is better at and in reality it regressed in multiple benchmarks.Here we just get MMLU, which is widely known to be saturated and knowing they trained on synthetic data, we have no idea how much "weight" was given to having MMLU like training data.Benchmarks are not perfect, but they give me context to build upon. ---edit: the benchmarks are covered in the paper: <a href="https://arxiv.org/pdf/2412.08905" rel="nofollow">https://arxiv.org/pdf/2412.08905</a>

PoignardAzur5 个月前

Saying that a 14B model is "small" feels a little silly at this point. I guess it doesn't require a high-end graphics card?

ai_biden5 个月前

I'm not too excited by Phi-4 benchmark results - It is#BenchmarkInflation.Microsoft Research just dropped Phi-4 14B, an open-source model that’s turning heads. It claims to rival Llama 3.3 70B with a fraction of the parameters — 5x fewer, to be exact.What’s the secret? Synthetic data. -> Higher quality, Less misinformation, More diversityBut the Phi models always have great benchmark scores, but they always disappoint me in real-world use cases.Phi series is famous for to be trained on benchmarks.I tried again with the hashtag#phi4 through Ollama - but its not satisfactory.To me, at the moment - IFEval is the most important llm benchmark.But look the smart business strategy of Microsoft:have unlimited access to gpt-4 the input prompt it to generate 30B tokens train a 1B parameter model call it phi-1 show benchmarks beating models 10x the size never release the data never detail how to generate the data( this time they told in very high level) claim victory over small models

liminal5 个月前

Is 14B parameters still considered small?

parmesean5 个月前

13.8 epochs of the benchmarks?

12 条评论

simonw5 个月前

评论 #42429439 未加载

评论 #42429083 未加载

评论 #42426805 未加载

评论 #42428413 未加载

评论 #42426863 未加载

评论 #42428155 未加载

评论 #42438048 未加载

评论 #42430788 未加载

评论 #42429504 未加载

评论 #42434603 未加载

评论 #42427092 未加载

评论 #42430202 未加载

thot_experiment5 个月前

评论 #42428994 未加载

评论 #42427036 未加载

xeckr5 个月前

Looks like it punches way above its weight(s).How far are we from running a GPT-3/GPT-4 level LLM on regular consumer hardware, like a MacBook Pro?

评论 #42407365 未加载

评论 #42427138 未加载

评论 #42406078 未加载

评论 #42427040 未加载

评论 #42428219 未加载

评论 #42426796 未加载

评论 #42426745 未加载

excerionsforte5 个月前

Looks like someone converted it for Ollama use already: <a href="https://ollama.com/vanilj/Phi-4">https://ollama.com/vanilj/Phi-4</a>

评论 #42432896 未加载

jsight5 个月前

travisgriggs5 个月前

Where have I been? What is a “small” language model? Wikipedia just talks about LLMs. Is this a sort of spectrum? Are there medium language models? Or is it a more nuanced classifier?

评论 #42427668 未加载

评论 #42427389 未加载

评论 #42427319 未加载

评论 #42427796 未加载

mupuff12345 个月前

So we moved from "reasoning" to "complex reasoning".I wonder what will be next month's buzzphrase.

评论 #42429527 未加载

评论 #42430841 未加载

zurfer5 个月前

PoignardAzur5 个月前

Saying that a 14B model is "small" feels a little silly at this point. I guess it doesn't require a high-end graphics card?

ai_biden5 个月前

liminal5 个月前

Is 14B parameters still considered small?

parmesean5 个月前

13.8 epochs of the benchmarks?

Phi-4: Microsoft's Newest Small Language Model Specializing in Complex Reasoning

12 条评论

Phi-4: Microsoft's Newest Small Language Model Specializing in Complex Reasoning

12 条评论