TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Phi-2: Self-Extend Boosts Performance, Extends Context to 8k Without Training

89 点作者 georgehill超过 1 年前

4 条评论

valine超过 1 年前
The method they use is surprisingly simple. They claim GPTs can’t effectively generate beyond the context window because our models overfit on positional encodings. The fix is literally to cap the positional encodings at inference time.<p>It makes sense intuitively that the exact position of tokens really only matters for adjacent or near adjacent tokens. For far away tokens a rough position is fine.
评论 #38972109 未加载
评论 #38973243 未加载
scosman超过 1 年前
Phi-2 and TinyLlama are so so so impressive for being &lt; 3B parameter models. They can run on a phone, and are pretty snappy.<p>Benchmarks: <a href="https:&#x2F;&#x2F;github.com&#x2F;ggerganov&#x2F;llama.cpp&#x2F;discussions&#x2F;4508">https:&#x2F;&#x2F;github.com&#x2F;ggerganov&#x2F;llama.cpp&#x2F;discussions&#x2F;4508</a><p>I don&#x27;t see them taking over general purpose chat&#x2F;query use cases, but fined tuned to a specific use case and embedded into mobile apps, might be how we see LLMs jump from cool tech demos to something that&#x27;s present in most products.
te_chris超过 1 年前
Has anyone successfully fine tuned it for function calling? Thinking could use a lightweight model like this to interpret args in a pipeline then format them for passing down the chain
评论 #39018377 未加载
behnamoh超过 1 年前
Despite what people say about ϕ-2, I never liked its responses. It clearly lacks depth and consistency.
评论 #38970133 未加载
评论 #38972101 未加载
评论 #38971091 未加载
评论 #38970497 未加载