科技回声

The method they use is surprisingly simple. They claim GPTs can’t effectively generate beyond the context window because our models overfit on positional encodings. The fix is literally to cap the positional encodings at inference time.<p>It makes sense intuitively that the exact position of tokens really only matters for adjacent or near adjacent tokens. For far away tokens a rough position is fine.

Phi-2 and TinyLlama are so so so impressive for being < 3B parameter models. They can run on a phone, and are pretty snappy.<p>Benchmarks: <a href="https://github.com/ggerganov/llama.cpp/discussions/4508">https://github.com/ggerganov/llama.cpp/discussions/4508</a><p>I don't see them taking over general purpose chat/query use cases, but fined tuned to a specific use case and embedded into mobile apps, might be how we see LLMs jump from cool tech demos to something that's present in most products.

Has anyone successfully fine tuned it for function calling? Thinking could use a lightweight model like this to interpret args in a pipeline then format them for passing down the chain

Despite what people say about ϕ-2, I never liked its responses. It clearly lacks depth and consistency.

Has anyone successfully fine tuned it for function calling? Thinking could use a lightweight model like this to interpret args in a pipeline then format them for passing down the chain

Despite what people say about ϕ-2, I never liked its responses. It clearly lacks depth and consistency.

Phi-2: Self-Extend Boosts Performance, Extends Context to 8k Without Training

4 条评论

Phi-2: Self-Extend Boosts Performance, Extends Context to 8k Without Training

4 条评论