We really need better long context benchmarks than needle-in-a-haystack. There is LV-Eval (<a href="https://arxiv.org/abs/2402.05136" rel="nofollow">https://arxiv.org/abs/2402.05136</a>) with multi-hop QA that's better but still pretty basic.
TLDR;
1. InternLM2 is an open-source Large Language Model that has shown improvements over previous models, particularly in long-context modeling.
2. The model uses a unique approach, combining traditional training with Supervised Fine-Tuning and Conditional Online Reinforcement Learning from Human Feedback.
3. It offers a variety of model sizes and training stages to the community, demonstrating significant advancements in AI research and application.
Does anyone know how the free commercial license works? Do they usually grant it? <a href="https://wj.qq.com/s2/12727483/5dba/" rel="nofollow">https://wj.qq.com/s2/12727483/5dba/</a> looks like a form there.<p>Apache 2 code, free commercial license with application form for weights.
I experimented with this model and vLLM around a month ago. The long context length is attractive, but it was incredibly slow on a g5.12xlarge (4 NVIDIA A10G GPUs). I actually could not get it to respond for single examples longer than 50K tokens.