47 点作者 MasterScrat将近 2 年前

4 条评论

As a heavy user of GPT-4 (I'm working on a plugin), reading this felt like a puzzle piece being dropped into place.<p>Maybe this is just confirmation bias, but yeah, trying to push the model's capabilities is like working with a committee of brilliant minds chaired by an idiot.<p>Also, I can see why they kept this secret. Competitors just shaved months off their R&D timelines.

euclaise将近 2 年前

The only paper that I could find using an approach with fully separated experts like this is <a href="https://arxiv.org/pdf/2208.03306.pdf" rel="nofollow noreferrer">https://arxiv.org/pdf/2208.03306.pdf</a>

swyx将近 2 年前

the source podcast that this came from: <a href="https://news.ycombinator.com/item?id=36407269">https://news.ycombinator.com/item?id=36407269</a>

adeon将近 2 年前

Is this an actually confirmed detail or just something George Hotz speculated? How credible is it?

评论 #36419526 未加载

GPT-4: 8 x 220B experts trained with different data/task distributions

4 条评论

GPT-4: 8 x 220B experts trained with different data/task distributions

4 条评论