47 pointsby MasterScratalmost 2 years ago

4 comments

As a heavy user of GPT-4 (I'm working on a plugin), reading this felt like a puzzle piece being dropped into place.<p>Maybe this is just confirmation bias, but yeah, trying to push the model's capabilities is like working with a committee of brilliant minds chaired by an idiot.<p>Also, I can see why they kept this secret. Competitors just shaved months off their R&D timelines.

euclaisealmost 2 years ago

The only paper that I could find using an approach with fully separated experts like this is <a href="https://arxiv.org/pdf/2208.03306.pdf" rel="nofollow noreferrer">https://arxiv.org/pdf/2208.03306.pdf</a>

swyxalmost 2 years ago

the source podcast that this came from: <a href="https://news.ycombinator.com/item?id=36407269">https://news.ycombinator.com/item?id=36407269</a>

adeonalmost 2 years ago

Is this an actually confirmed detail or just something George Hotz speculated? How credible is it?

评论 #36419526 未加载

GPT-4: 8 x 220B experts trained with different data/task distributions

4 comments

GPT-4: 8 x 220B experts trained with different data/task distributions

4 comments