TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

GPT-4 Architecture

25 pointsby abtinsetyanialmost 2 years ago

3 comments

og_kalualmost 2 years ago
Down on twitter now so <a href="https:&#x2F;&#x2F;archive.is&#x2F;Y72Gu" rel="nofollow noreferrer">https:&#x2F;&#x2F;archive.is&#x2F;Y72Gu</a><p>The reason companies&#x2F;researchers haven&#x27;t generally touched MoE for LLMs despite how good it sounds on paper is because they&#x27;ve typically sucked and underperformed their dense counterparts.<p>assuming this is all true, Did Open ai do anything differently here or is it just scale ?<p>I know this very recent paper shows MoE benefit far more from Instruct tuning - <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2305.14705" rel="nofollow noreferrer">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2305.14705</a><p>FLAN-MOE-32B comfortably surpasses FLAN-PALM-62B with a third of the compute. It goes from 25.5% to 65.4% on MMLU.<p>In comparison, 55.1 to 59.6% for Flan-Palm 62b. That just kind of shows the underperformance you expect from sparse models.<p>But from Open ai&#x27;s technical report, it doesn&#x27;t seem like they needed that.<p>The Vision component seems to be just scale. Well all of it seems to be just scale. Seems like there&#x27;s plenty scale left too as far as performance gains go.
abtinsetyanialmost 2 years ago
source: <a href="https:&#x2F;&#x2F;www.semianalysis.com&#x2F;p&#x2F;gpt-4-architecture-infrastructure" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.semianalysis.com&#x2F;p&#x2F;gpt-4-architecture-infrastruc...</a>
version_fivealmost 2 years ago
A long intro with no real content, just tech bro &quot;here&#x27;s the thing&quot; stuff trying to bait you into subscribing. Doesn&#x27;t actually explain the architecture if you don&#x27;t subscribe. Don&#x27;t bother reading.
评论 #36674976 未加载