TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

A guess at how o1-preview works

3 pointsby edmack9 months ago

2 comments

343rwerfd9 months ago
The hidden chain-of-though inside the process, from the official statement about it, I infer &#x2F; suspect that it uses an unhobbled mode of the model, puts it in this special mode where it can use the whole training, avoiding the intrisic bias towards the aligned outcomes.<p>I think that, to put it in simple terms, &quot;the sum of the good and the bad&quot; is the secret sauce here, pumping the &quot;IQ&quot; of the model (every output in the hidden chain), to levels apparently a lot better than they could probably reach with just aligned hidden internal outputs.<p>Another way of looking at the &quot;sum of good and bad&quot; stuff, is that the model would have a potentially way bigger set of choices (probability space?), to look into for every given prompt.
edmack9 months ago
Corrections and discussion very welcome! Happy to improve it!