TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Efficient Reasoning with Hidden Thinking

172 pointsby fofoz4 months ago

14 comments

scribu4 months ago
Would be curious to know how this stacks up against Coconut [1] which also uses latent space for reasoning.<p>[1] <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2412.06769" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2412.06769</a>
评论 #42921030 未加载
评论 #42927781 未加载
评论 #42921477 未加载
moolimon4 months ago
I feel like this is the obvious next step for chain of thought reasoning. Excited to see work on models that try and transform the intermediate thinking space tokens, down to language. Allowing us to still try and see what&#x27;s happening inside the &quot;mind&quot; of the LLM, if that process is even possible to map to language anymore. I also wonder what the implications of this research are on chain of thought reasoning with reinforcement learning, since from my understanding many of the reward mechanisms set up during reinforcement learning are around the structure of thought process.
评论 #42920549 未加载
评论 #42927870 未加载
评论 #42920284 未加载
评论 #42920117 未加载
评论 #42924153 未加载
thom4 months ago
Very importantly here they provide a ways of decoding the encoded thought tokens, so you&#x27;re not really losing explanatory power or debuggability. As much as OpenAI want to present hidden chain of thought as some sort of long term advantage or safety feature, it&#x27;s horrible when you want to understand how a model came to some insane conclusion.
byschii4 months ago
isn&#x27;t this dangerous? isn&#x27;t the efficiency given at the expense of safety and interpretability?<p><a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2412.14093" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2412.14093</a> (Alignment faking in large language models)<p><a href="https:&#x2F;&#x2F;joecarlsmith.com&#x2F;2024&#x2F;12&#x2F;18&#x2F;takes-on-alignment-faking-in-large-language-models" rel="nofollow">https:&#x2F;&#x2F;joecarlsmith.com&#x2F;2024&#x2F;12&#x2F;18&#x2F;takes-on-alignment-fakin...</a><p>PS I m definitely not an expert
评论 #42924135 未加载
评论 #42920699 未加载
评论 #42921991 未加载
评论 #42924154 未加载
评论 #42921999 未加载
评论 #42923207 未加载
评论 #42921514 未加载
Davidzheng4 months ago
Probably not needed in the end to reason in latent space. Unless constrained by human preference&#x2F;SFT data, RL spontaneously should create new additions to language to help with new reasoning methods&#x2F;new concepts invented by the system.
评论 #42923661 未加载
评论 #42924287 未加载
another_poster4 months ago
Is “multimodal reasoning” as big a deal as it sounds? Does this technique mean LLMs can generate chains of thought that map to other modalities, such as sound and images?
评论 #42936558 未加载
评论 #42922771 未加载
deoxykev4 months ago
I don&#x27;t think autoregressive models have a fundemental difference in terms of reasoning capability in latent space vs token space. Latent space enables abstract reasoning and pattern recognition, while token space acts as both the discrete interface for communication, and as a interaction medium to extend, refine and synthesize high order reasoning over latent space.<p>Intuively speaking, most people think of writing as a communication tool. But actually it&#x27;s also a thinking tool that helps create deeper connections over discrete thoughts which can only occupy a fixed slice of our attention at any given time. Attentional capacity the primary limitation-- for humans and LLMs. So use the token space as extended working memory. Besides, even the Coconut paper got mediocre results. I don&#x27;t think this is the way.
评论 #42921040 未加载
vessenes4 months ago
I’ve been thinking a bit about this lately - reasoning in latent space - especially because it looks like that’s what R1-Zero does — the researchers mention that it’s &lt;think&gt; sections switch back and forth between Chinese and English, but the &lt;say&gt; sections are coherent.<p>The paper raises a few more questions than it answers, though.<p>Do they hard code a certain set of CoT token types upfront to train on? While the results are good, they are not ‘great’ - other methods seem to provide better outcomes, based on their own charts.<p>The interpretability does not seem ‘strong’ to me either - they train decoders on latent space encodings by sort of guessing what must be going on based on text prompts.<p>That said, this is a fairly sweet ‘hack’ in my mind - training hidden layers to do the reasoning. I guess I’m skeptical that it’s the way forward, though. It feels like until your CoT token can specify it needs more thinking time, you’re stuck without extensibility &#x2F; deep thinking when needed.<p>Overall, very cool. Probably not “the future”. More research in latent space reasoning would be very welcome.
nullc4 months ago
Keeping the thinking interpretable makes it easier to impose conditions on it both at runtime and as part of reinforcement. It opens the doors to manually injecting relevant thoughts triggered by supervision (&quot;I must remember to say nothing that could offend the party.&quot;, search results, or access to APIs like calculators).<p>Those advantages are easily worth some efficiency.<p>I&#x27;m skeptical of the safety&#x2F;security arguments some have made. Models RL trained seeing their own COT may (and in fact almost certainly) will develop hidden context embedded into their word choices that carry through data that we&#x27;re not aware of, the fact that the COT appears to be English (or some other human language) doesn&#x27;t mean that we necessarily really understand it.<p>Consider how a game of Hanabi between long time partners might look to an outsider.
_KnighT_4 months ago
I&#x27;m new to this topic. Can someone help me understand this sentence?<p>&quot;Meanwhile, through the next-token prediction constraint, the explicit textual symbols of the hidden representations for Heima Encoder are aligned to the text of the corresponding special tokens {&lt;CoT&gt;(k)} in vocabulary, while the hidden representations contained in hidden states of thinking tokens remain distinct and variable depending on the inputs&quot;<p>I understand that they have fine-tuned the MLLM to produce, in response to each query and image input, the CoT &quot;thinking tokens&quot; in addition to the answer.<p>How does that establish an association between the thinking tokens and the original plain-English CoT statements?<p>The second clause seems to say that the thinking tokens encode information that is &quot;distinct and variable depending on the inputs.&quot; Is my interpretation correct?
antirez4 months ago
Cool, but isn&#x27;t this encoding a potentially very long thinking process into a fixed embedding? Intuitively should not work as well.
评论 #42951828 未加载
评论 #42925959 未加载
gunalx4 months ago
I would be interrested in seeing how a combined latent space and traditional gpro cot could perform vs just one of either.<p>My intuition is still that latent space would be better at emulating larger models with fewer params, and cot helping refining the output after latent space.<p>Combined it would kinda being able to think about a problem. Throw down a draft then refine it.
aradox664 months ago
Could someone ELI5? It sounds like they generate a compressed token which represents a whole &quot;thought&quot; rather than elaborating the entire &quot;thought&quot; in actual language. Is that right?
评论 #42920404 未加载
jakobschwich4 months ago
Seems like a promising next step.