科技回声

9 条评论

One of the benefits of using thinking tokens compared to “thinking in a latent” space is that you can directly observe the quality of the CoT. In R1 they saw it was mixing languages and fixed it with cold start data.It would be hard to SFT this because you can only SFT the final result not the latent space.I also notice the authors only had compute for a single full training run. It’s impressive they saw such good results from that, but I wonder if they could get better results by incorporating recent efficiency improvements.I would personally not use this architecture because 1) it adds a lot of hyperparameters which don’t have a strong theoretical grounding and 2) it’s not clearly better than simpler methods.

评论 #43007827 未加载

评论 #43011771 未加载

评论 #43010055 未加载

WhitneyLand3 个月前

One of the hoped for benefits of this approach that’s described later in the paper. It’s not fully fleshed out what this will mean but the prospect is tantalizing."On a more philosophical note, we hope that latent reasoning captures facets of human reasoning that defy verbalization, such as spatial thinking, physical intuition or (motor) planning. Over many iterations of the recurrent process, reasoning in a high-dimensional vector space would enable the deep exploration of multiple directions simultaneously, instead of linear thinking, leading to a system capable of exhibiting novel and complex reasoning behavior."

ckrapu3 个月前

My opinion is that opaque reasoning is a prerequisite for many of the worst possible AI outcomes.We should make reasoning fully visible in the output space.

评论 #43007400 未加载

评论 #43013365 未加载

评论 #43012905 未加载

评论 #43007153 未加载

评论 #43011369 未加载

nialv73 个月前

Slightly off topic, I rarely see paper talks about their failed training runs, and why those runs failed. This paper is definitely a breath of fresh air. And their analyses of their failures, the changes they made to fix them, and the rational behind that, are all very insightful.

评论 #43007291 未加载

HarHarVeryFunny3 个月前

Latent / embedding-space reasoning seems a step in the right direction, but building recurrence into the model while still relying on gradient descent (i.e. BPTT) to train it seems to create more of a problem (training inefficiency) than it solves, especially since they still end up externally specifying the number of recurrent iterations (r=4, 8, etc) for a given inference. Ideally having recurrence internal to the model would allow the model itself to decide how long to iterate for before outputting anything.

评论 #43018603 未加载

评论 #43006227 未加载

评论 #43006335 未加载

timbilt3 个月前

Twitter thread about this by the author: <a href="https://x.com/jonasgeiping/status/1888985929727037514" rel="nofollow">https://x.com/jonasgeiping/status/1888985929727037514</a>

评论 #43006075 未加载

评论 #43013657 未加载

tmnvdb3 个月前

Interesting stuff. As the authors note, using latent reasoning seems to be a way to sink more compute into the model and get better performance without increasing the model size, good news for those on a steady diet of 'scale pills'

EternalFury3 个月前

Isn’t this equivalent to maximizing latent space activation without corrective user input? How does it implement self correction or backtracking?

anentropic3 个月前

is what they call "test-time" here the same as what is often called "inference time" elsewhere?

评论 #43014697 未加载

9 条评论

janalsncm3 个月前

评论 #43007827 未加载

评论 #43011771 未加载

评论 #43010055 未加载

WhitneyLand3 个月前

ckrapu3 个月前

My opinion is that opaque reasoning is a prerequisite for many of the worst possible AI outcomes.We should make reasoning fully visible in the output space.

评论 #43007400 未加载

评论 #43013365 未加载

评论 #43012905 未加载

评论 #43007153 未加载

评论 #43011369 未加载

nialv73 个月前

评论 #43007291 未加载

HarHarVeryFunny3 个月前

评论 #43018603 未加载

评论 #43006227 未加载

评论 #43006335 未加载

timbilt3 个月前

Twitter thread about this by the author: <a href="https://x.com/jonasgeiping/status/1888985929727037514" rel="nofollow">https://x.com/jonasgeiping/status/1888985929727037514</a>

评论 #43006075 未加载

评论 #43013657 未加载

tmnvdb3 个月前

EternalFury3 个月前

Isn’t this equivalent to maximizing latent space activation without corrective user input? How does it implement self correction or backtracking?

anentropic3 个月前

is what they call "test-time" here the same as what is often called "inference time" elsewhere?

评论 #43014697 未加载

Scaling up test-time compute with latent reasoning: A recurrent depth approach

9 条评论

Scaling up test-time compute with latent reasoning: A recurrent depth approach

9 条评论