테크에코

Do you want paperclips? Because this is how you get paperclips!Eliminate all agents, all sources of change, all complexity - anything that could introduce unpredictability, and it suddenly becomes far easier to predict the future, no?

So instead of next token prediction its next event prediction. At some point this just loops around and we're back to teaching models to predict the next token in the sequence.

From the abstract> A simple trading rule turns this calibration edge into $127 of hypothetical profit versus $92 for o1 (p = 0.037).I'm lazy: is this hypothetical shooting fish in a barrel, or is it a real edge?

Why would you use RL if you're not going to control the environment, but just predict it?

"a couple of wavy lines"bzzzzz "sorry this isn't your lucky day"

So instead of next token prediction its next event prediction. At some point this just loops around and we're back to teaching models to predict the next token in the sequence.

Why would you use RL if you're not going to control the environment, but just predict it?

"a couple of wavy lines"bzzzzz "sorry this isn't your lucky day"

Outcome-Based Reinforcement Learning to Predict the Future

5 comments

Outcome-Based Reinforcement Learning to Predict the Future

5 comments