テックエコー

We now know that RL can make models more capable on measurable tasks and is the new dimension of scaling law, but is anyone putting these capabilities to more meaningful use beyond Olympic math problems or 2D game playing?<p>So far, pretty much all of the computer use agent demos I've seen revolve around some kind of instruction following (book this flight, clean my desktop, etc.) I wonder, is anyone working on putting them into active trading in financial markets and use P&L as a reward / loss function? Or maybe title agnostic video game playing which is optimized for ELO or rank or win rate?<p>It feels like context length would go eventually from millions of tokens to days or months of agent's "life span"; inference cost would eventually go down to time cost of GPU server since hybrid models (mamba + attention) with linear time complexity can perform like regular transformers (who's inference is quadratic). What are the other major technical challenges here?<p>I think a meaningful metric is crucial, and I took a lot of inspiration from this startup, Chai.ai, a competitor to Character.ai. I went to one of their events and got the sense that they are essentially optimizing chat LLMs for user monthly retention and subscription. Their small team hit 30M ARR with 1.8m DAU, and it happened over the last year or so. Combined with my own experience working at a startup, it seems like the right metric is the money shot.<p>Am I missing anything fundamental? Is anybody working on this? (or have interest?)

Computer use agent with RL training for day trading?

1 comment

Computer use agent with RL training for day trading?

1 comment