TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: ART – a new open-source RL framework for training agents

116 pointsby kcorbitt18 days ago
Hey HN, I wanted to share a new project we&#x27;ve been working on for the last couple of months called ART (<a href="https:&#x2F;&#x2F;github.com&#x2F;OpenPipe&#x2F;ART">https:&#x2F;&#x2F;github.com&#x2F;OpenPipe&#x2F;ART</a>).<p>ART is a new open-source framework for training agents using reinforcement learning (RL). RL allows you to train an agent to perform better at any task whose outcome can be measured and quantified.<p>There are many excellent projects focused on training LLMs with RL, such as GRPOTrainer (<a href="https:&#x2F;&#x2F;huggingface.co&#x2F;docs&#x2F;trl&#x2F;main&#x2F;en&#x2F;grpo_trainer" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;docs&#x2F;trl&#x2F;main&#x2F;en&#x2F;grpo_trainer</a>) and verl (<a href="https:&#x2F;&#x2F;github.com&#x2F;volcengine&#x2F;verl">https:&#x2F;&#x2F;github.com&#x2F;volcengine&#x2F;verl</a>). We&#x27;ve used these frameworks extensively for customer-facing projects at OpenPipe, but grew frustrated with some key limitations:<p>- Multi-turn workflows, where the agent calls a tool, gets a response, and calls another, are not well supported. This makes them a non-starter for any task that requires an agent to perform a sequence of actions.<p>- Other frameworks typically have low GPU efficiency. They may require multiple H100 GPUs just to train a small 7B parameter model, and aren&#x27;t able to keep the GPUs busy consistently during both the &quot;rollout&quot; and &quot;training&quot; phases of the training loop.<p>- Existing frameworks are typically not a convenient shape for integrating with existing agentic codebases. Existing trainers expect you to call raw text completion endpoints, and don&#x27;t automatically provide industry-standard chat completion APIs.<p>ART is designed to address these limitations and make it easy to train high-quality agents. We&#x27;ve also shared many details and practical lessons learned is in this post, which walks through a demo of training an email research agent that outperforms o3 (<a href="https:&#x2F;&#x2F;openpipe.ai&#x2F;blog&#x2F;art-e-mail-agent">https:&#x2F;&#x2F;openpipe.ai&#x2F;blog&#x2F;art-e-mail-agent</a>). You can also find out more about ART&#x27;s architecture in our announcement post (<a href="https:&#x2F;&#x2F;openpipe.ai&#x2F;blog&#x2F;art-trainer-a-new-rl-trainer-for-agents">https:&#x2F;&#x2F;openpipe.ai&#x2F;blog&#x2F;art-trainer-a-new-rl-trainer-for-ag...</a>).<p>Happy to answer any questions you have!

7 comments

bradhilton17 days ago
Contributor here, we developed the Agent Reinforcement Trainer (ART) library to make it easy to train LLMs for anything.<p>No callbacks or straitjacket flows. Instead we serve an OpenAI API-compatible endpoint that you can use as a drop-in replacement for any proprietary APIs you may be hitting.<p>After collecting responses from the inference API, you can tune the model with your own custom rewards and repeat the process as long as you like, until performance converges. We believe this level of flexibility will make it easier for you to train state-of-the-art models for your own use cases, much like Kyle&#x27;s new email agent[1].<p>Also happy to answer any questions you have about the framework.<p>[1] <a href="https:&#x2F;&#x2F;openpipe.ai&#x2F;blog&#x2F;art-e-mail-agent">https:&#x2F;&#x2F;openpipe.ai&#x2F;blog&#x2F;art-e-mail-agent</a>
kcorbitt17 days ago
Figured now was a good time to post this since we recently got surprisingly good results on training an email research agent. Link is above, but will put it here as well since I think it&#x27;s a good example of RL&#x27;s promise: <a href="https:&#x2F;&#x2F;openpipe.ai&#x2F;blog&#x2F;art-e-mail-agent">https:&#x2F;&#x2F;openpipe.ai&#x2F;blog&#x2F;art-e-mail-agent</a>
someguy10101017 days ago
Thanks for sharing this! A couple of questions come to mind:<p>- How does training with RL differ from fine tuning?<p>- When would it make sense to fine tune instead of using RL?
评论 #43849667 未加载
tcdent17 days ago
I really like this concept.<p>Do you have documentation for the API response from the `&#x2F;_train_model` endpoint?
评论 #43849385 未加载
jeffchuber17 days ago
the table with comparable models is a really great way to show off things here
pama17 days ago
Was the name influenced by the ship in the murderbot diaries?
评论 #43854401 未加载
gitroom17 days ago
Perfect, I&#x27;ve always wanted an easier way to mess with RL frameworks. Gonna mess around with this asap.
评论 #43856094 未加载
评论 #43856370 未加载