A couple of comments. What’s not that interesting here is that adding search to an LLM increases accuracy — this is known, and largely implemented via RAG or other search pipelines which then stuff information into the context.<p>What might be interesting here is that they are thinking about taxonomic tool use-cases, and exploring training and therefore optimizing the utilization of them.<p>This to me is a proof of concept — an interesting one, but just a proof of concept. You can see from their example search that the model over-relied on search; it didn’t need to re-search three times to get the answer.<p>A next step that I think <i>would</i> be useful would be updating the reward function to penalize search; pressing the model to use search when it <i>needs</i> to and not before. This to me is a likely framework going forward where MCP tool costing matters, and would be really useful to have in the next gen of tool calling LLMs.<p>In the case of search we’d hopefully get a really useful signal and outcome for times the model is unsure — it would call a friend, and get good info! And for times it’s sure, we’d have taught it not to waste reward on that.