This reminds me of the Agent Workflow Memory (AWM) paper [1], which also tries to find optimal decision paths for LLM-based agents but relies on in-context learning, whereas DeepRAG fine-tunes models to decide when to retrieve external knowledge.<p>I’ve been thinking about how modifying AWM to use fine-tuning or an external knowledge system (RAG) might work—capturing the ‘good’ workflows it discovers rather than relying purely on prompting.<p>[1] <a href="https://arxiv.org/abs/2409.07429" rel="nofollow">https://arxiv.org/abs/2409.07429</a> - Agent Workflow Memory (Wang et al., 2024)
Noice!<p>Does anyone have a good recommendation for a local dev setup that does something similar with available tools? Ie incorporates a bunch of PDFs (~10,000 pages of datasheets) and other docs, as well as a curl style importer?<p>Trying to wean myself off the next tech molochs, ideally with local functionality similar to OpenAIs Search + Reason, and gave up on Langchain during my first attempt 6 months ago.
The title reads awkwardly to a native English speaker. A search of the PDF for "latency" returns one result, discussing how naive RAG can result in latency. What are the latency impacts and other trade-offs to achieve the claimed "[improved] answer accuracy by 21.99%"? Is there any way that I could replicate these results without having to write my own implementation?