TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Emerging architectures for LLM applications

255 pointsby makaimcalmost 2 years ago

31 comments

ericjangalmost 2 years ago
I am an AI researcher. Most actual AI researchers and engineers use very few of these tools - the only one being model providers like OpenAI API and public clouds (AWS, Azure, GCP). The rest of these are infra-centric tools that a16z is highly incentivized to over-inflate the importance of.
评论 #36412595 未加载
评论 #36414338 未加载
评论 #36415898 未加载
评论 #36416239 未加载
评论 #36416804 未加载
bluecoconutalmost 2 years ago
&gt; So, agents have the potential to become a central piece of the LLM app architecture (or even take over the whole stack, if you believe in recursive self-improvement). ... . There’s only one problem: agents don’t really work yet.<p>I really appreciate that they called out and separated some hype vs. practice, specifically with regards to Agents. This is something I keep hoping works better than it does, and in practice every attempt I&#x27;ve taken in this direction leads to disappointment.
评论 #36411282 未加载
评论 #36410747 未加载
评论 #36411123 未加载
评论 #36412438 未加载
killdozeralmost 2 years ago
This blog post is way more complex than it needs to be, a lot of what most people are doing with llms right now boils down to using vector databases to provide the &quot;best&quot; info&#x2F;examples to your prompt. This is a slick marketing page but im not sure what they think they&#x27;re providing beyond that.
评论 #36412270 未加载
fzliualmost 2 years ago
Bit of self-promotion, but Milvus (<a href="https:&#x2F;&#x2F;milvus.io" rel="nofollow noreferrer">https:&#x2F;&#x2F;milvus.io</a>) is another open-source vector database option as well (I have a pretty good idea as to why it isn&#x27;t listed in a16z&#x27;s blog post). We also have milvus-lite, a pip-installable package that uses the same API, for folks who don&#x27;t want to stand up a local service.<p><pre><code> pip install milvus </code></pre> Other than that, it&#x27;s great to see the shout-out for Vespa.
评论 #36411392 未加载
评论 #36412548 未加载
评论 #36411852 未加载
swyxalmost 2 years ago
interesting to see that the word &quot;generative&quot; does not appear in this blogpost (apart from the tags). 6 months ago Generative AI was all the rage: <a href="https:&#x2F;&#x2F;a16z.com&#x2F;2023&#x2F;01&#x2F;19&#x2F;who-owns-the-generative-ai-platform&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;a16z.com&#x2F;2023&#x2F;01&#x2F;19&#x2F;who-owns-the-generative-ai-platf...</a><p>I think this is a very well articulated breakdown of the &quot;LLM Core Code Shell&quot; (<a href="https:&#x2F;&#x2F;www.latent.space&#x2F;p&#x2F;function-agents#%C2%A7llm-core-code-shell" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.latent.space&#x2F;p&#x2F;function-agents#%C2%A7llm-core-co...</a>) view of the world. but it is underselling the potential to leave the agents stuff to a three paragraph &quot;what about agents?&quot; piece at the end. the emerging architecture of &quot;Code Core, LLM Shell&quot; decentralizing and specializing the role of the LLM will hopefully get more airtime in the december a16z landscape chart!
评论 #36410757 未加载
stan_kirdeyalmost 2 years ago
this and other different end-to-end architectures are offered in deepset&#x2F;haystack, one of the best and quite mature frameworks to work with LLMs (pre-GPT craze) and do augmented retrieval, etc.<p>I do feel the article presents old concepts as &quot;emerging&quot;.<p>if you are curious about building something quickly, you can jump into one of the tutorials <a href="https:&#x2F;&#x2F;haystack.deepset.ai&#x2F;tutorials" rel="nofollow noreferrer">https:&#x2F;&#x2F;haystack.deepset.ai&#x2F;tutorials</a><p>Over a weekend I&#x27;ve used deepset&#x2F;haystack to build a Q&#x2F;A engine over open source communities slack and discord threads that can potentially have an answer - it was a joy and a breeze to implement. If you have question about Metaflow, K8s, Golang, Deepset, Deep Java Library and some other tech - try asking your quick question on <a href="https:&#x2F;&#x2F;www.kwq.ai" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.kwq.ai</a> :-)
评论 #36417434 未加载
adamgordonbellalmost 2 years ago
Microsoft guidance is legit and useful. It&#x27;s a bunch of prompting features piled on top of handlebar syntax. ( And it has its own caching. Set temp to 0 and it caches. no need for LLM specific caching libs :) )<p><a href="https:&#x2F;&#x2F;github.com&#x2F;microsoft&#x2F;guidance">https:&#x2F;&#x2F;github.com&#x2F;microsoft&#x2F;guidance</a>
评论 #36413602 未加载
mikehollingeralmost 2 years ago
How prescient is the &quot;Hidden Technical Debt&quot; [1] paper from ~8 yrs ago compared to this? See the top of pg4 for a figure that I&#x27;ve personally found to be useful in explaining all the stuff necessary to put together a reasonable app using ML&#x2F;DL stuff (up until today, anyway).<p>I see all the same bits called out:<p>- Data collection<p>- Machine &#x2F;resource management<p>- Serving<p>- Monitoring<p>- Analysis<p>- Process mgt<p>- Data verification<p>There&#x27;s some new concepts that aren&#x27;t quite captured in the original paper like the &quot;playground&quot; though.<p>I&#x27;ve kind of been expecting a follow-up that shows an update to that original paper.<p>[1] <a href="https:&#x2F;&#x2F;proceedings.neurips.cc&#x2F;paper_files&#x2F;paper&#x2F;2015&#x2F;file&#x2F;86df7dcfd896fcaf2674f757a2463eba-Paper.pdf" rel="nofollow noreferrer">https:&#x2F;&#x2F;proceedings.neurips.cc&#x2F;paper_files&#x2F;paper&#x2F;2015&#x2F;file&#x2F;8...</a>
serjesteralmost 2 years ago
&gt; &quot;For devs who see every database-shaped hole and try to insert Postgres&quot;<p>I see sub second performance on &gt;1m vectors with PgVector. Vector databases have a place but this statement seems disingenuous at best. Bringing on a vector database adds additional complexity and a giant chunk of use cases simply don&#x27;t need it. Not to mention the additional latency you&#x27;d be adding.
Imnimoalmost 2 years ago
&gt;In-context learning solves this problem with a clever trick: instead of sending all the documents with each LLM prompt, it sends only a handful of the most relevant documents.<p>I don&#x27;t even think this is a correct definition for &quot;in-context learning&quot;. In-context learning is a type of few-shot learning in which examples of input&#x2F;output pairs are provided as part of the prompt. The idea is that the model is able to &quot;learn&quot; the pattern of the task from the examples. Quoting from the GPT-3 paper:<p>&gt;what we call “in-context learning”, using the text input of a pretrained language model as a form of task specification: the model is conditioned on a natural language instruction and&#x2F;or a few demonstrations of the task and is then expected to complete further instances of the task simply by predicting what comes next.<p>I really don&#x27;t think it&#x27;s standard to refer to the process of embedding-based retrieval as &quot;in-context learning&quot;.
评论 #36413838 未加载
评论 #36414451 未加载
sourcelabsalmost 2 years ago
Is the emerging architecture made out to be more complicated than what most of the companies are currently building? Perhaps! But this is most likely the general direction where things will start trending towards as the auxiliary ecosystem matures.<p>Shameless plug: For fellow Ruby-ists we&#x27;re building an orchestration layer for building LLM applications, inspired by the original, Langchain.rb: <a href="https:&#x2F;&#x2F;github.com&#x2F;andreibondarev&#x2F;langchainrb">https:&#x2F;&#x2F;github.com&#x2F;andreibondarev&#x2F;langchainrb</a>
nicoalmost 2 years ago
Reading the comments, it seems like we need better human-agent interaction tools<p>Many are frustrated about not being able to better direct the agents<p>It’s like the agents have certain pre-learned things they can do, but they aren’t really learning how to apply those things to the environments their human operators want them to develop in<p>Or at least it is not easy&#x2F;straightforward how to teach the model new tricks
评论 #36412701 未加载
ggleasonalmost 2 years ago
I was looking at various forms of indexing solutions to solve search and clustering problems with TerminusDB for clients. When I compared solutions against embeddings from LLMs, they LLMs were just far easier to work with and got much better results. I believe traditional text indexing will die quickly, as will a lot of the Entity Resolution and traditional clustering methods to be replaced completely by LLM&#x27;s. We found them so compelling we wrote our own open source vector database sidecar: <a href="https:&#x2F;&#x2F;github.com&#x2F;terminusdb-labs&#x2F;terminusdb-semantic-indexer">https:&#x2F;&#x2F;github.com&#x2F;terminusdb-labs&#x2F;terminusdb-semantic-index...</a>
评论 #36415424 未加载
ironfootnzalmost 2 years ago
It&#x27;s a good article for people with no strong background in NLP or LLMs. Gives a comprehensive overview on how this could be applicable for startups.<p>Enterprise, not so quite, there are a lot of other stuff to consider like points missing, ethical application, filtering, security, points that are very important for enterprise customers.<p>Also, in-context learning is just one way to apply LLMs, there are way more applications like few short learning, fine tuning, depending on cost and application involved as I&#x27;ve highlighted here<p><a href="https:&#x2F;&#x2F;twitter.com&#x2F;igorcosta&#x2F;status&#x2F;1671316499179667456" rel="nofollow noreferrer">https:&#x2F;&#x2F;twitter.com&#x2F;igorcosta&#x2F;status&#x2F;1671316499179667456</a>
snowcrash123almost 2 years ago
I agree on the excerpt on agents. Reliability and reproducibility of task completion is the biggest problem for agents to cross the chasm to real life use cases. When agents are given an objective, they think everything from first principles or scratch about next best action to complete the objective and agent trajectory ends up becoming more of a linguistic dance. But we are solving some of the agent specific problems at SuperAGI <a href="https:&#x2F;&#x2F;github.com&#x2F;TransformerOptimus&#x2F;SuperAGI">https:&#x2F;&#x2F;github.com&#x2F;TransformerOptimus&#x2F;SuperAGI</a> ( disclaimer : Im creator of it ) by doing agent trajectory fine tuning using recursive instructions. Think about objective as telling agent to go from A to B and instructions are akin to giving it directions about the route. And this instruction can be self created after every run and fed into subsequent runs to improve the trajectory.<p>Other problem with agent is : most independent agents are capable of doing very thin slice of use case, but for complex knowledge work tasks, more often than not, one agent is not enough. You need a team of agents. We introduced a concept of Agent Clusters - which operate in master slave architecture and coordinating among themselves to complete nuanced tasks and coordinating via shared memory and shared task list.<p>Another big bottleneck I think is lack of a notional concept of Knowledge for Agents. We have LTM and STM, but knowledge is specialized understanding of particular class of objectives ( ecommerce customer support, Account based marketing, medical diagnostics for particular condition etc ) plugged into the agent. Currently agents leverage on the knowledge available in the LLMs. LLMs are great for intelligence, but not necessarily knowledge required for an objective. So we added concept of knowledge - which is a embedding plugged into agent apart from LTM &#x2F; STM<p>There lot of other challenges that need to solved like agent performance monitoring, agent specific models, agent to agent communication etc to truly solve for agents deployed in production. Not sure about point mentioned in the article that they might even take over the entire stack because autonomous agentic behaviour is good for certain use cases and not for all kinds of apps.
arguflowalmost 2 years ago
Should have Qdrant in the list of vector db&#x27;s
RealQwertiealmost 2 years ago
I think sidecar vector databases that work with existing dbs will emerge as more prevalent than the pure vector DB. I also think the vector &amp; graph combo on highly interconnected data will have additional benefits for those building a wide range of LLM applications. A good example is the VectorLink architecture with TerminusDB [1] which is based on Hierarchical Navigable Small World graphs written in Rust.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;terminusdb-labs&#x2F;terminusdb-semantic-indexer">https:&#x2F;&#x2F;github.com&#x2F;terminusdb-labs&#x2F;terminusdb-semantic-index...</a>
neilvalmost 2 years ago
With everyone writing about LLMs, and not time to read them even 1% of it all, the reason to read A16z is for technical&#x2F;analytical merit, or for an investment-pumping angle?
Xen9almost 2 years ago
In future we will see ton of similar charts borrowing elements from graph and signal theories. There&#x27;s no limit on amount of different LLM-multiagents.
评论 #36409976 未加载
评论 #36411406 未加载
pryelluwalmost 2 years ago
I covered the subject during a python Atlanta talk last month. There isn’t much that’s new at the moment. Mostly because an LLM can be considered a software agent. That may change soon as things become more complex, though. Things like AWS’s Kendra show there’s some new patterns in the pipeline.<p>I’ll say this post is rather shallow to be considered technical or even fit the title
phillipcarteralmost 2 years ago
Ugh. Of course the Enterprise Architecture rears its ugly head here.<p>Just here to say that you can quickly build a robust feature with only OpenAI&#x27;s APIs, redis, a text file for the prompt you parameterize (versioned), and a little bit of glue code (no LangChain). You can add instrumentation for observability around that like you would any other code.<p>I would wager that most enterprise use cases don&#x27;t need most of the tools listed in this article, and using them is complete overkill.
评论 #36412822 未加载
评论 #36413031 未加载
评论 #36412981 未加载
评论 #36412968 未加载
评论 #36413125 未加载
评论 #36413201 未加载
评论 #36414422 未加载
评论 #36412795 未加载
评论 #36413771 未加载
m3kw9almost 2 years ago
Still relatively simple, the stack being LLM hopefully most of the actual “stack” work will be transferred inside the LLM. Example, if context size becomes unlimited, could do away with vector dbs.
评论 #36420014 未加载
akisejalmost 2 years ago
Great starting point! These diagrams notably miss a LLM firewall layer, which is critical in practice to safe LLM adoption. Source: We work with thousands of users for logicloop.com&#x2F;ai
评论 #36409870 未加载
emmenderalmost 2 years ago
are VCs doing architecture now ? huh... architecture astronauts much.<p>looks like they are playing catchup and trying to stay relevant.<p>What happened to their web3 vision ?
zitterbewegungalmost 2 years ago
This feels like exactly what we have done with full stack engineering and recommend everyone in the space needs all of this…
applgo443almost 2 years ago
They mention the contextual stack is is relatively underdeveloped. Any idea on what can be improved there?
评论 #36411857 未加载
DethNinjaalmost 2 years ago
Do a16z invest in small scale AI companies? Or are they only doing series B+ investments?
评论 #36410002 未加载
lmeyerovalmost 2 years ago
Something not obvious to me with these VC diagrams wrt the memory tier being just vector DBs vs also including knowledge graphs<p>Good: We&#x27;re (of course) doing a lot of these architectures behind-the-scenes for louie.ai and client projects around that. Vector embeddings are an easy way to do direct recall for data that&#x27;s bigger-than-context. As long as the user has a simple question that just needs recalling a text snippet that fairly directly overlaps with the question, vector embeddings are magical. Conversational memory for sharing DB queries across teammates, simple discussion of decades of PDF archives and internal wikis... amazing.<p>Not so good: What happens when the text data to answer your question isn&#x27;t a directly semantic search match away? &quot;Why does Team X have so many outages?&quot; =&gt; What projects is Team X on&quot; + &quot;Outages for those projects&quot; + &quot;Analysis for outage&quot; . AFAICT, this gets into:<p>A. Failure: Stick with query -&gt; vector DB -&gt; LLM summary and get the wrong answer over the wrong data<p>B. AutoGPT: Getting into an autoGPT langchain that iteratively queries the vector DB, and iteratively reasons over results, &amp; iteratively plans, until it finds what it wants. But autoGPT seems to be more excitement than production use. Many questions like speed, cost, &amp; quality...<p>C. Knowledge graphs: Getting into use the LLM to generate a higher-quality knowledge graph of the data that is more receptive to LLM querying. The above question now becomes a simpler multi-hop query over the KG, so both fast and cost-effective... If you&#x27;ve indexed correctly and taught your LLM to generate the right queries.<p>(Related: If you&#x27;re into this kind of topic, we&#x27;re hiring here to build out these systems + help use them on our customers in investigative areas like cyber, misinfo, &amp; emergency response. See new openings up @ ttps:&#x2F;&#x2F;www.graphistry.com&#x2F;careers !)
评论 #36416525 未加载
ukuinaalmost 2 years ago
Anyone building vectorDBs in-browser, possibly using WASM?
LeicaLattealmost 2 years ago
Any companies making vector databases for iOS or Android?
fnordpigletalmost 2 years ago
AKA one box and edge for every funded a16z startup