Patterns for building LLM-based systems and products

436 pointsby 7d7nalmost 2 years ago

18 comments

rawoke083600almost 2 years ago

It's crazy how fast this field moves ! I basically live on HN and between 30%-40% of the terms or metrics I never even heard of (or maybe I just glanced over in the past)I love articles like these, and how they are able to bring me up to speed (at least to some degree) on the "new paradigm" that is AI/LLM.As a coder I cannot say what the future will look like (binary views) but I can easily believe that in the future we will have MORE AI/LLM and not LESS AI/LLM thus getting up to speed (at least on the acronyms and core theory and concepts) is well worthwhile.Very Good Article !

评论 #36969401 未加载

评论 #36972680 未加载

评论 #36973185 未加载

austinkhalealmost 2 years ago

> 65 min readThat's when you know it's going to be amazing. Single best narrative-form overview of the current state of integrating LLM's into applications and the challenges encountered I've read so far. This is fantastic and must have required an incredible amount of work. Massive kudos to the author.

fooblatalmost 2 years ago

I'm starting to see a lot of products in "beta" that seem to be little more than a very thin wrapper around ChatGPT. So thin that it is trivial to get it to give general responses.I recently trialed an AI Therapy Assistant service. If I stayed on topic, then it stayed on topic. If I asked it to generate poems or code samples, it happily did that too.It felt like they rushed it out without even considering that someone might ask it non-therapy related questions.

评论 #36971141 未加载

评论 #36976849 未加载

评论 #36970162 未加载

评论 #36970304 未加载

评论 #36971664 未加载

shahulesalmost 2 years ago

Evals is not suitable for evaluating LLM applications such as RAG, etc because one has to evaluate on their own data where no golden test data exists, and techniqus used have poor correlation with human judgement. We have build RAGAS framework for this <a href="https://github.com/explodinggradients/ragas">https://github.com/explodinggradients/ragas</a>

评论 #36968357 未加载

lsyalmost 2 years ago

For those who don't have 65 minutes, if you write software you are probably familiar with the concepts of evals, caching, guardrails, defensive UX, and collecting user feedback, none of which are really unique to LLMs. The other two items are "fine-tuning" which just means nudging the LLM to be better at responding a certain way, and "RAG" which is a new acronym that just means using the input to look things up in a database first and concatenate them into the prompt so the LLM uses it as part of the context for token generation.

评论 #36973230 未加载

mercurialsoloalmost 2 years ago

Good note on a design pattern for a LLM based product. The biggest focal points will be if we see evolution of frameworks that tackle the hard parts here.Evals, RAG, Guardrails often times require recursive calls to LLM's or other fine-tuned systems which are based on LLM's.I would like to see LLM's and models condensed and bundled up into more singular task trained models - much more beneficial versus system design on using LLM's for applications.This seems like we are applying traditional system design patterns for using LLM's in practice in apps.

awonghalmost 2 years ago

> Finally, sometimes the best eval is human eval aka vibe check.Pretty telling about where we are in the evolution of these systems.

kcorbittalmost 2 years ago

This is fantastic! I found myself nodding along in many places. I've definitely found in practice that evals are critical to shipping LLM-based apps with confidence. I'm actually working on an open-source tool in this space: <a href="https://github.com/openpipe/openpipe">https://github.com/openpipe/openpipe</a>. Would love any feedback on ways to make it more useful. :)

justanotheratomalmost 2 years ago

"hybrid retrieval (traditional search index + embedding-based search) works better than either alone."- any references for how this hybrid retrieval is done?

评论 #36967382 未加载

评论 #36967297 未加载

评论 #36970188 未加载

评论 #36970461 未加载

jscheelalmost 2 years ago

I've been working on getting LLM-based features out in a production environment for the past few months. This article is absolute gold. Does a great job of capturing several learnings that I think a lot of us are dealing with in silos.

ankitg12almost 2 years ago

This is a very well written and detailed article on the Art of LLMs. Thanks for writing this.

VoodooJuJualmost 2 years ago

Most of these products are just trivial wrappers around the behemoths, wrappers whose creators either can't recognize or don't even use half the patterns rattled off here.I'd be more interested in the sales and marketing patterns being employed to hawk the same rebranded wrappers over and over. Ultimately, that's what's really going to contribute most to the success of all these startups.

评论 #36972986 未加载

soultreesalmost 2 years ago

Very informative. I bookmarked it.

the_tlialmost 2 years ago

Anyone aware of a practical how to on implementing a data flywheel for fine-tuning (improving the model with user feedback)?

评论 #36969328 未加载

marcopicentinialmost 2 years ago

Very informative website! Great

sgt101almost 2 years ago

no help to manage prompt injection...where's the dual LLM pattern?

ilakshalmost 2 years ago

I'm sorry but from a _practical_ standpoint, it feels like mostly fluff. Someone was advertising today on a HN hiring post that they would create a basic chatbot for a specific set of documents for $15,000. This feels like the type of web page that person would use to confuse a client into thinking that was a fair price.Practically speaking the starting point should be things like the APIs such as OpenAI or open source frameworks and software. For example, llama_index <a href="https://github.com/jerryjliu/llama_index">https://github.com/jerryjliu/llama_index</a>. You can use something like that or another GitHub repo built with it to create a customized chatbot application in a few minutes or a few days. (It should not take two weeks and $15,000).It would be good to see something detailed that demonstrates an actual use case for fine tuning. Also, I don't believe that the academic tests are appropriate in that case. If you really were dead set on avoiding a leading edge closed LLM, and doing actual fine-tuning, you would want a person to look at the outputs and judge them in their specific context such as handling customer support requests for that system.

评论 #36967831 未加载

评论 #36972256 未加载

评论 #36968016 未加载

noduermealmost 2 years ago

Oh God, the marketing of barely-understood tech-crafting recipes into new corporate jargon has turned into new acronyms to obfuscate the jargon, and is now accelerating even faster than AI.Did you miss the NFT train? Have you ever asked yourself if this is what you should be doing with your life?Just speaking as a guy who actually writes logic and code, rather than like, coming up with incantations and selling horseshit.

评论 #36967042 未加载

评论 #36967556 未加载