TechEcho

7 comments

triyambakamabout 2 years ago

This is super helpful. I'm building a document question-answering service over a custom data corpus (related to Saivism, a sect of Hinduism). So far the first pass has been to manually chunk the text (based on headings, chapters etc.) and then I've used OpenAI's embedding service and storing the embeddings in Pinecone. All stiched together using LangChain. To ask a question, the question is again embedded, then searched against the vector store, then the related documents are provided as context to the LLM along with the question.So far it was really easy to set up the prototype, but the results weren't as great as I had hoped, so I'm excited to see how I could improve it.Edit: wow, I didn't see this before. LangChain implements one of the featured article's suggestions (HyDE) - <a href="https://python.langchain.com/en/latest/modules/chains/index_examples/hyde.html" rel="nofollow">https://python.langchain.com/en/latest/modules/chains/index_...</a>

评论 #35736227 未加载

评论 #35736291 未加载

vectoralabout 2 years ago

This is one of the areas of LLMs that I find most interesting. So far, I've found simple question-answering over vectorstores to be a lackluster experience. In particular, the more information you embed and stick into the vectorstore, the less useful the system becomes as you are less likely to get the information you're looking for (especially if the users don't understand their queries need to look like the docs the want to ask about.I haven't had a chance to try out hypothetical embedded docs yet, but I expect they only provide a marginal improvement (especially if QAing over proprietary data or information).I'd love to see any other interesting, more up-to-date resources anyone has found on this topic. I found this recent paper interesting: <a href="https://arxiv.org/abs/2304.11062" rel="nofollow">https://arxiv.org/abs/2304.11062</a>

评论 #35735817 未加载

johntb86about 2 years ago

This document seems to have been written before the toolformer paper[0], which fine tunes the model to use tools (e.g search) to retrieve information.[0]: <a href="https://arxiv.org/abs/2302.04761" rel="nofollow">https://arxiv.org/abs/2302.04761</a>

评论 #35736434 未加载

foxgroverabout 2 years ago

Few other helpful options recently added to Langchain:1. Extraction for query filters - <a href="https://twitter.com/hwchase17/status/1651617956881924096?s=46&t=gkyxL9FAhSE-DiMAkwTkcg" rel="nofollow">https://twitter.com/hwchase17/status/1651617956881924096?s=4...</a>2. Contextual compression to eek more out of prompt stuffing - <a href="https://twitter.com/hwchase17/status/1649428295467905025?s=46&t=gkyxL9FAhSE-DiMAkwTkcg" rel="nofollow">https://twitter.com/hwchase17/status/1649428295467905025?s=4...</a>And then it’s been there’s existing great utility chains for map-reduce, with re-ranking, etc for more ways to apply LLM completions over large documents and/or large sets of documents: 3. <a href="https://m.youtube.com/watch?v=f9_BWhCI4Zo">https://m.youtube.com/watch?v=f9_BWhCI4Zo</a>

gdiamosabout 2 years ago

We are going to need better retrieval methods as LLMs augment and generate more content on the internet.

评论 #35735867 未加载

KrugerDunningsabout 2 years ago

Sentence embeddings have been great for improving semantic search, but I am still struggling with finding relevant documents for numerical values. Questions like "what people where born in 1992" or "people with at least 4 children". One thing I can do is pre-process the data by transforming the date of birth into boomers/zoomers/millenials and the like but this does not help on the question side if people don't know what to ask

quickthrower2about 2 years ago

Is this architecture what autogpt uses?

7 comments

triyambakamabout 2 years ago

评论 #35736227 未加载

评论 #35736291 未加载

vectoralabout 2 years ago

评论 #35735817 未加载

johntb86about 2 years ago

评论 #35736434 未加载

foxgroverabout 2 years ago

gdiamosabout 2 years ago

We are going to need better retrieval methods as LLMs augment and generate more content on the internet.

评论 #35735867 未加载

KrugerDunningsabout 2 years ago

quickthrower2about 2 years ago

Is this architecture what autogpt uses?

Current architectural best practices for LLM applications

7 comments

Current architectural best practices for LLM applications

7 comments