Hi HN, I am Jan, CTO and co-founder of Pathway.com.<p>We’ve built a LLM microservice that answers questions about a corpus of documents, while automatically reacting to additions of new docs. The single, self-contained service fully replaces a complex multi-system pipeline that scans in real-time for new documents, indexes them into a specialized database and queries it to generate answers. Everyone can have their own real-time vector now.<p>Github: <a href="https://github.com/pathwaycom/llm-app">https://github.com/pathwaycom/llm-app</a>
Demo video: <a href="https://youtu.be/kcrJSk00duw" rel="nofollow noreferrer">https://youtu.be/kcrJSk00duw</a><p>I am eager to hear your thoughts and comments!
To quickly get to the application sources please go to:<p>- <a href="https://github.com/pathwaycom/llm-app/blob/main/llm_app/pathway_pipelines/contextless/app.py">https://github.com/pathwaycom/llm-app/blob/main/llm_app/path...</a> for the simplest contextless app<p>- <a href="https://github.com/pathwaycom/llm-app/blob/main/llm_app/pathway_pipelines/contextful/app.py">https://github.com/pathwaycom/llm-app/blob/main/llm_app/path...</a> for the default app that builds a reactive index of context documents<p>- <a href="https://github.com/pathwaycom/llm-app/blob/main/llm_app/pathway_pipelines/contextful_s3/app.py">https://github.com/pathwaycom/llm-app/blob/main/llm_app/path...</a> for the contextful app reading data from s3<p>- <a href="https://github.com/pathwaycom/llm-app/blob/main/llm_app/pathway_pipelines/local/app.py">https://github.com/pathwaycom/llm-app/blob/main/llm_app/path...</a> for the app using locally available models
I see the ingested documents in the data folder don't have an id field, only a doc field.<p>{"doc": "Using Large Language Models in Pathway is simple: just call the functions from `pathway.stdlib.ml.nlp`!"}<p>What if I pass two contradictory statements? Is there a way to remove (or better update) a document with a new version?<p>For example, if I am ingesting some public docs, and I update a doc page. How do I make so that it only takes the answer from the latest document version?
Hi, interesting!<p>> Then it processes and organizes these documents by building a 'vector index' using the Pathway package.<p>What is the Pathway package?