I have this idea to use LLM daily. Train it on my emails / notes / chats . Have it draft replies and I edit them as needed and it learns from that.<p>Is anyone doing anything like that? I have all of the open source stuff downloaded (models , lollms-webui , promptfoo, etc ) and have been experimenting with the interactive chat stuff . Also txtai to make semantic search.<p>That all seems pretty mature / progressing nicely . A few more months and I expect a clear reference stack will emerge .<p>What about the assistant stack ? I invest all these resources to self host and feed in all my data. I want to maximize the ROI.
The most limiting factor I’ve come across is hitting the context window. Eventually your new eager employee starts to forget what you’ve taught them but they’re too confident to admit it.
I'm pretty interested in this as well. I have moved from Notion to Obsidian for my personal notes, to-do lists and errata in preparation for this since obsidian uses local plaintext files.<p>What I would <i>love</i> to get working at some point is allowing an LLM access to my schedule, notes and goals and then have it help prompt me at appropriate times. "Hey, TJ I noticed you haven't worked out this week, it's sunny today this might be a good time". That sort of thing.<p>There seem to be good tooling around agents, prompt engineering, RAG etc. The 'glue' around getting the LLM to help figure out when appropriate time(s) to check in with me is the bit I am missing, but that's probably mostly down to me being an artist and only a very very JR hobbyist programmer though.
Training a local LLM on individual facts is a tricky one. Typically it’s not possible to train with a limited quantity of data and expect the model to generalize on that data well. In context learning generalizes well, but it’s a bad fit for an “employee” model that’s supposed to learn over a long stretch of time.<p>If your goal is to bake new concepts into the model weights, your only real option is a dataset with that concept being used in a wide variety of contexts.<p>A more feasible approach I think would be retrial augmented generation. You’d essentially store your conversations in a database and calculate embeddings as you go. This would allow you to later do a natural language search of the database, and insert the most relevant portion of the conversation into your context window.
I would like to extend the question: is anyone building a homelab for the specific purpose of training a LLM on their personal info? The choice of hardware (for speed, cost, and noise concerns) seems important.
I was thinking on doing something among the same lines but with code:<p>An LLM that is specifically trained for Software Development, and to which I feed the code of all my company's repositories. And I keep feeding commits/pull requests.<p>The idea is that I can query it about architectural issues, code improvements, and other technical aspects at different levels of abstraction (code, architecture, business, etc).<p>So far, I've played a bit with CodeRabbit and it's "just ok" but it is more of a very small windows to what "could be" than being actually useful.
As a solo developer answering emails that basically point people to various guides and FAQs I’ve published … I need this. Zendesk claims to have an AI component but forces you to input all training data into their own wiki knowledge base. I can see why they don’t want to use prior responses as training (pii concerns), but at least give me some boilerplate responses that I can use to get a head start and further train the model(s).
I've been building workflow assistants that make existing employees more productive or enable entirely new business models. Some of these assistants use selected local models (due to cost or privacy factors)<p>Currently the stack is gravitates around:<p>- GPT-4 - either to drive the entire workflow OR generate prompts, plans and guidelines for the local models to execute.<p>- structured knowledge bases (either derived from existing sources OR curated manually by companies to drive AI assistants).<p>- embedding search indexes, augmented by full-text search. Usually LLM has access to the search engine and can drive the search as needed, refining the queries if results aren't good enough.<p>All of that is instrumented with logic to capture user feedback at every single step. This is crucial for the continuous improvement of the model!<p>Bigger model can use this information once in a while, to improve plans and workflow guidelines to make the overall process more efficient.<p>AMA, if needed!
Everyone I know just uses the hosted ones, because of the sheer performance gap.<p>For now, you can do all the custom/manual training you want, but gpt4 will almost always outperform it with the right context.<p>Hopefully that will change in the future. Even then, I don't expect people to want to self-host as in on their own machines. More like custom training, then host either on SAAS or PAAS, or their own on-prem if they have it. Spending the performance of a personal laptop isn't worth the reduction of performance on other tasks. Again, maybe that will change.
Cool use case, glad to see txtai [1] is helping (I'm the main dev for txtai).<p>Since you're using txtai, this article I just wrote yesterday might be helpful: <a href="https://neuml.hashnode.dev/build-rag-pipelines-with-txtai" rel="nofollow noreferrer">https://neuml.hashnode.dev/build-rag-pipelines-with-txtai</a><p>Looks like you've received a lot of great ideas here already though!<p>1 - <a href="https://github.com/neuml/txtai">https://github.com/neuml/txtai</a>
Local models have taken a mind boggling leap over the past months so i'm sure we'll be able to add layers soon by ourselves even on a laptop?<p>Seriously this is not far from Chat GPT 3.5 in only 6.7GB's and runs on a Macbook Air:<p><a href="https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-GGUF" rel="nofollow noreferrer">https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-GGUF</a><p>But yeah current context windows are limiting.
I've been using & contributing to Lightrail (<a href="https://github.com/lightrail-ai/lightrail">https://github.com/lightrail-ai/lightrail</a>). Each instance comes with a local vectorDB and integrates with apps like Chrome & VSCode, so I can read in content like my notes, emails, etc. It doesn't support self-hosted LLMs yet unfortunately!
There's Rewind.ai for macOS, which tracks all audio, video, and text it can see as you work, then lets you query it via its local LLM chat. Works pretty well. Also can summarize meetings and work with your calendar in certain ways. It does not use local documents you have not viewed on-screen; it doesn't index your file directories.
I have a similar goal/desire as you. My current project is ingesting this type of data into Elasticsearch with vector embeddings and using a normal search+knn to generate some context when creating a prompt.<p>This works reasonably well with gpt4, but my context is almost always too large for self-hosted models so far.
This is something I imagine coming out of Autogen or OpenAi Assistants in a few months. You really need multiple agents (as of now) most of the time. IMO multiple GPT4 agents ARE smart enough to accomplish a lot, it's getting them working together and setup that's the issue.