I am part of a larger community, which organizes themselves through loads of E-Mails, PDFa etc. Many questions one has about the current state of affairs could be, in my opinion, done through a ChatGPT like interface.<p>How would one go about training a model based on local files? Is it possible? What would I have to do?
For non commercial use? To answer your question, finetune a llama based instruction model, maybe using the lit-llama repo. For this you will need to rent a pretty beefy cloud instance, and you will need to resume the finetuning (or use a LORA) to put new data in. Then host it on a cheaper server with a llama.cpp frontend.<p>But what you <i>really</i> might want is a vector search. This seems like a better fit.
There are some "drag and drop" type solutions, like <a href="https://www.chatbase.co/" rel="nofollow">https://www.chatbase.co/</a>. There are various more - search for custom chatgpt on product hunt and you'll find a lot.