I'm fairly new to the LLM landscape, so I'm asking for help in finding a suitable solution for my requirements.<p>For my work (software engineer) I usually take a fairly big amount of markdown notes as a braindump (and not particularly well organised).<p>I wanted to ask what are my options for a LLM that could be trained with my personal notes and provide answers to prompts in order to organise the ideas and information. Some examples would be:<p>- What is the deadline for X
- Give me all the tasks that mention Y and have a due date of Z
- Create an agenda for a meeting on A<p>etc..<p>Powerful reasoning is not essential (if that's the correct terminology), the focus would be on correctly identifying topics (in the above example).<p>Some additionalt requirements are:<p>- This would run on a commodity Linux machine, but with relatively modern specs (64Gb RAM, i7 Raptor Lake, but fairly average/low-end GPU)
- I would need this to run in "air gapped" way. i.e. no outside transfer of information, since I might use data I don't want to share (even unknowingly)<p>Thank you
This code based model leaderboard changes somewhat often:
<a href="https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard" rel="nofollow noreferrer">https://huggingface.co/spaces/bigcode/bigcode-models-leaderb...</a><p>You will probably have to quantize the model though. Codellama->
<a href="https://huggingface.co/codellama/CodeLlama-13b-Instruct-hf" rel="nofollow noreferrer">https://huggingface.co/codellama/CodeLlama-13b-Instruct-hf</a><p>——————-
This is the main leaderboard, changes almost daily, and breaks sometimes.
<a href="https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard" rel="nofollow noreferrer">https://huggingface.co/spaces/bigcode/bigcode-models-leaderb...</a><p>Some models are trained specifically for something and coding then some are not. Gotta read the model card / dataset.<p>According to some other thread on Samantha, I think this one may be trained on coding also but you may have to test it or find the thread on Samantha ver 1.2.
<a href="https://huggingface.co/ehartford/samantha-1.2-mistral-7b" rel="nofollow noreferrer">https://huggingface.co/ehartford/samantha-1.2-mistral-7b</a><p>Without gpu memory info, idk which would be best for tokens/sec. You want to offload as much as you can to the gpu. 10-15 tokens/sec is okay. Less than that gets annoying.
I would start with a 7b model then move to 11b or 13b once that is up and running.
Vicuna v1.5 13B q8 runs locally using a 3060ti 8GB VRAM card on Win 10, dual Xeon (16 core) with 128GB RAM. I use LM Studio (mac, win & linux) which is super easy to install and it has a local inference server that you can connect clients to using an openai style api. A very fun project so far...