> Before I begin I would like to credit the thousands or millions of unknown artists, coders and writers upon whose work the Large Language Models(LLMs) are trained, often without due credit or compensation<p>I like this. If we insist on pushing forward with GenAI we should probably at least make some digital or physical monument like "The Tomb of the Unknown Creator".<p>Cause they sure as sh*t ain't gettin paid. RIP.
I’m surprised to see no mention of AnythingLLM (<a href="https://github.com/Mintplex-Labs/anything-llm">https://github.com/Mintplex-Labs/anything-llm</a>). I use it with an Anthropic API key, but am giving thought to extending it with local LLM.
It’s a great app: good file management for RAG, agents with web search, cross platform desktop client, but can also easily be run as a server using docker compose.<p>Nb: if you’re still paying $20/mo for a feature-poor chat experience that’s locked to a single provider, you should consider using any of the many wonderful chat clients that take a variety of API keys instead. You might find that your LLM utilization doesn’t quite fit a flat rate model, and that the feature set of the third-party client is comparable (or surpasses) that of the LLM provider’s.<p>edit: included repo link; note on API keys as alternative to subscription
If anyone is looking for a one click solution without having to have a Docker running, try Msty - something that I have been working on for almost a year. Has RAG and Web Search built in among others and can connect to your Obsidian vaults as well.<p><a href="https://msty.app" rel="nofollow">https://msty.app</a>
I run a pretty similar setup on an m2-max - 96gb.<p>Just for AI image generation i would rather recommend krita with the <a href="https://github.com/Acly/krita-ai-diffusion">https://github.com/Acly/krita-ai-diffusion</a> plugin.
Open WebUI sure does pull in a lot of dependencies... Do I really need all of langchain, pytorch, and plenty others for what is advertised as a _frontend_?<p>Does anyone know of a lighter/minimalist version?
Super basic intro but perhaps useful. Doesn't mention quant sizes, which is important when you're GPU poor. Lots of other client-side things you can do too, like KoboldAI, TavernAI, Jan, LangFuse for observability, CogVLM2 for a vision model.<p>One of the best places to get the latest info on what people are doing with local models is /lmg/ on 4chan's /g/
anyone got a guide on setting up and running the business-class stuff (70B models over multiple A100, etc)? i'd be willing to spend the money but only if i could get a good guide on how to set everything up, what hardware goes with what motherboard/ram/cpu, etc.
There is a lot I want to do with LLMs locally, but it seems like we're still not quite there hardware-wise (well, within reasonable cost). For example, Llama's smaller models take upwards of 20 seconds to generate a brief response on a 4090; at that point I'd rather just use an API to a service that can generate it in a couple seconds.
There was a post a few weeks back (or a reply to a post) showing an app entirely made using an LLM. It was like a 3D globe made with 3js, and I believe the poster had created it locally on his M4 MacBook with 96 GB RAM? I can't recall which model it was or what else the app did, but maybe someone knows what I'm talking about?
What GPU offers a good balance between cost and performance for running LLMs locally? I'd like to do more experimenting, and am due for a GPU upgrade from my 1080 anyway, but would like to spend less than $1600...
Still nothing better than oobabooga (<a href="https://github.com/oobabooga/text-generation-webui">https://github.com/oobabooga/text-generation-webui</a>) in terms of maximalism/"Pro"/"Prosumer" LLM UI/UX ALA Blender, Photoshop, Final Cut Pro, etc.<p>Embarrassing and any VCs reading this can contact me to talk about how to fix that. lm-studio is today the closest competition (but not close enough) and Adobe or Microsoft could do it if they fired their current folks which prevent this from happening.<p>If you're not using Oobabooga, you're likely not playing with the settings on models, and if you're not playing with your models settings, you're hardly even scratching the surface on its total capabilities.
You can try out <a href="https://wiz.chat" rel="nofollow">https://wiz.chat</a> (my project) if you want to Run llama on your web browser. Still needs a GPU and the latest version of chrome but it's fast enough for my usage.
We will at some point have a JS API to run preliminary LLM to make local decisions, then the server will be final arbiter. So for example a comment rage moderator can help an end user change their proposed post while they write it, to help them not turn the comment into rage bate. This will be done best locally on the users browser. Then when they are ready to post, one final check by the server would be done. This would be like today's React front ends doing all the state and UI computation, reducing servers from having to render HTML, for example.
I have a similar pc and I use text-generation-webui and mostly exllama quantized models.<p>I also deploy text-generation-webui for clients on k8s with gpu for similar reasons.<p>Last I checked, llamafile / ollama are not as optimised for gpu use.<p>For image generation I moved from automatic webui to comfyui a few months ago - they're different beasts, for some workflow automatic is easier to use but for most tasks you can create a better workflow with enough comfy extensions.<p>Facefusion warrants a mention for faceswapping
As a piece of writing feedback, I would convert your citation links into normal links. Clicking on the citation doesn't jump to the link or the citation entry, and you are basically using hyperlinks anyway.
I just use MLC with WebGPU: <a href="https://codepen.io/mikestaub/pen/WNqpNGg" rel="nofollow">https://codepen.io/mikestaub/pen/WNqpNGg</a>
> I have a laptop running Linux with core i9 (32threads) CPU, 4090 GPU (16GB VRAM) and 96 GB of RAM.<p>Is there somewhere I can find a computer like this pre-built?
David Bombal interviews a mysterious man where he shows how he uses AI/LLMs for his automated LinkedIn posts and other tasks. <a href="https://www.youtube.com/watch?v=vF-MQmVxnCs" rel="nofollow">https://www.youtube.com/watch?v=vF-MQmVxnCs</a>
My understanding is that local LLMs are mostly just toys that output basic responses, and simply can’t compete with full LLMs trained with $60 million+ worth of compute time, and that no matter how good hardware gets, larger companies will always have even better hardware and resources to output even better results, so basically this is pointless for anything competitive or serious. Is this accurate?