For my experiments with new self-hostable models on Linux, I've been using a script to download GGUF-models from TheBloke on HuggingFace (currently, TheBloke's repository has 657 models in the GGUF format) which I feed to a simple program I wrote which invokes llama.cpp compiled with GPU support. The GGUF format and TheBloke are a blessing, because I'm able to check out new models basically on the day of their release (TheBloke is very fast) and without an issue. However, the only frontend I have is console. Judging by their site, their setup is exactly the same as mine (which I implemented over a weekend), except that they also added a React-based UI on top. I wonder, how they're planning to commercialize it, because it's pretty trivial to replicate, and there're already open-source UI's like oogabooga.