Demo starts at 50m into the video. This was a bit terrifying to record because 2am the previous night everything was totally broken after a major refactor (so that we could add external LLM support as well as local GPUs). But pressure can be a useful force :-D<p>We start with a stack deployed on my laptop without a GPU, pointing to together.ai so we can run open source LLMs easily without having to have access to a GPU. We show simple inference through the ChatGPT-like web interface (with users, sessions etc) and then simple drag'n'drop RAG.<p>Then we show some helix apps defined as yaml: Marvin the Paranoid Android (just a system prompt on top of llama3:8b), an HR app that interacts with an API, and a surprise API integration I'd done that morning with the podcast host's own OpenAPI spec for their app Screenly.<p>Finally, we deploy it for real on a DigitalOcean droplet for the controlplane - see <a href="https://docs.helix.ml/helix/getting-started/architecture/" rel="nofollow">https://docs.helix.ml/helix/getting-started/architecture/</a> - and a $0.35/h A40 on runpod.io. Armed with a real GPU, we can do image inference and fine-tuning as well as all the things described above!