Hi HN,<p>I'm excited to introduce Mixlayer, a platform I've been working on over the past 6 months that allows you to code and deploy prompts using simple JavaScript functions.<p>Mixlayer recreates the developer experience of using LLMs locally without having to do all of the local setup yourself. I originally came up with this idea when using LLMs on my MacBook and thought it’d be cool to build a product that makes it easy for everyone. It compiles your code to a WASM binary and runs it alongside a custom inference stack I wrote in Rust. When you integrate LLMs in this way, your code and the model share a common context window that stays open for the duration of your program’s execution. I find many common prompting patterns become much simpler when applied in this way versus using a generic OpenAI-style inference API.<p>Some cool features:
* Tool calling: LLM has direct access to your code, just pass objects containing functions and their descriptions
* Hidden tokens: Mark certain tokens as "hidden" to recreate long-running reasoning and iterative refinement operations like gpt-4o.
* Output constraints: Use regular expressions to constrain the generated text
* Instant deployment: we can host your prompts behind an API that we scale for you<p>Tech details:
* Built on Huggingface's candle crate
* Supports continuous batching and multi-GPU for larger models
* WASM allows me to support for more prompt languages easily in the future<p>Models:
* Free tier: Llama 3.1 8b (on NVIDIA L4s, shared resources)
* Paid tier: Faster models on A100s (soon H100 SXMs)
* Llama 3.1 70b (currently gated due to resource constraints, requires 8xH100 SXMs)<p>Future:
* Vision models
* More elaborate decoding methods (e.g. beam)
* Multiple model prompts (routing/spawning/forking/joining)<p>I’m happy to discuss any of the internal/technical details around how I built this.<p>Thank you for your time and feedback!