TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Mixlayer – code and deploy LLM prompts using JavaScript

5 pointsby zackangelo7 months ago
Hi HN,<p>I&#x27;m excited to introduce Mixlayer, a platform I&#x27;ve been working on over the past 6 months that allows you to code and deploy prompts using simple JavaScript functions.<p>Mixlayer recreates the developer experience of using LLMs locally without having to do all of the local setup yourself. I originally came up with this idea when using LLMs on my MacBook and thought it’d be cool to build a product that makes it easy for everyone. It compiles your code to a WASM binary and runs it alongside a custom inference stack I wrote in Rust. When you integrate LLMs in this way, your code and the model share a common context window that stays open for the duration of your program’s execution. I find many common prompting patterns become much simpler when applied in this way versus using a generic OpenAI-style inference API.<p>Some cool features: * Tool calling: LLM has direct access to your code, just pass objects containing functions and their descriptions * Hidden tokens: Mark certain tokens as &quot;hidden&quot; to recreate long-running reasoning and iterative refinement operations like gpt-4o. * Output constraints: Use regular expressions to constrain the generated text * Instant deployment: we can host your prompts behind an API that we scale for you<p>Tech details: * Built on Huggingface&#x27;s candle crate * Supports continuous batching and multi-GPU for larger models * WASM allows me to support for more prompt languages easily in the future<p>Models: * Free tier: Llama 3.1 8b (on NVIDIA L4s, shared resources) * Paid tier: Faster models on A100s (soon H100 SXMs) * Llama 3.1 70b (currently gated due to resource constraints, requires 8xH100 SXMs)<p>Future: * Vision models * More elaborate decoding methods (e.g. beam) * Multiple model prompts (routing&#x2F;spawning&#x2F;forking&#x2F;joining)<p>I’m happy to discuss any of the internal&#x2F;technical details around how I built this.<p>Thank you for your time and feedback!

no comments

no comments