I wanted to share a project I've been working on for the past few weeks: llgtrt. It's a Rust implementation of a HTTP REST server for hosting Large Language Models using llguidance library for constrained output with NVIDIA TensorRT-LLM.<p>The server is compatible with the OpenAI REST API and supports structured JSON schema enforcement as well as full context-free grammars (via Guidance). It's similar in spirit to the Python-based TensorRT-LLM OpenAI server example but written entirely in Rust and built with constraints in mind. No Triton Inference Server involved.<p>This also serves as a demo for the llguidance library, which lets you apply sampling constraints via Rust, C, or Python interface, with minimal generation overhead and no startup cost.<p>Any feedback or questions are welcome!