TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: LLGTRT: TensorRT-LLM+Rust server w/ OpenAI-compat and Structured Output

6 pointsby mmoskal7 months ago
I wanted to share a project I&#x27;ve been working on for the past few weeks: llgtrt. It&#x27;s a Rust implementation of a HTTP REST server for hosting Large Language Models using llguidance library for constrained output with NVIDIA TensorRT-LLM.<p>The server is compatible with the OpenAI REST API and supports structured JSON schema enforcement as well as full context-free grammars (via Guidance). It&#x27;s similar in spirit to the Python-based TensorRT-LLM OpenAI server example but written entirely in Rust and built with constraints in mind. No Triton Inference Server involved.<p>This also serves as a demo for the llguidance library, which lets you apply sampling constraints via Rust, C, or Python interface, with minimal generation overhead and no startup cost.<p>Any feedback or questions are welcome!

no comments

no comments