TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: LLGTRT: TensorRT-LLM+Rust server w/ OpenAI-compat and Structured Output

6 点作者 mmoskal7 个月前
I wanted to share a project I&#x27;ve been working on for the past few weeks: llgtrt. It&#x27;s a Rust implementation of a HTTP REST server for hosting Large Language Models using llguidance library for constrained output with NVIDIA TensorRT-LLM.<p>The server is compatible with the OpenAI REST API and supports structured JSON schema enforcement as well as full context-free grammars (via Guidance). It&#x27;s similar in spirit to the Python-based TensorRT-LLM OpenAI server example but written entirely in Rust and built with constraints in mind. No Triton Inference Server involved.<p>This also serves as a demo for the llguidance library, which lets you apply sampling constraints via Rust, C, or Python interface, with minimal generation overhead and no startup cost.<p>Any feedback or questions are welcome!

暂无评论

暂无评论