TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: ArchGW – An open-source intelligent proxy server for prompts

39 点作者 sparacha3 个月前
Hi HN! This is Salman, Adil, Shuguang and Co working on ArchGW[1] - an open-source lightweight proxy server for prompts - written in Rust and built on top of Envoy[2]. Arch moves the critical but pesky handling and processing of prompts: task understanding, prompt routing, safety, and observability - outside business logic. Its an edge and egress proxy for agentic apps.<p>We&#x27;ve talked to 100s of developers at places like Twilio, GE Healthcare, Redhat, Square, etc and there was a consistent theme in building AI apps: to move past a nascent demo they are left to their own devices in building out middle ware capabilities so that developers can move faster and ship with confidence.<p>Today, the approach to building an enterprise-ready AI app is cobbling together a large set of mono-functional tools, adding LLM-based preprocessing steps to determine safety (e.g. applying governance and guardrails), ask clarifying questions to improve task performance, support common agentic operations by packaging and managing function calling scenarios manually, etc. Not to mention, all the undifferentiated work in incorporating different LLM models and versions, and managing resiliency, retries and fallback logic.<p>ArchGW was built with the belief that prompts are nuanced and opaque user requests, which require the same capabilities as traditional HTTP requests including secure handling, intelligent routing, robust observability, and integration with backend (API) systems for personalization – outside business logic. We help built Envoy while at Lyft and think its offers a great foundation to build a proxy to manage traffic for prompts.<p>Here are some additional details about the open source project. ArchGW is written in rust, and the request path has three main parts:<p>* Listener subsystem which handles downstream (ingress) and upstream (egress) request processing.<p>* Prompt handler subsystem. This is where ArchGW makes decisions on the safety of the incoming request via its prompt_guard primitive and identifies where to forward the conversation to via its prompt_target primitive.<p>* Model serving subsystem is the interface that hosts all the lightweight LLMs[3] engineered in ArchGW and offers a framework for things like hallucination detection of our these models<p>We loved building this open source project, and our belief is that this infrastructure primitive would help developers build faster, safer and more personalized agents without all the manual prompt engineering and systems integration work needed to get there. We hope to invite other developers to use and improve Arch. Please give it a shot and leave feedback here, or at our discord channel [4]<p>Also here is a quick demo of the project in action [5]. You can check out our public docs here at [6]. Our models are also available here [7].<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;katanemo&#x2F;archgw">https:&#x2F;&#x2F;github.com&#x2F;katanemo&#x2F;archgw</a><p>[2] <a href="https:&#x2F;&#x2F;www.envoyproxy.io&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.envoyproxy.io&#x2F;</a><p>[3] <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;collections&#x2F;katanemo&#x2F;arch-function-66" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;collections&#x2F;katanemo&#x2F;arch-function-66</a>...<p>[4] <a href="https:&#x2F;&#x2F;discord.com&#x2F;channels&#x2F;1292630766827737088&#x2F;12926307682" rel="nofollow">https:&#x2F;&#x2F;discord.com&#x2F;channels&#x2F;1292630766827737088&#x2F;12926307682</a>...<p>[5] <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=I4Lbhr-NNXk" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=I4Lbhr-NNXk</a><p>[6] <a href="https:&#x2F;&#x2F;docs.archgw.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;docs.archgw.com&#x2F;</a><p>[7] <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;katanemo" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;katanemo</a>

4 条评论

adilhafeez3 个月前
Hi - this is Adil the co-founder who developed archgw. We are working tirelessly to create a framework that would help developers write agentic application without having to write all the crufty&#x2F;boilerplate code. At the very minimum we provide observability and logging without adding much overhead. You can simple plug arch gateway into your existing LLM application and you&#x27;d start seeing details like time-to-first-token, total latency, token count and tons of other observability details. I do recommend start tinkering with our getting started page here [1]<p>And for a bit more advanced use cases I do recommend looking at llm_routing [2] demo and currency_exchange demo [3].<p>We currently support providing seamless interface to major providers like openai, mistral, deepseek and also support hooking up to local providers like ollma [4]<p>[1] - <a href="https:&#x2F;&#x2F;github.com&#x2F;katanemo&#x2F;archgw?tab=readme-ov-file#quickstart">https:&#x2F;&#x2F;github.com&#x2F;katanemo&#x2F;archgw?tab=readme-ov-file#quicks...</a><p>[2] - <a href="https:&#x2F;&#x2F;github.com&#x2F;katanemo&#x2F;archgw&#x2F;tree&#x2F;main&#x2F;demos&#x2F;use_cases&#x2F;llm_routing">https:&#x2F;&#x2F;github.com&#x2F;katanemo&#x2F;archgw&#x2F;tree&#x2F;main&#x2F;demos&#x2F;use_cases...</a><p>[3] - <a href="https:&#x2F;&#x2F;github.com&#x2F;katanemo&#x2F;archgw&#x2F;tree&#x2F;main&#x2F;demos&#x2F;samples_python&#x2F;currency_exchange">https:&#x2F;&#x2F;github.com&#x2F;katanemo&#x2F;archgw&#x2F;tree&#x2F;main&#x2F;demos&#x2F;samples_p...</a><p>[4] - <a href="https:&#x2F;&#x2F;github.com&#x2F;katanemo&#x2F;archgw&#x2F;tree&#x2F;main&#x2F;demos&#x2F;use_cases&#x2F;ollama">https:&#x2F;&#x2F;github.com&#x2F;katanemo&#x2F;archgw&#x2F;tree&#x2F;main&#x2F;demos&#x2F;use_cases...</a>
mikram3 个月前
I think I saw this a few months ago, but never followed up. Why train your own models? Aren&#x27;t you better off using GPT or something like that to handle the tasks Arch uses specialized models for?
评论 #43274719 未加载
_nh_3 个月前
How do you compare with <a href="https:&#x2F;&#x2F;github.com&#x2F;comet-ml&#x2F;opik">https:&#x2F;&#x2F;github.com&#x2F;comet-ml&#x2F;opik</a> in observability?
评论 #43275421 未加载
sparacha3 个月前
Woud love feedback. See if it is useful, or what adaptations would make it useful.