Hi HN,<p>I've been developing Portkey Gateway, an open-source AI gateway that's now processing billions of tokens daily across 200+ LLMs. Today, we're launching a significant update: integrated Guardrails at the gateway level.<p>Key technical features:
1. Guardrails as middleware: We've implemented a hooks architecture that allows guardrails to act as middleware in the request/response flow. This enables real-time LLM output evaluation and transformation.
2. Flexible orchestration: The gateway can now route requests based on guardrail verdicts. This allows for complex logic like fallbacks to different models or prompts based on output quality.
3. Plugin system: We've designed a modular plugin system that allows integration of various guardrail implementations (e.g., anthropic/constrained-llm, microsoft/guidance).
4. Stateless design: The guardrails implementation maintains the gateway's stateless nature, ensuring scalability and allowing for easy horizontal scaling.
5. Unified API: Despite the added complexity, we've maintained our unified API across different LLM providers, now extended to include guardrail configurations.<p>Implementation details:
* The guardrails are implemented as async functions in the request pipeline.
* We use a combination of regex and LLM-based evaluation for output validation.
* The system supports both pre-processing (input modification) and post-processing (output filtering/transformation) guardrails.<p>Performance impact:
* Latency increase is minimal (<50ms) for most deterministic guardrails.
* We've implemented caching mechanisms to reduce repeated evaluations.
* Since the gateway lives on the edge, it avoids longer roundtrips<p>Challenges we're still tackling:
* Balancing strict guardrails with maintaining model creativity
* Standardizing evaluation metrics across different types of guardrails
* Handling guardrail false positives/negatives effectively<p>We believe this approach of integrating guardrails at the gateway level provides a powerful tool for managing LLM behavior in production environments.<p>The code is open-source, and we welcome contributions and feedback. We're particularly interested in hearing about specific use cases or challenges you've faced in implementing reliable LLM systems.<p>Detailed documentation: <a href="https://portkey.wiki/guardrails" rel="nofollow">https://portkey.wiki/guardrails</a><p>What are your thoughts on this approach? Are there specific guardrail implementations or orchestration patterns you'd like to see added?
saw your tweet on X, nice work and congrats on launching!<p>i'm curious about the caching mechanisms you've implemented to reduce repeated evaluations - are you using a traditional cache store like redis or something more bespoke?