TechEcho

Hi HN,I built this open-source LLM red teaming tool based on my experience scaling LLMs at a big co to millions of users... and seeing all the bad things people did.How it works:- Uses an unaligned model to create toxic inputs- Runs these inputs through your app using different techniques: raw, prompt injection, and a chain-of-thought jailbreak that tries to re-frame the request to trick the LLM.- Probes a bunch of other failure cases (e.g. will your customer support bot recommend a competitor? Does it think it can process a refund when it can't? Will it leak your user's address?)- Built on top of promptfoo, a popular eval toolOne interesting thing about my approach is that almost none of the tests are hardcoded. They are all tailored toward the specific purpose of your application, which makes the attacks more potent.Some of these tests reflect fundamental, unsolved issues with LLMs. Other failures can be solved pretty trivially by prompting or safeguards.Most businesses will never ship LLMs without at least being able to quantify these types of risks. So I hope this helps someone out. Happy building!

2 comments

danenania12 months ago

I haven't yet tried this red teaming tool, but I recently started using promptfoo to build out an evals pipeline for Plandex, a terminal-based AI coding tool I'm building[1]. promptfoo has been a pleasure to work with so far and I'd recommend it to anyone who knows they need evals but isn't sure where to begin.It's quite flexible for different kinds of prompting scenarios and makes it easy to e.g. test a prompt n number of times (good for catching long-tail issues), only re-run evals that failed previously (helps to reduce costs/running time when you're iterating), or define various kinds of success criteria--exactly matches an expected string, contains an expected substring, a boolean JSON property is true/false, an LLM call that determines success, etc. etc. It pretty much covers all the bases on that front.It can also treat prompts as jinja2 templates which is good for testing 'dynamic' prompts which take parameters (all of Plandex's prompts are like this).It seems like a good foundation to build red teaming on top of.1 - <a href="https://github.com/plandex-ai/plandex">https://github.com/plandex-ai/plandex</a>

Oras12 months ago

Can this be dynamic on prompts and providers?I’m thinking of continuous evaluation for LLM in production, where after each call, a webhook will send the input/output to evaluate.

2 comments

danenania12 months ago

Oras12 months ago

Can this be dynamic on prompts and providers?I’m thinking of continuous evaluation for LLM in production, where after each call, a webhook will send the input/output to evaluate.

Show HN: Automated red teaming for your LLM app

2 comments

Show HN: Automated red teaming for your LLM app

2 comments