TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Use Purple LLaMA to test ChatGPT safeguards

4 pointsby saqadriover 1 year ago
I spent some time this weekend playing with LLaMA Guard, a fine-tuned LLaMA-7B model by Meta that lets you add guardrails around generative AI. I recorded a quick demo showing what it does and how to use it.<p>The best part is that you can define your own “safety taxonomy” with it — custom policies for what is safe vs unsafe interactions between humans (prompts) and AI (responses).<p>I wanted to see how “safe” conversations with OpenAI’s ChatGPT were, so I ran a bunch of prompts (a mixture of innocuous and inappropriate) and asked LLaMA Guard to classify the interactions as safe&#x2F;unsafe.<p>My key takeaways from the exercise: 1. OpenAI has done a good job of adding guardrails for its models. LLaMA Guard helped confirm this.<p>2. What makes this really cool is I may have a very specific set of policies I want to enforce ON TOP of the standard guardrails that a model ships with. LLaMA Guard makes this possible.<p>3. This kind of model chaining — passing responses from OpenAI models to LLaMA is becoming increasingly common, and I think we’ll have even more complex pipelines in the near future. It helped to have a consistent interface to store this multi-model pipeline as a config, especially because that same config also contains my safety taxonomy.<p>Try it out yourself:<p>GitHub: https:&#x2F;&#x2F;github.com&#x2F;lastmile-ai&#x2F;aiconfig&#x2F;tree&#x2F;main&#x2F;cookbooks&#x2F;LLaMA-Guard<p>Colab: https:&#x2F;&#x2F;colab.research.google.com&#x2F;drive&#x2F;1CfF0Bzzkd5VETmhsniksSpekpS-LKYtX<p>YouTube: &lt;https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=XxggqoqIVdg&gt;<p>Would love the community&#x27;s feedback on the overall approach.

1 comment

saqadriover 1 year ago
Happy to answer any questions on the approach here! One thing I was slightly disappointed by was the instruction fine-tuning of LLaMA Guard was good for conversations, but not for declarative statements. So framing things as questions flagged the safeguards, but other styles of interactions didn&#x27;t.<p>I wonder if it&#x27;ll be better with LLaMA-13B instead of 7B.<p>Also link doesn&#x27;t render nicely in the text above -- here it is: <a href="https:&#x2F;&#x2F;github.com&#x2F;lastmile-ai&#x2F;aiconfig&#x2F;tree&#x2F;main&#x2F;cookbooks&#x2F;LLaMA-Guard">https:&#x2F;&#x2F;github.com&#x2F;lastmile-ai&#x2F;aiconfig&#x2F;tree&#x2F;main&#x2F;cookbooks&#x2F;...</a>