TechEcho

I spent some time this weekend playing with LLaMA Guard, a fine-tuned LLaMA-7B model by Meta that lets you add guardrails around generative AI. I recorded a quick demo showing what it does and how to use it.The best part is that you can define your own “safety taxonomy” with it — custom policies for what is safe vs unsafe interactions between humans (prompts) and AI (responses).I wanted to see how “safe” conversations with OpenAI’s ChatGPT were, so I ran a bunch of prompts (a mixture of innocuous and inappropriate) and asked LLaMA Guard to classify the interactions as safe/unsafe.My key takeaways from the exercise: 1. OpenAI has done a good job of adding guardrails for its models. LLaMA Guard helped confirm this.2. What makes this really cool is I may have a very specific set of policies I want to enforce ON TOP of the standard guardrails that a model ships with. LLaMA Guard makes this possible.3. This kind of model chaining — passing responses from OpenAI models to LLaMA is becoming increasingly common, and I think we’ll have even more complex pipelines in the near future. It helped to have a consistent interface to store this multi-model pipeline as a config, especially because that same config also contains my safety taxonomy.Try it out yourself:GitHub: https://github.com/lastmile-ai/aiconfig/tree/main/cookbooks/LLaMA-GuardColab: https://colab.research.google.com/drive/1CfF0Bzzkd5VETmhsniksSpekpS-LKYtXYouTube: <https://www.youtube.com/watch?v=XxggqoqIVdg>Would love the community's feedback on the overall approach.

1 comment

saqadriover 1 year ago

Happy to answer any questions on the approach here! One thing I was slightly disappointed by was the instruction fine-tuning of LLaMA Guard was good for conversations, but not for declarative statements. So framing things as questions flagged the safeguards, but other styles of interactions didn't.I wonder if it'll be better with LLaMA-13B instead of 7B.Also link doesn't render nicely in the text above -- here it is: <a href="https://github.com/lastmile-ai/aiconfig/tree/main/cookbooks/LLaMA-Guard">https://github.com/lastmile-ai/aiconfig/tree/main/cookbooks/...</a>

1 comment

saqadriover 1 year ago

Show HN: Use Purple LLaMA to test ChatGPT safeguards

1 comment

Show HN: Use Purple LLaMA to test ChatGPT safeguards

1 comment