I spent some time this weekend playing with LLaMA Guard, a fine-tuned LLaMA-7B model by Meta that lets you add guardrails around generative AI. I recorded a quick demo showing what it does and how to use it.<p>The best part is that you can define your own “safety taxonomy” with it — custom policies for what is safe vs unsafe interactions between humans (prompts) and AI (responses).<p>I wanted to see how “safe” conversations with OpenAI’s ChatGPT were, so I ran a bunch of prompts (a mixture of innocuous and inappropriate) and asked LLaMA Guard to classify the interactions as safe/unsafe.<p>My key takeaways from the exercise:
1. OpenAI has done a good job of adding guardrails for its models. LLaMA Guard helped confirm this.<p>2. What makes this really cool is I may have a very specific set of policies I want to enforce ON TOP of the standard guardrails that a model ships with. LLaMA Guard makes this possible.<p>3. This kind of model chaining — passing responses from OpenAI models to LLaMA is becoming increasingly common, and I think we’ll have even more complex pipelines in the near future. It helped to have a consistent interface to store this multi-model pipeline as a config, especially because that same config also contains my safety taxonomy.<p>Try it out yourself:<p>GitHub: https://github.com/lastmile-ai/aiconfig/tree/main/cookbooks/LLaMA-Guard<p>Colab: https://colab.research.google.com/drive/1CfF0Bzzkd5VETmhsniksSpekpS-LKYtX<p>YouTube: <https://www.youtube.com/watch?v=XxggqoqIVdg><p>Would love the community's feedback on the overall approach.
Happy to answer any questions on the approach here! One thing I was slightly disappointed by was the instruction fine-tuning of LLaMA Guard was good for conversations, but not for declarative statements. So framing things as questions flagged the safeguards, but other styles of interactions didn't.<p>I wonder if it'll be better with LLaMA-13B instead of 7B.<p>Also link doesn't render nicely in the text above -- here it is: <a href="https://github.com/lastmile-ai/aiconfig/tree/main/cookbooks/LLaMA-Guard">https://github.com/lastmile-ai/aiconfig/tree/main/cookbooks/...</a>