Overview
Llama Guard 3 is Meta’s open-weight safety model for moderating AI prompts and responses. It classifies policy risks, returns structured JSON (allow/block with categories and rationale), and is designed to run beside your assistant for low-latency, context-aware guardrails.
Description
Llama Guard 3 is a lightweight, instruction-tuned classifier that turns safety policies into reliable, machine-readable decisions. You pass it user input or a model’s output—optionally with chat history—and it produces a concise judgment plus tagged categories and short explanations, suitable for logging and automated actions. Unlike simple keyword filters, it reasons over paraphrase, sarcasm, and multi-turn context, helping catch disallowed requests while allowing benign ones that look similar on the surface. Policies are customizable, so teams can start with common areas (e.g., violence, hate, sexual content, self-harm, illegal activity) and adapt wording or thresholds to match internal standards. The model is optimized for latency and determinism, integrates cleanly with function-calling and RAG pipelines, and can be used both pre-generation (to screen prompts) and post-generation (to review candidate answers or enforce safe-completion flows). Overall, Llama Guard 3 provides a practical foundation for building auditable, JSON-first safety controls into real products without heavy infrastructure.
About Meta
We're connecting people to what they care about, powering new, meaningful experiences, and advancing the state-of-the-art through open research and accessible tooling.
View Company Profile