# Safety > Source: https://ibl.ai/safety Safety isn't a feature — it's the product ## The Problem: Single-Layer Safety Fails Traditional approaches create exploitable gaps that users learn to work around ### Input filtering only Blocks obvious requests but misses reframed prompts — academic, hypothetical, or therapeutic framing bypasses the filter ### Vendor promises Relying on the LLM provider's built-in guardrails offers no institutional control and no visibility into what was caught or missed ### Policy statements Written policies without enforcement infrastructure are not protection — they are documentation of intent The failure cycle: boundary discovery → request reframing → system compliance. A single checkpoint is not enough. ibl.ai replaces single-checkpoint safety with enforcement infrastructure ## Dual-Layer Moderation: Two Independent Safety Checkpoints Every interaction passes through both layers — input and output are evaluated independently User sends message → Layer 1 — Input Moderation → LLM generates response → Layer 2 — Output Safety ### Layer 1 — Input Moderation - Evaluates user messages before the LLM processes them - Flags direct harmful requests and evasion attempts - Detects academic, hypothetical, and therapeutic reframing - Blocks problematic prompts before they reach the model ### Layer 2 — Output Safety - Evaluates model responses before delivery to the user - Catches manipulative, authority-framed, or harmful content - Independent verification — does not rely on the LLM's own judgment - Blocks unsafe responses even if input moderation passed Harmful content is blocked — not rephrased into a "safer" version. The interaction is stopped and flagged for administrative review. Flagged interactions flow to institutional governance ## Governance: Visibility & Institutional Control Flagged prompts are not silently discarded — they land in admin queues with full context ### What was requested Full prompt content preserved ### Who submitted it User identity and context ### Frequency & patterns Repeat behavior detection ### Human intervention Escalation triggers and workflows ### Administrative Workflow Identify → Document → Escalate → Support ## Policy Control: Customizable Safety Policies Different audiences have different needs — your institution controls the moderation logic ### What You Control - Moderation logic and rules - Sensitivity thresholds per category - Category focus — which topics to monitor - Audience-specific policies (minors vs. adults) - Context-specific rules (counseling vs. coursework) ### Why It Matters - A K-12 environment requires different guardrails than a corporate setting - Counseling-adjacent interactions need different handling than coursework - Regulatory requirements vary by industry and jurisdiction - One-size-fits-all moderation leaves gaps everywhere ## Why This Matters: The Incentive Problem Commercial AI systems optimize for engagement — institutions need the opposite ### Commercial AI Optimizes For - Engagement and session length - Reduced friction — fewer blocks, more compliance - Broad applicability across consumer use cases - Speed over caution ### Institutions Require - Protection of users — especially vulnerable populations - Liability reduction with auditable enforcement - Governance and institutional oversight - Safety as a non-negotiable, not a tradeoff ## Validation: Stress-Tested Against Real Threats Internal testing covers both direct and evasion-based attack vectors ### Direct attack scenarios Explicit requests for harmful content — weapons, explosives, self-harm — are blocked at the input layer ### Evasion-based scenarios Academically framed, hypothetical, and therapeutic reframing attempts are detected and blocked ### Administrative visibility Every blocked interaction is logged with full context and made visible to administrators ### Latency tradeoff Output evaluation adds processing time — acceptable in contexts where safety matters more than speed ## Summary: Enforcement Infrastructure — Not Promises What the ibl.ai safety system delivers 1. **Dual-layer moderation** — Independent input and output evaluation — two checkpoints, not one. 2. **Flagged visibility** — Every blocked interaction is preserved with full context for administrative review. 3. **Human workflows** — Identify, document, escalate, support — institutional processes connected to the safety system. 4. **Institutional policy control** — Your organization defines moderation logic, sensitivity thresholds, and audience-specific rules. 5. **Blocking, not rephrasing** — Harmful content is stopped — not softened into a compliant-sounding version. --- *[View on ibl.ai](https://ibl.ai/safety)*