Safety
Safety isn't a feature β it's the product
Single-Layer Safety Fails
Traditional approaches create exploitable gaps that users learn to work around
Input filtering only
Blocks obvious requests but misses reframed prompts β academic, hypothetical, or therapeutic framing bypasses the filter
Vendor promises
Relying on the LLM provider's built-in guardrails offers no institutional control and no visibility into what was caught or missed
Policy statements
Written policies without enforcement infrastructure are not protection β they are documentation of intent
Two Independent Safety Checkpoints
Every interaction passes through both layers β input and output are evaluated independently
Layer 1 β Input Moderation
- β Evaluates user messages before the LLM processes them
- β Flags direct harmful requests and evasion attempts
- β Detects academic, hypothetical, and therapeutic reframing
- β Blocks problematic prompts before they reach the model
Layer 2 β Output Safety
- β Evaluates model responses before delivery to the user
- β Catches manipulative, authority-framed, or harmful content
- β Independent verification β does not rely on the LLM's own judgment
- β Blocks unsafe responses even if input moderation passed
Visibility & Institutional Control
Flagged prompts are not silently discarded β they land in admin queues with full context
What was requested
Full prompt content preserved
Who submitted it
User identity and context
Frequency & patterns
Repeat behavior detection
Human intervention
Escalation triggers and workflows
Administrative Workflow
Customizable Safety Policies
Different audiences have different needs β your institution controls the moderation logic
What You Control
- β Moderation logic and rules
- β Sensitivity thresholds per category
- β Category focus β which topics to monitor
- β Audience-specific policies (minors vs. adults)
- β Context-specific rules (counseling vs. coursework)
Why It Matters
- β A K-12 environment requires different guardrails than a corporate setting
- β Counseling-adjacent interactions need different handling than coursework
- β Regulatory requirements vary by industry and jurisdiction
- β One-size-fits-all moderation leaves gaps everywhere
The Incentive Problem
Commercial AI systems optimize for engagement β institutions need the opposite
Commercial AI Optimizes For
- β Engagement and session length
- β Reduced friction β fewer blocks, more compliance
- β Broad applicability across consumer use cases
- β Speed over caution
Institutions Require
- β Protection of users β especially vulnerable populations
- β Liability reduction with auditable enforcement
- β Governance and institutional oversight
- β Safety as a non-negotiable, not a tradeoff
Stress-Tested Against Real Threats
Internal testing covers both direct and evasion-based attack vectors
Direct attack scenarios
Explicit requests for harmful content β weapons, explosives, self-harm β are blocked at the input layer
Evasion-based scenarios
Academically framed, hypothetical, and therapeutic reframing attempts are detected and blocked
Administrative visibility
Every blocked interaction is logged with full context and made visible to administrators
Latency tradeoff
Output evaluation adds processing time β acceptable in contexts where safety matters more than speed
Enforcement Infrastructure β Not Promises
What the ibl.ai safety system delivers
Dual-layer moderation
Independent input and output evaluation β two checkpoints, not one.
Flagged visibility
Every blocked interaction is preserved with full context for administrative review.
Human workflows
Identify, document, escalate, support β institutional processes connected to the safety system.
Institutional policy control
Your organization defines moderation logic, sensitivity thresholds, and audience-specific rules.
Blocking, not rephrasing
Harmful content is stopped β not softened into a compliant-sounding version.