---
title: "ibl.ai's Custom Safety & Moderation Layers in mentorAI"
slug: "iblais-custom-safety-moderation-layers-in-mentorai"
author: "Jeremy Weaver"
date: "2025-09-02 18:10:19.438879"
category: "Premium"
topics: "domain-scoped AI assistants

higher ed AI safety

AI moderation layer

base-model alignment

institutional AI governance

LTI 1.3 integration

Canvas AI sidebar

retrieval-augmented generation (RAG)

scoped retrieval

policy-based guardrails

AI refusal behavior

out-of-scope question handling

AI in LMS

privacy-aware AI

multi-tenant AI architecture

AI audit logging

compliant AI deployment

AI for universities

instructor-controlled AI

citation-backed responses"
summary: "An explainer of mentorAI’s custom safety & moderation layer for higher ed: how domain-scoped assistants sit on top of base-model alignment to enforce campus policies, cite approved sources, and politely refuse out-of-scope requests—consistent behavior across Canvas (LTI 1.3), web, and mobile without over-permitting access."
banner: ""
thumbnail: ""
---

Most large language models arrive with built-in alignment. Helpful—but not nearly specific enough for a university’s norms, risk posture, and use cases. What campuses tell us they need is **an extra layer of governance that’s theirs**: assistants that stay in their lane, cite approved materials, and **politely refuse** anything out of scope.

That’s exactly how we designed mentorAI’s safety & moderation layer: an additive guardrail system that sits above the base model to **enforce institutional policy and domain boundaries**, wherever the assistant is deployed (LMS via LTI, web, or mobile).

---

# What “Domain-Scoped” Actually Means

When you scope an assistant to a domain (e.g., Admissions, Intro to Epidemiology, Academic Integrity Policy), you’re setting **hard boundaries**:

- **Topics**: What the assistant may discuss (allowlist) and what it must decline (denylist).

- **Sources**: Which documents it can cite (e.g., syllabus, slides, policy PDFs)—and what’s off-limits.

- **Audience & role**: How it should respond to students vs. instructors vs. prospective students.

- **Refusal behavior**: The exact language and escalation path when a request is out of scope.

Because this runs **on top** of the model’s native alignment, you get two layers of protection: the model’s baseline safety **plus** institution-specific rules.

# A Concrete Example From The Field

On a Syracuse University deployment, a prospective-student assistant is scoped to admissions and IT FAQs on the public site. Ask it “What’s the best pizza in New York?” and it **declines**—not because the base model can’t answer, but because our moderation layer instructs it to **answer only within the approved domain** (and to redirect helpfully).

That same pattern applies to course assistants: keep Q&A inside the course’s approved materials, decline unrelated or prohibited topics, and cite sources so students know where the answer came from.

# How The Layer Works (Without Over-Permitting Anything)

- **Additive policy prompts**: We bind policy to each assistant: allowed topics, tone, refusal templates, and escalation guidelines.

- **Server-side checks**: We validate intent against the policy before retrieval or tool use; off-domain requests are intercepted and declined.

- **Scoped retrieval (optional)**: If you enable RAG, we only retrieve from the assistant’s **approved corpus**. If you don’t enable RAG, the assistant still respects topic boundaries.

- **Privacy-aware identity**: Institutions choose what identity, if any, to pass via LTI (anonymous, pseudonymous ID, or email) and the assistant adapts behavior accordingly.

Importantly, **no extra LMS permissions** are required just to enforce domain scope.

# Designed for Governance and Faculty Trust

- **Configurable, not opaque**: Instructors and admins can review and adjust policy text (scope, tone, refusal language) so the assistant reflects local pedagogy.

- **Transparent refusals**: When questions fall outside scope, the assistant explains why and suggests approved channels or resources.

- **Auditable (with options)**: Activity logs can be enabled to surface patterns (e.g., repeated questions the syllabus should clarify) while respecting your chosen privacy mode.

- **Multi-tenant by design**: Each school, program, or course can run **its own policies and corpora**, isolated from others.

# Why Higher Ed Needs an Extra Layer (Beyond Base Alignment)

- **Institutional values, enforced**: Base models don’t know your policies. Your assistants should.

- **Reduced off-topic drift**: Domain boundaries keep conversations relevant and reduce hallucination risk.

- **Consistent student experience**: The same, policy-aligned behavior across course sites, advising pages, and public-facing assistants.

- **Lower operational risk**: Clear refusals and scoped sources make compliance reviews and stakeholder sign-off simpler.

---

# The Takeaway

Base-model alignment is a starting point. Campuses need **their own** safety & moderation layer to keep assistants on mission: scoped topics, approved sources, consistent refusals, and governance faculty can understand and adjust.

If you want to see a domain-scoped assistant politely refuse out-of-bounds questions—and help students faster inside your policies—visit **[Schedule a consultation](https://ibl.ai/contact)**