--- title: "ibl.ai's Custom Safety & Moderation Layers in mentorAI" slug: "iblais-custom-safety-moderation-layers-in-mentorai" author: "Jeremy Weaver" date: "2025-09-02 18:10:19.438879" category: "Premium" topics: "domain-scoped AI assistants higher ed AI safety AI moderation layer base-model alignment institutional AI governance LTI 1.3 integration Canvas AI sidebar retrieval-augmented generation (RAG) scoped retrieval policy-based guardrails AI refusal behavior out-of-scope question handling AI in LMS privacy-aware AI multi-tenant AI architecture AI audit logging compliant AI deployment AI for universities instructor-controlled AI citation-backed responses" summary: "An explainer of mentorAI’s custom safety & moderation layer for higher ed: how domain-scoped assistants sit on top of base-model alignment to enforce campus policies, cite approved sources, and politely refuse out-of-scope requests—consistent behavior across Canvas (LTI 1.3), web, and mobile without over-permitting access." banner: "" thumbnail: "" --- Most large language models arrive with built-in alignment. Helpful—but not nearly specific enough for a university’s norms, risk posture, and use cases. What campuses tell us they need is **an extra layer of governance that’s theirs**: assistants that stay in their lane, cite approved materials, and **politely refuse** anything out of scope. That’s exactly how we designed mentorAI’s safety & moderation layer: an additive guardrail system that sits above the base model to **enforce institutional policy and domain boundaries**, wherever the assistant is deployed (LMS via LTI, web, or mobile). --- # What “Domain-Scoped” Actually Means When you scope an assistant to a domain (e.g., Admissions, Intro to Epidemiology, Academic Integrity Policy), you’re setting **hard boundaries**: - **Topics**: What the assistant may discuss (allowlist) and what it must decline (denylist). - **Sources**: Which documents it can cite (e.g., syllabus, slides, policy PDFs)—and what’s off-limits. - **Audience & role**: How it should respond to students vs. instructors vs. prospective students. - **Refusal behavior**: The exact language and escalation path when a request is out of scope. Because this runs **on top** of the model’s native alignment, you get two layers of protection: the model’s baseline safety **plus** institution-specific rules. # A Concrete Example From The Field On a Syracuse University deployment, a prospective-student assistant is scoped to admissions and IT FAQs on the public site. Ask it “What’s the best pizza in New York?” and it **declines**—not because the base model can’t answer, but because our moderation layer instructs it to **answer only within the approved domain** (and to redirect helpfully). That same pattern applies to course assistants: keep Q&A inside the course’s approved materials, decline unrelated or prohibited topics, and cite sources so students know where the answer came from. # How The Layer Works (Without Over-Permitting Anything) - **Additive policy prompts**: We bind policy to each assistant: allowed topics, tone, refusal templates, and escalation guidelines. - **Server-side checks**: We validate intent against the policy before retrieval or tool use; off-domain requests are intercepted and declined. - **Scoped retrieval (optional)**: If you enable RAG, we only retrieve from the assistant’s **approved corpus**. If you don’t enable RAG, the assistant still respects topic boundaries. - **Privacy-aware identity**: Institutions choose what identity, if any, to pass via LTI (anonymous, pseudonymous ID, or email) and the assistant adapts behavior accordingly. Importantly, **no extra LMS permissions** are required just to enforce domain scope. # Designed for Governance and Faculty Trust - **Configurable, not opaque**: Instructors and admins can review and adjust policy text (scope, tone, refusal language) so the assistant reflects local pedagogy. - **Transparent refusals**: When questions fall outside scope, the assistant explains why and suggests approved channels or resources. - **Auditable (with options)**: Activity logs can be enabled to surface patterns (e.g., repeated questions the syllabus should clarify) while respecting your chosen privacy mode. - **Multi-tenant by design**: Each school, program, or course can run **its own policies and corpora**, isolated from others. # Why Higher Ed Needs an Extra Layer (Beyond Base Alignment) - **Institutional values, enforced**: Base models don’t know your policies. Your assistants should. - **Reduced off-topic drift**: Domain boundaries keep conversations relevant and reduce hallucination risk. - **Consistent student experience**: The same, policy-aligned behavior across course sites, advising pages, and public-facing assistants. - **Lower operational risk**: Clear refusals and scoped sources make compliance reviews and stakeholder sign-off simpler. --- # The Takeaway Base-model alignment is a starting point. Campuses need **their own** safety & moderation layer to keep assistants on mission: scoped topics, approved sources, consistent refusals, and governance faculty can understand and adjust. If you want to see a domain-scoped assistant politely refuse out-of-bounds questions—and help students faster inside your policies—visit **[Schedule a consultation](https://ibl.ai/contact)**