The Short Answer
Self-hosted AI for hospitals and health systems means the AI runtime executes inside your existing HIPAA-covered environment — your own VPC, on-premise data center, or dedicated air-gapped enclave. ibl.ai handles orchestration, the chat UI, model routing, and integrations from outside the boundary. Compute, model artifacts, and PHI stay inside. No managed-cloud BAA in the critical path.
Why Hospitals End Up Here
Every serious clinical AI program follows the same arc:
- Pilot on managed cloud SaaS. Fast, one workload, single BAA. Works for 6–18 months.
- Expand to Managed VPC. Same vendor, hospital-controlled cloud environment. Still requires BAA; PHI still leaves the hospital perimeter at request time.
- Settle on self-hosted. Runtime executes inside the hospital's existing HIPAA-covered environment. PHI never crosses the trust boundary.
Most reach stage 3 because the highest-volume workloads (prior auth, clinical documentation, intake triage) drive enough compliance overhead at managed scale that the BAA model stops being efficient. Self-hosted flattens the compliance graph.
What "Self-Hosted" Looks Like Operationally
The runtime sits inside the covered environment. Three deployment options that share the same platform:
- Managed VPC — the same AWS / Azure / GCP VPC that already hosts your EHR data lake, HL7 feeds, and patient-portal back end. Best for high-volume compliance workloads.
- On-premise — a dedicated GPU cluster inside your data center (or a colo'd one). Best for IDNs with significant on-prem infrastructure and IT teams that prefer to manage their own metal.
- Fully air-gapped — no internet egress; model artifacts pinned locally. Best for the most sensitive workloads: clinical research, prior-auth appeals, discharge-summary review, IRB-overseen agents.
Model artifacts live inside the boundary. Weights, prompt templates, agent configuration — all pinned, all versioned by your IT, all updated on your schedule. No CDN-pulled runtime configuration.
LLM provider APIs are either disabled or proxied through hospital-controlled routing. Frontier-lab models can be used (Claude via Bedrock, GPT-5 via Azure OpenAI) — but the proxy enforces data residency, logs every call to your SIEM, and the hospital decides which models are permitted for which workloads.
ibl.ai's role is the orchestration layer: chat UI, mentor management, multi-agent coordination, model routing with fallbacks, audit logging, dashboards. The connection between the platform and the hospital-hosted runtime is a secure Ed25519-signed WebSocket; the platform sees orchestration metadata (which mentor, which skill, which model class), not the payloads.
Workloads Self-Hosted Handles Best
High-volume, PHI-heavy, latency-tolerant workloads are where self-hosted's cost + compliance advantage compounds most:
- Prior authorization — 10,000–30,000 letters per month at typical health-system scale. Highest-volume administrative AI workload in any hospital.
- Clinical documentation — ambient scribing, dictation cleanup, structured-note generation. PHI content is dense; the workload sits in the EHR's blast radius.
- Patient-intake triage — inbound message classification, severity flagging, clinical-urgency detection.
- Discharge-summary review — instructions, medication reconciliation, follow-up scheduling. Every discharge becomes audit-relevant evidence.
- Prior-auth appeals + peer-to-peer prep — high-complexity workloads requiring frontier reasoning (Opus, GPT-5).
- Clinical research Q&A — trial-protocol questions, drug-interaction lookup, evidence synthesis.
For the per-workload cost breakdown, see What AI Prior Authorization Actually Costs in 2026.
The Cost Math
A 5,000-clinician regional health system, ~10,000 prior-auth requests per month (representative workload):
| Approach | Monthly cost | PHI location |
|---|---|---|
| ChatGPT Enterprise ($60/clinician × 5K) | $300,000 | OpenAI cloud |
| Microsoft 365 Copilot ($30/clinician × 5K) | $150,000 | Microsoft cloud |
| Specialty PA AI vendor (per-clinician ~$75) | $375,000 | Vendor cloud |
| Direct Claude Sonnet API | ~$240 | Anthropic cloud |
| ibl.ai self-hosted (Llama 4 / DeepSeek-R1) | ~$3,000–5,000 | Inside the hospital's perimeter |
ibl.ai self-hosted is ~60× cheaper than ChatGPT Enterprise for the same workload, with PHI never leaving the hospital's environment.
For the full segment cost-math context, see AI Cost Math for Hospitals: Per-Seat vs Usage-Based in 2026.
Why Self-Hosted Is the Default Endpoint
Three structural reasons hospitals trend toward self-hosted over time:
1. The BAA model breaks at scale. Multiple LLM providers running different models for different workloads → multiple BAAs renewed on different vendors' clocks → continuous compliance overhead. Self-hosted means the runtime is part of the hospital's existing HIPAA scope; the BAA conversation disappears for the runtime layer.
2. Examiner subpoenas reach the vendor. When OCR audits, PHI that lived in a vendor's cloud — even briefly — adds a chain-of-custody question. Self-hosted means the audit lives in the hospital's SIEM, on infrastructure the hospital can produce.
3. Payer criteria change faster than vendor release cycles. Prior-auth medical-necessity criteria update weekly per payer. Managed vendors typically lag 2–6 weeks on criteria updates. Self-hosted means the criteria library is the hospital's — updated the same day the payer publishes the change.
Run the Numbers
- AI Cost Math for Hospitals: Per-Seat vs Usage-Based in 2026 — segment-wide cost math
- What AI Prior Authorization Actually Costs in 2026 — per-letter token math + vendor comparison
- Air-Gapped Clinical AI Platform — the air-gapped tier specifically
- Self-Hosted AI vs ChatGPT Enterprise for Healthcare — deployment comparison
- Healthcare AI Reference Architecture on ibl.ai — full FERPA-by-design architecture
- Healthcare AI Blueprint: Managed VPC in 30/60/90 Days — staged deployment recipe
Why Family-Owned and New York Matters Here
A health system's AI vendor relationship for workloads as central as prior auth and clinical documentation is a multi-year commitment. ibl.ai is family-owned and operated from New York, NY — a U.S.-headquartered, domestically-owned, long-term partner with a perpetual platform license and no investor exit pressure. The runtime is open source. The PHI stays inside the covered boundary. The math works at a 100-bed community hospital or a 30-hospital IDN.
Self-hosted AI for hospitals isn't an enterprise-tier upgrade. It's the architecture that survives the third HIPAA-compliance review.