Per-Seat Pricing Was Built for Software You Use Occasionally
A mid-size health system has 5,000 clinicians. ChatGPT Enterprise lists at around $60 per user per month. That's $300,000 per month — $3.6M per year — before a single prior-authorization letter is drafted.
The pricing model was built for collaboration software (Slack, Notion, Salesforce) — tools where most seats sit idle most of the day and the per-seat fee approximates "access." For AI that actually does work — drafting prior auths, summarizing visit notes, triaging messages — the seat model breaks. The cost scales with how many people could use it, not what they do.
The same workload, priced by tokens consumed, costs a fraction. The math is the post.
What the Latest Models Actually Cost in 2026
Token pricing across the major providers, approximate as of mid-2026 (always check provider docs for current rates):
| Model | Provider | Input ($/MTok) | Output ($/MTok) | HIPAA-eligible? |
|---|---|---|---|---|
| Claude Opus 4.7 | Anthropic | $15 | $75 | Yes (BAA) |
| Claude Sonnet 4.6 | Anthropic | $3 | $15 | Yes (BAA) |
| Claude Haiku 4.5 | Anthropic | $1 | $5 | Yes (BAA) |
| GPT-5 | OpenAI | $10 | $30 | Yes (Enterprise BAA) |
| Gemini 3 Pro | $3.50 | $10.50 | Yes (Vertex BAA) | |
| Llama 4 (70B, self-hosted) | Meta (open weights) | ~$0 | ~$0 | Yes (you control PHI) |
| DeepSeek-R1 (self-hosted) | DeepSeek (open weights) | ~$0 | ~$0 | Yes (you control PHI) |
For self-hosted open-weight models, "~$0 per token" means the marginal cost is just the GPU time. A single A100 or H100 instance ($1–3/hour reserved) handles thousands of clinical requests per day.
A Real Workload: Prior Authorization at 5,000-Clinician Health System
Prior authorization is the highest-volume, highest-pain administrative AI use case in any health system. A mid-size system processes roughly 10,000 prior-auth requests per month. Each request is about 500 tokens in (patient context, clinical justification) and 1,500 tokens out (drafted letter with citations to medical-necessity criteria). For a deeper per-letter cost breakdown — including per-transaction specialty vendors (Cohere Health / Olive / Notable) and three scale tiers (community / regional / IDN) — see What AI Prior Authorization Actually Costs in 2026.
That's 5M input + 15M output tokens per month for the entire prior-auth workload — across 5,000 clinicians, that's an average of 2 requests per clinician per month, with heavy concentration on a few high-volume specialties.
What it costs by deployment shape
| Deployment | Pricing shape | Monthly cost | Annual | PHI residency |
|---|---|---|---|---|
| ChatGPT Enterprise | Per-seat ($60/user) | $300,000 | $3,600,000 | OpenAI cloud (BAA) |
| Microsoft 365 Copilot | Per-seat ($30/user) | $150,000 | $1,800,000 | Microsoft cloud (BAA) |
| Direct API — Claude Sonnet 4.6 | Token-based | ~$240 | ~$2,880 | Anthropic cloud (BAA) |
| Direct API — GPT-5 | Token-based | ~$500 | ~$6,000 | OpenAI cloud (BAA) |
| ibl.ai self-hosted (Llama 4 / DeepSeek-R1) | Flat license + GPU | ~$3,000–5,000 | ~$36,000–60,000 | Inside your VPC / on-prem |
The ibl.ai row covers the GPU instance, the platform license, and ongoing support. It does not include the BAA conversation, the vendor risk review, or the re-architecture every time the vendor updates their data-processing terms — because there is no third-party vendor in the data path. The model runs on infrastructure you already own.
Why the Per-Seat Math Doesn't Work in Healthcare
Three reasons per-seat AI fails harder in healthcare than anywhere else:
1. Usage is concentrated. A handful of high-volume specialties (oncology, cardiology, GI) generate most of the prior-auth and documentation load. Buying a seat for every clinician means subsidizing the ones who barely touch it for the ones who hit it constantly. Token pricing aligns the bill to the actual work.
2. The clinical workforce is large and lower-paid than the cost model assumes. A 5,000-clinician system isn't 5,000 attending physicians — it's nurses, techs, residents, schedulers, coders, billers. The seat fee assumes a uniform "knowledge worker" who can absorb $60/month of overhead. For a coding clerk doing prior auth all day, $60/month is fine; for a triage nurse who touches AI twice a week, it's not.
3. PHI residency forces re-purchase, not extension. When a managed AI vendor updates its data-processing terms — or when the FDA / OCR publishes new guidance — every BAA gets re-papered. With self-hosted, the data never leaves; the model swap is a config change, not a procurement event.
What Stays the Same, What Changes
Self-hosting the runtime doesn't mean rebuilding the platform. The chat UI, the clinician dashboards, the audit logs, the model-routing-with-fallbacks, the multi-agent orchestration — all of that stays managed by ibl.ai. The compute, the model, and the PHI move inside the hospital's perimeter.
The trade-off most health systems don't realize: the per-seat SaaS line item is bigger than the all-in self-hosted infrastructure budget. A $3M/year ChatGPT Enterprise contract pays for an internal AI platform team, dedicated GPUs, and the model-choice flexibility that comes with owning the stack — with money left over.
Run the Numbers for Your Health System
For workload-specific calculations — prior auth, clinical documentation, patient messaging triage — use the AI Help Desk Cost Savings Calculator as a starting point (the math generalizes to most high-volume clinical-administrative workloads).
For the deployment comparison side-by-side — including HIPAA posture, BAA reach, and air-gapped options — see Self-Hosted AI vs ChatGPT Enterprise for Healthcare.
For the full HIPAA-aligned architecture (Managed VPC → on-premise → air-gapped tiers, Epic / Cerner / athenahealth integrations, TCO at 10K clinicians), read Healthcare AI Reference Architecture on ibl.ai.
Why Family-Owned and New York Matters Here
The sovereignty argument falls apart if the vendor on the other side of the BAA is on a five-year exit clock, foreign-owned, or acquired before the next OCR audit. ibl.ai is family-owned and operated from New York, NY — a long-term partner for U.S. health systems, defense, and regulated buyers, with a perpetual platform license and no investor exit pressure.
The runtime is open source. The data stays inside the covered boundary. The math works at 100 clinicians or 50,000.