Prior Authorization Is the Highest-Leverage AI Workload in Healthcare Admin
Every health system has the same pain. The AMA's surveys put prior-auth burden at the top of the list of administrative complaints — clinicians spend hours per week on it, denials and re-submissions slow patient care, and the back-office staff who manage the workflow are the highest-turnover roles in the revenue cycle.
It's also the workload AI can actually do well. The task is structured: pull patient context, match it to payer-specific medical-necessity criteria, draft a letter with cited reasoning, route for review. A current-generation model handles a first-pass draft in seconds at a cost measured in cents.
The vendors in this space — Cohere Health, Olive AI, Notable Health, and now every per-seat AI suite — know it. They've built pricing around the value of the workload (per-transaction fees of $2–5 per letter, or per-clinician seats at $50–100/month) rather than the cost of producing the output.
The output cost is much smaller. Showing the math is the post.
What a Prior-Auth Letter Actually Costs Per Token
A typical prior-authorization draft is about 2,000 input tokens (patient context, visit notes excerpts, payer-specific medical-necessity criteria) and 2,500 output tokens (the drafted letter with cited reasoning). Cost-per-letter on the major models:
| Model | Input ($/MTok) | Output ($/MTok) | $ per letter | When to use it |
|---|---|---|---|---|
| Claude Opus 4.7 | $15 | $75 | $0.22 | Complex appeals, peer-to-peer prep |
| GPT-5 | $10 | $30 | $0.095 | Mixed cases, ambiguous criteria |
| Claude Sonnet 4.6 | $3 | $0.044 | Standard PA workhorse | |
| Gemini 3 Pro | $3.50 | $10.50 | $0.033 | Long-context (multi-document) cases |
| Claude Haiku 4.5 | $1 | $5 | $0.015 | High-volume routine cases |
| Gemini 3 Flash | $0.35 | $1.05 | $0.003 | Cheapest hosted option |
| Llama 4 / DeepSeek-R1 (self-hosted) | ~$0 | ~$0 | ~$0 | Inside the HIPAA boundary |
The most expensive frontier model produces a draft for 22 cents. The mid-tier workhorse: 4 cents. The cheap hosted option: fractions of a cent. Self-hosted on the hospital's own GPU: the marginal cost is the electricity.
Monthly Bills at Three Scale Tiers
Realistic prior-auth volume in 2026:
- Community hospital (200 beds): ~3,000 letters/month
- Regional health system (5,000 clinicians): ~12,000 letters/month
- IDN / academic medical center: ~30,000 letters/month
Monthly cost using Claude Sonnet 4.6 (the standard workhorse model for this workload) vs the per-transaction and per-seat alternatives:
| Approach | Pricing shape | Community (3K/mo) | Regional (12K/mo) | IDN (30K/mo) |
|---|---|---|---|---|
| Specialty PA AI vendor | Per-transaction (~$3/letter) | $9,000 | $36,000 | $90,000 |
| Specialty PA AI (per-clinician) | ~$75/clinician/mo | ~$15,000 | ~$375,000 | ~$1,500,000 |
| ChatGPT Enterprise | $60/seat × all clinicians | ~$12,000 | ~$300,000 | ~$1,200,000 |
| Direct API — Claude Sonnet 4.6 | Token-based | ~$131 | ~$522 | ~$1,305 |
| Direct API — GPT-5 | Token-based | ~$285 | ~$1,140 | ~$2,850 |
| Direct API — Gemini 3 Flash | Token-based (cheapest hosted) | ~$9 | ~$36 | ~$90 |
| ibl.ai self-hosted (Llama 4 / DeepSeek-R1) | Flat license + GPU | ~$2,000 | ~$3,000–5,000 | ~$5,000–8,000 |
The ibl.ai row covers the GPU instance (sized so a single H100 handles the IDN workload), the platform license, and ongoing support. The PHI never leaves the hospital's environment. At the IDN scale, the specialty per-clinician vendor is ~200× more expensive for the same letters drafted.
Why the Per-Transaction Vendor Math Is the Sneaky One
Per-seat is obviously wrong for prior auth — most of a hospital's clinicians don't draft prior auths personally; a small revenue-cycle team does most of the volume on behalf of the medical staff. Buying a seat for every clinician to access an AI tool 80% of them never touch is the same per-seat trap that breaks for every healthcare AI use case.
Per-transaction is the trickier one. $3 per letter sounds reasonable. It feels aligned to value — the hospital pays the vendor for each piece of work the AI does. But:
The unit price doesn't drop with volume. A 30,000-letter/month IDN pays the same $3/letter as a 3,000-letter/month community hospital. There's no scale benefit. Self-hosting the same workload costs less per letter as volume grows, because the GPU is amortized.
The vendor's marginal cost is the same fraction of a cent the hospital would pay running it directly. The $3 is value capture, not cost recovery. That's the vendor's business model, not a description of the underlying compute.
PHI residency is still vendor-side. Per-transaction or per-seat, the data still goes to the vendor's cloud, the BAA still needs annual review, and the model selection is still the vendor's choice — not the hospital's.
Why HIPAA Makes Self-Hosting Non-Negotiable at Scale
PHI cannot leave the HIPAA-covered boundary without a BAA — and even with a BAA, every change in the vendor's data-processing terms, every sub-processor change, and every model switch is a re-papering event. For a workload that touches 30,000 patient records per month, that's a continuous compliance overhead the hospital doesn't see until the OCR audit.
Self-hosting changes the geometry. The claw runs inside the hospital's existing covered environment. Patient data, clinical notes, prior-auth narratives, and payer correspondence never traverse a third-party cloud. The model swap (Sonnet for routine, Opus for complex appeals, Haiku for high-volume sorting) is a config change — not a procurement event.
The other compliance angle that doesn't get discussed often: payer medical-necessity criteria change constantly. Hospitals using a managed AI vendor depend on the vendor to keep the criteria library current. Self-hosting means the hospital owns the criteria library — and can update it the same day the payer publishes the change, not whenever the vendor's release cycle ships it.
What Stays the Same, What Changes
Self-hosting prior-auth AI doesn't mean rebuilding the platform around it. The clinician-facing chat UI, the case-worker dashboards, the audit logs, the model-routing-with-fallbacks, the multi-agent orchestration, the Epic / Cerner / athenahealth integrations — all of that stays managed by ibl.ai. The compute, the model, and the PHI move inside the hospital's covered environment.
What disappears: the $90K/month per-transaction bill (or the $1.2M/month per-seat bill) at IDN scale.
What appears: a self-hosted prior-auth capability the hospital owns and controls, with the model-routing policy the medical-staff committee designed:
- Sonnet for standard prior auths (the bulk)
- Opus for complex appeals and peer-to-peer prep
- Haiku for triage sorting (which payer, which criteria set)
- Llama 4 self-hosted for the highest-volume routine cases when even pennies per letter add up
Run the Numbers for Your Health System
For workload sizing and cost modeling for a prior-auth deployment, the AI Help Desk Cost Savings Calculator generalizes well — prior auth is one of the highest-volume, most-amenable-to-automation administrative workloads in healthcare.
For the segment-wide cost-math context (not just prior auth), see AI Cost Math for Hospitals: Per-Seat vs Usage-Based in 2026.
For the deployment comparison side-by-side — including HIPAA posture, BAA reach, and air-gapped options — see Self-Hosted AI vs ChatGPT Enterprise for Healthcare.
For the full HIPAA-aligned architecture (Epic / Cerner / athenahealth integrations, Managed VPC → on-prem → air-gapped tiers), read Healthcare AI Reference Architecture on ibl.ai.
For the staged deployment recipe — Managed VPC for low-sensitivity workloads + on-prem for production — see Healthcare AI Blueprint: Managed VPC in 30/60/90 Days.
For the broader pricing landscape (every major model, every major per-seat vendor), the hub: What Does AI Actually Cost in 2026?.
Why Family-Owned and New York Matters Here
A health system's AI vendor relationship for a workload as central as prior authorization is a multi-year commitment — the workflows, the criteria library, the EHR integration, the audit trail revenue-cycle compliance relies on. ibl.ai is family-owned and operated from New York, NY — a long-term partner with a perpetual platform license and no investor exit pressure. The runtime is open source. The PHI stays inside the hospital's covered boundary. The math works at a 100-bed community hospital or a 30-hospital IDN.