--- title: "What AI Prior Authorization Actually Costs in 2026" slug: "what-ai-prior-authorization-actually-costs-2026" author: "ibl.ai Engineering" date: "2026-05-30 18:00:00" category: "Premium" topics: "AI prior authorization cost, prior auth automation, AI prior auth pricing, ChatGPT prior auth, Claude prior auth, HIPAA AI prior authorization, Cohere Health pricing, Olive AI cost, prior auth per-letter cost, self-hosted prior auth AI" summary: "Per-letter token math for prior authorization across the latest models, monthly bills at community / regional / IDN scale, and why the per-transaction and per-clinician AI vendors are the wrong shape — even for the workload that started the AI-in-healthcare conversation." banner: "" thumbnail: "" --- ## Prior Authorization Is the Highest-Leverage AI Workload in Healthcare Admin Every health system has the same pain. The AMA's surveys put prior-auth burden at the top of the list of administrative complaints — clinicians spend hours per week on it, denials and re-submissions slow patient care, and the back-office staff who manage the workflow are the highest-turnover roles in the revenue cycle. It's also the workload AI can actually do well. The task is structured: pull patient context, match it to payer-specific medical-necessity criteria, draft a letter with cited reasoning, route for review. A current-generation model handles a first-pass draft in seconds at a cost measured in cents. The vendors in this space — Cohere Health, Olive AI, Notable Health, and now every per-seat AI suite — know it. They've built pricing around the value of the workload (per-transaction fees of $2–5 per letter, or per-clinician seats at $50–100/month) rather than the cost of producing the output. The output cost is much smaller. Showing the math is the post. ## What a Prior-Auth Letter Actually Costs Per Token A typical prior-authorization draft is about **2,000 input tokens** (patient context, visit notes excerpts, payer-specific medical-necessity criteria) and **2,500 output tokens** (the drafted letter with cited reasoning). Cost-per-letter on the major models:

Model	Input ($/MTok)	Output ($/MTok)	$ per letter	When to use it
Claude Opus 4.7	$15	$75	$0.22	Complex appeals, peer-to-peer prep
GPT-5	$10	$30	$0.095	Mixed cases, ambiguous criteria
Claude Sonnet 4.6		$3	$0.044	Standard PA workhorse
Gemini 3 Pro	$3.50	$10.50	$0.033	Long-context (multi-document) cases
Claude Haiku 4.5	$1	$5	$0.015	High-volume routine cases
Gemini 3 Flash	$0.35	$1.05	$0.003	Cheapest hosted option
Llama 4 / DeepSeek-R1 (self-hosted)	~$0	~$0	~$0	Inside the HIPAA boundary

The most expensive frontier model produces a draft for **22 cents**. The mid-tier workhorse: **4 cents**. The cheap hosted option: **fractions of a cent**. Self-hosted on the hospital's own GPU: the marginal cost is the electricity. ## Monthly Bills at Three Scale Tiers Realistic prior-auth volume in 2026: - **Community hospital** (200 beds): ~3,000 letters/month - **Regional health system** (5,000 clinicians): ~12,000 letters/month - **IDN / academic medical center**: ~30,000 letters/month Monthly cost using **Claude Sonnet 4.6** (the standard workhorse model for this workload) vs the per-transaction and per-seat alternatives:

Approach	Pricing shape	Community (3K/mo)	Regional (12K/mo)	IDN (30K/mo)
Specialty PA AI vendor	Per-transaction (~$3/letter)	$9,000	$36,000	$90,000
Specialty PA AI (per-clinician)	~$75/clinician/mo	~$15,000	~$375,000	~$1,500,000
ChatGPT Enterprise	$60/seat × all clinicians	~$12,000	~$300,000	~$1,200,000
Direct API — Claude Sonnet 4.6	Token-based	~$131	~$522	~$1,305
Direct API — GPT-5	Token-based	~$285	~$1,140	~$2,850
Direct API — Gemini 3 Flash	Token-based (cheapest hosted)	~$9	~$36	~$90
ibl.ai self-hosted (Llama 4 / DeepSeek-R1)	Flat license + GPU	~$2,000	~$3,000–5,000	~$5,000–8,000

The ibl.ai row covers the GPU instance (sized so a single H100 handles the IDN workload), the platform license, and ongoing support. The PHI never leaves the hospital's environment. At the IDN scale, the specialty per-clinician vendor is **~200× more expensive** for the same letters drafted. ## Why the Per-Transaction Vendor Math Is the Sneaky One Per-seat is obviously wrong for prior auth — most of a hospital's clinicians don't draft prior auths personally; a small revenue-cycle team does most of the volume on behalf of the medical staff. Buying a seat for every clinician to access an AI tool 80% of them never touch is the same per-seat trap that breaks for every healthcare AI use case. **Per-transaction is the trickier one.** $3 per letter sounds reasonable. It feels aligned to value — the hospital pays the vendor for each piece of work the AI does. But: 1. **The unit price doesn't drop with volume.** A 30,000-letter/month IDN pays the same $3/letter as a 3,000-letter/month community hospital. There's no scale benefit. Self-hosting the same workload costs *less* per letter as volume grows, because the GPU is amortized. 2. **The vendor's marginal cost is the same fraction of a cent the hospital would pay running it directly.** The $3 is value capture, not cost recovery. That's the vendor's business model, not a description of the underlying compute. 3. **PHI residency is still vendor-side.** Per-transaction or per-seat, the data still goes to the vendor's cloud, the BAA still needs annual review, and the model selection is still the vendor's choice — not the hospital's. ## Why HIPAA Makes Self-Hosting Non-Negotiable at Scale PHI cannot leave the HIPAA-covered boundary without a BAA — and even with a BAA, every change in the vendor's data-processing terms, every sub-processor change, and every model switch is a re-papering event. For a workload that touches 30,000 patient records per month, that's a continuous compliance overhead the hospital doesn't see until the OCR audit. Self-hosting changes the geometry. The claw runs inside the hospital's existing covered environment. Patient data, clinical notes, prior-auth narratives, and payer correspondence never traverse a third-party cloud. The model swap (Sonnet for routine, Opus for complex appeals, Haiku for high-volume sorting) is a config change — not a procurement event. The other compliance angle that doesn't get discussed often: **payer medical-necessity criteria change constantly.** Hospitals using a managed AI vendor depend on the vendor to keep the criteria library current. Self-hosting means the hospital owns the criteria library — and can update it the same day the payer publishes the change, not whenever the vendor's release cycle ships it. ## What Stays the Same, What Changes Self-hosting prior-auth AI doesn't mean rebuilding the platform around it. The clinician-facing chat UI, the case-worker dashboards, the audit logs, the model-routing-with-fallbacks, the multi-agent orchestration, the Epic / Cerner / athenahealth integrations — all of that stays managed by ibl.ai. The compute, the model, and the PHI move inside the hospital's covered environment. What disappears: the $90K/month per-transaction bill (or the $1.2M/month per-seat bill) at IDN scale. What appears: a self-hosted prior-auth capability the hospital owns and controls, with the model-routing policy the medical-staff committee designed: - **Sonnet** for standard prior auths (the bulk) - **Opus** for complex appeals and peer-to-peer prep - **Haiku** for triage sorting (which payer, which criteria set) - **Llama 4 self-hosted** for the highest-volume routine cases when even pennies per letter add up ## Run the Numbers for Your Health System For workload sizing and cost modeling for a prior-auth deployment, the **[AI Help Desk Cost Savings Calculator](/resources/calculators/ai-help-desk-savings-calculator)** generalizes well — prior auth is one of the highest-volume, most-amenable-to-automation administrative workloads in healthcare. For the segment-wide cost-math context (not just prior auth), see **[AI Cost Math for Hospitals: Per-Seat vs Usage-Based in 2026](/blog/ai-cost-math-for-hospitals-per-seat-vs-usage)**. For the deployment comparison side-by-side — including HIPAA posture, BAA reach, and air-gapped options — see **[Self-Hosted AI vs ChatGPT Enterprise for Healthcare](/resources/comparisons/self-hosted-ai-vs-chatgpt-enterprise-for-healthcare)**. For the full HIPAA-aligned architecture (Epic / Cerner / athenahealth integrations, Managed VPC → on-prem → air-gapped tiers), read **[Healthcare AI Reference Architecture on ibl.ai](/blog/healthcare-ai-reference-architecture)**. For the staged deployment recipe — Managed VPC for low-sensitivity workloads + on-prem for production — see **[Healthcare AI Blueprint: Managed VPC in 30/60/90 Days](/blog/healthcare-ai-blueprint-managed-vpc-30-60-90-days)**. For the broader pricing landscape (every major model, every major per-seat vendor), the hub: **[What Does AI Actually Cost in 2026?](/blog/what-does-ai-actually-cost-in-2026)**. ## Why Family-Owned and New York Matters Here A health system's AI vendor relationship for a workload as central as prior authorization is a multi-year commitment — the workflows, the criteria library, the EHR integration, the audit trail revenue-cycle compliance relies on. ibl.ai is family-owned and operated from New York, NY — a long-term partner with a perpetual platform license and no investor exit pressure. The runtime is open source. The PHI stays inside the hospital's covered boundary. The math works at a 100-bed community hospital or a 30-hospital IDN.