The Regional Bank Math: $60 × 10,000 Employees Is Not the Right Number
A regional bank has 10,000 employees — relationship managers, compliance analysts, back-office operations, IT, branch staff. ChatGPT Enterprise at $60 per user per month is $600,000 per month — $7.2M per year. Microsoft 365 Copilot at $30 per user is $300,000 per month — $3.6M per year. Most of those seats touch AI a handful of times per week, if that.
The per-seat model was built for productivity software where every desk needs occasional access. For AI doing real work — KYC document review, AML alert triage, advisor copilot, internal policy Q&A — the cost should scale with the work, not the org chart. And the data should stay in the bank's VPC, not a vendor's cloud where every quarter's DPA refresh is a compliance event.
The math is the post.
What the Latest Models Actually Cost in 2026
Token pricing across the major providers, approximate as of mid-2026 (always check provider docs for current rates):
| Model | Provider | Input ($/MTok) | Output ($/MTok) | Best for |
|---|---|---|---|---|
| Claude Opus 4.7 | Anthropic | $15 | $75 | Complex KYC narratives, advisor copilot |
| Claude Sonnet 4.6 | Anthropic | $3 | $15 | AML alert triage, document classification |
| Claude Haiku 4.5 | Anthropic | $1 | $5 | High-volume routing, transaction tagging |
| GPT-5 | OpenAI | $10 | $30 | Sanctions-screening narration, internal Q&A |
| Gemini 3 Pro | $3.50 | $10.50 | Long-context filings & disclosures | |
| Llama 4 (70B, self-hosted) | Meta (open weights) | ~$0 | ~$0 | In-VPC bulk workloads, sensitive desks |
| DeepSeek-R1 (self-hosted) | DeepSeek (open weights) | ~$0 | ~$0 | Cost-sensitive batch reasoning |
For self-hosted open-weight models, the marginal cost is GPU time. A reserved H100 instance ($1.50–3/hour) handles tens of thousands of bank workflows per day inside the bank's VPC.
A Real Workload: AML Alert Triage at a Regional Bank
AML alert triage is the highest-volume, highest-pain compliance AI use case in retail and commercial banking. A regional bank generates roughly 40,000 alerts per month. A typical alert is 800 input tokens (transaction context, customer history, sanctions hits) and 1,200 output tokens (narrative explaining the disposition with cited reasoning). For a deeper per-alert cost breakdown — including a side-by-side against Quantexa, NICE Actimize, Hawk AI, ComplyAdvantage, and Feedzai at three scale tiers (community / regional / G-SIB) — see What AI AML Alert Triage Actually Costs in 2026.
That's 32M input + 48M output tokens per month for the entire alert workload — concentrated on a few hundred compliance analysts, not spread across the bank's 10K headcount.
What it costs by deployment shape
| Deployment | Pricing shape | Monthly cost | Annual | Data residency |
|---|---|---|---|---|
| ChatGPT Enterprise | Per-seat ($60/user × 10K) | $600,000 | $7,200,000 | OpenAI cloud (DPA) |
| Microsoft 365 Copilot | Per-seat ($30/user × 10K) | $300,000 | $3,600,000 | Microsoft cloud (DPA) |
| Glean | Per-seat (~$40/user × 10K) | $400,000 | $4,800,000 | Glean cloud (DPA) |
| Direct API — Claude Sonnet 4.6 | Token-based | ~$816 | ~$9,792 | Anthropic cloud (bank DPA) |
| Direct API — GPT-5 | Token-based | ~$1,760 | ~$21,120 | OpenAI cloud (bank DPA) |
| ibl.ai self-hosted (Llama 4 / DeepSeek-R1) | Flat license + GPU | ~$5,000–15,000 | ~$60,000–180,000 | Inside the bank's VPC / on-prem |
The ibl.ai row covers the GPU instance, the platform license, and ongoing support. There is no third-party vendor in the data path, no managed-cloud DPA to renegotiate, and no question about whether the model provider could be examiner-subpoenaed for transaction records.
Why Per-Seat Pricing Fails Harder in Financial Services
Three structural reasons:
1. Usage is concentrated in compliance, risk, and front-office advisory. A retail-bank teller doesn't generate AML narratives; a compliance analyst does. Buying a seat for every employee subsidizes the 9,500 who barely use AI for the 500 who depend on it. Token pricing — or a flat-rate platform — aligns the bill to the work.
2. SR 11-7 model risk applies to the whole stack, not just the model. OCC SR 11-7 and the joint Fed/OCC/FDIC model-risk guidance require validation, governance, and ongoing monitoring of any model affecting bank decisions. A managed AI vendor that controls the model selection, the training data, and the inference path is a sole-source dependency that risk committees have to underwrite as a single point of failure. A self-hosted, model-agnostic stack passes the test by being inspectable and swappable.
3. Examiner subpoenas don't stop at the bank's perimeter. When the OCC, FINRA, or a state regulator asks for the full reasoning behind a flagged transaction, the bank produces it. When that reasoning lives inside a third-party AI vendor's cloud, the bank introduces a chain-of-custody question that doesn't exist when the model runs inside the bank's VPC.
What Stays the Same, What Changes
Self-hosting the runtime doesn't mean rebuilding the bank's AI tooling. The chat UI, the agent dashboards, the audit logs, the model-routing-with-fallbacks, the multi-agent orchestration, the integration with Bloomberg / Refinitiv / FIS — all of that stays managed by ibl.ai. The compute, the model, and the transaction data move inside the bank's VPC.
What disappears: the $3.6–7M/year per-seat line item. What appears: an internal AI capability the bank owns and audits, with the model-choice flexibility that model risk committees require — Opus for the high-stakes advisor copilot, Sonnet for the AML triage queue, Llama 4 for the air-gapped trading-desk workload.
Run the Numbers for Your Bank
For workload sizing and cost modeling for your AML, KYC, and advisory teams, the AI Help Desk Cost Savings Calculator generalizes to most high-volume bank-administrative workloads.
For the deployment comparison side-by-side — including FINRA / SR 11-7 / GLBA posture and air-gapped options for trading and private-client desks — see Self-Hosted AI vs ChatGPT Enterprise for Financial Services.
For the full SEC / FINRA / SOX / PCI / SR 11-7 aligned architecture (Bloomberg / Refinitiv / FIS integration, model-output versioning, air-gapped tier), read Financial Services AI Reference Architecture on ibl.ai.
For the staged deployment recipe — Managed VPC for low-sensitivity workloads + air-gapped for trading and private-client desks — see Financial Services Blueprint: Air-Gapped AI in 90 Days.
Why Family-Owned and New York Matters Here
The regulatory exposure of a bank's AI vendor relationship is non-trivial — every DPA refresh, every change in data-processing terms, every vendor acquisition is an event the bank's third-party-risk team has to underwrite. ibl.ai is family-owned and operated from New York, NY — a long-term partner with a perpetual platform license and no investor exit pressure. The runtime is open source. The transaction data stays inside the bank's network. The math works at a 500-employee community bank or a 50,000-employee regional.