--- title: "AI Cost Math for Financial Services: Per-Seat vs Usage-Based in 2026" slug: "ai-cost-math-for-financial-services-per-seat-vs-usage" author: "ibl.ai Engineering" date: "2026-05-30 12:00:00" category: "Premium" topics: "AI cost financial services, bank AI pricing, KYC AML AI, FINRA AI compliance, SR 11-7 model risk, ChatGPT Enterprise financial services, GPT-5 banking, Claude Opus financial services, self-hosted bank AI, per-seat vs usage-based" summary: "What AI actually costs a regional bank in 2026 — token pricing for the latest models against the $300–600K/month ChatGPT Enterprise and Copilot bills, with KYC/AML workload math and SR 11-7 model risk on a stack you can audit." banner: "" thumbnail: "" --- ## The Regional Bank Math: $60 × 10,000 Employees Is Not the Right Number A regional bank has 10,000 employees — relationship managers, compliance analysts, back-office operations, IT, branch staff. ChatGPT Enterprise at $60 per user per month is **$600,000 per month — $7.2M per year**. Microsoft 365 Copilot at $30 per user is **$300,000 per month — $3.6M per year**. Most of those seats touch AI a handful of times per week, if that. The per-seat model was built for productivity software where every desk needs occasional access. For AI doing real work — KYC document review, AML alert triage, advisor copilot, internal policy Q&A — the cost should scale with the work, not the org chart. And the data should stay in the bank's VPC, not a vendor's cloud where every quarter's DPA refresh is a compliance event. The math is the post. ## What the Latest Models Actually Cost in 2026 Token pricing across the major providers, approximate as of mid-2026 (always check provider docs for current rates):

Model	Provider	Input ($/MTok)	Output ($/MTok)	Best for
Claude Opus 4.7	Anthropic	$15	$75	Complex KYC narratives, advisor copilot
Claude Sonnet 4.6	Anthropic	$3	$15	AML alert triage, document classification
Claude Haiku 4.5	Anthropic	$1	$5	High-volume routing, transaction tagging
GPT-5	OpenAI	$10	$30	Sanctions-screening narration, internal Q&A
Gemini 3 Pro	Google	$3.50	$10.50	Long-context filings & disclosures
Llama 4 (70B, self-hosted)	Meta (open weights)	~$0	~$0	In-VPC bulk workloads, sensitive desks
DeepSeek-R1 (self-hosted)	DeepSeek (open weights)	~$0	~$0	Cost-sensitive batch reasoning

For self-hosted open-weight models, the marginal cost is GPU time. A reserved H100 instance ($1.50–3/hour) handles tens of thousands of bank workflows per day inside the bank's VPC. ## A Real Workload: AML Alert Triage at a Regional Bank AML alert triage is the highest-volume, highest-pain compliance AI use case in retail and commercial banking. A regional bank generates roughly **40,000 alerts per month**. A typical alert is 800 input tokens (transaction context, customer history, sanctions hits) and 1,200 output tokens (narrative explaining the disposition with cited reasoning). For a deeper per-alert cost breakdown — including a side-by-side against Quantexa, NICE Actimize, Hawk AI, ComplyAdvantage, and Feedzai at three scale tiers (community / regional / G-SIB) — see **[What AI AML Alert Triage Actually Costs in 2026](/blog/what-ai-aml-alert-triage-actually-costs-2026)**. That's **32M input + 48M output tokens per month** for the entire alert workload — concentrated on a few hundred compliance analysts, not spread across the bank's 10K headcount. ### What it costs by deployment shape

Deployment	Pricing shape	Monthly cost	Annual	Data residency
ChatGPT Enterprise	Per-seat ($60/user × 10K)	$600,000	$7,200,000	OpenAI cloud (DPA)
Microsoft 365 Copilot	Per-seat ($30/user × 10K)	$300,000	$3,600,000	Microsoft cloud (DPA)
Glean	Per-seat (~$40/user × 10K)	$400,000	$4,800,000	Glean cloud (DPA)
Direct API — Claude Sonnet 4.6	Token-based	~$816	~$9,792	Anthropic cloud (bank DPA)
Direct API — GPT-5	Token-based	~$1,760	~$21,120	OpenAI cloud (bank DPA)
ibl.ai self-hosted (Llama 4 / DeepSeek-R1)	Flat license + GPU	~$5,000–15,000	~$60,000–180,000	Inside the bank's VPC / on-prem

The ibl.ai row covers the GPU instance, the platform license, and ongoing support. There is no third-party vendor in the data path, no managed-cloud DPA to renegotiate, and no question about whether the model provider could be examiner-subpoenaed for transaction records. ## Why Per-Seat Pricing Fails Harder in Financial Services Three structural reasons: **1. Usage is concentrated in compliance, risk, and front-office advisory.** A retail-bank teller doesn't generate AML narratives; a compliance analyst does. Buying a seat for every employee subsidizes the 9,500 who barely use AI for the 500 who depend on it. Token pricing — or a flat-rate platform — aligns the bill to the work. **2. SR 11-7 model risk applies to the whole stack, not just the model.** OCC SR 11-7 and the joint Fed/OCC/FDIC model-risk guidance require validation, governance, and ongoing monitoring of any model affecting bank decisions. A managed AI vendor that controls the model selection, the training data, and the inference path is a sole-source dependency that risk committees have to underwrite as a single point of failure. A self-hosted, model-agnostic stack passes the test by being inspectable and swappable. **3. Examiner subpoenas don't stop at the bank's perimeter.** When the OCC, FINRA, or a state regulator asks for the full reasoning behind a flagged transaction, the bank produces it. When that reasoning lives inside a third-party AI vendor's cloud, the bank introduces a chain-of-custody question that doesn't exist when the model runs inside the bank's VPC. ## What Stays the Same, What Changes Self-hosting the runtime doesn't mean rebuilding the bank's AI tooling. The chat UI, the agent dashboards, the audit logs, the model-routing-with-fallbacks, the multi-agent orchestration, the integration with Bloomberg / Refinitiv / FIS — all of that stays managed by ibl.ai. The compute, the model, and the transaction data move inside the bank's VPC. What disappears: the $3.6–7M/year per-seat line item. What appears: an internal AI capability the bank owns and audits, with the model-choice flexibility that model risk committees require — Opus for the high-stakes advisor copilot, Sonnet for the AML triage queue, Llama 4 for the air-gapped trading-desk workload. ## Run the Numbers for Your Bank For workload sizing and cost modeling for your AML, KYC, and advisory teams, the **[AI Help Desk Cost Savings Calculator](/resources/calculators/ai-help-desk-savings-calculator)** generalizes to most high-volume bank-administrative workloads. For the deployment comparison side-by-side — including FINRA / SR 11-7 / GLBA posture and air-gapped options for trading and private-client desks — see **[Self-Hosted AI vs ChatGPT Enterprise for Financial Services](/resources/comparisons/self-hosted-ai-vs-chatgpt-enterprise-for-financial-services)**. For the full SEC / FINRA / SOX / PCI / SR 11-7 aligned architecture (Bloomberg / Refinitiv / FIS integration, model-output versioning, air-gapped tier), read **[Financial Services AI Reference Architecture on ibl.ai](/blog/financial-services-ai-reference-architecture)**. For the staged deployment recipe — Managed VPC for low-sensitivity workloads + air-gapped for trading and private-client desks — see **[Financial Services Blueprint: Air-Gapped AI in 90 Days](/blog/financial-services-blueprint-air-gapped-ai-90-days)**. ## Why Family-Owned and New York Matters Here The regulatory exposure of a bank's AI vendor relationship is non-trivial — every DPA refresh, every change in data-processing terms, every vendor acquisition is an event the bank's third-party-risk team has to underwrite. ibl.ai is family-owned and operated from New York, NY — a long-term partner with a perpetual platform license and no investor exit pressure. The runtime is open source. The transaction data stays inside the bank's network. The math works at a 500-employee community bank or a 50,000-employee regional.