---
title: "AI Cost Math for Financial Services: Per-Seat vs Usage-Based in 2026"
slug: "ai-cost-math-for-financial-services-per-seat-vs-usage"
author: "ibl.ai Engineering"
date: "2026-05-30 12:00:00"
category: "Premium"
topics: "AI cost financial services, bank AI pricing, KYC AML AI, FINRA AI compliance, SR 11-7 model risk, ChatGPT Enterprise financial services, GPT-5 banking, Claude Opus financial services, self-hosted bank AI, per-seat vs usage-based"
summary: "What AI actually costs a regional bank in 2026 — token pricing for the latest models against the $300–600K/month ChatGPT Enterprise and Copilot bills, with KYC/AML workload math and SR 11-7 model risk on a stack you can audit."
banner: ""
thumbnail: ""
---

## The Regional Bank Math: $60 × 10,000 Employees Is Not the Right Number

A regional bank has 10,000 employees — relationship managers, compliance analysts, back-office operations, IT, branch staff. ChatGPT Enterprise at $60 per user per month is **$600,000 per month — $7.2M per year**. Microsoft 365 Copilot at $30 per user is **$300,000 per month — $3.6M per year**. Most of those seats touch AI a handful of times per week, if that.

The per-seat model was built for productivity software where every desk needs occasional access. For AI doing real work — KYC document review, AML alert triage, advisor copilot, internal policy Q&A — the cost should scale with the work, not the org chart. And the data should stay in the bank's VPC, not a vendor's cloud where every quarter's DPA refresh is a compliance event.

The math is the post.

## What the Latest Models Actually Cost in 2026

Token pricing across the major providers, approximate as of mid-2026 (always check provider docs for current rates):

<table style="width:100%; border-collapse:collapse; margin:1.5rem 0; font-size:0.95rem;">
  <thead>
    <tr style="background:#f5f5f0; border-bottom:2px solid #2175C5;">
      <th style="text-align:left; padding:0.75rem; color:#5f6368;">Model</th>
      <th style="text-align:left; padding:0.75rem; color:#5f6368;">Provider</th>
      <th style="text-align:right; padding:0.75rem; color:#5f6368;">Input ($/MTok)</th>
      <th style="text-align:right; padding:0.75rem; color:#5f6368;">Output ($/MTok)</th>
      <th style="text-align:left; padding:0.75rem; color:#5f6368;">Best for</th>
    </tr>
  </thead>
  <tbody>
    <tr style="border-bottom:1px solid #e5e7eb;">
      <td style="padding:0.75rem;"><strong>Claude Opus 4.7</strong></td>
      <td style="padding:0.75rem;">Anthropic</td>
      <td style="text-align:right; padding:0.75rem; font-variant-numeric:tabular-nums;">$15</td>
      <td style="text-align:right; padding:0.75rem; font-variant-numeric:tabular-nums;">$75</td>
      <td style="padding:0.75rem;">Complex KYC narratives, advisor copilot</td>
    </tr>
    <tr style="border-bottom:1px solid #e5e7eb;">
      <td style="padding:0.75rem;"><strong>Claude Sonnet 4.6</strong></td>
      <td style="padding:0.75rem;">Anthropic</td>
      <td style="text-align:right; padding:0.75rem; font-variant-numeric:tabular-nums;">$3</td>
      <td style="text-align:right; padding:0.75rem; font-variant-numeric:tabular-nums;">$15</td>
      <td style="padding:0.75rem;">AML alert triage, document classification</td>
    </tr>
    <tr style="border-bottom:1px solid #e5e7eb;">
      <td style="padding:0.75rem;"><strong>Claude Haiku 4.5</strong></td>
      <td style="padding:0.75rem;">Anthropic</td>
      <td style="text-align:right; padding:0.75rem; font-variant-numeric:tabular-nums;">$1</td>
      <td style="text-align:right; padding:0.75rem; font-variant-numeric:tabular-nums;">$5</td>
      <td style="padding:0.75rem;">High-volume routing, transaction tagging</td>
    </tr>
    <tr style="border-bottom:1px solid #e5e7eb;">
      <td style="padding:0.75rem;"><strong>GPT-5</strong></td>
      <td style="padding:0.75rem;">OpenAI</td>
      <td style="text-align:right; padding:0.75rem; font-variant-numeric:tabular-nums;">$10</td>
      <td style="text-align:right; padding:0.75rem; font-variant-numeric:tabular-nums;">$30</td>
      <td style="padding:0.75rem;">Sanctions-screening narration, internal Q&A</td>
    </tr>
    <tr style="border-bottom:1px solid #e5e7eb;">
      <td style="padding:0.75rem;"><strong>Gemini 3 Pro</strong></td>
      <td style="padding:0.75rem;">Google</td>
      <td style="text-align:right; padding:0.75rem; font-variant-numeric:tabular-nums;">$3.50</td>
      <td style="text-align:right; padding:0.75rem; font-variant-numeric:tabular-nums;">$10.50</td>
      <td style="padding:0.75rem;">Long-context filings & disclosures</td>
    </tr>
    <tr style="border-bottom:1px solid #e5e7eb;">
      <td style="padding:0.75rem;"><strong>Llama 4 (70B, self-hosted)</strong></td>
      <td style="padding:0.75rem;">Meta (open weights)</td>
      <td style="text-align:right; padding:0.75rem; font-variant-numeric:tabular-nums;">~$0</td>
      <td style="text-align:right; padding:0.75rem; font-variant-numeric:tabular-nums;">~$0</td>
      <td style="padding:0.75rem;">In-VPC bulk workloads, sensitive desks</td>
    </tr>
    <tr style="border-bottom:1px solid #e5e7eb;">
      <td style="padding:0.75rem;"><strong>DeepSeek-R1 (self-hosted)</strong></td>
      <td style="padding:0.75rem;">DeepSeek (open weights)</td>
      <td style="text-align:right; padding:0.75rem; font-variant-numeric:tabular-nums;">~$0</td>
      <td style="text-align:right; padding:0.75rem; font-variant-numeric:tabular-nums;">~$0</td>
      <td style="padding:0.75rem;">Cost-sensitive batch reasoning</td>
    </tr>
  </tbody>
</table>

For self-hosted open-weight models, the marginal cost is GPU time. A reserved H100 instance ($1.50–3/hour) handles tens of thousands of bank workflows per day inside the bank's VPC.

## A Real Workload: AML Alert Triage at a Regional Bank

AML alert triage is the highest-volume, highest-pain compliance AI use case in retail and commercial banking. A regional bank generates roughly **40,000 alerts per month**. A typical alert is 800 input tokens (transaction context, customer history, sanctions hits) and 1,200 output tokens (narrative explaining the disposition with cited reasoning). For a deeper per-alert cost breakdown — including a side-by-side against Quantexa, NICE Actimize, Hawk AI, ComplyAdvantage, and Feedzai at three scale tiers (community / regional / G-SIB) — see **[What AI AML Alert Triage Actually Costs in 2026](/blog/what-ai-aml-alert-triage-actually-costs-2026)**.

That's **32M input + 48M output tokens per month** for the entire alert workload — concentrated on a few hundred compliance analysts, not spread across the bank's 10K headcount.

### What it costs by deployment shape

<table style="width:100%; border-collapse:collapse; margin:1.5rem 0; font-size:0.95rem;">
  <thead>
    <tr style="background:#f5f5f0; border-bottom:2px solid #2175C5;">
      <th style="text-align:left; padding:0.75rem; color:#5f6368;">Deployment</th>
      <th style="text-align:left; padding:0.75rem; color:#5f6368;">Pricing shape</th>
      <th style="text-align:right; padding:0.75rem; color:#5f6368;">Monthly cost</th>
      <th style="text-align:right; padding:0.75rem; color:#5f6368;">Annual</th>
      <th style="text-align:left; padding:0.75rem; color:#5f6368;">Data residency</th>
    </tr>
  </thead>
  <tbody>
    <tr style="border-bottom:1px solid #e5e7eb;">
      <td style="padding:0.75rem;"><strong>ChatGPT Enterprise</strong></td>
      <td style="padding:0.75rem;">Per-seat ($60/user × 10K)</td>
      <td style="text-align:right; padding:0.75rem; font-variant-numeric:tabular-nums; color:#b91c1c;"><strong>$600,000</strong></td>
      <td style="text-align:right; padding:0.75rem; font-variant-numeric:tabular-nums; color:#b91c1c;">$7,200,000</td>
      <td style="padding:0.75rem;">OpenAI cloud (DPA)</td>
    </tr>
    <tr style="border-bottom:1px solid #e5e7eb;">
      <td style="padding:0.75rem;"><strong>Microsoft 365 Copilot</strong></td>
      <td style="padding:0.75rem;">Per-seat ($30/user × 10K)</td>
      <td style="text-align:right; padding:0.75rem; font-variant-numeric:tabular-nums; color:#b91c1c;"><strong>$300,000</strong></td>
      <td style="text-align:right; padding:0.75rem; font-variant-numeric:tabular-nums; color:#b91c1c;">$3,600,000</td>
      <td style="padding:0.75rem;">Microsoft cloud (DPA)</td>
    </tr>
    <tr style="border-bottom:1px solid #e5e7eb;">
      <td style="padding:0.75rem;"><strong>Glean</strong></td>
      <td style="padding:0.75rem;">Per-seat (~$40/user × 10K)</td>
      <td style="text-align:right; padding:0.75rem; font-variant-numeric:tabular-nums; color:#b91c1c;"><strong>$400,000</strong></td>
      <td style="text-align:right; padding:0.75rem; font-variant-numeric:tabular-nums; color:#b91c1c;">$4,800,000</td>
      <td style="padding:0.75rem;">Glean cloud (DPA)</td>
    </tr>
    <tr style="border-bottom:1px solid #e5e7eb;">
      <td style="padding:0.75rem;">Direct API — Claude Sonnet 4.6</td>
      <td style="padding:0.75rem;">Token-based</td>
      <td style="text-align:right; padding:0.75rem; font-variant-numeric:tabular-nums;">~$816</td>
      <td style="text-align:right; padding:0.75rem; font-variant-numeric:tabular-nums;">~$9,792</td>
      <td style="padding:0.75rem;">Anthropic cloud (bank DPA)</td>
    </tr>
    <tr style="border-bottom:1px solid #e5e7eb;">
      <td style="padding:0.75rem;">Direct API — GPT-5</td>
      <td style="padding:0.75rem;">Token-based</td>
      <td style="text-align:right; padding:0.75rem; font-variant-numeric:tabular-nums;">~$1,760</td>
      <td style="text-align:right; padding:0.75rem; font-variant-numeric:tabular-nums;">~$21,120</td>
      <td style="padding:0.75rem;">OpenAI cloud (bank DPA)</td>
    </tr>
    <tr style="background:#f0f9ff; border-bottom:1px solid #e5e7eb;">
      <td style="padding:0.75rem;"><strong>ibl.ai self-hosted (Llama 4 / DeepSeek-R1)</strong></td>
      <td style="padding:0.75rem;">Flat license + GPU</td>
      <td style="text-align:right; padding:0.75rem; font-variant-numeric:tabular-nums; color:#15803d;"><strong>~$5,000–15,000</strong></td>
      <td style="text-align:right; padding:0.75rem; font-variant-numeric:tabular-nums; color:#15803d;">~$60,000–180,000</td>
      <td style="padding:0.75rem;"><strong>Inside the bank's VPC / on-prem</strong></td>
    </tr>
  </tbody>
</table>

The ibl.ai row covers the GPU instance, the platform license, and ongoing support. There is no third-party vendor in the data path, no managed-cloud DPA to renegotiate, and no question about whether the model provider could be examiner-subpoenaed for transaction records.

## Why Per-Seat Pricing Fails Harder in Financial Services

Three structural reasons:

**1. Usage is concentrated in compliance, risk, and front-office advisory.** A retail-bank teller doesn't generate AML narratives; a compliance analyst does. Buying a seat for every employee subsidizes the 9,500 who barely use AI for the 500 who depend on it. Token pricing — or a flat-rate platform — aligns the bill to the work.

**2. SR 11-7 model risk applies to the whole stack, not just the model.** OCC SR 11-7 and the joint Fed/OCC/FDIC model-risk guidance require validation, governance, and ongoing monitoring of any model affecting bank decisions. A managed AI vendor that controls the model selection, the training data, and the inference path is a sole-source dependency that risk committees have to underwrite as a single point of failure. A self-hosted, model-agnostic stack passes the test by being inspectable and swappable.

**3. Examiner subpoenas don't stop at the bank's perimeter.** When the OCC, FINRA, or a state regulator asks for the full reasoning behind a flagged transaction, the bank produces it. When that reasoning lives inside a third-party AI vendor's cloud, the bank introduces a chain-of-custody question that doesn't exist when the model runs inside the bank's VPC.

## What Stays the Same, What Changes

Self-hosting the runtime doesn't mean rebuilding the bank's AI tooling. The chat UI, the agent dashboards, the audit logs, the model-routing-with-fallbacks, the multi-agent orchestration, the integration with Bloomberg / Refinitiv / FIS — all of that stays managed by ibl.ai. The compute, the model, and the transaction data move inside the bank's VPC.

What disappears: the $3.6–7M/year per-seat line item. What appears: an internal AI capability the bank owns and audits, with the model-choice flexibility that model risk committees require — Opus for the high-stakes advisor copilot, Sonnet for the AML triage queue, Llama 4 for the air-gapped trading-desk workload.

## Run the Numbers for Your Bank

For workload sizing and cost modeling for your AML, KYC, and advisory teams, the **[AI Help Desk Cost Savings Calculator](/resources/calculators/ai-help-desk-savings-calculator)** generalizes to most high-volume bank-administrative workloads.

For the deployment comparison side-by-side — including FINRA / SR 11-7 / GLBA posture and air-gapped options for trading and private-client desks — see **[Self-Hosted AI vs ChatGPT Enterprise for Financial Services](/resources/comparisons/self-hosted-ai-vs-chatgpt-enterprise-for-financial-services)**.

For the full SEC / FINRA / SOX / PCI / SR 11-7 aligned architecture (Bloomberg / Refinitiv / FIS integration, model-output versioning, air-gapped tier), read **[Financial Services AI Reference Architecture on ibl.ai](/blog/financial-services-ai-reference-architecture)**.

For the staged deployment recipe — Managed VPC for low-sensitivity workloads + air-gapped for trading and private-client desks — see **[Financial Services Blueprint: Air-Gapped AI in 90 Days](/blog/financial-services-blueprint-air-gapped-ai-90-days)**.

## Why Family-Owned and New York Matters Here

The regulatory exposure of a bank's AI vendor relationship is non-trivial — every DPA refresh, every change in data-processing terms, every vendor acquisition is an event the bank's third-party-risk team has to underwrite. ibl.ai is family-owned and operated from New York, NY — a long-term partner with a perpetual platform license and no investor exit pressure. The runtime is open source. The transaction data stays inside the bank's network. The math works at a 500-employee community bank or a 50,000-employee regional.
