# Air-Gapped AI - Local Models, Maximum Control for Government

> Source: https://ibl.ai/service/air-gapped-ai/government

Run ibl.ai's entire Agentic OS on air-gapped Ubuntu servers with NVIDIA GPUs. Local models via NIM, Ollama, or vLLM. Zero external API calls, complete data sovereignty for your agency.

Deploy ibl.ai's full Agentic OS on air-gapped infrastructure where no data ever leaves your agency enclave. Models run locally on Ubuntu servers with NVIDIA GPUs via NIM, Ollama, or vLLM.

ibl.ai's forward-deployed engineers install the entire stack on your hardware. You get the same AI agent capabilities as our cloud deployment—mission support, workforce training, citizen services—with zero external API calls, complete data sovereignty, and ATO-boundary preservation.

## What This Is

### 

Air-Gapped AI is ibl.ai's on-premise deployment option. The entire Agentic OS—agent runtime, model serving, vector databases, orchestration layer—runs on Ubuntu servers inside your enclave with no internet connectivity required after initial setup.

Models are served locally through NVIDIA NIM, Ollama, or vLLM on your NVIDIA GPUs. You choose from models by NVIDIA, Meta (Llama), Google (Gemma), Microsoft (Phi), Mistral, and others. Every inference request stays within your security perimeter and ATO boundary.

ibl.ai's forward-deployed engineers configure the stack, optimize model performance for your hardware, integrate with your agency systems, and transfer full operational knowledge to your team.

Every configuration file, every model weight, every integration adapter belongs to your agency.

## Why Air-Gapped for Government

### Complete Data Sovereignty

No data leaves your enclave. No API calls to external AI providers. Mission data, personnel records, and classified information stay within your security perimeter at all times.

### ATO Boundary Preservation

Air-gapped deployment keeps AI within your existing Authorization to Operate boundary. No new external connections to authorize. No cloud dependencies to document. Simplifies security assessment.

### Classification-Ready

Suitable for IL4, IL5, and higher environments. Models run entirely within your enclave. No data exfiltration risk through external API calls. Compatible with cross-domain solutions.

### Model Choice and Flexibility

Run any open model that fits your GPUs. Switch between Llama, Gemma, Phi, Mistral, or NVIDIA NeMo models without changing agent configurations. No vendor lock-in to any single model provider.

### Same Capabilities as Cloud

Air-gapped deployment runs the full ibl.ai Agentic OS. AI agents for training, mission support, citizen services, analytics—every feature works identically to the cloud version.

## Supported Models and Inference Engines

### NVIDIA NIM

GPU-optimized inference microservices for maximum throughput on NVIDIA hardware. Supports Llama, Mistral, and NVIDIA NeMo models with TensorRT-LLM acceleration. Best for high-throughput production workloads.

### Ollama

Lightweight model serving for rapid deployment and testing. Supports a broad catalog of open models with simple configuration. Ideal for development environments and smaller-scale deployments.

### vLLM

High-performance inference engine with PagedAttention for efficient memory management. Supports continuous batching for maximum GPU utilization. Production-grade serving for large-scale deployments.

### Model Catalog

Meta Llama (8B, 70B, 405B), Google Gemma (2B, 7B, 27B), Microsoft Phi (3.5, 4), Mistral (7B, 8x7B, Large), NVIDIA NeMo models, and any Hugging Face-compatible model. New models added as they release.

## Infrastructure Requirements

### Operating System

Ubuntu 22.04 LTS or later. Standard server installation with NVIDIA drivers and CUDA toolkit. No specialized OS or kernel modifications required.

### GPU Requirements

NVIDIA GPUs with sufficient VRAM for your chosen models. A single A100 80GB runs Llama 70B. Smaller models like Phi-3.5 or Gemma 7B run on consumer-grade GPUs. We right-size recommendations to your workload.

### Network

No internet connectivity required after initial setup. Internal network access to agency systems (HRIS, LMS, IdP) for integrations. All model weights and dependencies are pre-loaded during installation.

### Storage

SSD storage for model weights, vector databases, and agent state. Capacity depends on the number of models deployed. Typical installations require 500GB to 2TB of fast storage.

## Security and Compliance

### ITAR Compatible

Air-gapped deployment meets ITAR requirements for agencies handling export-controlled data. No data transmission to foreign servers. Complete physical and logical isolation.

### FedRAMP / NIST 800-53 Aligned

On-premise deployment within your ATO boundary. All NIST 800-53 controls are addressable locally. No shared infrastructure, no multi-tenant risks, no cloud provider dependencies.

### IL4/IL5 Ready

Designed for deployment in Impact Level 4 and 5 enclaves. All AI processing stays within the enclave boundary. Compatible with existing cross-domain and data guard architectures.

### Continuous Monitoring

Local audit logging integrates with your SIEM and continuous monitoring infrastructure. Every agent interaction, model inference, and tool invocation is captured for security review.

## Deployment Options

### Single Server (Agency Enclave)

Entire stack on one Ubuntu server with NVIDIA GPUs in your enclave. Suitable for program offices, mission units, or pilot programs. Simple to operate within existing security controls.

### Multi-Node Cluster

Distributed deployment across multiple servers for higher throughput and redundancy. Kubernetes orchestration with Helm charts. Scales to agency-wide usage within your enclave.

### Hybrid (Enclave + GovCloud)

Classified workloads in your enclave, unclassified agents in GovCloud. Consistent configurations across both. Cross-domain solutions with consistent agent behavior and security policies.

## What You Own

### 

Complete Agentic OS installation on your hardware with all agent configurations and model settings documented

### 

Local model weights for all deployed models—pre-downloaded and optimized for your GPU hardware

### 

Inference engine configurations (NIM, Ollama, or vLLM) tuned for your specific hardware and workload

### 

Agency system integration adapters (HRIS, training systems, IdP) with full source code

### 

Infrastructure as Code (Ansible/Helm) for repeatable deployments and disaster recovery

### 

Operational runbooks covering model updates, GPU monitoring, backup procedures, and troubleshooting

### 

ATO documentation support—architecture diagrams, data flow maps, NIST 800-53 control matrices

## Engagement Model

### Infrastructure Assessment (1 week):

Evaluate your server hardware, GPU inventory, enclave network topology, and integration requirements. Right-size model recommendations to your compute capacity and ATO boundary.

### Installation and Configuration (2-4 weeks):

Forward-deployed engineers install the Agentic OS, configure inference engines, load model weights, build agency integrations, and validate the full stack within your enclave.

### Agent Development (2-3 weeks):

Build your first set of AI agents—workforce trainers, mission assistants, citizen-service aids. Configure guardrails, knowledge bases, and tool integrations specific to your mission.

### Knowledge Transfer (1-2 weeks):

Train your operations team on model management, agent configuration, GPU monitoring, and security procedures. Your team operates independently after handoff.

## Get Started

### Hardware Assessment:

Free 30-minute session to evaluate your existing enclave infrastructure, GPU capacity, and ATO requirements.

### Proof of Concept:

Deploy the Agentic OS on a single server within your security boundary to validate the approach before committing to full-scale deployment.

### Agency Deployment:

Complete air-gapped installation with agency integrations, agent library, ATO documentation support, operational procedures, and knowledge transfer.

---

*[View on ibl.ai](https://ibl.ai/service/air-gapped-ai/government)*