What does 'air-gapped' mean for AI deployment?

Air-gapped means the entire AI stack runs on your servers with zero external API calls. Models are served locally via NVIDIA NIM, Ollama, or vLLM on your own GPUs. No data ever leaves your network — not for inference, not for logging, not for telemetry. You have complete data sovereignty.

What models can run in an air-gapped deployment?

Any open-weight model: Llama, Mistral, Gemma, Phi, Falcon, and more. NVIDIA NIM optimizes inference for NVIDIA GPUs. You can also run quantized models on smaller hardware. New models are loaded offline via secure media transfer — no internet connection required.

What hardware do I need?

The minimum is an Ubuntu server with NVIDIA GPUs (A100, H100, L40S, or RTX series). The stack runs on any CUDA-capable hardware. For smaller deployments, quantized models can run on a single GPU. Large institutions typically use multi-GPU servers or small clusters.

Is ibl.ai's air-gapped deployment ITAR/FedRAMP/HIPAA compliant?

Yes. Because no data leaves your network, the deployment inherits your existing security posture. Air-gapped mode is designed for ITAR, FedRAMP, HIPAA, FERPA, and classified environments. All audit logs, model weights, and user data stay within your infrastructure.

How do updates work without internet access?

Updates are delivered via secure offline packages. Our team provides signed update bundles that you transfer to your air-gapped environment via approved media. Model updates, platform patches, and new features are all handled through this offline update process.

How long does an air-gapped deployment take?

A typical deployment takes 2-4 weeks from hardware provisioning to production agents. Our forward-deployed engineers handle the entire setup on-site. We start with a free 30-minute assessment to map your infrastructure and compliance requirements.

📅 Book a 30-min Demo 📞 Call/text (571) 293-0242

Air-Gapped AI

Run ibl.ai's entire Agentic OS on air-gapped Ubuntu servers with NVIDIA GPUs. Local models via NIM, Ollama, or vLLM. Zero external API calls, complete data sovereignty for your organization. No need to choose build vs. buy — you get both.

Sector:Higher Education Corporate Small Business K-12 Government Legal Financial Services Healthcare

Your Code. Your Data.Full ownership, zero vendor lock-in

Any LLM, Your ChoiceAir-gapped or any cloud provider's

85% Lower CostSignificantly cheaper at scale

AI TransformationPre-existing and custom agents

Air-Gapped AI - Local Models, Maximum Control for Enterprise

Deploy ibl.ai's full Agentic OS on air-gapped infrastructure where no data ever leaves your corporate network. Models run locally on Ubuntu servers with NVIDIA GPUs via NIM, Ollama, or vLLM.

ibl.ai's forward-deployed engineers install the entire stack on your hardware. You get the same AI agent capabilities as our cloud deployment—workforce training, compliance automation, skills development—with zero external API calls and complete data sovereignty.

What This Is

Air-Gapped AI is ibl.ai's on-premise deployment option. The entire Agentic OS—agent runtime, model serving, vector databases, orchestration layer—runs on Ubuntu servers inside your network with no internet connectivity required after initial setup.

Models are served locally through NVIDIA NIM, Ollama, or vLLM on your NVIDIA GPUs. You choose from models by NVIDIA, Meta (Llama), Google (Gemma), Microsoft (Phi), Mistral, and others. Every inference request stays within your security perimeter.

ibl.ai's forward-deployed engineers configure the stack, optimize model performance for your hardware, integrate with your enterprise systems, and transfer full operational knowledge to your team.

Every configuration file, every model weight, every integration adapter belongs to your organization.

Why Air-Gapped for Enterprise

Complete Data SovereigntyNo data leaves your network. No API calls to OpenAI, Anthropic, Google, or any external service. Trade secrets, employee data, and proprietary knowledge stay within your security perimeter at all times.

Regulatory Compliance by ArchitectureAir-gapped deployment eliminates the compliance complexity of third-party data processing. SOC 2, SOX, HIPAA, and GDPR obligations are simplified when data never crosses a network boundary.

Intellectual Property ProtectionProprietary training materials, competitive intelligence, and internal processes never leave your infrastructure. Employees use AI agents without risking IP exposure to external providers.

Model Choice and FlexibilityRun any open model that fits your GPUs. Switch between Llama, Gemma, Phi, Mistral, or NVIDIA NeMo models without changing agent configurations. No vendor lock-in to any single model provider.

Same Capabilities as CloudAir-gapped deployment runs the full ibl.ai Agentic OS. AI mentors, course generation, compliance training, analytics, multi-channel deployment—every feature works identically to the cloud version.

Supported Models and Inference Engines

NVIDIA NIMGPU-optimized inference microservices for maximum throughput on NVIDIA hardware. Supports Llama, Mistral, and NVIDIA NeMo models with TensorRT-LLM acceleration. Best for high-throughput production workloads.

OllamaLightweight model serving for rapid deployment and testing. Supports a broad catalog of open models with simple configuration. Ideal for development environments and smaller-scale deployments.

vLLMHigh-performance inference engine with PagedAttention for efficient memory management. Supports continuous batching for maximum GPU utilization. Production-grade serving for large-scale deployments.

Model CatalogMeta Llama (8B, 70B, 405B), Google Gemma (2B, 7B, 27B), Microsoft Phi (3.5, 4), Mistral (7B, 8x7B, Large), NVIDIA NeMo models, and any Hugging Face-compatible model. New models added as they release.

Infrastructure Requirements

Operating SystemUbuntu 22.04 LTS or later. Standard server installation with NVIDIA drivers and CUDA toolkit. No specialized OS or kernel modifications required.

GPU RequirementsNVIDIA GPUs with sufficient VRAM for your chosen models. A single A100 80GB runs Llama 70B. Smaller models like Phi-3.5 or Gemma 7B run on consumer-grade GPUs. We right-size recommendations to your workload.

NetworkNo internet connectivity required after initial setup. Internal network access to enterprise systems (HRIS, LMS, IdP) for integrations. All model weights and dependencies are pre-loaded during installation.

StorageSSD storage for model weights, vector databases, and agent state. Capacity depends on the number of models deployed. Typical installations require 500GB to 2TB of fast storage.

Security and Compliance

SOC 2 / SOX ReadyOn-premise deployment with complete audit trails. All data processing happens within your security boundary. No third-party subprocessors for AI inference.

HIPAA CompliantFor organizations handling protected health information. PHI never leaves your facility. Local model serving eliminates BAA requirements with external AI providers.

GDPR AlignedData residency requirements are met by default when all processing happens on your hardware in your jurisdiction. No cross-border data transfers for AI operations.

ITAR CompatibleFor defense contractors and organizations handling export-controlled data. No data transmission to external servers. Complete physical and logical isolation.

Deployment Options

Single ServerEntire stack on one Ubuntu server with NVIDIA GPUs. Suitable for departments, business units, or pilot programs. Simple to operate and maintain.

Multi-Node ClusterDistributed deployment across multiple servers for higher throughput and redundancy. Kubernetes orchestration with Helm charts. Scales to organization-wide usage.

Hybrid (Air-Gapped + Cloud)Sensitive workloads on air-gapped servers, general-purpose agents on ibl.ai cloud. Consistent agent configurations across both environments. Migrate workloads as policies evolve.

What You Own

Complete Agentic OS installation on your hardware with all agent configurations and model settings documented

Local model weights for all deployed models—pre-downloaded and optimized for your GPU hardware

Inference engine configurations (NIM, Ollama, or vLLM) tuned for your specific hardware and workload

Enterprise system integration adapters (HRIS, LMS, IdP) with full source code

Infrastructure as Code (Ansible/Helm) for repeatable deployments and disaster recovery

Operational runbooks covering model updates, GPU monitoring, backup procedures, and troubleshooting

Security documentation for your compliance team—architecture diagrams, data flow maps, control matrices

Engagement Model

Infrastructure Assessment (1 week):Evaluate your server hardware, GPU inventory, network topology, and integration requirements. Right-size model recommendations to your compute capacity.

Installation and Configuration (2-4 weeks):Forward-deployed engineers install the Agentic OS, configure inference engines, load model weights, build enterprise integrations, and validate the full stack in your environment.

Agent Development (2-3 weeks):Build your first set of AI agents—compliance trainers, onboarding coaches, skills-gap analyzers. Configure guardrails, knowledge bases, and tool integrations specific to your use cases.

Knowledge Transfer (1-2 weeks):Train your IT team on model management, agent configuration, GPU monitoring, and operational procedures. Your team operates independently after handoff.