# OpenClaw Server Setup Step-by-step guide for deploying an OpenClaw gateway on a VPS (e.g. Hetzner Cloud) and connecting it to ibl.ai as a chat runner. > Source: [github.com/iblai/iblai-claw-setup](https://github.com/iblai/iblai-claw-setup) --- ## Architecture ``` Student (browser) → ibl.ai Platform (Django Channels / ASGI) │ ClawLLMRunner │ OpenClawClient (WSS + Ed25519 device identity signing) │ Caddy (on host, TLS via Let's Encrypt) │ reverse proxy to localhost:18789 ▼ OpenClaw Gateway (systemd service, loopback only) │ LLM Provider (Anthropic, etc.) ``` **Why Caddy on the host (not Docker):** Caddy must run directly on the host so that TCP connections to OpenClaw arrive from `127.0.0.1`. This preserves loopback auto-approval for device identity. If Caddy ran in a Docker container, it would connect via Docker bridge (172.x.x.x) and OpenClaw would treat it as a remote connection. **Why device identity signing:** On vanilla OpenClaw, the gateway requires Ed25519 device identity in the WebSocket connect handshake. Without it, connections succeed but the gateway grants **zero scopes** -- effectively treating the client as unauthenticated. This is the root cause of "missing scope: operator.read" failures. The platform backend signs each connect with its own Ed25519 keypair. --- ## Prerequisites Before starting, you need: 1. **A VPS or dedicated server** -- Hetzner CX22 (2 vCPU, 4 GB RAM, ~$4/mo) is sufficient. OpenClaw is lightweight; the LLM API call is the bottleneck, not local compute. Use the Ashburn location for US East proximity. 2. **A domain or subdomain** pointing to the server's **actual IP** (not an elastic IP -- see [Snags Reference](#snags-reference)). 3. **Anthropic API key** (or another LLM provider key). 4. **Ports 80 and 443 open** on your cloud firewall **before** installing Caddy. ### Critical: DNS and firewall must be ready first Let's Encrypt ACME challenges will fail if: 1. DNS points to an elastic IP that isn't routing to the actual server 2. Port 443 is not open on the cloud firewall (only port 80 was initially opened) After 5 failed attempts, Let's Encrypt rate-limits the domain for 1 hour. **All three** of these must be correct before Caddy's first start: - DNS A record → server's real IP (verify with `dig your-domain.example.com +short`) - Port 80 open inbound from `0.0.0.0/0` (for `http-01` ACME challenge) - Port 443 open inbound (for `tls-alpn-01` fallback and actual HTTPS traffic) Also: don't toggle firewall rules while Caddy is retrying -- each failed attempt counts against the rate limit. --- ## Part 1: Install OpenClaw ### 1.1 -- SSH in and install Node.js 22 ```bash ssh root@ # Install Node.js 22 (skip if already installed -- check with node --version) curl -fsSL https://deb.nodesource.com/setup_22.x | bash - apt-get install -y nodejs node --version # should show v22.x.x ``` ### 1.2 -- Install OpenClaw ```bash npm install -g openclaw@latest openclaw --version ``` ### 1.3 -- Generate a gateway token ```bash export OPENCLAW_GATEWAY_TOKEN=$(openssl rand -hex 32) echo "$OPENCLAW_GATEWAY_TOKEN" ``` **Save this token** -- you need it when connecting to the ibl.ai platform. Immediately persist it to `~/.bashrc` so CLI commands work in future SSH sessions. Running `openclaw devices list` in a new SSH session will fail with `MissingEnvVarError: Missing env var "OPENCLAW_GATEWAY_TOKEN"` if the token was only exported in the original shell: ```bash echo "export OPENCLAW_GATEWAY_TOKEN=$OPENCLAW_GATEWAY_TOKEN" >> ~/.bashrc ``` ### 1.4 -- Write the full config Writing the full config upfront skips the interactive onboarding wizard entirely. ```bash mkdir -p ~/.openclaw cat > ~/.openclaw/openclaw.json << 'CONF' { "meta": { "lastTouchedVersion": "" }, "wizard": { "lastRunVersion": "", "lastRunCommand": "onboard", "lastRunMode": "local" }, "auth": { "profiles": { "anthropic:default": { "provider": "anthropic", "mode": "api_key" } } }, "agents": { "defaults": { "model": { "primary": "anthropic/claude-sonnet-4-6" }, "workspace": "/root/.openclaw/workspace" } }, "commands": { "native": "auto", "nativeSkills": "auto", "restart": true, "ownerDisplay": "raw" }, "session": { "dmScope": "per-channel-peer" }, "gateway": { "port": 18789, "mode": "local", "bind": "loopback", "controlUi": { "allowedOrigins": [ "https://YOUR-DOMAIN-HERE" ] }, "auth": { "mode": "token", "token": "${OPENCLAW_GATEWAY_TOKEN}" }, "tailscale": { "mode": "off", "resetOnExit": false } } } CONF ``` Replace `` with the output of `openclaw --version` (e.g. `2026.3.13`). Replace `YOUR-DOMAIN-HERE` with your actual domain. Change the model in `agents.defaults.model.primary` if needed (OpenClaw normalizes date-stamped IDs to short aliases, e.g. `claude-sonnet-4-20250514` → `claude-sonnet-4-6`). The `wizard` and `meta` fields tell OpenClaw that onboarding already ran, so `openclaw onboard` won't re-prompt. The `session.dmScope: "per-channel-peer"` is a security best practice for multi-user (each DM conversation gets its own session scope). **Optional: model fallbacks** -- to prevent hard failures when the primary LLM provider has an outage, add fallback models: ```json "model": { "primary": "anthropic/claude-sonnet-4-6", "fallbacks": ["anthropic/claude-haiku-4-5", "openai/gpt-5"] } ``` This is especially recommended for multi-agent setups where the probability of hitting an API error scales with the number of agents. ### 1.5 -- Set the Anthropic API key ```bash export ANTHROPIC_API_KEY= echo "export ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY" >> ~/.bashrc ``` ### 1.6 -- Create systemd service and start ```bash # Create workspace directory mkdir -p /root/.openclaw/workspace # Enable lingering so user services survive SSH logout loginctl enable-linger root # Start the gateway openclaw gateway --port 18789 & # Or use the onboard wizard just for the systemd service: # openclaw onboard --install-daemon # (It will detect existing config and skip most prompts) ``` Verify the gateway is running: ```bash curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:18789/ # Expected: 200 ``` > **Note:** If you want the systemd service auto-created, you can still run `openclaw onboard --install-daemon` -- it will detect the existing config ("Use existing values"), skip most prompts, and just install the service at `~/.config/systemd/user/openclaw-gateway.service`. > **Why `loginctl enable-linger root`?** OpenClaw installs a **user-level** systemd service. Without lingering, the service dies when the last SSH session closes. The `openclaw onboard --install-daemon` wizard handles this automatically, but if you skip the wizard you must run it yourself. Verify with: `loginctl show-user root 2>/dev/null | grep Linger` -- should show `Linger=yes`. See [Snag #11](#snags-reference) for what happens when this is missed. --- ## Part 2: Install Caddy (Reverse Proxy + TLS) ### 2.1 -- Install Caddy ```bash apt install -y debian-keyring debian-archive-keyring apt-transport-https curl curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' \ | gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' \ | tee /etc/apt/sources.list.d/caddy-stable.list apt update && apt install caddy ``` ### 2.2 -- Configure Caddyfile ```bash cat > /etc/caddy/Caddyfile << 'EOF' your-domain.example.com { handle /api/status { rewrite * / reverse_proxy localhost:18789 } reverse_proxy localhost:18789 } EOF systemctl restart caddy systemctl status caddy ``` The `/api/status` rewrite shim maps the platform's health check path to `/` (the OpenClaw Control UI page), which returns 200 when the gateway is up. Vanilla OpenClaw has no `/api/status` endpoint -- this shim maintains compatibility with the ibl.ai platform's connectivity checks. After restart, Caddy will automatically obtain a Let's Encrypt TLS certificate. Check the logs if it doesn't work: ```bash journalctl -u caddy --no-pager -n 50 ``` ### 2.3 -- Control UI origin allowlist OpenClaw's Control UI only allows connections from the gateway's own host (localhost) by default. If the Control UI shows *"origin not allowed (open the Control UI from the gateway host or allow it in gateway.controlUi.allowedOrigins)"*, the config needs updating. The full config in step 1.4 already includes `controlUi.allowedOrigins` with your domain. If you wrote a minimal config instead, or need to add origins after the fact: ```bash openclaw config set gateway.controlUi.allowedOrigins '["https://your-domain.example.com"]' systemctl --user restart openclaw-gateway ``` --- ## Part 3: Firewall ### Cloud firewall (Hetzner, AWS, etc.) Set these rules in your cloud provider's firewall console: | Direction | Protocol | Port | Source | Purpose | |---|---|---|---|---| | Inbound | TCP | 22 | Management IPs | SSH | | Inbound | TCP | 80 | `0.0.0.0/0` | ACME challenge (Let's Encrypt) | | Inbound | TCP | 443 | `0.0.0.0/0` or allowlist | HTTPS (Caddy → OpenClaw) | **If restricting port 443 to specific IPs**, you must include: - The **ibl.ai platform server's outbound IP** -- find it with `curl -s ifconfig.me` from the platform server - **Your own IP** -- for Control UI browser access - Any **VPN egress IPs** used by your team If the cloud firewall restricts port 443 to specific IPs and a user's IP isn't in the allowlist, the browser will show `ERR_CONNECTION_TIMED_OUT`. Dev containers also won't reach the server unless connected to a VPN with an allowlisted IP. ### Host firewall (UFW) ```bash ufw allow 22/tcp ufw allow 80/tcp ufw allow 443/tcp ufw --force enable ``` Both the cloud firewall (network level, outside the server) and UFW (host level) must allow traffic for it to reach Caddy. --- ## Part 4: Validate ### 4.1 -- Health check ```bash curl -s -o /dev/null -w "%{http_code}" https://your-domain.example.com/api/status # Expected: 200 ``` ### 4.2 -- Control UI Open in browser: `https://your-domain.example.com/?token=` The first browser access through Caddy will show "pairing required". Browser devices connecting through the reverse proxy are not auto-approved -- only loopback connections are. Approve the browser device: ```bash # On the server: openclaw devices list openclaw devices approve ``` Each browser profile generates a unique device ID. This is a one-time step per browser. Do **not** use `dangerouslyDisableDeviceAuth` -- the docs call it a "severe security downgrade" and it only affects the Control UI, not programmatic WebSocket connections. ### 4.3 -- Chat test In the Control UI, send a test message. You should get a response from the configured LLM. **Full stack confirmed:** Browser → Caddy (TLS/Let's Encrypt) → OpenClaw Gateway → Anthropic API. --- ## Part 5: Connect to ibl.ai ### 5.1 -- Register claw instance Register through the ibl.ai platform API. See [Platform Integration](platform-integration.md#register-your-instance) for the full API reference. ### 5.2 -- Generate and store device keypair The ibl.ai platform backend needs an Ed25519 keypair for device identity signing. Without it, config push will fail with "missing scope: operator.read/write/admin". Generate one: ```python from cryptography.hazmat.primitives.asymmetric.ed25519 import Ed25519PrivateKey from cryptography.hazmat.primitives.serialization import Encoding, NoEncryption, PrivateFormat key = Ed25519PrivateKey.generate() pem = key.private_bytes(Encoding.PEM, PrivateFormat.PKCS8, NoEncryption()).decode() print(pem) ``` Store the private key in the claw instance's `connection_params`: ```json { "device_identity": { "private_key_pem": "-----BEGIN PRIVATE KEY-----\n\n-----END PRIVATE KEY-----\n" } } ``` Set this via the ibl.ai platform API (PATCH the instance's `connection_params` field) or through your admin interface. **How it works:** `OpenClawClient.connect()` receives a `connect.challenge` from the gateway, signs it with the Ed25519 key using the `v2` payload format (`v2|deviceId|clientId|clientMode|role|scopes|signedAtMs|token|nonce`), and includes the `device` object in the connect params. Fresh keypairs are auto-approved on loopback -- no manual `openclaw devices approve` step needed for backend connections. Each connect signs fresh (no session token caching). ### 5.3 -- Push config Push configuration through the ibl.ai platform API. See [Platform Integration](platform-integration.md#push-configuration-to-the-instance). Verify in logs that the push completed without "missing scope" errors. A successful push sets `agents.files.set` (IDENTITY.md, SOUL.md), `config.get`, and `config.patch` -- the gateway restarts itself after config.patch. ### 5.4 -- Test chat through the platform 1. Open the mentor in any ibl.ai application (Mentor AI, Skills AI, etc.) 2. Select the claw-backed mentor 3. Send a test message (e.g. "Hello, say hi in 5 words") 4. Verify: - "Connected." acknowledgment appears - Response streams in token-by-token - Response completes (EOS received) - Message persists on page refresh (chat history saved) --- ## Multi-Agent Setup (Optional) The default config creates a single agent (`main`). To run multiple agents on the same gateway (e.g. tutor, course-creator, admissions), add them via the CLI: ```bash openclaw agents add tutor-agent openclaw agents add course-creator-agent ``` Each agent gets its own workspace (`~/.openclaw/workspace-`) and agent directory (`~/.openclaw/agents//agent`). The agents appear in `agents.list` in `openclaw.json`. You can also add them by editing the config directly. **Note:** More agents means more concurrent LLM API calls, which increases the chance of hitting provider rate limits or outages. Consider adding model fallbacks (see [Step 1.4](#14----write-the-full-config)) if running multiple agents. --- ## Keeping OpenClaw Updated Check for updates periodically: ```bash openclaw --version # current version openclaw update # update to latest ``` The gateway logs a notice on startup when an update is available. After updating, restart the service: ```bash systemctl --user restart openclaw-gateway ``` **Caution:** OpenClaw updates may wipe the paired devices list, requiring re-pairing. See [Device Re-Pairing](#device-re-pairing-after-gateway-restarts--updates). Back up `~/.openclaw/` before major version upgrades. --- ## Monitoring and Diagnostics ### Live log tailing Run in separate SSH sessions to watch both during operation: ```bash # Gateway logs (WebSocket connects, chat requests, Anthropic API errors) journalctl --user -u openclaw-gateway -f # Caddy logs (incoming HTTPS requests, TLS issues) journalctl -u caddy -f ``` ### What to look for in gateway logs | Log pattern | Meaning | |---|---| | `protocol 3` | WebSocket handshake succeeded | | `chat.send` | Chat request sent to LLM provider | | `error` / `ECONNREFUSED` | Anthropic API call failed (key issue, rate limit, outage) | | `close 4008` | WebSocket proxy issue | | `missing scope` | Device identity signing not working -- check keypair config | ### Quick health checks (no restart needed) ```bash # Gateway alive? curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:18789/ # Expected: 200 # Structured health status openclaw health --json # Connected devices openclaw devices list # Caddy + TLS working? curl -s -o /dev/null -w "%{http_code}" https://your-domain.example.com/api/status # Expected: 200 # Anthropic key still valid? curl -s -o /dev/null -w "%{http_code}" \ -H "x-api-key: $ANTHROPIC_API_KEY" \ -H "anthropic-version: 2023-06-01" \ https://api.anthropic.com/v1/models # Expected: 200 # Disk/memory df -h / && free -h ``` ### Verbose logging (not recommended during live use) ```bash openclaw config set OPENCLAW_LOG_LEVEL debug systemctl --user restart openclaw-gateway # Remember to set back to info after debugging: # openclaw config set OPENCLAW_LOG_LEVEL info ``` --- ## Device Re-Pairing After Gateway Restarts / Updates ### The problem OpenClaw gateway updates (`npm install -g openclaw@latest`) or gateway restarts can wipe the paired devices list. When this happens, the platform backend's device identity is no longer recognized by the gateway, and **all mentors** on that server fail with `PAIRING_REQUIRED` / `NOT_PAIRED` errors. Users see: _"The mentor is starting up, please wait..."_ → _"The mentor is currently unavailable. Please try again later."_ This affects **all mentors** linked to the server -- the device identity is per claw instance, not per mentor. One re-pairing fixes all mentors on that server. ### How to re-pair manually 1. **Trigger a connection attempt** -- send any message to any mentor linked to the affected server. This creates a pending pairing request on the gateway. 2. **SSH into the OpenClaw server** and approve: ```bash # List devices -- look for the "Pending" section openclaw devices list # Approve the pending request (use the requestId, NOT the device ID) openclaw devices approve ``` 3. **Retry the chat** -- the next message should connect successfully. All mentors on this server are now fixed. ### Why loopback auto-approval doesn't work The design intent was that Caddy (on the same host) proxies to OpenClaw at `localhost:18789`, so connections arrive from `127.0.0.1` and are auto-approved. However, Caddy adds `X-Forwarded-For` headers with the remote client's IP, and OpenClaw uses those to determine the "real" client IP. Since the platform backend connects from a remote server, OpenClaw sees a non-loopback IP and requires manual approval. ### Solution options (for review and automation phase) The goal is a **robust, non-fragile solution** so that gateway restarts and OpenClaw updates do not require manual re-pairing. The options below are proposed for evaluation -- no implementation is specified here. #### A. OpenClaw upstream (preferred if feasible) - **A1. Persist paired devices across restarts and version upgrades** -- OpenClaw stores paired devices in `~/.openclaw/devices.json` (or equivalent) and loads this file on startup. - **A2. Config-based trusted device registration** -- `gateway.trustedDevices` or `gateway.trustedDevicePublicKeys` in config, survives restarts because it lives in config. - **A3. Admin API for device approval** -- `POST /api/admin/devices/approve` protected by gateway token. #### B. Platform-side automation - **B1. Health check detects NOT_PAIRED and alerts** -- WebSocket connect attempt in health check, notify admins on failure. - **B2. Admin action: "Trigger re-pair"** -- triggers connect attempt and shows the requestId. - **B3. Automated re-pair via agent on OpenClaw host** -- a small service that auto-approves known devices. #### C. Infrastructure / deployment - **C1. Backup and restore `devices.json`** -- backup before update, restore after. - **C2. External persistence (e.g. R2 / S3)** -- persist device state externally. #### D. Reverse-proxy behavior (Caddy) - **D1. Strip forwarded headers so OpenClaw sees loopback** -- Caddy strips `X-Forwarded-For` and `X-Real-Ip`. OpenClaw sees `127.0.0.1` and auto-approves. Tradeoff: no real client IP in gateway logs. **Recommendation:** Pursue **A1** and/or **A2** with the OpenClaw project so device pairing survives restarts by design. Short term, manual re-pair plus **B1** (detect and alert) reduces silent outage. ### Device identity scope - Device identity is per **claw instance** (stored in `connection_params.device_identity.private_key_pem`) - All mentors linked to the same server share the same device - One re-pairing approval covers all mentors on that server - Each server with a different keypair needs its own pairing --- ## Snags Reference Issues encountered during initial deployments, collected here for quick reference. | # | Issue | Root cause | Fix | |---|---|---|---| | 1 | Let's Encrypt ACME challenges fail | DNS pointed to elastic IP not routing to server; port 443 not open | Point DNS to actual server IP; open ports 80+443 before Caddy starts | | 2 | Let's Encrypt rate limit (1 hour) | 5 failed ACME attempts from the above | Wait for cooldown; don't toggle firewall while Caddy retries | | 3 | Control UI "origin not allowed" | OpenClaw only allows localhost origins by default | `openclaw config set gateway.controlUi.allowedOrigins '["https://..."]'` | | 4 | Control UI "pairing required" | Browser device not auto-approved through reverse proxy | `openclaw devices approve ` (one-time per browser) | | 5 | Browser `ERR_CONNECTION_TIMED_OUT` | Cloud firewall restricting port 443; user IP not in allowlist | Add IP to cloud firewall allowlist | | 6 | `OPENCLAW_GATEWAY_TOKEN` not found in new SSH sessions | Token only exported in original shell | Add `export OPENCLAW_GATEWAY_TOKEN=...` to `~/.bashrc` | | 7 | Config push "missing scope: operator.read" | `OpenClawClient` was omitting device identity from connect handshake | Implement Ed25519 device signing (see [Part 5.2](#52----generate-and-store-device-keypair)) | | 8 | Dev container can't reach server | Cloud firewall restricts port 443; dev IP not allowlisted | Connect via VPN with allowlisted IP, or broaden firewall rule | | 9 | Model ID mismatch | OpenClaw normalizes `claude-sonnet-4-20250514` → `claude-sonnet-4-6` | Use short alias in agent config | | 10 | `NOT_PAIRED` after gateway update | Update/restart wiped paired devices; Caddy forwards `X-Forwarded-For` so auto-approval doesn't work | Manual re-pair (see [Device Re-Pairing](#device-re-pairing-after-gateway-restarts--updates)) | | 11 | Gateway dies when SSH session ends | `loginctl enable-linger root` was skipped during manual setup | Run `loginctl enable-linger root` (see [Step 1.6](#16----create-systemd-service-and-start)). Verify with `loginctl show-user root 2>/dev/null \| grep Linger` | --- Next: **[Connect to the ibl.ai Platform →](platform-integration.md)**