Operations

What Is Agent Recovery?

Automated detection and correction of agent failures — including container crashes, health check failures, and degraded performance.

In Plain English

Agent recovery is self-healing for your AI agent. When the agent crashes, becomes unresponsive, or fails health checks, the recovery system detects the failure and takes corrective action automatically.

Clawctl implements a three-stage recovery pipeline: restart the container, if that fails redeploy from scratch, and if that fails alert the operator. Recovery is rate-limited (3 attempts per 24 hours) to prevent crash loops from consuming resources.

The recovery system runs every 5 minutes, checking container health, process status, and response times. Most failures are resolved automatically before you even notice.

Why It Matters for OpenClaw

Agents crash. Containers run out of memory. Processes hang. Without auto-recovery, these failures require manual intervention — which means downtime until someone notices and fixes it.

How Clawctl Helps

Clawctl provides automatic agent recovery with restart and redeploy escalation. 5-minute health check interval. 3/24-hour rate limiting. Operator alerts when auto-recovery is exhausted.

Try Clawctl — 60 Second Deploy

Common Questions

How fast does recovery happen?

Health checks run every 5 minutes. Recovery action (restart or redeploy) takes 15-60 seconds.

What if auto-recovery fails?

After 3 attempts in 24 hours, Clawctl alerts the operator. Manual intervention required.

Can I disable auto-recovery?

Yes. Suspend the agent through the dashboard to pause recovery. Useful during maintenance.