Automated detection and correction of agent failures — including container crashes, health check failures, and degraded performance.
Agent recovery is self-healing for your AI agent. When the agent crashes, becomes unresponsive, or fails health checks, the recovery system detects the failure and takes corrective action automatically.
Clawctl implements a three-stage recovery pipeline: restart the container, if that fails redeploy from scratch, and if that fails alert the operator. Recovery is rate-limited (3 attempts per 24 hours) to prevent crash loops from consuming resources.
The recovery system runs every 5 minutes, checking container health, process status, and response times. Most failures are resolved automatically before you even notice.
Agents crash. Containers run out of memory. Processes hang. Without auto-recovery, these failures require manual intervention — which means downtime until someone notices and fixes it.
Clawctl provides automatic agent recovery with restart and redeploy escalation. 5-minute health check interval. 3/24-hour rate limiting. Operator alerts when auto-recovery is exhausted.
Try Clawctl — 60 Second DeployHealth checks run every 5 minutes. Recovery action (restart or redeploy) takes 15-60 seconds.
After 3 attempts in 24 hours, Clawctl alerts the operator. Manual intervention required.
Yes. Suspend the agent through the dashboard to pause recovery. Useful during maintenance.
Health Checks
Automated probes that verify an AI agent is running, responsive, and functioning correctly at regular intervals.
Agent Monitoring
Real-time observation of AI agent behavior, performance, and health — including conversation quality, error rates, and resource usage.
Agent Suspension
Temporarily disabling an AI agent so it stops processing messages and executing actions, without destroying its configuration or data.
Agent Deployment
The process of provisioning infrastructure, configuring security, and launching an AI agent into a production environment.