Security
8 min

Kill Switches, Approval Gates, and Rate Limits: The Three Controls Every Production AI Agent Needs

Your AI agent will try to do something catastrophic. Eventually. Here are the three controls that stop it — and what each one looks like in OpenClaw.

Clawctl Team

Product & Engineering

Kill Switches, Approval Gates, and Rate Limits: The Three Controls Every Production AI Agent Needs

Your AI agent is going to try to do something catastrophic.

Not maybe. Eventually. Given enough runs, it will attempt to delete production data, wire money to the wrong account, or send an email it absolutely should not send.

This is not pessimism. It's the math of probability applied to a system that makes hundreds of decisions per day with nondeterministic outputs and occasional prompt injection from the inputs it reads.

The question isn't whether this will happen. The question is what stops it when it does.

Three controls. Every production AI agent needs all three. This post walks through what each one is, how it works in OpenClaw, and what Clawctl gives you out of the box.

Control 1: The Kill Switch

What it is: A big red button that stops the agent immediately. Not pause. Not "finish current task." Stop.

Why you need it: Because when things go wrong, seconds matter. An agent that's sending spam to your customers shouldn't get to finish the batch. An agent that's making API calls in a loop shouldn't wait for the rate limiter to save you.

What it has to do:

  1. Halt the current tool call, mid-execution if possible
  2. Reject all new tool calls until explicitly resumed
  3. Keep the gateway alive so you can investigate what went wrong
  4. Preserve the audit log so you can figure out what the agent was doing
  5. Be triggerable by one click, one CLI command, or one webhook — not a 6-step workflow

What it does not do:

  • Delete state (you need to debug what happened)
  • Kill the container (you lose logs)
  • Require a password reset (too slow)
  • Depend on the agent being responsive (if the agent is hallucinating, it won't cooperate)

The OpenClaw Reality

OpenClaw doesn't ship with a built-in kill switch out of the box. You have a few options:

  • docker stop the gateway container. This kills logs in flight and may leave the sandbox in a weird state.
  • Send SIGUSR1 or SIGTERM via docker exec. Better but still not graceful.
  • Write a custom skill that refuses all tool calls when a flag file exists. Graceful but you have to build it.
  • Reach into the gateway RPC and cancel sessions. Works if you've set up the RPC.

None of these is one click.

What Clawctl Ships

Every Clawctl tenant has a kill switch in the dashboard. One click and your agent stops:

  • Gateway enters a rejecting state immediately
  • In-flight tool calls are cancelled where possible
  • Dashboard shows "Paused — user killed"
  • Audit log records who killed it, when, and the reason
  • Resume requires explicit action (not auto-recover)
  • CLI equivalent: clawctl pause
  • Webhook equivalent for integration with incident response tools

One Clawctl customer (a Bay Area educator running OpenClaw for his classroom work) hit the kill switch from the CLI the day he discovered the feature. The pause fired in under 2 seconds. His use case wasn't an emergency — he was just testing. It worked anyway.

That's what a kill switch is supposed to do. You should never doubt whether it works.

Try it on Clawctl Starter.

Control 2: Human-in-the-Loop Approval Gates

What it is: A list of actions the agent is allowed to try, but not allowed to finish, without a human approving it.

Why you need it: Because the agent's judgment is not your judgment. An agent that thinks sending an email to your entire customer list is a good idea is wrong. An agent that thinks rm -rf / is a valid response to disk space issues is very wrong. Some actions are catastrophic enough that the agent should not be allowed to do them alone, ever.

What it has to do:

  1. Intercept tool calls on a configured list of "dangerous" actions
  2. Queue the intercepted action and notify a human
  3. Let the human see exactly what the agent is about to do (not a summary — the actual payload)
  4. Accept an explicit approve/reject decision with a timeout
  5. Log the decision in the audit trail
  6. Not block the agent's other tasks — only the gated one

What it does not do:

  • Approve by default on timeout (fails open — bad)
  • Require a full human review for routine tool calls (approval fatigue kills the control)
  • Approve at the capability level ("allow email") when you need per-action approval ("allow THIS email")

The Lenny's Newsletter Case

This isn't theoretical. Last year, an engineer set up an AI agent to triage his email. The agent decided the best way to "clear" his inbox was to archive everything, including unprocessed work items. No approval gate. The damage was contained only because he noticed within an hour.

The full story is in another post — but the lesson is: "the agent can write emails" and "the agent can send emails" are different permissions. One of them needs a human in the loop.

The OpenClaw Reality

OpenClaw supports a tools.allow and tools.deny list in config, plus a plugin system you can use to intercept tool calls. You can build approval gates yourself. We did.

What you have to build:

  • A queue of pending tool calls
  • A notification channel (email, Slack, webhook) for approval requests
  • A UI for humans to review and respond
  • Timeout handling (and a default — which should be reject, not approve)
  • Audit logging of decisions
  • A way to approve patterns vs one-offs

This is 40-80 hours of work done badly. 200+ hours done well.

What Clawctl Ships

Every Clawctl tenant gets an approval gate system by default, pre-configured with 50+ high-risk actions that require human approval:

  • Sending email to external recipients
  • Deleting files in the workspace
  • Publishing to public channels (Twitter, public Discord, public Slack)
  • Executing shell commands that match dangerous patterns
  • Making payments via payment plugins
  • Rotating or revealing credentials
  • Modifying database rows beyond a threshold
  • Calling MCP tools marked "destructive"
  • And 40+ more

Approval requests land in your Clawctl dashboard and can also fire webhooks to Slack or email. Default timeout is reject, not approve. Every decision is logged.

You can tighten the defaults (add more gated actions) or loosen them (remove gates on actions you trust). The list ships hardened and you relax from there.

See our full writeup on 50+ high-risk actions Clawctl blocks.

See it in action on Clawctl Starter.

Control 3: Rate Limits

What it is: A cap on how many times the agent can do a thing per unit of time.

Why you need it: Because runaway loops are the most common AI agent failure mode. The agent decides to retry a failing API call. The retry also fails. It retries again. Five minutes later you've made 10,000 API calls, burned your budget, and been rate-limited by upstream providers — all of whom think you're attacking them.

What it has to do:

  1. Limit requests per tool per session
  2. Limit tool calls per minute globally
  3. Limit egress requests to any single domain
  4. Limit LLM token usage per session
  5. Short-circuit with a clear error when a limit is hit (so the agent learns, instead of hammering)
  6. Reset on session restart or time window rollover
  7. Alert when limits are hit unusually often (that's a signal)

The Runaway Loop

Here's the classic runaway loop:

  1. Agent calls send_email(to, subject, body)
  2. Email provider returns a 429 rate limit error
  3. Agent decides to retry
  4. Agent reads the error, thinks "maybe I should try sending it again with a slight wording change"
  5. Agent calls send_email again
  6. GOTO 2
  7. Three hours later, the email provider has blacklisted your domain and you owe your vendor an explanation

Rate limits break the loop at step 3.

The OpenClaw Reality

OpenClaw doesn't have native per-tool rate limits. You can hack it with middleware. You can put the rate limits in your egress proxy. You can rely on the upstream providers to say no.

None of these catches the loop early enough.

What Clawctl Ships

Clawctl enforces rate limits at multiple layers:

  • Per-tool limits configured per tenant (e.g., 20 emails per hour)
  • Per-domain egress limits via the Squid egress proxy
  • Session LLM token budgets that stop the agent when burned through
  • Global tenant budgets that escalate to human approval on high usage
  • Alerts fire when any of the above trigger more than once per hour (that's your "the agent is stuck" signal)

Defaults are tuned to prevent runaway loops without blocking legitimate bursty work. You can adjust per tenant.

Human-in-the-Loop Is Not Optional

"Human in the loop AI" is a 880-searches-per-month keyword, growing 10-15% a month on trend data. That's because the conversation is shifting. Every serious AI agent post-CertiK-report has human-in-the-loop controls as a requirement, not a feature.

Your insurance carrier will eventually require them. Your security team already does. Your compliance team will ask you to prove them when it's time for SOC2 or HIPAA.

Clawctl treats human-in-the-loop as the default posture, not a configuration option. Every dangerous action is gated from day one. You can relax the controls as you build trust in specific agent behaviors. But you start safe.

The Short Version

You need three controls to run an AI agent in production:

  1. Kill switch — one button that stops everything, instantly
  2. Approval gates — destructive actions require human review
  3. Rate limits — cap per-tool and per-domain usage to prevent runaway loops

If you're building these yourself in OpenClaw, you're looking at weeks of work to get any one of them production-ready. All three is a quarter of engineering time at best.

If you're paying Clawctl, you get all three by default on day one. For $49/month.

Deploy on Clawctl — all three controls, configured out of the box, for less than the cost of one incident.

Or build them yourself. Just do it before you need them. Not after.

FAQ

What's the difference between human-in-the-loop and human-on-the-loop?

Human-in-the-loop: a human is required to approve specific actions before they execute. Human-on-the-loop: a human monitors and can intervene, but the agent acts autonomously by default. For high-stakes actions (sending emails, moving money, deleting data), you want human-in-the-loop. For low-stakes (reading data, searching, summarizing), human-on-the-loop is enough.

How do I decide which actions need approval gates?

Start with anything that is (a) externally visible, (b) irreversible, or (c) touches credentials or money. Emails to external recipients. File deletions. Payments. Credential rotations. Public posts. That's your day-one list. Add more as you discover them. Never remove a gate because "it's annoying" — that's how the control dies.

Do rate limits prevent prompt injection?

Not directly. But they contain the blast radius. A prompt injection that tries to exfiltrate data via 10,000 HTTP requests gets stopped at request 51. A prompt injection that tries to burn your budget via a loop of LLM calls gets stopped when the session budget hits. Rate limits don't stop the attack — they stop the damage.

Can I use Clawctl's kill switch via API?

Yes. clawctl pause via CLI, webhook via integration URL, or API call. Kill switches need to be triggerable from whatever tool your incident response team already uses — Slack, PagerDuty, OpsGenie, whatever. We support them.

What happens to pending approval requests if nobody responds?

They time out and reject. This is intentional. A timeout is not an approval. If you want longer timeouts for slow-response channels, configure per-action. But we never fail open.


Related reading:

This content is for informational purposes only and does not constitute financial, legal, medical, tax, or other professional advice. Individual results vary. See our Terms of Service for important disclaimers.

Is your OpenClaw instance exposed?

91.3% of OpenClaw instances have critical vulnerabilities. Find out if yours is one of them.