How to Cut Your AI Agent Costs by 60% (Without Sacrificing Security)
Your AI agent is burning $500/month on tasks a $0.02 model could handle.
I watched a startup's Anthropic bill for three months. $800/month. For one agent. Doing what? Background heartbeats. Session check-ins. Simple "are you still there?" pings.
They were running Claude Sonnet for everything. Including the tasks that needed zero intelligence.
That is not a technology problem. That is a configuration problem. And Clawctl solves it out of the box.
The Real Cost of AI Agents
Let's talk numbers. Current model pricing (February 2026):
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Claude Sonnet 4 | $3.00 | $15.00 |
| GPT-4o | $2.50 | $10.00 |
| Claude 3 Haiku | $0.25 | $1.25 |
| GPT-4o Mini | $0.15 | $0.60 |
| Ollama (local) | $0.00 | $0.00 |
That is a 60x difference between Sonnet and Haiku.
And an infinite difference when you run locally with Ollama.
Most teams do not realize they are using a $15/M token model for tasks that need a $0.25/M model. Or tasks that need no API at all.
The 80/20 of Agent Costs
Here is what I have seen after reviewing dozens of OpenClaw deployments:
80% of agent activity does not need frontier models:
- Heartbeats (background check-ins)
- Session warming
- Simple routing decisions
- Health monitoring
- Basic text summarization
- Triage and classification
20% actually needs the expensive stuff:
- Complex multi-step reasoning
- Code generation and review
- Image understanding
- Nuanced writing
- Tool orchestration
The founders burning $800/month? They were sending every heartbeat—every 30 minutes, 24/7—to Claude Sonnet.
That is 1,440 heartbeats per month. At roughly $0.50 per heartbeat.
$720/month. On pings.
Strategy 1: Heartbeat Optimization
Heartbeats are background check-ins. They keep sessions warm, verify the agent is responsive, and maintain connection state.
They need to work. They do not need to be smart.
The math:
| Frequency | Checks/Month | Claude Haiku | Ollama Local |
|---|---|---|---|
| Every 10 min | 4,320 | ~$1.30 | $0.00 |
| Every 30 min | 1,440 | ~$0.43 | $0.00 |
| Every 1 hour | 720 | ~$0.22 | $0.00 |
Compare that to Claude Sonnet at those frequencies:
| Frequency | Checks/Month | Claude Sonnet |
|---|---|---|
| Every 10 min | 4,320 | ~$78.00 |
| Every 30 min | 1,440 | ~$26.00 |
| Every 1 hour | 720 | ~$13.00 |
Clawctl's default: Heartbeats go to Claude 3 Haiku automatically. You can switch to Ollama for zero cost.
The configuration is one field in the dashboard. Or one line in openclaw.json:
{
"agents": {
"defaults": {
"heartbeat": {
"every": "30m",
"model": "anthropic/claude-3-haiku"
}
}
}
}
Strategy 2: Local Models with Ollama
Ollama runs LLMs locally. No API calls. No per-token billing. The models are good enough for most operational tasks.
Model availability by plan:
| Plan | RAM | CPU | Local Models Available |
|---|---|---|---|
| Starter | 2GB | 1 | Cloud-only (not enough for local) |
| Team | 4GB | 2 | Small (up to 7B): glm-4.7-flash, llama3.2:3b, qwen2.5:1.5b |
| Business | 8GB | 4 | Medium (up to 13B): +llama3.2:8b, mistral:7b, deepseek-coder |
| Enterprise | 16GB | 8 | Large (30B+): +llama3.1:70b, mixtral:8x7b, qwen2.5:72b |
Models that work well for agent operations:
- glm-4.7-flash — Fast, Chinese/English, great for simple tasks (Team+)
- llama3.2 — Meta's latest, solid general performance (Team+)
- qwen2.5 — Alibaba's model, excellent for coding tasks (Team+)
- deepseek-coder — Specialized for code, surprisingly capable (Business+)
- mixtral:8x7b — Mixture of experts, strong reasoning (Enterprise)
These are not toy models. They are production-ready for:
- Heartbeats
- Log summarization
- Simple routing
- Health checks
- Notification formatting
Clawctl integration:
Ollama appears in your provider list automatically on Team+ plans. No API key needed.
{
"agents": {
"defaults": {
"heartbeat": {
"every": "30m",
"model": "ollama/glm-4.7-flash"
}
}
}
}
Your heartbeat cost: $0.00/month (Team+ plans).
Strategy 3: Task-Based Model Routing
Smart teams route different tasks to different models. Here is the playbook:
| Task | Best Quality | Best Value |
|---|---|---|
| Brain/Chat | Claude Sonnet 4 | kimmy-k2.5 |
| Heartbeat | Claude 3 Haiku | ollama/glm-4.7-flash |
| Coding | Codex GPT-5.2 | miniax-2.1 |
| Web Browsing | Claude Sonnet 4 | deepseek-v3 |
| Writing | Claude Sonnet 4 | kimmy-k2.5 |
| Image Understanding | Claude Sonnet 4 | gemini-2.5-flash |
The "best value" column is not about being cheap. It is about matching capability to requirement.
Heartbeats do not need reasoning. Do not pay for reasoning.
Code review needs depth. Pay for depth.
The Security Trade-off (There Is Not One)
"But if I optimize for cost, will I not sacrifice security?"
No. Cost optimization and security are orthogonal.
Clawctl gives you both:
Cost optimization:
- Heartbeat routing to cheap models
- Ollama integration for zero-cost operations
- Per-task model configuration
Security (built-in, not optional):
- 70+ high-risk actions blocked by default
- Human-in-the-loop approvals
- Network egress control (Squid proxy, domain allowlists)
- Full audit trail with search and export
- Encrypted secrets vault
- Prompt injection defenses
The $800/month startup was not just overspending on models. They also had their agent exposed on Shodan with plaintext API keys.
Cost optimization did not cause that. Bad defaults did.
Clawctl's defaults are secure AND cost-optimized.
Real Numbers: Before and After
Let us walk through a real optimization:
Before (raw OpenClaw, default everything):
| Category | Model | Monthly Cost |
|---|---|---|
| Heartbeats (30m) | Claude Sonnet | $720 |
| Chat/reasoning | Claude Sonnet | $50 |
| Misc operations | Claude Sonnet | $30 |
| Total | $800/mo |
After (Clawctl with optimized routing):
| Category | Model | Monthly Cost |
|---|---|---|
| Heartbeats (30m) | Ollama local | $0 |
| Chat/reasoning | Claude Sonnet | $50 |
| Misc operations | Claude Haiku | $2 |
| Total | $52/mo |
Savings: $748/month (93%)
The agent does the same work. Same availability. Same functionality.
The difference is routing heartbeats and simple operations to appropriate models.
Why This Matters for Managed Deployments
If you are self-hosting OpenClaw, you can configure all of this manually.
But here is what happens in practice:
- You deploy with defaults (expensive models for everything)
- You do not notice until the bill arrives
- You spend a weekend configuring model routing
- You forget to update it when you add new agents
- The next agent inherits the expensive defaults
Clawctl ships with cost-optimized defaults:
- Heartbeats go to Claude Haiku (configurable to Ollama)
- Simple operations use cheap models
- Complex reasoning uses your choice of frontier model
Every tenant gets these defaults. Every new agent inherits them. You can override per-agent if needed.
And you get security at the same time:
- Network egress locked to approved domains
- Audit logs with 7-365 day retention
- Human approvals for high-risk actions
- Encrypted secrets vault
No trade-off. Cost optimization AND security. Built into the platform.
The Bottom Line
42,665 exposed OpenClaw instances were found in January 2026.
93.4% were vulnerable.
Many of those were also overpaying for API costs. Bad defaults compound.
Clawctl exists because:
-
Security should not require expertise. Hardened openclaw.json is generated automatically.
-
Cost optimization should not require constant attention. Heartbeats go to cheap models by default.
-
Deploying should take 60 seconds, not a weekend. One command. Secure. Cost-optimized.
Your agent can do amazing things. Do not let it burn money on heartbeats while exposing your API keys.
Deploy OpenClaw with built-in cost optimization
$49/mo. Cheaper than one month of heartbeats on the wrong model.