Save Tokens: Use Codex as the Muscle, Opus as the Brain
Claude Opus is the best model for OpenClaw. Everyone agrees.
It's also expensive. And even on the $200/mo Claude Max plan, you'll hit limits.
Here's the solution: don't use Opus for everything.
The Problem
You're using your agent heavily. Morning briefs. Research. Coding. Analysis.
By mid-month, you're hitting rate limits. Or burning through API credits.
The issue: every task—simple or complex—uses the same expensive model.
Website summary? Opus. Simple calculation? Opus. Quick file read? Opus. Complex analysis? Opus.
That's wasteful.
The Mental Model
Think of your agent as having a brain and muscles:
Brain (Claude Opus):
- Strategic thinking
- Complex analysis
- Decision making
- Planning and orchestration
Muscles (Codex, local models):
- Coding tasks
- Simple summaries
- File operations
- Repetitive work
The brain decides what to do. The muscles do the work.
Setting Up Multi-Model Routing
Step 1: Install Codex CLI
If you have a ChatGPT Plus subscription, you have access to Codex CLI.
Install it and connect it to your OpenClaw.
Step 2: Tell your agent about the new muscle
I've installed Codex CLI. From now on, whenever you need to:
- Write code
- Edit files
- Run commands
- Do repetitive tasks
Use Codex instead of doing it yourself. You handle the planning and decisions.
Codex handles the execution.
Step 3: Watch your token usage drop
Opus now only handles the thinking. Codex does the typing.
What Gets Routed Where
| Task Type | Model | Why |
|---|---|---|
| Planning what to build | Opus | Requires judgment |
| Actually writing code | Codex | Execution, not thinking |
| Complex analysis | Opus | Requires understanding |
| Simple summaries | Sonnet/local | Just compression |
| File operations | Codex | Mechanical task |
| Decision making | Opus | Core capability |
| Research synthesis | Opus | Connecting ideas |
| Formatting output | Sonnet/local | Simple transform |
The rule: if it requires judgment, Opus. If it requires typing, something cheaper.
Local Models for Simple Tasks
Take it further with local models:
Install Ollama:
Your OpenClaw can install and manage local models automatically.
Set up Ollama with a local model.
Use it for:
- Website summaries
- Simple text extraction
- Format conversion
- Quick lookups
Save API credits for complex tasks.
One user reported:
"Just had OpenClaw set up Ollama with a local model. Now it handles website summaries and simple tasks locally instead of burning API credits. Blown away that an AI just installed another AI to save me money."
The Cascade Pattern
Set up a cascade for task routing:
When you receive a task, evaluate complexity:
1. Simple/mechanical → Local model or Codex
2. Moderate complexity → Claude Sonnet
3. High complexity → Claude Opus
Default to the cheapest option that can handle the task.
Only escalate when necessary.
This is how you run 24/7 without burning $500/mo on API calls.
Coding Workflow Example
Here's how a proactive coding session should work:
Opus (brain):
- Reviews your project
- Identifies what to build
- Creates a plan
- Breaks into tasks
Codex (muscle):
- Writes the code
- Creates the files
- Runs the tests
- Makes the PR
Opus (brain):
- Reviews the result
- Suggests improvements
- Decides if done
Opus touches the task twice. Codex does all the heavy lifting.
Token Savings Math
Without multi-model routing:
- 10 coding sessions/day
- ~50k tokens each
- 500k tokens/day
- At Opus rates: expensive
With multi-model routing:
- Same 10 sessions
- Opus: ~5k tokens (planning)
- Codex: ~45k tokens (execution)
- 90% of tokens on cheaper model
Real-world reports: 60-80% cost reduction.
The Proactive Overnight Pattern
This matters most for overnight work.
Your agent builds features while you sleep. Without routing, that's:
- Hours of Opus usage
- Massive token consumption
- Hitting rate limits
With routing:
- Opus plans the features
- Codex builds them
- Opus reviews results
Same output. Fraction of the cost.
Configuration Example
Add this to your agent's instructions:
Token optimization rules:
1. For any coding task, use Codex CLI
2. For website summaries, use local Ollama
3. For format conversions, use local Ollama
4. For complex analysis, use Claude Opus
5. For research synthesis, use Claude Opus
When uncertain, ask: "Does this require judgment or just execution?"
- Judgment → Opus
- Execution → Codex/local
Monitoring Usage
Track your savings:
At the end of each day, report:
- Tasks completed
- Which model handled each
- Estimated tokens saved
Include in my morning brief.
This helps you tune the routing over time.
Common Mistakes
Mistake 1: Routing everything to local
Local models are worse. Don't use them for:
- Complex reasoning
- Nuanced decisions
- Creative work
You'll get bad output and waste time fixing it.
Mistake 2: Not routing anything
The opposite problem. Using Opus for everything is wasteful.
Find the balance.
Mistake 3: Forgetting about Sonnet
Claude Sonnet is in the middle. Good for moderate tasks. Cheaper than Opus.
Why This Matters for Clawctl
Clawctl makes this easier:
| DIY Challenge | Clawctl |
|---|---|
| Configure model routing manually | Built-in routing rules |
| Install and maintain Ollama | Managed local models |
| Track token usage yourself | Usage dashboard |
| No cost alerts | Budget notifications |
Your agent is cost-optimized out of the box.
The Bottom Line
Claude Opus is brilliant. Use it for brilliant things.
Everything else? Route to cheaper options.
- Brain (Opus) = planning, decisions, analysis
- Muscles (Codex, local) = execution, typing, mechanical work
This is how you run proactive agents 24/7 without going broke.