Claude Opus is brilliant but expensive. Even the $200/mo plan has limits. Here's how to route tasks to different models and cut your token costs without losing capability.

Save Tokens: Use Codex as the Muscle, Opus as the Brain

Claude Opus is the best model for OpenClaw. Everyone agrees.

It's also expensive. And even on the $200/mo Claude Max plan, you'll hit limits.

Here's the solution: don't use Opus for everything.

The Problem

You're using your agent heavily. Morning briefs. Research. Coding. Analysis.

By mid-month, you're hitting rate limits. Or burning through API credits.

The issue: every task—simple or complex—uses the same expensive model.

Website summary? Opus. Simple calculation? Opus. Quick file read? Opus. Complex analysis? Opus.

That's wasteful.

The Mental Model

Think of your agent as having a brain and muscles:

Brain (Claude Opus):

Strategic thinking
Complex analysis
Decision making
Planning and orchestration

Muscles (Codex, local models):

Coding tasks
Simple summaries
File operations
Repetitive work

The brain decides what to do. The muscles do the work.

Setting Up Multi-Model Routing

Step 1: Install Codex CLI

If you have a ChatGPT Plus subscription, you have access to Codex CLI.

Install it and connect it to your OpenClaw.

Step 2: Tell your agent about the new muscle

I've installed Codex CLI. From now on, whenever you need to:
- Write code
- Edit files
- Run commands
- Do repetitive tasks

Use Codex instead of doing it yourself. You handle the planning and decisions.
Codex handles the execution.

Step 3: Watch your token usage drop

Opus now only handles the thinking. Codex does the typing.

What Gets Routed Where

Task Type	Model	Why
Planning what to build	Opus	Requires judgment
Actually writing code	Codex	Execution, not thinking
Complex analysis	Opus	Requires understanding
Simple summaries	Sonnet/local	Just compression
File operations	Codex	Mechanical task
Decision making	Opus	Core capability
Research synthesis	Opus	Connecting ideas
Formatting output	Sonnet/local	Simple transform

The rule: if it requires judgment, Opus. If it requires typing, something cheaper.

Local Models for Simple Tasks

Take it further with local models:

Install Ollama:

Your OpenClaw can install and manage local models automatically.

Set up Ollama with a local model.
Use it for:
- Website summaries
- Simple text extraction
- Format conversion
- Quick lookups

Save API credits for complex tasks.

One user reported:

"Just had OpenClaw set up Ollama with a local model. Now it handles website summaries and simple tasks locally instead of burning API credits. Blown away that an AI just installed another AI to save me money."

The Cascade Pattern

Set up a cascade for task routing:

When you receive a task, evaluate complexity:

1. Simple/mechanical → Local model or Codex
2. Moderate complexity → Claude Sonnet
3. High complexity → Claude Opus

Default to the cheapest option that can handle the task.
Only escalate when necessary.

This is how you run 24/7 without burning $500/mo on API calls.

Coding Workflow Example

Here's how a proactive coding session should work:

Opus (brain):

Reviews your project
Identifies what to build
Creates a plan
Breaks into tasks

Codex (muscle):

Writes the code
Creates the files
Runs the tests
Makes the PR

Opus (brain):

Reviews the result
Suggests improvements
Decides if done

Opus touches the task twice. Codex does all the heavy lifting.

Token Savings Math

Without multi-model routing:

10 coding sessions/day
~50k tokens each
500k tokens/day
At Opus rates: expensive

With multi-model routing:

Same 10 sessions
Opus: ~5k tokens (planning)
Codex: ~45k tokens (execution)
90% of tokens on cheaper model

Real-world reports: 60-80% cost reduction.

The Proactive Overnight Pattern

This matters most for overnight work.

Your agent builds features while you sleep. Without routing, that's:

Hours of Opus usage
Massive token consumption
Hitting rate limits

With routing:

Opus plans the features
Codex builds them
Opus reviews results

Same output. Fraction of the cost.

Configuration Example

Add this to your agent's instructions:

Token optimization rules:

1. For any coding task, use Codex CLI
2. For website summaries, use local Ollama
3. For format conversions, use local Ollama
4. For complex analysis, use Claude Opus
5. For research synthesis, use Claude Opus

When uncertain, ask: "Does this require judgment or just execution?"
- Judgment → Opus
- Execution → Codex/local

Monitoring Usage

Track your savings:

At the end of each day, report:
- Tasks completed
- Which model handled each
- Estimated tokens saved

Include in my morning brief.

This helps you tune the routing over time.

Common Mistakes

Mistake 1: Routing everything to local

Local models are worse. Don't use them for:

Complex reasoning
Nuanced decisions
Creative work

You'll get bad output and waste time fixing it.

Mistake 2: Not routing anything

The opposite problem. Using Opus for everything is wasteful.

Find the balance.

Mistake 3: Forgetting about Sonnet

Claude Sonnet is in the middle. Good for moderate tasks. Cheaper than Opus.

Why This Matters for Clawctl

Clawctl makes this easier:

DIY Challenge	Clawctl
Configure model routing manually	Built-in routing rules
Install and maintain Ollama	Managed local models
Track token usage yourself	Usage dashboard
No cost alerts	Budget notifications

Your agent is cost-optimized out of the box.

The Bottom Line

Claude Opus is brilliant. Use it for brilliant things.

Everything else? Route to cheaper options.

Brain (Opus) = planning, decisions, analysis
Muscles (Codex, local) = execution, typing, mechanical work

This is how you run proactive agents 24/7 without going broke.

Deploy with built-in cost optimization →

Learn about hardware decisions →

Save Tokens: Use Codex as the Muscle, Opus as the Brain