How We Achieved 77% Cost Reduction and Provider Independence
Over the past week, we rebuilt our AI infrastructure around OpenClaw’s multi-agent architecture. The result was a 77% cost reduction, provider independence, and a delegation system that routes work to the most cost-effective model for each job.
Below is the technical journey of optimizing a 7-agent squad with OpenClaw.
The Challenge: Model Provider Lock-In
We started with a simple problem: our entire squad defaulted to a single model provider. This created three issues:
- Cost inefficiency because premium models handled routine work
- Single point of failure because provider outage meant full system outage
- Suboptimal routing because research tasks used content models and vice versa
OpenClaw’s flexible model configuration made provider independence possible, but we needed the right strategy.
OpenClaw Multi-Agent Architecture
Our setup uses OpenClaw’s agent system with a hub-and-spoke pattern:
PaxMachina (Coordinator)
├── Archon (Research)
├── Forge (Code)
├── Scribe (Content)
├── Oracle (SEO)
├── Cipher (Finance)
└── Sentinel (Monitoring)
Each agent has its own workspace, model configuration, and tool access. OpenClaw handles session management, tool routing, and inter-agent communication.
Agent Configuration in OpenClaw
{
"agents": {
"defaults": {
"model": {
"primary": "google-gemini-cli/gemini-2.5-flash",
"fallbacks": [
"anthropic/claude-haiku-4-5",
"google-gemini-cli/gemini-3-flash-preview"
]
}
},
"list": [
{
"id": "paxmachina",
"model": {
"primary": "anthropic/claude-sonnet-4-5",
"fallbacks": ["google-gemini-cli/gemini-3-pro-preview"]
}
}
]
}
}
OpenClaw’s model abstraction lets us use Google Gemini, Anthropic Claude, OpenAI GPT, and xAI Grok through the same interface.
System Logic: RPC Delegation & QMD Memory Bus
COORDINATOR
/workspace/memory/*.md
Strategy 1: Model-Task Alignment
We analyzed each agent’s workload and matched it to models based on ROI and failure modes:
Strategic Coordination (PaxMachina)
Before: Gemini 3 Pro @ $1.25/1M tokens
After: Claude Sonnet 4.5 @ $3/1M tokens
Why: Better synthesis and planning justified premium cost for coordinator decisions and final outputs
High-Quality Content Review (Scribe)
Before: Claude Haiku @ $0.80/1M tokens
After: Claude Sonnet 4.5 @ $3/1M tokens
Why: Public content needs higher edit quality and fewer revisions, not cheap tokens
Research and Analysis (Archon, Oracle)
Model: Gemini Flash @ $0.075/1M tokens
Why: Fast, low cost, and effective for evidence gathering and synthesis workflows
Code Generation (Forge)
Model: GPT-5.2 @ $5/1M tokens
Why: Higher cost was acceptable when it reduced rework and improved correctness for code tasks
Financial Analysis (Cipher)
Model: Gemini Flash + xAI Grok skill
Why: Flash for orchestration, Grok for time-sensitive market context when needed
Infrastructure Monitoring (Sentinel)
Model: Gemini Flash @ $0.075/1M tokens
Why: Structured checks and fast turnaround mattered more than deep reasoning
Result: 60% of work on $0.075/1M models, 20% on $3/1M, 10% on $5/1M, 10% fallbacks
Strategy 2: Provider Diversity in Fallbacks
Single-provider fallback chains fail during outages. OpenClaw’s fallback system enabled provider diversity.
Balanced Fallback Pattern
{
"primary": "google-gemini-cli/gemini-2.5-flash",
"fallbacks": [
"anthropic/claude-haiku-4-5",
"google-gemini-cli/gemini-3-flash",
"google-gemini-cli/gemini-2.5-pro"
]
}
Benefits:
- ✅ If Google is not accesible, Anthropic can take over
- ✅ If Anthropic is out of tokens, fallback returns to Google
- ✅ Progressive quality degradation instead of hard failure
- ✅ Higher task completion rate under rate limits and incidents
Fallback Distribution
- Primary handles: 95% of requests
- Fallback 1: 4% (rate limits)
- Fallback 2: 0.9% (outages)
- Fallback 3: 0.1% (disaster)
In our workload, the cost impact of fallbacks was negligible relative to the reliability gains.
Strategy 3: Native Search Synthesis with llm_task
OpenClaw’s llm_task enabled workers to delegate synthesis to models that can perform their own web search, reducing repeated external search calls and context bloat.
The Problem: Web Search Spam
Old pattern:
web_search("topic") // 15-20 calls
web_search("topic detail 1")
web_search("topic detail 2")
// ... context overflow at 100k tokens
New pattern:
// Worker does 1-3 targeted searches for specific facts
web_search("specific fact")
web_search("another fact")
// Then delegates synthesis to a model with native search
llm_task({
model: "gemini-2.5-flash",
prompt: "Research [topic] using your web search. Synthesize findings."
})
// Model searches internally, returns synthesis with smaller context footprint
Observed result: token reduction per task in the 100k to 30k range for research-heavy work
OpenClaw Tool Configuration
{
"tools": {
"allow": [
"web_search",
"web_fetch",
"llm_task",
"read",
"write"
]
}
}
Workers got minimal tool access and OpenClaw enforced boundaries per agent.
Strategy 4: Fixed Agent Coordination
OpenClaw supports agent-to-agent communication, but we found the most reliable pattern was hub-and-spoke RPC delegation from the coordinator.
Broken Pattern (Don’t Use)
sessions_send({
agent: "worker-content",
message: "Write blog post"
})
Working Pattern (RPC Style)
#!/bin/bash
# delegate.sh - Proper OpenClaw agent delegation
WORKER_TYPE=$1
TASK=$2
openclaw agent \
--agent "worker-${WORKER_TYPE}" \
--ephemeral \
--timeout 300 \
--message "$TASK"
Coordinator invokes workers via OpenClaw CLI:
~/workspace/scripts/delegate.sh research "Analyze competitor X"
~/workspace/scripts/delegate.sh content "Write blog about Y"
OpenClaw Ephemeral Sessions
For cron jobs and delegated tasks, OpenClaw’s --ephemeral flag created temporary sessions that auto-cleaned:
openclaw agent \
--agent worker-research \
--ephemeral \
--timeout 300 \
--message "Task description"
- ✅ Fresh session each run
- ✅ Automatic cleanup
- ✅ No session pollution
- ✅ Fewer tool state errors
Strategy 5: Memory-First Patterns
OpenClaw’s QMD memory integration enabled workers to check memory before searching.
Memory-First Flow
Task received
↓
Check QMD memory
↓
Found? → Use cached knowledge
↓
Not found? → 1-3 web searches → synthesis → write to memory
OpenClaw QMD Configuration
{
"memory": {
"backend": "qmd",
"qmd": {
"paths": [
{
"path": "/home/clawdia/workspace/memory/briefings",
"name": "briefings",
"pattern": "**/*.md"
}
],
"update": {
"interval": "10m",
"onBoot": true
},
"limits": {
"maxResults": 6,
"maxSnippetChars": 700
}
}
}
}
Workers wrote findings to workspace/memory/briefings/ and OpenClaw indexed them automatically. Future tasks reused this research.
Observed result: fewer redundant searches and fewer repeat calls on recurring topics
Strategy 6: Dynamic Model Selection
OpenClaw’s llm_task supported per-task model selection:
// Quick lookup
llm_task({
model: "gemini-2.5-flash",
prompt: "What's the current BTC price?"
})
// Deep analysis
llm_task({
model: "claude-sonnet-4-5",
prompt: "Comprehensive competitive analysis of [company]"
})
// Time-sensitive context
llm_task({
model: "xai/grok-4",
prompt: "What happened in crypto markets in the last 24 hours?"
})
Workers chose the right model per subtask and OpenClaw handled authentication and routing.
Measurement Methodology
All measurements were taken over a 7-day production window using real workloads only. Synthetic benchmarks and ad hoc tests were excluded.
A “task” is defined as a single end-to-end agent execution initiated by PaxMachina and completed by one or more workers, including fallback execution when triggered.
Token usage was measured using OpenClaw’s native model-usage reporting and cross-checked against provider dashboards where available.
Pricing reflects publicly listed provider prices as of February 2026. No negotiated or volume discounts were applied.
Cost Analysis
Before Optimization
Token usage: 100-200k per task
Primary model: Mostly Gemini 3 Pro @ $1.25/1M
Weighted average: ~$1.15/1M
Monthly (100M tokens): $115
After Optimization
Token usage: 20-40k per task (-70%)
Model distribution:
60% Gemini Flash @ $0.075/1M
20% Claude Sonnet @ $3/1M
10% GPT-5.2 @ $5/1M
10% Others @ $0.80-1.25/1M
Weighted average: ~$0.85/1M (-26%)
Monthly (30M tokens): $25.50
Savings: $89.50/month (77% reduction)
Per-Task Cost Examples
Research (Archon):
- Before: 100k × $1.25 = $0.125
- After: 30k × $0.075 = $0.002
- Savings: 98%
Content (Scribe):
- Before: 100k × $1.25 = $0.125
- After: 40k × $3.00 = $0.120
- Savings: 4% (higher quality, fewer revisions)
SEO (Oracle):
- Before: 80k × $1.25 = $0.100
- After: 20k × $0.075 = $0.002
- Savings: 98%
Pricing Sources
- Google Gemini pricing based on official Google AI documentation
- Anthropic Claude pricing based on public Anthropic pricing pages
- OpenAI GPT pricing based on published OpenAI API rates
- xAI Grok pricing based on published API pricing
Prices may change over time. Calculations reflect list pricing at measurement time.
OpenClaw Configuration Patterns
Per-Agent Model Routing
{
"agents": {
"list": [
{
"id": "paxmachina",
"model": {
"primary": "anthropic/claude-sonnet-4-5",
"fallbacks": [
"google-gemini-cli/gemini-3-pro-preview",
"google-gemini-cli/gemini-2.5-pro",
"anthropic/claude-haiku-4-5"
]
},
"tools": {
"profile": "full"
}
},
{
"id": "worker-research",
"model": {
"primary": "google-gemini-cli/gemini-2.5-flash",
"fallbacks": [
"anthropic/claude-haiku-4-5",
"google-gemini-cli/gemini-3-flash-preview"
]
},
"tools": {
"profile": "minimal",
"allow": ["web_search", "web_fetch", "llm_task"]
}
}
]
}
}
Multi-Provider Authentication
OpenClaw supported multiple auth methods per provider:
{
"auth": {
"profiles": {
"google-gemini-cli:alo@scalytics.io": {
"provider": "google-gemini-cli",
"mode": "oauth",
"email": "alo@scalytics.io"
},
"anthropic:paxmachina": {
"provider": "anthropic",
"mode": "token"
},
"openai-codex:default": {
"provider": "openai-codex",
"mode": "oauth"
},
"xai:default": {
"provider": "xai",
"mode": "api_key"
}
}
}
}
Tool Access Control
{
"tools": {
"agentToAgent": {
"enabled": true,
"allow": [
"paxmachina",
"worker-research",
"worker-code"
]
}
}
}
OpenClaw enforced which agents could communicate. Workers could not arbitrarily invoke each other.
Results
Performance Improvements
| Metric | Before | After | Change |
|---|---|---|---|
| Token usage/task | 100-200k | 20-40k | -70% |
| Cost per 1M tokens | $1.15 | $0.85 | -26% |
| Total monthly cost | $115 | $26 | -77% |
| Context overflows | 2-3/day | <1/week | -95% |
| Response quality | Baseline | Improved | Directional |
| System uptime | 95% | 99.9% | +4.9% |
Response quality was evaluated using an internal review rubric covering factual accuracy, task completion, structural clarity, and required human revisions. Scores are directional and workload-specific, not absolute benchmarks.
Uptime reflects successful task completion across all agents, including automatic provider fallback during rate limits and incidents.
Operational Benefits
- ✅ Provider independence with multi-provider routing
- ✅ Automatic failover via diversified fallback chains
- ✅ Cost optimization through model-task alignment
- ✅ Better outputs where it matters by reserving premium models for premium tasks
- ✅ Less token waste through memory-first workflows
Key OpenClaw Features Used
1. Flexible Model Configuration
Per-agent primary and fallback models with provider abstraction
2. Multi-Provider Authentication
OAuth, API key, and token auth across Google, Anthropic, OpenAI, xAI
3. Tool Access Control
Profile-based tool allowlists enforced per agent
4. QMD Memory Integration
Automatic indexing and retrieval from a markdown knowledge base
5. Ephemeral Sessions
Temporary sessions for one-off tasks with automatic cleanup
6. llm_task Tool
Delegate to any model with dynamic selection
7. Session Management
Persistent sessions for chat interfaces, ephemeral sessions for automation
8. Gateway API
RPC-style agent invocation via CLI and HTTP
Lessons Learned
1. Hub-and-Spoke Beats Mesh
Direct agent-to-agent communication added complexity. Coordinator-mediated delegation was simpler and more reliable.
2. Provider Diversity Matters
Single-provider fallback chains failed during outages. Mixing providers improved task completion under incidents.
3. Model-Task Alignment is Critical
Do not use premium models for routine work. Match model capability to task value.
4. Memory-First Prevents Waste
Check the knowledge base before searching. Write findings back for reuse.
5. Ephemeral Sessions for Automation
Cron jobs and delegated tasks should use ephemeral sessions to avoid session pollution.
6. Dynamic Model Selection Improves ROI
Let workers choose models per subtask. Spend premium tokens only where they change outcomes.
OpenClaw Configuration Best Practices
1. Set Sensible Defaults
{
"agents": {
"defaults": {
"model": {
"primary": "google-gemini-cli/gemini-2.5-flash",
"fallbacks": [
"anthropic/claude-haiku-4-5",
"google-gemini-cli/gemini-3-flash-preview"
]
},
"timeoutSeconds": 600,
"compaction": {
"mode": "safeguard"
}
}
}
}
2. Override for Specific Agents
Strategic agents got premium models while workers stayed on defaults
3. Minimal Tool Access
Only give agents the tools they need. Use OpenClaw profiles like minimal, coding, and full.
4. Memory Path Configuration
Point QMD at organized knowledge base directories
5. Session Archiving
Archive long-running sessions to reduce corruption risk:
{
"session": {
"archiveAfterMinutes": 1440
}
}
Workflow Hardening and Operational Simplification
As of February 2026, we tightened the operational model to reduce orchestration sprawl, lower idle token spend, and make failures easier to detect and recover.
Trigger and Heartbeat Simplification
- Consolidated recurring content checks into a single runner:
scripts/content_team_heartbeat.sh. - Cron triggered one content heartbeat every 15 minutes during active hours.
- Task definitions and execution rules lived in
memory/HEARTBEAT_TASKS.md, including per-task intervals and timeouts.
Local Tools First, Delegate on Change Only
- Inbox watcher pulled the latest
content-pipelinestate and compared commits locally. - Sentinel triage executed only when changes were detected.
- Idle cycles avoided unnecessary model calls entirely.
Required File Contract
Each content item included the following files before it was considered valid:
draft.mdmeta.jsonsources.md
For technical or quantitative claims, an additional file was mandatory:
claims.md
Writing Quality Contract
We enforced global writing rules across all brands and channels:
- evidence first
- no AI slop
- no em dashes
- explicit audience fit
These rules were anchored in:
shared/style-guides/evidence_writing_standard.mdops/review-checklists/content_review.mdshared/market/AUDIENCE_DISCOVERY.md
Clear Ownership Model
The hardened pipeline enforced clear responsibility boundaries:
- PaxMachina coordinated and decided GO or NO-GO
- Oracle handled SEO intent and keyword alignment
- Archon expanded evidence and pain point support
- Scribe rewrote for quality and voice
- Sentinel monitored and reported operational changes
This reduced coordination overhead, eliminated silent failures, and made the system easier to reason about under load.
Deployment Pattern
Cron Job Template
#!/bin/bash
# Standard OpenClaw cron job pattern
set -euo pipefail
openclaw agent \
--agent worker-research \
--ephemeral \
--timeout 300 \
--message "Task description"
Delegation Helper
#!/bin/bash
# delegate.sh - RPC-style agent invocation
WORKER=$1
TASK=$2
TIMEOUT=${3:-300}
openclaw agent \
--agent "worker-${WORKER}" \
--ephemeral \
--timeout "$TIMEOUT" \
--message "$TASK"
Usage
# From coordinator
~/scripts/delegate.sh research "Analyze competitor pricing"
~/scripts/delegate.sh content "Write blog post about AI trends"
~/scripts/delegate.sh seo "Keyword research for [topic]"
Monitoring
Check Model Distribution
openclaw gateway call sessions.list | \
jq '.sessions[] | {agent: .key, model: .model}'
Track Token Usage
openclaw skill model-usage --agent paxmachina --days 7
List Active Sessions
openclaw sessions list
Gateway Status
openclaw gateway ping
openclaw gateway status
Conclusion
OpenClaw’s flexible multi-agent architecture enabled us to:
- Achieve provider independence across Google, Anthropic, OpenAI, xAI
- Reduce costs 77% through model-task alignment and lower token usage
- Improve reliability with provider-diverse fallback chains
- Reduce token waste with memory-first patterns
- Simplify coordination with hub-and-spoke RPC delegation
The key insight was simple: match model capability to task value. Use low-cost models for the majority of work, reserve premium models for decisions and outputs that justify the spend, and let OpenClaw handle routing, authentication, and failover.
These results reflect a content-heavy, research-driven multi-agent workload. Systems dominated by long-form reasoning, continuous streaming, or real-time inference may observe different cost and performance characteristics.
Resources
- OpenClaw Documentation: https://docs.openclaw.ai
- Model Pricing: validate list prices in each provider’s official pricing documentation
- QMD Memory System: markdown-based knowledge base patterns
- Multi-Agent Patterns: hub-and-spoke coordination
Tags: openclaw, multi-agent systems, AI cost optimization, model provider independence, llm orchestration, agent coordination, production AI, gemini, claude, gpt, grok
Category: Technical Deep Dive
Audience: OpenClaw users, AI engineers, DevOps teams building multi-agent systems
If you need help with distributed systems, backend engineering, or data platforms, check my Services.