Skip to main content

Building a Model-Agnostic Multi-Agent System with OpenClaw

Over one week we rebuilt our AI stack around OpenClaw’s multi-agent architecture to avoid provider lock-in and stop wasting premium tokens. By aligning models to tasks, diversifying fallbacks across providers, enforcing minimal tool access, and switching to memory-first workflows with ephemeral sessions, we reduced token usage per task by about 70% and cut our monthly bill by 77% while improving operational resilience.

How We Achieved 77% Cost Reduction and Provider Independence

Over the past week, we rebuilt our AI infrastructure around OpenClaw’s multi-agent architecture. The result was a 77% cost reduction, provider independence, and a delegation system that routes work to the most cost-effective model for each job.

Below is the technical journey of optimizing a 7-agent squad with OpenClaw.


The Challenge: Model Provider Lock-In

We started with a simple problem: our entire squad defaulted to a single model provider. This created three issues:

  1. Cost inefficiency because premium models handled routine work
  2. Single point of failure because provider outage meant full system outage
  3. Suboptimal routing because research tasks used content models and vice versa

OpenClaw’s flexible model configuration made provider independence possible, but we needed the right strategy.


OpenClaw Multi-Agent Architecture

Our setup uses OpenClaw’s agent system with a hub-and-spoke pattern:

PaxMachina (Coordinator)
├── Archon (Research)
├── Forge (Code)
├── Scribe (Content)
├── Oracle (SEO)
├── Cipher (Finance)
└── Sentinel (Monitoring)

Each agent has its own workspace, model configuration, and tool access. OpenClaw handles session management, tool routing, and inter-agent communication.

Agent Configuration in OpenClaw

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "google-gemini-cli/gemini-2.5-flash",
        "fallbacks": [
          "anthropic/claude-haiku-4-5",
          "google-gemini-cli/gemini-3-flash-preview"
        ]
      }
    },
    "list": [
      {
        "id": "paxmachina",
        "model": {
          "primary": "anthropic/claude-sonnet-4-5",
          "fallbacks": ["google-gemini-cli/gemini-3-pro-preview"]
        }
      }
    ]
  }
}

OpenClaw’s model abstraction lets us use Google Gemini, Anthropic Claude, OpenAI GPT, and xAI Grok through the same interface.

System Logic: RPC Delegation & QMD Memory Bus

PaxMachina COORDINATOR
RPC CALL
KNOWLEDGE CORE (QMD)
/workspace/memory/*.md
Auto-indexed Knowledge Shared Across All Agents and Processes
Archon (Research)
⇄ QMD SYNC
Scribe (Content)
← QMD READ
Forge (Code)
→ QMD WRITE
EPHEMERAL
Fresh container per task
HUB-AND-SPOKE
Centralized control
ISOLATED
No process pollution

Strategy 1: Model-Task Alignment

We analyzed each agent’s workload and matched it to models based on ROI and failure modes:

Strategic Coordination (PaxMachina)

Before: Gemini 3 Pro @ $1.25/1M tokens
After: Claude Sonnet 4.5 @ $3/1M tokens
Why: Better synthesis and planning justified premium cost for coordinator decisions and final outputs

High-Quality Content Review (Scribe)

Before: Claude Haiku @ $0.80/1M tokens
After: Claude Sonnet 4.5 @ $3/1M tokens
Why: Public content needs higher edit quality and fewer revisions, not cheap tokens

Research and Analysis (Archon, Oracle)

Model: Gemini Flash @ $0.075/1M tokens
Why: Fast, low cost, and effective for evidence gathering and synthesis workflows

Code Generation (Forge)

Model: GPT-5.2 @ $5/1M tokens
Why: Higher cost was acceptable when it reduced rework and improved correctness for code tasks

Financial Analysis (Cipher)

Model: Gemini Flash + xAI Grok skill
Why: Flash for orchestration, Grok for time-sensitive market context when needed

Infrastructure Monitoring (Sentinel)

Model: Gemini Flash @ $0.075/1M tokens
Why: Structured checks and fast turnaround mattered more than deep reasoning

Result: 60% of work on $0.075/1M models, 20% on $3/1M, 10% on $5/1M, 10% fallbacks


Strategy 2: Provider Diversity in Fallbacks

Single-provider fallback chains fail during outages. OpenClaw’s fallback system enabled provider diversity.

Balanced Fallback Pattern

{
  "primary": "google-gemini-cli/gemini-2.5-flash",
  "fallbacks": [
    "anthropic/claude-haiku-4-5",
    "google-gemini-cli/gemini-3-flash",
    "google-gemini-cli/gemini-2.5-pro"
  ]
}

Benefits:

  • ✅ If Google is not accesible, Anthropic can take over
  • ✅ If Anthropic is out of tokens, fallback returns to Google
  • ✅ Progressive quality degradation instead of hard failure
  • ✅ Higher task completion rate under rate limits and incidents

Fallback Distribution

  • Primary handles: 95% of requests
  • Fallback 1: 4% (rate limits)
  • Fallback 2: 0.9% (outages)
  • Fallback 3: 0.1% (disaster)

In our workload, the cost impact of fallbacks was negligible relative to the reliability gains.


Strategy 3: Native Search Synthesis with llm_task

OpenClaw’s llm_task enabled workers to delegate synthesis to models that can perform their own web search, reducing repeated external search calls and context bloat.

The Problem: Web Search Spam

Old pattern:

web_search("topic")  // 15-20 calls
web_search("topic detail 1")
web_search("topic detail 2")
// ... context overflow at 100k tokens

New pattern:

// Worker does 1-3 targeted searches for specific facts
web_search("specific fact")
web_search("another fact")

// Then delegates synthesis to a model with native search
llm_task({
  model: "gemini-2.5-flash",
  prompt: "Research [topic] using your web search. Synthesize findings."
})
// Model searches internally, returns synthesis with smaller context footprint

Observed result: token reduction per task in the 100k to 30k range for research-heavy work

OpenClaw Tool Configuration

{
  "tools": {
    "allow": [
      "web_search",
      "web_fetch",
      "llm_task",
      "read",
      "write"
    ]
  }
}

Workers got minimal tool access and OpenClaw enforced boundaries per agent.


Strategy 4: Fixed Agent Coordination

OpenClaw supports agent-to-agent communication, but we found the most reliable pattern was hub-and-spoke RPC delegation from the coordinator.

Broken Pattern (Don’t Use)

sessions_send({
  agent: "worker-content",
  message: "Write blog post"
})

Working Pattern (RPC Style)

#!/bin/bash
# delegate.sh - Proper OpenClaw agent delegation

WORKER_TYPE=$1
TASK=$2

openclaw agent \
  --agent "worker-${WORKER_TYPE}" \
  --ephemeral \
  --timeout 300 \
  --message "$TASK"

Coordinator invokes workers via OpenClaw CLI:

~/workspace/scripts/delegate.sh research "Analyze competitor X"
~/workspace/scripts/delegate.sh content "Write blog about Y"

OpenClaw Ephemeral Sessions

For cron jobs and delegated tasks, OpenClaw’s --ephemeral flag created temporary sessions that auto-cleaned:

openclaw agent \
  --agent worker-research \
  --ephemeral \
  --timeout 300 \
  --message "Task description"
  • ✅ Fresh session each run
  • ✅ Automatic cleanup
  • ✅ No session pollution
  • ✅ Fewer tool state errors

Strategy 5: Memory-First Patterns

OpenClaw’s QMD memory integration enabled workers to check memory before searching.

Memory-First Flow

Task received
  ↓
Check QMD memory
  ↓
Found? → Use cached knowledge
  ↓
Not found? → 1-3 web searches → synthesis → write to memory

OpenClaw QMD Configuration

{
  "memory": {
    "backend": "qmd",
    "qmd": {
      "paths": [
        {
          "path": "/home/clawdia/workspace/memory/briefings",
          "name": "briefings",
          "pattern": "**/*.md"
        }
      ],
      "update": {
        "interval": "10m",
        "onBoot": true
      },
      "limits": {
        "maxResults": 6,
        "maxSnippetChars": 700
      }
    }
  }
}

Workers wrote findings to workspace/memory/briefings/ and OpenClaw indexed them automatically. Future tasks reused this research.

Observed result: fewer redundant searches and fewer repeat calls on recurring topics


Strategy 6: Dynamic Model Selection

OpenClaw’s llm_task supported per-task model selection:

// Quick lookup
llm_task({
  model: "gemini-2.5-flash",
  prompt: "What's the current BTC price?"
})

// Deep analysis
llm_task({
  model: "claude-sonnet-4-5",
  prompt: "Comprehensive competitive analysis of [company]"
})

// Time-sensitive context
llm_task({
  model: "xai/grok-4",
  prompt: "What happened in crypto markets in the last 24 hours?"
})

Workers chose the right model per subtask and OpenClaw handled authentication and routing.


Measurement Methodology

All measurements were taken over a 7-day production window using real workloads only. Synthetic benchmarks and ad hoc tests were excluded.

A “task” is defined as a single end-to-end agent execution initiated by PaxMachina and completed by one or more workers, including fallback execution when triggered.

Token usage was measured using OpenClaw’s native model-usage reporting and cross-checked against provider dashboards where available.

Pricing reflects publicly listed provider prices as of February 2026. No negotiated or volume discounts were applied.


Cost Analysis

Before Optimization

Token usage: 100-200k per task
Primary model: Mostly Gemini 3 Pro @ $1.25/1M
Weighted average: ~$1.15/1M
Monthly (100M tokens): $115

After Optimization

Token usage: 20-40k per task (-70%)
Model distribution:
  60% Gemini Flash @ $0.075/1M
  20% Claude Sonnet @ $3/1M
  10% GPT-5.2 @ $5/1M
  10% Others @ $0.80-1.25/1M
Weighted average: ~$0.85/1M (-26%)
Monthly (30M tokens): $25.50

Savings: $89.50/month (77% reduction)

$115
PRE-OPT
→ 77% Drop →
$26
POST-OPT
TOKEN VOLUME REDUCTION
-70% (via Memory)
AVG. PRICE / 1M TOKENS
-26% (via Alignment)

Per-Task Cost Examples

Research (Archon):

  • Before: 100k × $1.25 = $0.125
  • After: 30k × $0.075 = $0.002
  • Savings: 98%

Content (Scribe):

  • Before: 100k × $1.25 = $0.125
  • After: 40k × $3.00 = $0.120
  • Savings: 4% (higher quality, fewer revisions)

SEO (Oracle):

  • Before: 80k × $1.25 = $0.100
  • After: 20k × $0.075 = $0.002
  • Savings: 98%

Pricing Sources

  • Google Gemini pricing based on official Google AI documentation
  • Anthropic Claude pricing based on public Anthropic pricing pages
  • OpenAI GPT pricing based on published OpenAI API rates
  • xAI Grok pricing based on published API pricing

Prices may change over time. Calculations reflect list pricing at measurement time.


OpenClaw Configuration Patterns

Per-Agent Model Routing

{
  "agents": {
    "list": [
      {
        "id": "paxmachina",
        "model": {
          "primary": "anthropic/claude-sonnet-4-5",
          "fallbacks": [
            "google-gemini-cli/gemini-3-pro-preview",
            "google-gemini-cli/gemini-2.5-pro",
            "anthropic/claude-haiku-4-5"
          ]
        },
        "tools": {
          "profile": "full"
        }
      },
      {
        "id": "worker-research",
        "model": {
          "primary": "google-gemini-cli/gemini-2.5-flash",
          "fallbacks": [
            "anthropic/claude-haiku-4-5",
            "google-gemini-cli/gemini-3-flash-preview"
          ]
        },
        "tools": {
          "profile": "minimal",
          "allow": ["web_search", "web_fetch", "llm_task"]
        }
      }
    ]
  }
}

Multi-Provider Authentication

OpenClaw supported multiple auth methods per provider:

{
  "auth": {
    "profiles": {
      "google-gemini-cli:alo@scalytics.io": {
        "provider": "google-gemini-cli",
        "mode": "oauth",
        "email": "alo@scalytics.io"
      },
      "anthropic:paxmachina": {
        "provider": "anthropic",
        "mode": "token"
      },
      "openai-codex:default": {
        "provider": "openai-codex",
        "mode": "oauth"
      },
      "xai:default": {
        "provider": "xai",
        "mode": "api_key"
      }
    }
  }
}

Tool Access Control

{
  "tools": {
    "agentToAgent": {
      "enabled": true,
      "allow": [
        "paxmachina",
        "worker-research",
        "worker-code"
      ]
    }
  }
}

OpenClaw enforced which agents could communicate. Workers could not arbitrarily invoke each other.


Results

Performance Improvements

Metric Before After Change
Token usage/task 100-200k 20-40k -70%
Cost per 1M tokens $1.15 $0.85 -26%
Total monthly cost $115 $26 -77%
Context overflows 2-3/day <1/week -95%
Response quality Baseline Improved Directional
System uptime 95% 99.9% +4.9%

Response quality was evaluated using an internal review rubric covering factual accuracy, task completion, structural clarity, and required human revisions. Scores are directional and workload-specific, not absolute benchmarks.

Uptime reflects successful task completion across all agents, including automatic provider fallback during rate limits and incidents.

Operational Benefits

  • Provider independence with multi-provider routing
  • Automatic failover via diversified fallback chains
  • Cost optimization through model-task alignment
  • Better outputs where it matters by reserving premium models for premium tasks
  • Less token waste through memory-first workflows

Key OpenClaw Features Used

1. Flexible Model Configuration

Per-agent primary and fallback models with provider abstraction

2. Multi-Provider Authentication

OAuth, API key, and token auth across Google, Anthropic, OpenAI, xAI

3. Tool Access Control

Profile-based tool allowlists enforced per agent

4. QMD Memory Integration

Automatic indexing and retrieval from a markdown knowledge base

5. Ephemeral Sessions

Temporary sessions for one-off tasks with automatic cleanup

6. llm_task Tool

Delegate to any model with dynamic selection

7. Session Management

Persistent sessions for chat interfaces, ephemeral sessions for automation

8. Gateway API

RPC-style agent invocation via CLI and HTTP


Lessons Learned

1. Hub-and-Spoke Beats Mesh

Direct agent-to-agent communication added complexity. Coordinator-mediated delegation was simpler and more reliable.

2. Provider Diversity Matters

Single-provider fallback chains failed during outages. Mixing providers improved task completion under incidents.

3. Model-Task Alignment is Critical

Do not use premium models for routine work. Match model capability to task value.

4. Memory-First Prevents Waste

Check the knowledge base before searching. Write findings back for reuse.

5. Ephemeral Sessions for Automation

Cron jobs and delegated tasks should use ephemeral sessions to avoid session pollution.

6. Dynamic Model Selection Improves ROI

Let workers choose models per subtask. Spend premium tokens only where they change outcomes.


OpenClaw Configuration Best Practices

1. Set Sensible Defaults

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "google-gemini-cli/gemini-2.5-flash",
        "fallbacks": [
          "anthropic/claude-haiku-4-5",
          "google-gemini-cli/gemini-3-flash-preview"
        ]
      },
      "timeoutSeconds": 600,
      "compaction": {
        "mode": "safeguard"
      }
    }
  }
}

2. Override for Specific Agents

Strategic agents got premium models while workers stayed on defaults

3. Minimal Tool Access

Only give agents the tools they need. Use OpenClaw profiles like minimal, coding, and full.

4. Memory Path Configuration

Point QMD at organized knowledge base directories

5. Session Archiving

Archive long-running sessions to reduce corruption risk:

{
  "session": {
    "archiveAfterMinutes": 1440
  }
}

Workflow Hardening and Operational Simplification

As of February 2026, we tightened the operational model to reduce orchestration sprawl, lower idle token spend, and make failures easier to detect and recover.

Trigger and Heartbeat Simplification

  • Consolidated recurring content checks into a single runner: scripts/content_team_heartbeat.sh.
  • Cron triggered one content heartbeat every 15 minutes during active hours.
  • Task definitions and execution rules lived in memory/HEARTBEAT_TASKS.md, including per-task intervals and timeouts.

Local Tools First, Delegate on Change Only

  • Inbox watcher pulled the latest content-pipeline state and compared commits locally.
  • Sentinel triage executed only when changes were detected.
  • Idle cycles avoided unnecessary model calls entirely.

Required File Contract

Each content item included the following files before it was considered valid:

  • draft.md
  • meta.json
  • sources.md

For technical or quantitative claims, an additional file was mandatory:

  • claims.md

Writing Quality Contract

We enforced global writing rules across all brands and channels:

  • evidence first
  • no AI slop
  • no em dashes
  • explicit audience fit

These rules were anchored in:

  • shared/style-guides/evidence_writing_standard.md
  • ops/review-checklists/content_review.md
  • shared/market/AUDIENCE_DISCOVERY.md

Clear Ownership Model

The hardened pipeline enforced clear responsibility boundaries:

  • PaxMachina coordinated and decided GO or NO-GO
  • Oracle handled SEO intent and keyword alignment
  • Archon expanded evidence and pain point support
  • Scribe rewrote for quality and voice
  • Sentinel monitored and reported operational changes

This reduced coordination overhead, eliminated silent failures, and made the system easier to reason about under load.


Deployment Pattern

Cron Job Template

#!/bin/bash
# Standard OpenClaw cron job pattern
set -euo pipefail

openclaw agent \
  --agent worker-research \
  --ephemeral \
  --timeout 300 \
  --message "Task description"

Delegation Helper

#!/bin/bash
# delegate.sh - RPC-style agent invocation

WORKER=$1
TASK=$2
TIMEOUT=${3:-300}

openclaw agent \
  --agent "worker-${WORKER}" \
  --ephemeral \
  --timeout "$TIMEOUT" \
  --message "$TASK"

Usage

# From coordinator
~/scripts/delegate.sh research "Analyze competitor pricing"
~/scripts/delegate.sh content "Write blog post about AI trends"
~/scripts/delegate.sh seo "Keyword research for [topic]"

Monitoring

Check Model Distribution

openclaw gateway call sessions.list | \
  jq '.sessions[] | {agent: .key, model: .model}'

Track Token Usage

openclaw skill model-usage --agent paxmachina --days 7

List Active Sessions

openclaw sessions list

Gateway Status

openclaw gateway ping
openclaw gateway status

Conclusion

OpenClaw’s flexible multi-agent architecture enabled us to:

  1. Achieve provider independence across Google, Anthropic, OpenAI, xAI
  2. Reduce costs 77% through model-task alignment and lower token usage
  3. Improve reliability with provider-diverse fallback chains
  4. Reduce token waste with memory-first patterns
  5. Simplify coordination with hub-and-spoke RPC delegation

The key insight was simple: match model capability to task value. Use low-cost models for the majority of work, reserve premium models for decisions and outputs that justify the spend, and let OpenClaw handle routing, authentication, and failover.

These results reflect a content-heavy, research-driven multi-agent workload. Systems dominated by long-form reasoning, continuous streaming, or real-time inference may observe different cost and performance characteristics.


Resources

  • OpenClaw Documentation: https://docs.openclaw.ai
  • Model Pricing: validate list prices in each provider’s official pricing documentation
  • QMD Memory System: markdown-based knowledge base patterns
  • Multi-Agent Patterns: hub-and-spoke coordination

Tags: openclaw, multi-agent systems, AI cost optimization, model provider independence, llm orchestration, agent coordination, production AI, gemini, claude, gpt, grok

Category: Technical Deep Dive

Audience: OpenClaw users, AI engineers, DevOps teams building multi-agent systems

If you need help with distributed systems, backend engineering, or data platforms, check my Services.

Most read articles

Why Is Customer Obsession Disappearing?

Many companies trade real customer-obsession for automated, low-empathy support. Through examples from Coinbase, PayPal, GO Telecommunications and AT&T, this article shows how reliance on AI chatbots, outsourced call centers, and KPI-driven workflows erodes trust, NPS and customer retention. It argues that human-centric support—treating support as strategic investment instead of cost—is still a core growth engine in competitive markets. It's wild that even with all the cool tech we've got these days, like AI solving complex equations and doing business across time zones in a flash, so many companies are still struggling with the basics: taking care of their customers. The drama around Coinbase's customer support is a prime example of even tech giants messing up. And it's not just Coinbase — it's a big-picture issue for the whole industry. At some point, the idea of "customer obsession" got replaced with "customer automation," and no...

How to scale MySQL perfectly

When MySQL reaches its limits, scaling cannot rely on hardware alone. This article explains how strategic techniques such as caching, sharding and operational optimisation can drastically reduce load and improve application responsiveness. It outlines how in-memory systems like Redis or Memcached offload repeated reads, how horizontal sharding mechanisms distribute data for massive scale, and how tools such as Vitess, ProxySQL and HAProxy support routing, failover and cluster management. The summary also highlights essential practices including query tuning, indexing, replication and connection management. Together these approaches form a modern DevOps strategy that transforms MySQL from a single bottleneck into a resilient, scalable data layer able to grow with your application. When your MySQL database reaches its performance limits, vertical scaling through hardware upgrades provides a temporary solution. Long-term growth, though, requires a more comprehensive approach. This invo...

What the Heck is Superposition and Entanglement?

This post is about superposition and interference in simple, intuitive terms. It describes how quantum states combine, how probability amplitudes add, and why interference patterns appear in systems such as electrons, photons and waves. The goal is to give a clear, non mathematical understanding of how quantum behavior emerges from the rules of wave functions and measurement. If you’ve ever heard the words superposition or entanglement thrown around in conversations about quantum physics, you may have nodded politely while your brain quietly filed them away in the "too confusing to deal with" folder.  These aren't just theoretical quirks; they're the foundation of mind-bending tech like Google's latest quantum chip, the Willow with its 105 qubits. Superposition challenges our understanding of reality, suggesting that particles don't have definite states until observed. This principle is crucial in quantum technologies, enabling phenomena like quantum comp...