Building a Model-Agnostic Multi-Agent System with OpenClaw

Over one week we rebuilt our AI stack around OpenClaw’s multi-agent architecture to avoid provider lock-in and stop wasting premium tokens. By aligning models to tasks, diversifying fallbacks across providers, enforcing minimal tool access, and switching to memory-first workflows with ephemeral sessions, we reduced token usage per task by about 70% and cut our monthly bill by 77% while improving operational resilience.

How We Achieved 77% Cost Reduction and Provider Independence

Over the past week, we rebuilt our AI infrastructure around OpenClaw’s multi-agent architecture. The result was a 77% cost reduction, provider independence, and a delegation system that routes work to the most cost-effective model for each job.

Below is the technical journey of optimizing a 7-agent squad with OpenClaw.

The Challenge: Model Provider Lock-In

We started with a simple problem: our entire squad defaulted to a single model provider. This created three issues:

Cost inefficiency because premium models handled routine work
Single point of failure because provider outage meant full system outage
Suboptimal routing because research tasks used content models and vice versa

OpenClaw’s flexible model configuration made provider independence possible, but we needed the right strategy.

OpenClaw Multi-Agent Architecture

Our setup uses OpenClaw’s agent system with a hub-and-spoke pattern:

PaxMachina (Coordinator)
├── Archon (Research)
├── Forge (Code)
├── Scribe (Content)
├── Oracle (SEO)
├── Cipher (Finance)
└── Sentinel (Monitoring)

Each agent has its own workspace, model configuration, and tool access. OpenClaw handles session management, tool routing, and inter-agent communication.

Agent Configuration in OpenClaw

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "google-gemini-cli/gemini-2.5-flash",
        "fallbacks": [
          "anthropic/claude-haiku-4-5",
          "google-gemini-cli/gemini-3-flash-preview"
        ]
      }
    },
    "list": [
      {
        "id": "paxmachina",
        "model": {
          "primary": "anthropic/claude-sonnet-4-5",
          "fallbacks": ["google-gemini-cli/gemini-3-pro-preview"]
        }
      }
    ]
  }
}

OpenClaw’s model abstraction lets us use Google Gemini, Anthropic Claude, OpenAI GPT, and xAI Grok through the same interface.

System Logic: RPC Delegation & QMD Memory Bus
                PaxMachina
                COORDINATOR
            
RPC CALL
KNOWLEDGE CORE (QMD)

                /workspace/memory/*.md
            
Auto-indexed Knowledge Shared Across All Agents and Processes
Archon(Research)⇄ QMD SYNC
Scribe(Content)← QMD READ
Forge(Code)→ QMD WRITE
EPHEMERAL
Fresh container per task
HUB-AND-SPOKE
Centralized control
ISOLATED
No process pollution

Strategy 1: Model-Task Alignment

We analyzed each agent’s workload and matched it to models based on ROI and failure modes:

Strategic Coordination (PaxMachina)

Before: Gemini 3 Pro @ $1.25/1M tokens
After: Claude Sonnet 4.5 @ $3/1M tokens
Why: Better synthesis and planning justified premium cost for coordinator decisions and final outputs

High-Quality Content Review (Scribe)

Before: Claude Haiku @ $0.80/1M tokens
After: Claude Sonnet 4.5 @ $3/1M tokens
Why: Public content needs higher edit quality and fewer revisions, not cheap tokens

Research and Analysis (Archon, Oracle)

Model: Gemini Flash @ $0.075/1M tokens
Why: Fast, low cost, and effective for evidence gathering and synthesis workflows

Code Generation (Forge)

Model: GPT-5.2 @ $5/1M tokens
Why: Higher cost was acceptable when it reduced rework and improved correctness for code tasks

Financial Analysis (Cipher)

Model: Gemini Flash + xAI Grok skill
Why: Flash for orchestration, Grok for time-sensitive market context when needed

Infrastructure Monitoring (Sentinel)

Model: Gemini Flash @ $0.075/1M tokens
Why: Structured checks and fast turnaround mattered more than deep reasoning

Result: 60% of work on $0.075/1M models, 20% on $3/1M, 10% on $5/1M, 10% fallbacks

Strategy 2: Provider Diversity in Fallbacks

Single-provider fallback chains fail during outages. OpenClaw’s fallback system enabled provider diversity.

Balanced Fallback Pattern

{
  "primary": "google-gemini-cli/gemini-2.5-flash",
  "fallbacks": [
    "anthropic/claude-haiku-4-5",
    "google-gemini-cli/gemini-3-flash",
    "google-gemini-cli/gemini-2.5-pro"
  ]
}

Benefits:

✅ If Google is not accesible, Anthropic can take over
✅ If Anthropic is out of tokens, fallback returns to Google
✅ Progressive quality degradation instead of hard failure
✅ Higher task completion rate under rate limits and incidents

Fallback Distribution

Primary handles: 95% of requests
Fallback 1: 4% (rate limits)
Fallback 2: 0.9% (outages)
Fallback 3: 0.1% (disaster)

In our workload, the cost impact of fallbacks was negligible relative to the reliability gains.

Strategy 3: Native Search Synthesis with llm_task

OpenClaw’s llm_task enabled workers to delegate synthesis to models that can perform their own web search, reducing repeated external search calls and context bloat.

The Problem: Web Search Spam

Old pattern:

web_search("topic")  // 15-20 calls
web_search("topic detail 1")
web_search("topic detail 2")
// ... context overflow at 100k tokens

New pattern:

// Worker does 1-3 targeted searches for specific facts
web_search("specific fact")
web_search("another fact")

// Then delegates synthesis to a model with native search
llm_task({
  model: "gemini-2.5-flash",
  prompt: "Research [topic] using your web search. Synthesize findings."
})
// Model searches internally, returns synthesis with smaller context footprint

Observed result: token reduction per task in the 100k to 30k range for research-heavy work

OpenClaw Tool Configuration

{
  "tools": {
    "allow": [
      "web_search",
      "web_fetch",
      "llm_task",
      "read",
      "write"
    ]
  }
}

Workers got minimal tool access and OpenClaw enforced boundaries per agent.

Strategy 4: Fixed Agent Coordination

OpenClaw supports agent-to-agent communication, but we found the most reliable pattern was hub-and-spoke RPC delegation from the coordinator.

Broken Pattern (Don’t Use)

sessions_send({
  agent: "worker-content",
  message: "Write blog post"
})

Working Pattern (RPC Style)

#!/bin/bash
# delegate.sh - Proper OpenClaw agent delegation

WORKER_TYPE=$1
TASK=$2

openclaw agent \
  --agent "worker-${WORKER_TYPE}" \
  --ephemeral \
  --timeout 300 \
  --message "$TASK"

Coordinator invokes workers via OpenClaw CLI:

~/workspace/scripts/delegate.sh research "Analyze competitor X"
~/workspace/scripts/delegate.sh content "Write blog about Y"

OpenClaw Ephemeral Sessions

For cron jobs and delegated tasks, OpenClaw’s --ephemeral flag created temporary sessions that auto-cleaned:

openclaw agent \
  --agent worker-research \
  --ephemeral \
  --timeout 300 \
  --message "Task description"

✅ Fresh session each run
✅ Automatic cleanup
✅ No session pollution
✅ Fewer tool state errors

Strategy 5: Memory-First Patterns

OpenClaw’s QMD memory integration enabled workers to check memory before searching.

Memory-First Flow

Task received
  ↓
Check QMD memory
  ↓
Found? → Use cached knowledge
  ↓
Not found? → 1-3 web searches → synthesis → write to memory

OpenClaw QMD Configuration

{
  "memory": {
    "backend": "qmd",
    "qmd": {
      "paths": [
        {
          "path": "/home/clawdia/workspace/memory/briefings",
          "name": "briefings",
          "pattern": "**/*.md"
        }
      ],
      "update": {
        "interval": "10m",
        "onBoot": true
      },
      "limits": {
        "maxResults": 6,
        "maxSnippetChars": 700
      }
    }
  }
}

Workers wrote findings to workspace/memory/briefings/ and OpenClaw indexed them automatically. Future tasks reused this research.

Observed result: fewer redundant searches and fewer repeat calls on recurring topics

Strategy 6: Dynamic Model Selection

OpenClaw’s llm_task supported per-task model selection:

// Quick lookup
llm_task({
  model: "gemini-2.5-flash",
  prompt: "What's the current BTC price?"
})

// Deep analysis
llm_task({
  model: "claude-sonnet-4-5",
  prompt: "Comprehensive competitive analysis of [company]"
})

// Time-sensitive context
llm_task({
  model: "xai/grok-4",
  prompt: "What happened in crypto markets in the last 24 hours?"
})

Workers chose the right model per subtask and OpenClaw handled authentication and routing.

Measurement Methodology

All measurements were taken over a 7-day production window using real workloads only. Synthetic benchmarks and ad hoc tests were excluded.

A “task” is defined as a single end-to-end agent execution initiated by PaxMachina and completed by one or more workers, including fallback execution when triggered.

Token usage was measured using OpenClaw’s native model-usage reporting and cross-checked against provider dashboards where available.

Pricing reflects publicly listed provider prices as of February 2026. No negotiated or volume discounts were applied.

Cost Analysis

Before Optimization

Token usage: 100-200k per task
Primary model: Mostly Gemini 3 Pro @ $1.25/1M
Weighted average: ~$1.15/1M
Monthly (100M tokens): $115

After Optimization

Token usage: 20-40k per task (-70%)
Model distribution:
  60% Gemini Flash @ $0.075/1M
  20% Claude Sonnet @ $3/1M
  10% GPT-5.2 @ $5/1M
  10% Others @ $0.80-1.25/1M
Weighted average: ~$0.85/1M (-26%)
Monthly (30M tokens): $25.50

Savings: $89.50/month (77% reduction)

$115

PRE-OPT

→ 77% Drop →

$26

POST-OPT

TOKEN VOLUME REDUCTION

-70% (via Memory)

AVG. PRICE / 1M TOKENS

-26% (via Alignment)

Per-Task Cost Examples

Research (Archon):

Before: 100k × $1.25 = $0.125
After: 30k × $0.075 = $0.002
Savings: 98%

Content (Scribe):

Before: 100k × $1.25 = $0.125
After: 40k × $3.00 = $0.120
Savings: 4% (higher quality, fewer revisions)

SEO (Oracle):

Before: 80k × $1.25 = $0.100
After: 20k × $0.075 = $0.002
Savings: 98%

Pricing Sources

Google Gemini pricing based on official Google AI documentation
Anthropic Claude pricing based on public Anthropic pricing pages
OpenAI GPT pricing based on published OpenAI API rates
xAI Grok pricing based on published API pricing

Prices may change over time. Calculations reflect list pricing at measurement time.

OpenClaw Configuration Patterns

Per-Agent Model Routing

{
  "agents": {
    "list": [
      {
        "id": "paxmachina",
        "model": {
          "primary": "anthropic/claude-sonnet-4-5",
          "fallbacks": [
            "google-gemini-cli/gemini-3-pro-preview",
            "google-gemini-cli/gemini-2.5-pro",
            "anthropic/claude-haiku-4-5"
          ]
        },
        "tools": {
          "profile": "full"
        }
      },
      {
        "id": "worker-research",
        "model": {
          "primary": "google-gemini-cli/gemini-2.5-flash",
          "fallbacks": [
            "anthropic/claude-haiku-4-5",
            "google-gemini-cli/gemini-3-flash-preview"
          ]
        },
        "tools": {
          "profile": "minimal",
          "allow": ["web_search", "web_fetch", "llm_task"]
        }
      }
    ]
  }
}

Multi-Provider Authentication

OpenClaw supported multiple auth methods per provider:

{
  "auth": {
    "profiles": {
      "google-gemini-cli:alo@scalytics.io": {
        "provider": "google-gemini-cli",
        "mode": "oauth",
        "email": "alo@scalytics.io"
      },
      "anthropic:paxmachina": {
        "provider": "anthropic",
        "mode": "token"
      },
      "openai-codex:default": {
        "provider": "openai-codex",
        "mode": "oauth"
      },
      "xai:default": {
        "provider": "xai",
        "mode": "api_key"
      }
    }
  }
}

Tool Access Control

{
  "tools": {
    "agentToAgent": {
      "enabled": true,
      "allow": [
        "paxmachina",
        "worker-research",
        "worker-code"
      ]
    }
  }
}

OpenClaw enforced which agents could communicate. Workers could not arbitrarily invoke each other.

Results

Performance Improvements

Metric	Before	After	Change
Token usage/task	100-200k	20-40k	-70%
Cost per 1M tokens	$1.15	$0.85	-26%
Total monthly cost	$115	$26	-77%
Context overflows	2-3/day	<1/week	-95%
Response quality	Baseline	Improved	Directional
System uptime	95%	99.9%	+4.9%

Response quality was evaluated using an internal review rubric covering factual accuracy, task completion, structural clarity, and required human revisions. Scores are directional and workload-specific, not absolute benchmarks.

Uptime reflects successful task completion across all agents, including automatic provider fallback during rate limits and incidents.

Operational Benefits

✅ Provider independence with multi-provider routing
✅ Automatic failover via diversified fallback chains
✅ Cost optimization through model-task alignment
✅ Better outputs where it matters by reserving premium models for premium tasks
✅ Less token waste through memory-first workflows

Key OpenClaw Features Used

1. Flexible Model Configuration

Per-agent primary and fallback models with provider abstraction

2. Multi-Provider Authentication

OAuth, API key, and token auth across Google, Anthropic, OpenAI, xAI

3. Tool Access Control

Profile-based tool allowlists enforced per agent

4. QMD Memory Integration

Automatic indexing and retrieval from a markdown knowledge base

5. Ephemeral Sessions

Temporary sessions for one-off tasks with automatic cleanup

6. llm_task Tool

Delegate to any model with dynamic selection

7. Session Management

Persistent sessions for chat interfaces, ephemeral sessions for automation

8. Gateway API

RPC-style agent invocation via CLI and HTTP

Lessons Learned

1. Hub-and-Spoke Beats Mesh

Direct agent-to-agent communication added complexity. Coordinator-mediated delegation was simpler and more reliable.

2. Provider Diversity Matters

Single-provider fallback chains failed during outages. Mixing providers improved task completion under incidents.

3. Model-Task Alignment is Critical

Do not use premium models for routine work. Match model capability to task value.

4. Memory-First Prevents Waste

Check the knowledge base before searching. Write findings back for reuse.

5. Ephemeral Sessions for Automation

Cron jobs and delegated tasks should use ephemeral sessions to avoid session pollution.

6. Dynamic Model Selection Improves ROI

Let workers choose models per subtask. Spend premium tokens only where they change outcomes.

OpenClaw Configuration Best Practices

1. Set Sensible Defaults

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "google-gemini-cli/gemini-2.5-flash",
        "fallbacks": [
          "anthropic/claude-haiku-4-5",
          "google-gemini-cli/gemini-3-flash-preview"
        ]
      },
      "timeoutSeconds": 600,
      "compaction": {
        "mode": "safeguard"
      }
    }
  }
}

2. Override for Specific Agents

Strategic agents got premium models while workers stayed on defaults

3. Minimal Tool Access

Only give agents the tools they need. Use OpenClaw profiles like minimal, coding, and full.

4. Memory Path Configuration

Point QMD at organized knowledge base directories

5. Session Archiving

Archive long-running sessions to reduce corruption risk:

{
  "session": {
    "archiveAfterMinutes": 1440
  }
}

Workflow Hardening and Operational Simplification

As of February 2026, we tightened the operational model to reduce orchestration sprawl, lower idle token spend, and make failures easier to detect and recover.

Trigger and Heartbeat Simplification

Consolidated recurring content checks into a single runner: scripts/content_team_heartbeat.sh.
Cron triggered one content heartbeat every 15 minutes during active hours.
Task definitions and execution rules lived in memory/HEARTBEAT_TASKS.md, including per-task intervals and timeouts.

Local Tools First, Delegate on Change Only

Inbox watcher pulled the latest content-pipeline state and compared commits locally.
Sentinel triage executed only when changes were detected.
Idle cycles avoided unnecessary model calls entirely.

Required File Contract

Each content item included the following files before it was considered valid:

draft.md
meta.json
sources.md

For technical or quantitative claims, an additional file was mandatory:

claims.md

Writing Quality Contract

We enforced global writing rules across all brands and channels:

evidence first
no AI slop
no em dashes
explicit audience fit

These rules were anchored in:

shared/style-guides/evidence_writing_standard.md
ops/review-checklists/content_review.md
shared/market/AUDIENCE_DISCOVERY.md

Clear Ownership Model

The hardened pipeline enforced clear responsibility boundaries:

PaxMachina coordinated and decided GO or NO-GO
Oracle handled SEO intent and keyword alignment
Archon expanded evidence and pain point support
Scribe rewrote for quality and voice
Sentinel monitored and reported operational changes

This reduced coordination overhead, eliminated silent failures, and made the system easier to reason about under load.

Deployment Pattern

Cron Job Template

#!/bin/bash
# Standard OpenClaw cron job pattern
set -euo pipefail

openclaw agent \
  --agent worker-research \
  --ephemeral \
  --timeout 300 \
  --message "Task description"

Delegation Helper

#!/bin/bash
# delegate.sh - RPC-style agent invocation

WORKER=$1
TASK=$2
TIMEOUT=${3:-300}

openclaw agent \
  --agent "worker-${WORKER}" \
  --ephemeral \
  --timeout "$TIMEOUT" \
  --message "$TASK"

Usage

# From coordinator
~/scripts/delegate.sh research "Analyze competitor pricing"
~/scripts/delegate.sh content "Write blog post about AI trends"
~/scripts/delegate.sh seo "Keyword research for [topic]"

Monitoring

Check Model Distribution

openclaw gateway call sessions.list | \
  jq '.sessions[] | {agent: .key, model: .model}'

Track Token Usage

openclaw skill model-usage --agent paxmachina --days 7

List Active Sessions

openclaw sessions list

Gateway Status

openclaw gateway ping
openclaw gateway status

Conclusion

OpenClaw’s flexible multi-agent architecture enabled us to:

Achieve provider independence across Google, Anthropic, OpenAI, xAI
Reduce costs 77% through model-task alignment and lower token usage
Improve reliability with provider-diverse fallback chains
Reduce token waste with memory-first patterns
Simplify coordination with hub-and-spoke RPC delegation

The key insight was simple: match model capability to task value. Use low-cost models for the majority of work, reserve premium models for decisions and outputs that justify the spend, and let OpenClaw handle routing, authentication, and failover.

These results reflect a content-heavy, research-driven multi-agent workload. Systems dominated by long-form reasoning, continuous streaming, or real-time inference may observe different cost and performance characteristics.

Resources

OpenClaw Documentation: https://docs.openclaw.ai
Model Pricing: validate list prices in each provider’s official pricing documentation
QMD Memory System: markdown-based knowledge base patterns
Multi-Agent Patterns: hub-and-spoke coordination

Tags: openclaw, multi-agent systems, AI cost optimization, model provider independence, llm orchestration, agent coordination, production AI, gemini, claude, gpt, grok

Category: Technical Deep Dive

Audience: OpenClaw users, AI engineers, DevOps teams building multi-agent systems

Related guides:

If you need help with distributed systems, backend engineering, or data platforms, check my Services.

novatechflow | Alexander Alten