On par with Claude Code,
well ahead of the rest.
One task, four agents, identical conditions.
The chart first
Total cost, request count, and cache hit rate for all four agents under identical prompt, model, and skill. Numbers come from per-request OpenRouter CSV logs.
The direct cause: cache × request count
Per-prompt pricing is almost identical. The 6× total-cost gap comes from this: OpenClacky finished the same tasks with fewer requests and a higher cache hit rate.
51 requests + 90.6% hit rate → $5.10. Hermes 218 requests + 60.3% hit rate → $30.14.
Same prompt, four very different outputs.
Marketing / PPT HTML outputs are embedded inline; social-content text outputs are listed as files.
10-page Horizontal-Swipe Business Deck (single HTML)
guizang-ppt-skill · AI-Agent industry trend talk
AI Customer-Service SaaS Marketing Plan + Live Homepage
marketing-psychology skill · dual deliverable
B2B SaaS Competitor Analysis + Week-1 Social Calendar
social-content skill · 6-step pipeline
Controlled variables
To keep the results comparable, all four agents were run within the same time window, using identical prompts, the same underlying model, and the same skill versions.
Want the harness engineering details?
The tech deep dive covers cache design, pipeline decomposition, and every design trade-off.