10-page Horizontal-Swipe Business Deck (single HTML)
guizang-ppt-skill · AI-Agent industry trend talk
Task & environment
Strict 10 pages. Cover → core thesis (Copilot → Agent) → market drivers → multi-agent orchestration → vertical ROI → architecture evolution (AI-Native Infra) → governance → risks → 4-quarter roadmap → closing. Require charts + data comparisons, not walls of text.
Read the full prompt →One single-file index.html (horizontal swipe deck) + optional assets/images
Results
All numbers are recomputed per-request from the OpenRouter activity CSV.
| Agent | Requests | Cost | Prompt | Hit rate | Hit rate (-1st) | Truncations | Errors | Model clean |
|---|---|---|---|---|---|---|---|---|
|
OpenClacky
This project
|
10 | $1.23 | 490,844 | 85.4% | 87.1% | 0 | 0 | ✅ Clean |
|
Claude Code
Closed-source
|
19 | $1.45 | 1,372,822 | 94.8% | 94.9% | 0 | 0 | ⚠️ Mixed |
|
OpenClaw
Open-source peer
|
34 | $5.07 | 2,400,582 | 86.8% | 89.7% | 9 | 1 | ⚠️ Mixed |
|
Hermes
Open multi-agent
|
51 | $10.96 | 5,374,545 | 71.0% | 70.9% | 0 | 0 | ✅ Clean |
Artifact comparison
Marketing / PPT HTML outputs are embedded inline; social-content text outputs are listed as files.
Full screen recordings of all four runs
Process footage · EvidenceFull screen recordings captured during task execution. Same prompt, same window, four agents.
Recordings captured during the May 2026 benchmark. Original timing can be cross-checked against the created_at column in the OpenRouter logs.
Actual artifacts
All four agents' deliverables are public. Preview the HTML, read the Markdown, or download the source.
OpenClacky
2 filesClaude Code
1 filesOpenClaw
1 filesHermes
1 filesExecution path & observations
- OpenClacky — 10 requests, single session, 7 min. Zero truncations, zero errors. Fewest requests across all four agents.
- Claude Code — 19 requests, 2.7 min (fastest). Highest hit rate on this task (94.8%). Mixed in 1 haiku request as an auxiliary step.
- OpenClaw — 34 requests at $5.07. 9
finish_reason=length+ 1 error — 10 anomalous requests, 29.4% of all requests. Withopenrouter/autorouting, the first request landed on legacyclaude-4.6-opusand 2 later requests went togoogle/gemini-2.5-flash-liteand errored. - Hermes — 51 requests at $10.96, ~11 min wall clock. Hit rate 71.0% → 70.9% cold-start-excluded (flat). The most expensive run on this task.
Takeaway
On this task, OpenClacky shipped the same 10-page deck in fewer requests (10) and at the lowest cost ($1.23).
Claude Code had the highest hit rate (94.8%) but used more requests, ending at $1.45 — same order of magnitude.
OpenClaw's openrouter/auto routing exposed a concrete engineering problem on this task — legacy-opus pollution, wrong-model errors, 9 output truncations, 29.4% anomalous requests, total cost inflated to $5.07.
Hermes finished the same deliverable for $10.96 — 8.9× OpenClacky's cost.