01 · guizang-ppt-skill · 2026-05-02

10-page Horizontal-Swipe Business Deck (single HTML)

guizang-ppt-skill · AI-Agent industry trend talk

Task & environment

Original prompt

Strict 10 pages. Cover → core thesis (Copilot → Agent) → market drivers → multi-agent orchestration → vertical ROI → architecture evolution (AI-Native Infra) → governance → risks → 4-quarter roadmap → closing. Require charts + data comparisons, not walls of text.

Read the full prompt →
Expected artifacts

One single-file index.html (horizontal swipe deck) + optional assets/images

Results

All numbers are recomputed per-request from the OpenRouter activity CSV.

Agent Requests Cost Prompt Hit rate Hit rate (-1st) Truncations Errors Model clean
OpenClacky
This project
10 $1.23 490,844 85.4% 87.1% 0 0 ✅ Clean
Claude Code
Closed-source
19 $1.45 1,372,822 94.8% 94.9% 0 0 ⚠️ Mixed
OpenClaw
Open-source peer
34 $5.07 2,400,582 86.8% 89.7% 9 1 ⚠️ Mixed
Hermes
Open multi-agent
51 $10.96 5,374,545 71.0% 70.9% 0 0 ✅ Clean
Claude Code: haiku×1 + opus×18(非 opus 占比 <5%)
OpenClaw: openrouter/auto 路由混入 opus-4.6×1 + gemini-flash×2,异常模型花费占比 8.3%

Artifact comparison

Marketing / PPT HTML outputs are embedded inline; social-content text outputs are listed as files.

Full screen recordings of all four runs

Process footage · Evidence

Full screen recordings captured during task execution. Same prompt, same window, four agents.

OpenClacky
MP4
Claude Code
MP4
OpenClaw
MP4
Hermes
MP4

Recordings captured during the May 2026 benchmark. Original timing can be cross-checked against the created_at column in the OpenRouter logs.

Actual artifacts

All four agents' deliverables are public. Preview the HTML, read the Markdown, or download the source.

OpenClacky

2 files

Claude Code

1 files

OpenClaw

1 files

Hermes

1 files

Execution path & observations

  • OpenClacky — 10 requests, single session, 7 min. Zero truncations, zero errors. Fewest requests across all four agents.
  • Claude Code — 19 requests, 2.7 min (fastest). Highest hit rate on this task (94.8%). Mixed in 1 haiku request as an auxiliary step.
  • OpenClaw — 34 requests at $5.07. 9 finish_reason=length + 1 error — 10 anomalous requests, 29.4% of all requests. With openrouter/auto routing, the first request landed on legacy claude-4.6-opus and 2 later requests went to google/gemini-2.5-flash-lite and errored.
  • Hermes — 51 requests at $10.96, ~11 min wall clock. Hit rate 71.0% → 70.9% cold-start-excluded (flat). The most expensive run on this task.

Takeaway

On this task, OpenClacky shipped the same 10-page deck in fewer requests (10) and at the lowest cost ($1.23).

Claude Code had the highest hit rate (94.8%) but used more requests, ending at $1.45 — same order of magnitude.

OpenClaw's openrouter/auto routing exposed a concrete engineering problem on this task — legacy-opus pollution, wrong-model errors, 9 output truncations, 29.4% anomalous requests, total cost inflated to $5.07.

Hermes finished the same deliverable for $10.96 — 8.9× OpenClacky's cost.

Other tasks

02 · marketing-psychology
AI Customer-Service SaaS Marketing Plan + Live Homepage
marketing-psychology skill · dual deliverable
03 · social-content
B2B SaaS Competitor Analysis + Week-1 Social Calendar
social-content skill · 6-step pipeline
← Back to benchmark overview