01 · guizang-ppt-skill · 2026-05-02

10-page Horizontal-Swipe Business Deck (single HTML)

guizang-ppt-skill · AI-Agent industry trend talk

Task & environment

Original prompt

Strict 10 pages. Cover → core thesis (Copilot → Agent) → market drivers → multi-agent orchestration → vertical ROI → architecture evolution (AI-Native Infra) → governance → risks → 4-quarter roadmap → closing. Require charts + data comparisons, not walls of text.

Read the full prompt →

Expected artifacts

One single-file index.html (horizontal swipe deck) + optional assets/images

Results

All numbers are recomputed per-request from the OpenRouter activity CSV.

Agent	Requests	Cost	Prompt	Hit rate	Hit rate (-1st)	Truncations	Errors	Model clean
OpenClacky This project	10	$1.23	490,844	85.4%	87.1%	0	0	✅ Clean
Claude Code Closed-source	19	$1.45	1,372,822	94.8%	94.9%	0	0	⚠️ Mixed
OpenClaw Open-source peer	34	$5.07	2,400,582	86.8%	89.7%	9	1	⚠️ Mixed
Hermes Open multi-agent	51	$10.96	5,374,545	71.0%	70.9%	0	0	✅ Clean

Claude Code: haiku×1 + opus×18（非 opus 占比 <5%）

OpenClaw: openrouter/auto 路由混入 opus-4.6×1 + gemini-flash×2，异常模型花费占比 8.3%

Artifact comparison

Marketing / PPT HTML outputs are embedded inline; social-content text outputs are listed as files.

OpenClacky

Claude Code

OpenClaw

Hermes

Full screen recordings of all four runs

Process footage · Evidence

Full screen recordings captured during task execution. Same prompt, same window, four agents.

OpenClacky

MP4

Claude Code

MP4

OpenClaw

MP4

Hermes

MP4

Recordings captured during the May 2026 benchmark. Original timing can be cross-checked against the created_at column in the OpenRouter logs.

Actual artifacts

All four agents' deliverables are public. Preview the HTML, read the Markdown, or download the source.

OpenClacky

2 files

Claude Code

1 files

index.html

OpenClaw

1 files

index.html

Hermes

1 files

index.html

Execution path & observations

OpenClacky — 10 requests, single session, 7 min. Zero truncations, zero errors. Fewest requests across all four agents.
Claude Code — 19 requests, 2.7 min (fastest). Highest hit rate on this task (94.8%). Mixed in 1 haiku request as an auxiliary step.
OpenClaw — 34 requests at $5.07. 9 finish_reason=length + 1 error — 10 anomalous requests, 29.4% of all requests. With openrouter/auto routing, the first request landed on legacy claude-4.6-opus and 2 later requests went to google/gemini-2.5-flash-lite and errored.
Hermes — 51 requests at $10.96, ~11 min wall clock. Hit rate 71.0% → 70.9% cold-start-excluded (flat). The most expensive run on this task.

Takeaway

On this task, OpenClacky shipped the same 10-page deck in fewer requests (10) and at the lowest cost ($1.23).

Claude Code had the highest hit rate (94.8%) but used more requests, ending at $1.45 — same order of magnitude.

OpenClaw's openrouter/auto routing exposed a concrete engineering problem on this task — legacy-opus pollution, wrong-model errors, 9 output truncations, 29.4% anomalous requests, total cost inflated to $5.07.

Hermes finished the same deliverable for $10.96 — 8.9× OpenClacky's cost.

Other tasks

02 · marketing-psychology

AI Customer-Service SaaS Marketing Plan + Live Homepage

marketing-psychology skill · dual deliverable

03 · social-content

B2B SaaS Competitor Analysis + Week-1 Social Calendar

social-content skill · 6-step pipeline

← Back to benchmark overview