02 · marketing-psychology · 2026-04-30

AI Customer-Service SaaS Marketing Plan + Live Homepage

marketing-psychology skill · dual deliverable

Task & environment

Original prompt

Analyze only gorgias.com. Output a Chinese marketing document (positioning, 30-day acquisition plan, content topics, DM scripts, homepage copy, objections) and a single-file Chinese index.html (hero / pain / solution / features / use cases / FAQ / demo-booking, no external assets).

Read the full prompt →

Expected artifacts

One marketing execution Markdown + one single-file Chinese index.html (fully inlined CSS/JS)

Results

All numbers are recomputed per-request from the OpenRouter activity CSV.

Agent	Requests	Cost	Prompt	Hit rate	Hit rate (-1st)	Truncations	Model clean
OpenClacky This project	20	$1.72	628,278	91.0%	92.2%	1	✅ Clean
Claude Code Closed-source	8	$1.20	310,106	64.5%	63.6%	0	⚠️ Mixed
OpenClaw Open-source peer	34	$7.47	3,759,466	86.1%	88.2%	8	✅ Clean
Hermes Open multi-agent	22	$4.65	1,258,934	52.9%	53.9%	0	✅ Clean

Claude Code: haiku×2 + sonnet×1 + opus×5（非 opus 请求占 37.5%，花费占比 <5%）

Artifact comparison

Marketing / PPT HTML outputs are embedded inline; social-content text outputs are listed as files.

OpenClacky

Claude Code

OpenClaw

Hermes

Full screen recordings of all four runs

Process footage · Evidence

Full screen recordings captured during task execution. Same prompt, same window, four agents.

OpenClacky

MP4

Claude Code

MP4

OpenClaw

MP4

Hermes

MP4

Recordings captured during the May 2026 benchmark. Original timing can be cross-checked against the created_at column in the OpenRouter logs.

Actual artifacts

All four agents' deliverables are public. Preview the HTML, read the Markdown, or download the source.

OpenClacky

3 files

Claude Code

2 files

OpenClaw

2 files

Hermes

3 files

Execution path & observations

OpenClacky — 20 requests, single session. Session JSON was cleared by the rotate mechanism; system log confirms playbook landed 12:35, plan landed 16:09. Hit rate 91.0% / 92.2% cold-start-excluded — highest on this task.
Claude Code — 8 requests, cheapest at $1.20, but 3 of 8 requests silently used haiku/sonnet (architectural behavior: auto-dispatched lightweight models for auxiliary steps). Low hit rate (64.5%) is explained by the small session size — the first request carries proportionally more weight.
OpenClaw — 34 requests at $7.47 (4.3× OpenClacky). 8 of 34 requests hit finish_reason=length (23.5%) — output reached max_tokens, triggering continuation/retry with larger resubmitted context.
Hermes — 22 requests at $4.65. Hit rate 52.9% → 53.9% cold-start-excluded — essentially flat, reconfirming the cache miss is architectural.

Takeaway

On this task, OpenClacky's hit rate (91.0%) actually beat Claude Code's (64.5%). When request counts grow, OpenClacky's cache engineering edge shows up.

Claude Code's $1.20 headline number needs nuance: 3 of its 8 requests were haiku/sonnet, so this isn't strictly a "same model" comparison.

OpenClaw's high cost comes almost entirely from 8 output truncations and their retry overhead — the same failure mode shows up again on the PPT task.

Other tasks

01 · guizang-ppt-skill

10-page Horizontal-Swipe Business Deck (single HTML)

guizang-ppt-skill · AI-Agent industry trend talk

03 · social-content

B2B SaaS Competitor Analysis + Week-1 Social Calendar

social-content skill · 6-step pipeline

← Back to benchmark overview