02 · marketing-psychology · 2026-04-30

AI Customer-Service SaaS Marketing Plan + Live Homepage

marketing-psychology skill · dual deliverable

Task & environment

Original prompt

Analyze only gorgias.com. Output a Chinese marketing document (positioning, 30-day acquisition plan, content topics, DM scripts, homepage copy, objections) and a single-file Chinese index.html (hero / pain / solution / features / use cases / FAQ / demo-booking, no external assets).

Read the full prompt →
Expected artifacts

One marketing execution Markdown + one single-file Chinese index.html (fully inlined CSS/JS)

Results

All numbers are recomputed per-request from the OpenRouter activity CSV.

Agent Requests Cost Prompt Hit rate Hit rate (-1st) Truncations Errors Model clean
OpenClacky
This project
20 $1.72 628,278 91.0% 92.2% 1 0 ✅ Clean
Claude Code
Closed-source
8 $1.20 310,106 64.5% 63.6% 0 0 ⚠️ Mixed
OpenClaw
Open-source peer
34 $7.47 3,759,466 86.1% 88.2% 8 0 ✅ Clean
Hermes
Open multi-agent
22 $4.65 1,258,934 52.9% 53.9% 0 0 ✅ Clean
Claude Code: haiku×2 + sonnet×1 + opus×5(非 opus 请求占 37.5%,花费占比 <5%)

Artifact comparison

Marketing / PPT HTML outputs are embedded inline; social-content text outputs are listed as files.

Full screen recordings of all four runs

Process footage · Evidence

Full screen recordings captured during task execution. Same prompt, same window, four agents.

OpenClacky
MP4
Claude Code
MP4
OpenClaw
MP4
Hermes
MP4

Recordings captured during the May 2026 benchmark. Original timing can be cross-checked against the created_at column in the OpenRouter logs.

Actual artifacts

All four agents' deliverables are public. Preview the HTML, read the Markdown, or download the source.

Execution path & observations

  • OpenClacky — 20 requests, single session. Session JSON was cleared by the rotate mechanism; system log confirms playbook landed 12:35, plan landed 16:09. Hit rate 91.0% / 92.2% cold-start-excluded — highest on this task.
  • Claude Code — 8 requests, cheapest at $1.20, but 3 of 8 requests silently used haiku/sonnet (architectural behavior: auto-dispatched lightweight models for auxiliary steps). Low hit rate (64.5%) is explained by the small session size — the first request carries proportionally more weight.
  • OpenClaw — 34 requests at $7.47 (4.3× OpenClacky). 8 of 34 requests hit finish_reason=length (23.5%) — output reached max_tokens, triggering continuation/retry with larger resubmitted context.
  • Hermes — 22 requests at $4.65. Hit rate 52.9% → 53.9% cold-start-excluded — essentially flat, reconfirming the cache miss is architectural.

Takeaway

On this task, OpenClacky's hit rate (91.0%) actually beat Claude Code's (64.5%). When request counts grow, OpenClacky's cache engineering edge shows up.

Claude Code's $1.20 headline number needs nuance: 3 of its 8 requests were haiku/sonnet, so this isn't strictly a "same model" comparison.

OpenClaw's high cost comes almost entirely from 8 output truncations and their retry overhead — the same failure mode shows up again on the PPT task.

Other tasks

01 · guizang-ppt-skill
10-page Horizontal-Swipe Business Deck (single HTML)
guizang-ppt-skill · AI-Agent industry trend talk
03 · social-content
B2B SaaS Competitor Analysis + Week-1 Social Calendar
social-content skill · 6-step pipeline
← Back to benchmark overview