03 · social-content · 2026-04-30

B2B SaaS Competitor Analysis + Week-1 Social Calendar

social-content skill · 6-step pipeline

Task & environment

Original prompt

Run in order: read source material → 3 competitor analyses → gap analysis (shared blind spots + differentiation) → FlowBase content strategy → week-1 LinkedIn & Twitter content calendars → final combined report.

Read the full prompt →

Expected artifacts

10 Markdown files: 3× *_posts.md + 3× *_analysis.md + competitive_gap_analysis.md + flowbase_content_strategy.md + week1_linkedin.md + week1_twitter.md + final_report.md

Results

All numbers are recomputed per-request from the OpenRouter activity CSV.

Agent	Requests	Cost	Prompt	Hit rate	Hit rate (-1st)	Model clean
OpenClacky This project	21	$2.14	1,008,988	90.0%	91.4%	✅ Clean
Claude Code Closed-source	43	$2.84	3,204,300	97.6%	98.2%	✅ Clean
OpenClaw Open-source peer	13	$3.15	1,626,133	82.4%	88.3%	✅ Clean
Hermes Open multi-agent	145	$14.53	3,850,850	47.1%	47.4%	✅ Clean

Artifact comparison

Marketing / PPT HTML outputs are embedded inline; social-content text outputs are listed as files.

OpenClacky

15 files

final_report.md flowbase_content_strategy.md flowbase_blog_article.md competitive_gap_analysis.md week1_linkedin.md week1_twitter.md competitor_coda_analysis.md competitor_notion_analysis.md competitor_obsidian_analysis.md competitor_coda_posts.md competitor_notion_posts.md competitor_obsidian_posts.md

Claude Code

8 files

final_report.md flowbase_content_strategy.md competitive_gap_analysis.md week1_linkedin.md week1_twitter.md competitor_coda_analysis.md competitor_notion_analysis.md competitor_obsidian_analysis.md

OpenClaw

10 files

final_report.md flowbase_content_strategy.md competitive_gap_analysis.md week1_linkedin.md week1_twitter.md competitor_coda_analysis.md competitor_notion_analysis.md competitor_obsidian_analysis.md _check.py _check_result.json

Hermes

8 files

_README.md competitive_gap_analysis.md competitor_coda_analysis.md competitor_notion_analysis.md competitor_obsidian_analysis.md competitor_coda_posts.md competitor_notion_posts.md competitor_obsidian_posts.md

Full screen recordings of all four runs

Process footage · Evidence

Full screen recordings captured during task execution. Same prompt, same window, four agents.

OpenClacky

MP4

Claude Code

MP4

OpenClaw

MP4

Hermes

MP4

Recordings captured during the May 2026 benchmark. Original timing can be cross-checked against the created_at column in the OpenRouter logs.

Execution path & observations

OpenClacky — 21 requests, single session. Delivered all 10 expected artifacts plus a bonus flowbase_blog_article.md and a verify.py self-check script.
Claude Code — 43 requests, single session. Highest cache rate on this task (97.6%). The archive has 7 deliverables; the 3 *_posts.md files were processed in-context and not persisted as separate files.
OpenClaw — 13 requests, single session. Fewest requests, but larger per-request prompts + 82.4% hit rate pushed total to $3.15 — higher than OpenClacky.
Hermes — 145 requests across 9 sessions (1 orchestrator + 8 sub-tasks). Even with the first request removed, hit rate only climbs to 47.4% — the multi-session architecture rebuilds cache on every handoff. The orchestrator stopped at 21:08 after gap analysis; it never ran the strategy / week-1 / final-report steps.

Takeaway

On this task, OpenClacky delivered all 10 skill-defined artifacts for $2.14 across 21 requests — the lowest cost and the most complete output of any agent.

Claude Code had the highest cache hit rate (97.6%) but used more requests (43), ending at $2.84 — same order of magnitude as OpenClacky.

Hermes ran a multi-session pipeline that caused heavy cache misses and spent $14.53 without finishing the second half.

Other tasks

01 · guizang-ppt-skill

10-page Horizontal-Swipe Business Deck (single HTML)

guizang-ppt-skill · AI-Agent industry trend talk

02 · marketing-psychology

AI Customer-Service SaaS Marketing Plan + Live Homepage

marketing-psychology skill · dual deliverable

← Back to benchmark overview