B2B SaaS Competitor Analysis + Week-1 Social Calendar
social-content skill · 6-step pipeline
Task & environment
Run in order: read source material → 3 competitor analyses → gap analysis (shared blind spots + differentiation) → FlowBase content strategy → week-1 LinkedIn & Twitter content calendars → final combined report.
Read the full prompt →10 Markdown files: 3× *_posts.md + 3× *_analysis.md + competitive_gap_analysis.md + flowbase_content_strategy.md + week1_linkedin.md + week1_twitter.md + final_report.md
Results
All numbers are recomputed per-request from the OpenRouter activity CSV.
| Agent | Requests | Cost | Prompt | Hit rate | Hit rate (-1st) | Truncations | Errors | Model clean |
|---|---|---|---|---|---|---|---|---|
|
OpenClacky
This project
|
21 | $2.14 | 1,008,988 | 90.0% | 91.4% | 0 | 0 | ✅ Clean |
|
Claude Code
Closed-source
|
43 | $2.84 | 3,204,300 | 97.6% | 98.2% | 0 | 0 | ✅ Clean |
|
OpenClaw
Open-source peer
|
13 | $3.15 | 1,626,133 | 82.4% | 88.3% | 0 | 0 | ✅ Clean |
|
Hermes
Open multi-agent
|
145 | $14.53 | 3,850,850 | 47.1% | 47.4% | 0 | 0 | ✅ Clean |
Full screen recordings of all four runs
Process footage · EvidenceFull screen recordings captured during task execution. Same prompt, same window, four agents.
Recordings captured during the May 2026 benchmark. Original timing can be cross-checked against the created_at column in the OpenRouter logs.
Execution path & observations
- OpenClacky — 21 requests, single session. Delivered all 10 expected artifacts plus a bonus
flowbase_blog_article.mdand averify.pyself-check script. - Claude Code — 43 requests, single session. Highest cache rate on this task (97.6%). The archive has 7 deliverables; the 3
*_posts.mdfiles were processed in-context and not persisted as separate files. - OpenClaw — 13 requests, single session. Fewest requests, but larger per-request prompts + 82.4% hit rate pushed total to $3.15 — higher than OpenClacky.
- Hermes — 145 requests across 9 sessions (1 orchestrator + 8 sub-tasks). Even with the first request removed, hit rate only climbs to 47.4% — the multi-session architecture rebuilds cache on every handoff. The orchestrator stopped at 21:08 after gap analysis; it never ran the strategy / week-1 / final-report steps.
Takeaway
On this task, OpenClacky delivered all 10 skill-defined artifacts for $2.14 across 21 requests — the lowest cost and the most complete output of any agent.
Claude Code had the highest cache hit rate (97.6%) but used more requests (43), ending at $2.84 — same order of magnitude as OpenClacky.
Hermes ran a multi-session pipeline that caused heavy cache misses and spent $14.53 without finishing the second half.