Round 2 · same skill, same model, same prompt

OpenClacky vs Generic Agent —
same PPT task, two scheduling styles, real numbers.

One guizang-ppt-skill, one claude-4.7-opus, one prompt — the only variable is the agent framework itself. Both shipped an 11-slide deck. This page lays out the process data without picking a winner; you decide.

OpenClacky clacky-ai/openclacky Generic Agent lsdefine/GenericAgent

Generic Agent's stated goal is "a self-evolving agent that grows a skill tree from a 3.3K-line seed, achieving full system control with 6× less token consumption" (their README). Its lightweight context is not a bug — it's the core design intent. This page measures that philosophy against OpenClacky on the same PPT task, in the same coordinate system.

Bottom line · Total cost
$1.18 vs $1.82
Same skill, same claude-4.7-opus, same prompt. OpenClacky finished for 35% less than Generic Agent. Both shipped 11-slide decks of equivalent structure.
Cache hit rate
90.7% vs 73.6%
Total requests
14 vs 22
In three sentences
  • OpenClacky's cache hit rate is exceptionally high (90.7%). 13 of 14 requests reused the previous turn's prompt cache. The scheduler accumulates context and lets the model rehit it — that's the main cost driver.
  • Generic Agent uses 45% fewer tokens per turn. Average prompt size is 30.8k vs OpenClacky's 55.6k (0.55×), with faster per-turn generation too. Its "lightweight context" design goal is genuinely realized in the data.
  • The cost: cache broke 3 times mid-run (#10 / #15 / #20). Hit rate dropped to 0% on those turns, pushing the 22-request total above OpenClacky's 14. Under prompt-caching pricing, "light per turn" and "high cross-turn hit rate" pull against each other.

Experiment setup

The only variable is the agent framework.

Skill
guizang-ppt-skill
Same skill loaded on both sides; deliverable structure is identical.
Model
claude-4.7-opus
100% same model end-to-end on both sides; no mixing.
Prompt
10-slide AI Agent trends deck
Audience: enterprise decision-makers. Style: dark, executive.
Test date
2026-05-09
Two runs ~2 minutes apart, same model instance.
View prompt summary
Task: use guizang-ppt-skill to produce "2026 AI Agent Industry Trends:
Strategic Paradigm for Enterprise Intelligence Transformation".
Audience: enterprise leaders / mid-to-senior decision-makers (focus on
ROI, cost reduction, organizational change).
Length: 20-min talk. Pages: ≤10. Style: premium, minimal, executive, tech.
Outline: cover / core thesis (Copilot→Agent) / market drivers / multi-agent
collaboration / industry ROI / architecture / governance / risks /
quarterly roadmap / closing.
View original prompt.md →

Process recordings

Process · Evidence

Two things to watch: (1) what granularity of decision the agent makes per turn; (2) whether you see it re-read the same file repeatedly.

OpenClacky
≈ 3min
Generic Agent
≈ 4min

Recorded 2026-05-09. Timestamps line up with the OpenRouter CSVs row-by-row.

All key metrics

All numbers come from per-request OpenRouter activity CSVs plus session metadata. No estimates, no averages dressed up as totals.

Metric OpenClacky Generic Agent Notes
Requests 14 22 Total upstream model calls
Agent iterations 11 22 openclacky from session.stats; generic = request count
Wall-clock duration 168s 220s First request → last request
Total cost $1.18 $1.82 OpenRouter billing (cache discount applied)
Total prompt tokens 778,403 677,409 Σ input tokens sent to model
Total cached tokens 705,787 498,294 Tokens served from prompt cache
Overall cache hit rate 90.7% 73.6% Σcached / Σprompt
Cache breaks 1 5 Requests with hit < 50% (cold start excluded)
Avg prompt per turn 55,600 30,791 Tokens fed in per request
Avg completion per turn 1,054 828 Tokens generated per response
Avg generation time per turn 10.6s 8.0s OpenRouter generation_time_ms
Avg TTFT per turn 1560ms 1111ms OpenRouter time_to_first_token_ms
Output truncations 0 0 finish_reason = length
Errors 0 0 finish_reason = error / content_filter

Green marks the better number on metrics with a clear better direction; grey marks neutral metrics (e.g. raw token volume isn't inherently better or worse).

Per-request timeline · how cache breaks happen

Bar length = prompt token volume that turn; color = cache hit rate that turn. When a bar turns red, the agent's context was reset and the previous tens of K of cache value evaporated.

OpenClacky
14 requests
#1 13 0%
#2 21,363 0%
#3 40,959 52%
#4 55,665 74%
#5 56,470 99%
#6 57,015 99%
#7 57,922 98%
#8 67,509 86%
#9 68,421 99%
#10 69,055 99%
#11 70,068 99%
#12 70,287 100%
#13 71,459 98%
#14 72,197 98%
The curve climbs steadily to 72k. Apart from the cold start, every turn hits ≥ 86%; nothing flushes mid-run. The cost: each later turn is heavier and slower.
Generic Agent
22 requests
#1 4,883 0%
#2 5,627 87%
#3 11,553 49%
#4 13,899 83%
#5 16,523 84%
#6 18,190 91%
#7 24,390 75%
#8 30,226 81%
#9 40,607 74%
#10 33,309 0%
#11 40,536 82%
#12 42,591 95%
#13 46,429 92%
#14 48,221 96%
#15 29,459 0%
#16 31,334 94%
#17 33,257 94%
#18 37,223 89%
#19 47,005 79%
#20 38,028 0%
#21 40,945 93%
#22 43,174 95%
Requests #10 / #15 / #20 show three clear context resets (hit rate drops from 90+% to 0%); at #15 the prompt volume falls from 48k to 29k (−39%). This is the price for keeping per-turn prompts small.
≥ 80% hit 50–80% hit < 50% hit (cold start / context reset)

Technical traits (each side's tradeoff)

The two frameworks made different design tradeoffs. We pair them up against facts visible in this run — without ranking them.

Context management

Mirror-image attitudes toward "accumulate vs. compress" — leading to two different cost curves.

OpenClacky
Continuously accumulates context: prompt grows monotonically 13 → 72k over 14 turns, overall hit rate 90.7%. Pro: maximizes cache. Con: each later turn is heavier.
Generic Agent
Three clear context resets during the run (#10 / #15 / #20), pulling prompt back to ~30k. Pro: per-turn token usage stays flat. Con: every reset wipes the previous cache value.

Tool granularity

OpenClacky offers many cohesive specialized tools; Generic Agent leans on file_read / code_run / file_patch.

OpenClacky
13 tool_use calls spread across file_reader / edit / glob / grep / write etc. Each call yields more information per round trip, hence finishing in 14 turns.
Generic Agent
24 tool_use calls concentrated in file_read×11, code_run×8, file_patch×4. 11× file_read points to repeated re-confirmation of intermediate files — generic tools have no shortcut for "read the same file again".

Per-turn weight

"Light per turn, more turns" vs "heavy per turn, fewer turns" — the most direct difference here.

OpenClacky
Avg prompt 55.6k per turn, avg generation 10.6s, TTFT 1560ms. Slower, heavier per turn.
Generic Agent
Avg prompt 30.8k per turn, avg generation 8.0s, TTFT 1111ms. Lighter, faster per turn — Generic Agent's real advantage.

Resilience

Neither framework triggered any failure path on this task — quality-side draw.

OpenClacky
0 truncations, 0 errors. All 14 turns finish_reason=stop.
Generic Agent
0 truncations, 0 errors. All 22 turns finish_reason=stop.
How to read this set of numbers: Generic Agent's "light context + generic tools" yielded smaller per-turn token footprint and faster single-turn latency; OpenClacky's "accumulating context + specialized tools" maxed out cache hit rate and information density per tool call. Both completed the deck. On this run OpenClacky leads on requests (−36%), total cost (−35%) and hit rate (+17pp); Generic Agent leads on per-turn prompt volume (−45%) and per-turn generation time (−25%). Which tradeoff fits depends on whether you'd rather be "fast each turn" or "cheap overall".

Deliverable comparison

Both deliverables share the same skill template (11 slides, dual-canvas background, same fonts). Click any thumbnail to open the full deck.

Generic Agent open in new tab

Thumbnails are scaled previews. Both decks ship with the skill's motion.min.js + WebGL dual-canvas — desktop browsers recommended.

Summary

Same skill, same claude-4.7-opus, same prompt. Both shipped 11-slide decks of structurally equivalent quality — visible quality differences are minimal.

The differences live in process: OpenClacky took the "accumulate context + specialized tools" path, finishing in 14 requests at 90.7% hit rate and $1.18; Generic Agent took the "light context + generic tools" path, with smaller per-turn prompts and faster per-turn latency, ending at 22 requests / $1.82.

Worth respecting: "lightweight context" is Generic Agent's explicit design goal (its README slogan literally reads "6× less token consumption"). On this task its average per-turn prompt is 0.55× OpenClacky's (30.8k vs 55.6k) — it did achieve that. The trade-off is that "lightweight" and "high cache hit rate" pull against each other when prompt caching is in play.

This is an honest pair of design tradeoffs: neither dominates — each fits different scenarios. For long tasks where the agent needs to revisit early context, OpenClacky's accumulating scheduler costs less; for short tasks favoring tight single-turn responses, Generic Agent's lighter scheduler saves single-turn latency.

All raw data (OpenRouter CSVs, session JSON, screen recordings) is attached to this page — go verify it yourself, and visit Generic Agent's GitHub to read the author's own design rationale.

See other tasks

01 · guizang-ppt-skill
10-page Horizontal-Swipe Business Deck (single HTML)
guizang-ppt-skill · AI-Agent industry trend talk
02 · marketing-psychology
AI Customer-Service SaaS Marketing Plan + Live Homepage
marketing-psychology skill · dual deliverable
03 · social-content
B2B SaaS Competitor Analysis + Week-1 Social Calendar
social-content skill · 6-step pipeline
← Back to benchmark overview