OpenClacky vs Generic Agent —
same PPT task, two scheduling styles, real numbers.
One guizang-ppt-skill, one claude-4.7-opus, one prompt — the only variable is the agent framework itself. Both shipped an 11-slide deck. This page lays out the process data without picking a winner; you decide.
Generic Agent's stated goal is "a self-evolving agent that grows a skill tree from a 3.3K-line seed, achieving full system control with 6× less token consumption" (their README). Its lightweight context is not a bug — it's the core design intent. This page measures that philosophy against OpenClacky on the same PPT task, in the same coordinate system.
- OpenClacky's cache hit rate is exceptionally high (90.7%). 13 of 14 requests reused the previous turn's prompt cache. The scheduler accumulates context and lets the model rehit it — that's the main cost driver.
- Generic Agent uses 45% fewer tokens per turn. Average prompt size is 30.8k vs OpenClacky's 55.6k (0.55×), with faster per-turn generation too. Its "lightweight context" design goal is genuinely realized in the data.
- The cost: cache broke 3 times mid-run (#10 / #15 / #20). Hit rate dropped to 0% on those turns, pushing the 22-request total above OpenClacky's 14. Under prompt-caching pricing, "light per turn" and "high cross-turn hit rate" pull against each other.
Experiment setup
The only variable is the agent framework.
View prompt summary
Task: use guizang-ppt-skill to produce "2026 AI Agent Industry Trends: Strategic Paradigm for Enterprise Intelligence Transformation". Audience: enterprise leaders / mid-to-senior decision-makers (focus on ROI, cost reduction, organizational change). Length: 20-min talk. Pages: ≤10. Style: premium, minimal, executive, tech. Outline: cover / core thesis (Copilot→Agent) / market drivers / multi-agent collaboration / industry ROI / architecture / governance / risks / quarterly roadmap / closing.View original prompt.md →
Process recordings
Process · EvidenceTwo things to watch: (1) what granularity of decision the agent makes per turn; (2) whether you see it re-read the same file repeatedly.
Recorded 2026-05-09. Timestamps line up with the OpenRouter CSVs row-by-row.
All key metrics
All numbers come from per-request OpenRouter activity CSVs plus session metadata. No estimates, no averages dressed up as totals.
| Metric | OpenClacky | Generic Agent | Notes |
|---|---|---|---|
| Requests | 14 | 22 | Total upstream model calls |
| Agent iterations | 11 | 22 | openclacky from session.stats; generic = request count |
| Wall-clock duration | 168s | 220s | First request → last request |
| Total cost | $1.18 | $1.82 | OpenRouter billing (cache discount applied) |
| Total prompt tokens | 778,403 | 677,409 | Σ input tokens sent to model |
| Total cached tokens | 705,787 | 498,294 | Tokens served from prompt cache |
| Overall cache hit rate | 90.7% | 73.6% | Σcached / Σprompt |
| Cache breaks | 1 | 5 | Requests with hit < 50% (cold start excluded) |
| Avg prompt per turn | 55,600 | 30,791 | Tokens fed in per request |
| Avg completion per turn | 1,054 | 828 | Tokens generated per response |
| Avg generation time per turn | 10.6s | 8.0s | OpenRouter generation_time_ms |
| Avg TTFT per turn | 1560ms | 1111ms | OpenRouter time_to_first_token_ms |
| Output truncations | 0 | 0 | finish_reason = length |
| Errors | 0 | 0 | finish_reason = error / content_filter |
Green marks the better number on metrics with a clear better direction; grey marks neutral metrics (e.g. raw token volume isn't inherently better or worse).
Per-request timeline · how cache breaks happen
Bar length = prompt token volume that turn; color = cache hit rate that turn. When a bar turns red, the agent's context was reset and the previous tens of K of cache value evaporated.
Technical traits (each side's tradeoff)
The two frameworks made different design tradeoffs. We pair them up against facts visible in this run — without ranking them.
Context management
Mirror-image attitudes toward "accumulate vs. compress" — leading to two different cost curves.
Tool granularity
OpenClacky offers many cohesive specialized tools; Generic Agent leans on file_read / code_run / file_patch.
file_read×11, code_run×8, file_patch×4. 11× file_read points to repeated re-confirmation of intermediate files — generic tools have no shortcut for "read the same file again".Per-turn weight
"Light per turn, more turns" vs "heavy per turn, fewer turns" — the most direct difference here.
Resilience
Neither framework triggered any failure path on this task — quality-side draw.
finish_reason=stop.finish_reason=stop.Deliverable comparison
Both deliverables share the same skill template (11 slides, dual-canvas background, same fonts). Click any thumbnail to open the full deck.
Thumbnails are scaled previews. Both decks ship with the skill's motion.min.js + WebGL dual-canvas — desktop browsers recommended.
Summary
Same skill, same claude-4.7-opus, same prompt. Both shipped 11-slide decks of structurally equivalent quality — visible quality differences are minimal.
The differences live in process: OpenClacky took the "accumulate context + specialized tools" path, finishing in 14 requests at 90.7% hit rate and $1.18; Generic Agent took the "light context + generic tools" path, with smaller per-turn prompts and faster per-turn latency, ending at 22 requests / $1.82.
Worth respecting: "lightweight context" is Generic Agent's explicit design goal (its README slogan literally reads "6× less token consumption"). On this task its average per-turn prompt is 0.55× OpenClacky's (30.8k vs 55.6k) — it did achieve that. The trade-off is that "lightweight" and "high cache hit rate" pull against each other when prompt caching is in play.
This is an honest pair of design tradeoffs: neither dominates — each fits different scenarios. For long tasks where the agent needs to revisit early context, OpenClacky's accumulating scheduler costs less; for short tasks favoring tight single-turn responses, Generic Agent's lighter scheduler saves single-turn latency.
All raw data (OpenRouter CSVs, session JSON, screen recordings) is attached to this page — go verify it yourself, and visit Generic Agent's GitHub to read the author's own design rationale.