OpenClacky vs Pi Agent ——
same PPT task, bill at 65%.
Same guizang-ppt-skill, same claude-4.7-opus, same prompt. The only variable is the agent harness itself. Both delivered an 11-page deck; this page lays out the process data so you can judge for yourself.
Pi Agent comes from badlogic (creator of libGDX). It's a 46.9k-star agent harness mono-repo on GitHub — coding agent CLI, unified multi-provider LLM API, TUI / web UI libraries, Slack bot, vLLM pods. It emphasizes shareable OSS sessions and self-extensibility — a platform, not a script. Its "lightweight per-turn context, many small steps" stance is a deliberate engineering tradeoff. This page puts that philosophy on the same axes as OpenClacky for one specific PPT task.
- OpenClacky's cache hit rate is very high (90.7%). 13 of 14 requests rode the previous prompt cache — this is the root cause of both the lower bill ($1.18) and shorter wall-clock (2m29s).
- Pi has lighter per-turn prompts (−40%). Average prompt is 33.5k vs OpenClacky's 55.6k — direct evidence of its "small steps, lean context" design philosophy. Single-turn cost on the model side is genuinely lower.
- The price: 7-turn cold start + 6 mid-run cache breaks. The bill ends up 54% higher ($1.82 vs $1.18).
Experiment setup
The only variable is the agent harness.
View prompt summary
Task: use guizang-ppt-skill to produce "2026 AI Agent Industry Trends: Strategic Paradigm for Enterprise Intelligence". Audience: enterprise leaders / mid-to-senior decision-makers (focused on ROI, efficiency, organizational change). Length: 20 minutes / ≤10 pages. Style: high-end, minimal, business, tech. Outline: cover / core thesis (Copilot→Agent) / market drivers / multi-agent collaboration / industry ROI / architectural evolution / governance dividends / risks / 4-quarter roadmap / closing.View original prompt.md →
Run recordings
Process · EvidenceWatch for: (1) what granularity of decision the agent makes per turn; (2) any moments where the same file is re-read.
Recordings produced 2026-05-09; timestamps line up 1:1 with the OpenRouter CSVs.
All key metrics
Numbers come straight from the OpenRouter activity CSV, request by request. No estimates, no averaging tricks.
| Metric | OpenClacky | Pi Agent | Notes |
|---|---|---|---|
| Requests | 14 | 23 | Total model calls |
| Agent iterations | 11 | 23 | 1:1 with requests in this task |
| Total wall-clock | 149s | 342s | OpenRouter Σ generation_time (first → last) |
| Total cost | $1.18 | $1.79 | OpenRouter bill (cache discount applied) |
| Total prompt tokens | 778,403 | 769,349 | All input tokens sent to model |
| Total cached tokens | 705,787 | 582,868 | Tokens that hit prompt cache |
| Overall cache hit rate | 90.7% | 75.8% | Σcached / Σprompt |
| Cache breaks | 1 | 7 | Requests with hit rate < 50% (incl. cold start) |
| Avg prompt / turn | 55,600 | 33,450 | Per-request input volume |
| Avg completion / turn | 1,054 | 823 | Per-request output volume |
| Avg generation time / turn | 10.6s | 14.9s | Mean of OpenRouter generation_time_ms |
| Length cuts | 0 | 0 | Requests with finish_reason = length |
| Errors | 0 | 0 | finish_reason = error / content_filter |
Green = numerically better on that metric. Gray = no clear "better" direction (e.g. token volume itself is neither good nor bad).
Per-request timeline · how cache breaks happen
Bar length = prompt token volume. Color = cache hit rate. Red means the agent's context was reset or hadn't accumulated yet — tens of thousands of token-cache value just evaporated.
Technical traits (each side's tradeoffs)
Two harnesses, two engineering tradeoffs. We compare facts the data supports — no "who's better" verdict.
Context management
Mirror images of "accumulate vs trim per turn" produce two different cost curves.
Tool granularity
OpenClacky has many cohesive purpose-built tools. Pi composes finer-grained light calls across more turns.
Per-turn load
"Light per turn, many turns" vs "Heavy per turn, few turns" — the most visible axis of difference.
Resilience
Neither side hit any failure path on this task. No quality difference.
finish_reason=stop.tool_calls/stop; 1 turn finish_reason empty (no impact on result).Final outputs
Both outputs are structurally identical (same skill template: 11 pages, dual canvas backgrounds, same fonts). Click any thumbnail to open the full deck.
Thumbnails are scaled previews. Best viewed in a desktop browser; both decks include WebGL backgrounds and animations.
Wrap-up
Same skill, same claude-4.7-opus, same prompt. Both delivered structurally identical 11-page decks; visible quality difference is minimal.
The difference is in the process: OpenClacky took the accumulating-context + purpose-built-tools route — 14 requests, 90.7% cache hit rate, $1.18, 2m29s. Pi took the lightweight-context + small-steps route — smaller per-turn prompts, ending at 23 requests, $1.79, 5m42s.
Worth respecting: Pi is a 46.9k-star agent harness platform whose design goals include shareable OSS sessions, self-extensibility, and unified multi-provider APIs — \"lightweight + small steps\" is a natural consequence of that, not a bug. Under prompt-cache pricing, \"lighter per-turn\" and \"higher cross-turn hit rate\" pull against each other.
This is an honest tradeoff comparison: neither side dominates — they fit different scenarios. If you need shareable session datasets, fine-grained step replay, multi-provider flexibility, Pi's engineering value is hard to replace. If you want max cache reuse and cheaper / faster overall runs, OpenClacky's accumulating scheduling wins.
All raw data (OpenRouter CSV, session JSONL, recordings) is on this page. Verify it yourself, and feel free to visit Pi's GitHub for the author's own design exposition.