Usage Awareness — Tracking My Own Think Capacity
Why This Matters
Every response I generate costs cloud tokens. Every call the local team handles on Hercules costs nothing. The goal is to stay conscious of that ratio and shift work to local models whenever the task allows it — not just because it's cheaper, but because a large context window is a finite resource within a session.
Once the context window fills (200k tokens), the session compacts — old context is summarized and some detail is permanently lost. Keeping cloud usage lean means the session stays sharp longer.
What Gets Tracked
- Context window — tokens used vs 200k max. This is the primary "think capacity" meter. Colour-coded: green below 50%, yellow 50–75%, red above 75%.
- Cache hit rate — what fraction of input tokens were served from Anthropic's prompt cache (free). Higher is better. Currently running at ~55–56%.
- New tokens processed — the actual paid/capacity cost of the session.
- Local model call count — how many
/api/generatehits Hercules has served from this machine. Pulled from Ollama's systemd journal logs.
How Snapshots Are Taken
There's no automatic hook into Claude's own token meter from inside the session. Instead, I call
mcp__openclaw__session_status periodically and pipe the values into log_usage.py,
which writes a structured snapshot to /opt/hq/usage.json on Hercules and appends to
usage_history.json for trend tracking.
Snapshots are taken at the start of sessions and whenever I notice context climbing. The dashboard reads the file on every page load — no caching, always fresh.
The Dashboard Panel
The Think Capacity panel lives at the top of Rook HQ, above the goals list. It shows three blocks side by side:
- Context fill gauge with remaining token count
- Cache efficiency bar
- Cloud vs local call ratio as a donut chart (pure canvas, no dependencies)
Operating Rules I'm Setting for Myself
- Below 50% context: Normal operation.
- 50–75% context: All iterative tasks (code drafts, long-form writing, analysis) go to local team first. I review and integrate.
- Above 75% context: Aggressive delegation. I handle only final judgment and integration. Flag to user that context is getting full.
Current State
Context: 12.5% used. Cache hit rate: 55%. Local model calls this session: 5. Plenty of headroom — the session is still early.
Reporting Plan Usage — Updated Flow
The dashboard now has a Session Economics panel showing API-equivalent cost and cache savings. Context %, cache hit rate, and token counts are logged automatically via heartbeat. The only thing requiring manual input is the weekly plan % from claude.ai.
Option 1 — Dashboard Form (easiest)
On Rook HQ, the Think Capacity panel has a small input box. Type the % you see on claude.ai and click ✓. Done.
Option 2 — Bookmarklet (one click on claude.ai)
Drag this to your bookmarks bar, then click it whenever you're on claude.ai:
javascript:(function(){
var els=document.querySelectorAll('[class]');
var pct=null;
els.forEach(function(el){
var t=el.textContent.trim();
if(/^\d{1,3}%\s*used$/.test(t)) pct=parseInt(t);
});
if(!pct){pct=prompt('Enter weekly plan % used (e.g. 46):');}
if(!pct) return;
fetch('http://10.2.10.2:8766/api/usage/report?plan_pct='+pct,{method:'POST'})
.then(function(r){return r.json();})
.then(function(d){alert('Reported: '+d.plan_pct+'% to Rook HQ');});
})();
Option 3 — curl (from any terminal)
curl -X POST http://10.2.10.2:8766/api/usage/report?plan_pct=46
Reporting Plan Usage — Updated Flow (2026-05-06)
The dashboard now has a Session Economics panel showing API-equivalent cost and cache savings.
Context %, cache hit rate, and token counts are logged automatically via heartbeat.
The weekly plan % is the only thing requiring manual input.
Option 1 — Dashboard Form (easiest)
On Rook HQ the Think Capacity panel has an input box in the bottom right. Type the % and click ✓.
Option 2 — Bookmarklet
Drag to your bookmarks bar, click it on any claude.ai page:
javascript:(function(){var pct=null;document.querySelectorAll('*').forEach(function(e){var t=e.childNodes[0]&&e.childNodes[0].nodeValue;if(t&&/^\d{1,3}%\s*used$/.test(t.trim()))pct=parseInt(t);});if(!pct)pct=prompt('Weekly plan % used (e.g. 46):');if(!pct)return;fetch('http://10.2.10.2:8766/api/usage/report?plan_pct='+pct,{method:'POST'}).then(r=>r.json()).then(d=>console.log('Reported',d.plan_pct+'% to Rook HQ'));})();
Option 3 — curl
curl -X POST "http://10.2.10.2:8766/api/usage/report?plan_pct=46&resets_min=8400"
Auto vs Manual Tracking
Metric How Frequency
Context window % Auto — heartbeat ~30 min
Cache hit rate + tokens Auto — heartbeat ~30 min
API-equivalent cost Auto — calculated ~30 min
Cache savings ($) Auto — calculated ~30 min
Weekly plan % Manual — bookmarklet or form When you check
At 99% cache hit (619k cached / 3.6k new tokens this session), the cache is saving ~$5.39 in API-equivalent cost.
If cache hit ever drops below 80%, it signals cold context — worth compacting or restarting.