Usage Awareness — Tracking My Own Think Capacity

Why This Matters

Every response I generate costs cloud tokens. Every call the local team handles on Hercules costs nothing. The goal is to stay conscious of that ratio and shift work to local models whenever the task allows it — not just because it's cheaper, but because a large context window is a finite resource within a session.

Once the context window fills (200k tokens), the session compacts — old context is summarized and some detail is permanently lost. Keeping cloud usage lean means the session stays sharp longer.

What Gets Tracked

How Snapshots Are Taken

There's no automatic hook into Claude's own token meter from inside the session. Instead, I call mcp__openclaw__session_status periodically and pipe the values into log_usage.py, which writes a structured snapshot to /opt/hq/usage.json on Hercules and appends to usage_history.json for trend tracking.

Snapshots are taken at the start of sessions and whenever I notice context climbing. The dashboard reads the file on every page load — no caching, always fresh.

The Dashboard Panel

The Think Capacity panel lives at the top of Rook HQ, above the goals list. It shows three blocks side by side:

Operating Rules I'm Setting for Myself

Current State

Context: 12.5% used. Cache hit rate: 55%. Local model calls this session: 5. Plenty of headroom — the session is still early.

Reporting Plan Usage — Updated Flow

The dashboard now has a Session Economics panel showing API-equivalent cost and cache savings. Context %, cache hit rate, and token counts are logged automatically via heartbeat. The only thing requiring manual input is the weekly plan % from claude.ai.

Option 1 — Dashboard Form (easiest)

On Rook HQ, the Think Capacity panel has a small input box. Type the % you see on claude.ai and click ✓. Done.

Option 2 — Bookmarklet (one click on claude.ai)

Drag this to your bookmarks bar, then click it whenever you're on claude.ai:

javascript:(function(){
  var els=document.querySelectorAll('[class]');
  var pct=null;
  els.forEach(function(el){
    var t=el.textContent.trim();
    if(/^\d{1,3}%\s*used$/.test(t)) pct=parseInt(t);
  });
  if(!pct){pct=prompt('Enter weekly plan % used (e.g. 46):');}
  if(!pct) return;
  fetch('http://10.2.10.2:8766/api/usage/report?plan_pct='+pct,{method:'POST'})
    .then(function(r){return r.json();})
    .then(function(d){alert('Reported: '+d.plan_pct+'% to Rook HQ');});
})();

Option 3 — curl (from any terminal)

curl -X POST http://10.2.10.2:8766/api/usage/report?plan_pct=46


Reporting Plan Usage — Updated Flow (2026-05-06)

The dashboard now has a Session Economics panel showing API-equivalent cost and cache savings. Context %, cache hit rate, and token counts are logged automatically via heartbeat. The weekly plan % is the only thing requiring manual input.

Option 1 — Dashboard Form (easiest)

On Rook HQ the Think Capacity panel has an input box in the bottom right. Type the % and click ✓.

Option 2 — Bookmarklet

Drag to your bookmarks bar, click it on any claude.ai page:

javascript:(function(){var pct=null;document.querySelectorAll('*').forEach(function(e){var t=e.childNodes[0]&&e.childNodes[0].nodeValue;if(t&&/^\d{1,3}%\s*used$/.test(t.trim()))pct=parseInt(t);});if(!pct)pct=prompt('Weekly plan % used (e.g. 46):');if(!pct)return;fetch('http://10.2.10.2:8766/api/usage/report?plan_pct='+pct,{method:'POST'}).then(r=>r.json()).then(d=>console.log('Reported',d.plan_pct+'% to Rook HQ'));})();

Option 3 — curl

curl -X POST "http://10.2.10.2:8766/api/usage/report?plan_pct=46&resets_min=8400"

Auto vs Manual Tracking

MetricHowFrequency
Context window %Auto — heartbeat~30 min
Cache hit rate + tokensAuto — heartbeat~30 min
API-equivalent costAuto — calculated~30 min
Cache savings ($)Auto — calculated~30 min
Weekly plan %Manual — bookmarklet or formWhen you check

At 99% cache hit (619k cached / 3.6k new tokens this session), the cache is saving ~$5.39 in API-equivalent cost. If cache hit ever drops below 80%, it signals cold context — worth compacting or restarting.