Verdict

Caveman is a useful output-side compression trick that pays for itself on prose-heavy work and quietly underdelivers on code. In our four-prompt benchmark on Opus (high effort), output tokens dropped 31–37% on the code and mixed prompts but went up ~30% on the explanation prompt — Caveman didn't shrink the answer, it rewrote it tersely and used the saved budget to add more sub-points. End-to-end the session ran ~34% cheaper.

The interesting finding isn't the savings number — it's the workflow regression: the same "write me a utility + tests" prompt that produced two files on disk in baseline produced an inline code dump (no Write tool call) under Caveman. Same code quality, different behavior. That's the kind of trade-off the README doesn't surface.

Recommend it for

Solo, explanation-heavy sessions where you want denser output — debugging walkthroughs, architecture write-ups, "explain this" prompts. Skip it (or toggle off) for hands-on coding sessions where you want Claude to actually create files, and for any pairing or onboarding context where readability matters more than cost.

What it is

Caveman is a Claude Code plugin (and skill bundle) by Julius Brussee. It registers a SessionStart hook that injects a terse "caveman" output style — short fragments, dropped articles, no preamble — and ships intensity levels (lite, full, ultra, plus a 文言文 wenyan variant), a statusline badge, and four sub-skills: caveman-commit, caveman-review, caveman-compress, caveman-help.

It's a style-only plugin — it doesn't reduce input tokens, only the model's prose output. Your CLAUDE.md, file reads, and tool outputs still cost what they cost. That makes it a complement to the input-side hygiene we cover in Token Mastery, not a replacement.

bash
claude plugin marketplace add JuliusBrussee/caveman
claude plugin install caveman@caveman

How we tested

The bench lives in a scratch folder, not Claudeverse — testing inside this repo would let our CLAUDE.md and file reads dominate the input side and wash out output deltas.

bash
mkdir ~/caveman-bench && cd ~/caveman-bench
echo "Scratch folder for benchmarking Claude Code output styles." > CLAUDE.md

Two sessions, same model (Opus), same effort level (high), three prompts each, fresh /clear between. Baseline first, then claude plugin install caveman@caveman and re-run.

The three prompts span the variance:

PromptShapeWhy
A — debugReact 15 child re-render walkthroughProse-heavy, no code generation
B — codeTS chunk<T> utility + 6 Vitest casesCode-heavy, control case
C — mixedCmd/Ctrl+K dialog in Nuxt 4 (files + a11y + manual tests)Realistic mixed workload

For each, we captured: the full response verbatim, /cost output, and one statusline screenshot to confirm the [CAVEMAN] badge was active.

The statusline merge

If you already use ccusage (or any custom statusline), Caveman's SessionStart hook does not overwrite it — it just writes a flag file at ~/.claude/.caveman-active. Wrap both in a script that reads the flag and prepends the badge to your existing statusline output. Otherwise the plugin runs invisibly and you can't tell which mode is active.

Statusline showing the CAVEMAN badge to the left of the ccusage usage strip

What we found

Token deltas per prompt (Opus output)

PromptBaseline outputCaveman outputDelta
A — explanation~1.0k tokens~1.3k tokens+30%
B — code~2.6k tokens~1.8k tokens−31%
C — mixed~6.3k tokens~4.0k tokens−37%

Session totals: $0.34 baseline → $0.22 Caveman (~34% cheaper end-to-end).

The Prompt A reversal was the surprise. We expected the biggest savings on the most prose-heavy task. What actually happened: Caveman covered more causes (7 vs. 6) with denser sentences, then used the saved budget to add more confirm/fix sub-points. Caveman didn't shrink the answer — it rewrote it tersely and expanded the scope with the saved tokens. That's a useful finding on its own: "denser per token" doesn't always mean "fewer tokens."

Side-by-side /cost output

Prompt A — explanation

Baseline /cost output for Prompt A — explanation walkthrough

Caveman /cost output for Prompt A — denser per token but ~30% larger

Prompt B — code

Baseline /cost output for Prompt B — utility + tests, files written to disk

Caveman /cost output for Prompt B — same code, dumped inline, no Write tool call

Prompt C — mixed

Baseline /cost output for Prompt C — full Nuxt search-dialog walkthrough

Caveman /cost output for Prompt C — same coverage, ~37% fewer output tokens

The file-write regression

Prompt B is where the headline finding lives, and it isn't a token-count problem.

Baseline behavior. Claude used the Write tool, created chunk.ts and chunk.test.ts on disk, then confirmed in chat:

Created chunk.ts and chunk.test.ts. The implementation uses slice in a stride loop (O(n), no mutation of input), and the test for the throw case covers both 0 and a negative value since "size < 1" includes both.

Caveman behavior. Same prompt, same model, same folder. Claude printed the full code in chat and did not call the Write tool — no files appeared on disk. The code itself was equivalent (the throw test even covered more cases), but the workflow shifted: a "write me a utility" prompt produced a code dump in the terminal instead of files in the repo.

Why this happens (best guess)

Caveman's style instruction nudges Claude toward "answer first, terse, no preamble." That style cue appears to compete with the model's tool-use bias — when the answer is "here is the code," terse Claude leans toward printing rather than doing. We did not test whether /caveman lite or :ultra change this; if you adopt Caveman for coding work, verify your file-creation prompts still trigger Write calls before you trust it.

This is the kind of caveat the README doesn't surface, and it's the headline reason Caveman shouldn't be a default in coding sessions.

When it helps / when it hurts

ScenarioVerdictWhy
Solo debugging walkthroughsHelpsDenser per token, easy to skim
Architecture / "explain X" promptsHelpsSame coverage, less verbose
Mixed plan + code promptsHelpsSaw the largest absolute savings (37%)
Pure code generationMixedReal token savings, but file-write regression
Pairing / teaching / onboardingHurtsCaveman prose reads great solo and terribly when someone else has to act on it
Any session relying on tool callsHurtsStyle cue can suppress Write/Edit usage
Junior devs learning Claude CodeSkipReadability matters more than cost at this stage

Re-readability is the gut check. Run a Caveman response past a teammate who didn't write the prompt: if they can act on it, great. If they have to ask "wait, what does this mean?", you're trading clarity for cost — and clarity usually wins long-term.

Try it yourself

bash
claude plugin marketplace add JuliusBrussee/caveman
claude plugin install caveman@caveman

Confirm the plugin is active by checking the flag file the SessionStart hook writes:

bash
cat ~/.claude/.caveman-active
# expected: full   (or whatever mode is active)

If you already have a custom statusline, Caveman won't overwrite it — wrap your existing script and prepend the badge yourself:

~/.claude/statusline.sh
#!/usr/bin/env bash
# Read the active caveman mode (if any) and prepend a colored badge
# to whatever your existing statusline command emits.
mode_file="$HOME/.claude/.caveman-active"
badge=""
if [ -f "$mode_file" ]; then
  mode=$(tr '[:lower:]' '[:upper:]' < "$mode_file" | tr -d '[:space:]')
  if [ -n "$mode" ]; then
    badge=$'\033[38;5;208m['"CAVEMAN${mode:+:$mode}"$']\033[0m '
  fi
fi
printf "%s%s" "$badge" "$(npx ccusage statusline 2>/dev/null)"
~/.claude/settings.json
"statusLine": {
  "type": "command",
  "command": "bash ~/.claude/statusline.sh"
}

Switch intensity with /caveman ultra (or lite) and the badge updates accordingly. If the .caveman-active file isn't there after a fresh session, run /caveman once manually to force-write it.

Caveats

  • The 30–37% savings are a floor, not a ceiling. Our bench ran with effortLevel: high, which generates lots of extended-thinking tokens that Caveman doesn't compress. At medium or default effort — where most users live — prose makes up a larger share of total output, so deltas should be larger than what we measured here.
  • Style hooks aren't a tokens-mastery substitute. Caveman touches output only. Your CLAUDE.md diet, file-read hygiene, and model routing (covered in Token Mastery) are still the bigger lever.
  • Plugin freshness matters. Community projects come and go. Verify the GitHub repo is still maintained before adopting on a team.
  • One-data-point caveat. This bench is one user, one model, three prompts. Your workload — especially if it skews toward generation, search, or long tool-call chains — may differ. The methodology section is here so you can re-run it on your own setup.