Why Claude Code tokens run out so fast

Why do tokens in Claude Code run out so fast?

Over the past few days, limits in Claude Code even on the maximum plan have been burning out (by feel) 2-3x faster. Everyone got tense. And on Reddit, they may have found a bug in decompiled code (versions 2.1.74-2.1.83). They write that if you have auto-memory enabled — and it's enabled by default — Claude Code after every message silently fires a parallel API call that duplicates the entire conversation context. Roughly speaking, in a 200K-token conversation you're burning 400K per turn. You can't cancel this call, it's not visible in logs, and on fast sessions it can fire 2-3 times per message.

How to fix: run /memory and disable auto-memory.

Other things I track to stay within subscription limits or not overspend through API:
- CLAUDE.md loads into context in full. If it's over 200 lines, it's burning extra tokens every session. I move extras to .claude/rules/.
- Sometimes it's easier to start a new session. Or press double Escape to return to a previous session phase. That's definitely cheaper than dragging a conversation to compaction (when Claude summarizes the dialogue once the context window fills up — and often loses details).
- Each connected MCP server (GitHub, Jira, Docker) uses tokens before the first request. So I disable what I'm not using in a given session. This especially applies to the desktop app and the Cowork tab.