Why Claude Code tokens run out so fast

A bug found: auto-memory duplicates the entire context with a background call. Plus tips for saving limits.

Author: Michael Kokin ·

Why do tokens in Claude Code run out so fast?

Over the past few days, limits in Claude Code even on the maximum plan have been burning out (by feel) 2-3x faster. Everyone got tense. And on Reddit, they may have found a bug in decompiled code (versions 2.1.74-2.1.83). They write that if you have auto-memory enabled — and it's enabled by default — Claude Code after every message silently fires a parallel API call that duplicates the entire conversation context. Roughly speaking, in a 200K-token conversation you're burning 400K per turn. You can't cancel this call, it's not visible in logs, and on fast sessions it can fire 2-3 times per message.

How to fix: run /memory and disable auto-memory.

Other things I track to stay within subscription limits or not overspend through API:
- CLAUDE.md loads into context in full. If it's over 200 lines, it's burning extra tokens every session. I move extras to .claude/rules/.
- Sometimes it's easier to start a new session. Or press double Escape to return to a previous session phase. That's definitely cheaper than dragging a conversation to compaction (when Claude summarizes the dialogue once the context window fills up — and often loses details).
- Each connected MCP server (GitHub, Jira, Docker) uses tokens before the first request. So I disable what I'm not using in a given session. This especially applies to the desktop app and the Cowork tab.