[proxy] code.claude.com← back | site home | direct (HTTPS) ↗ | proxy home | ◑ dark◐ light

Manage costs effectively - Claude Code Docs

Claude Code consumes tokens for each interaction. Costs vary based on codebase size, query complexity, and conversation length. The average cost is $6 per developer per day, with daily costs remaining below $12 for 90% of users. For team usage, Claude Code charges by API token consumption. On average, Claude Code costs ~$100-200/developer per month with Sonnet 4.6 though there is large variance depending on how many instances users are running and whether they’re using it in automation. This page covers how to track your costs, manage costs for teams, and reduce token usage.

Track your costs

Using the /cost command

The /cost command provides detailed token usage statistics for your current session:

Total cost:            $0.55
Total duration (API):  6m 19.7s
Total duration (wall): 6h 33m 10.2s
Total code changes:    0 lines added, 0 lines removed

Managing costs for teams

When using Claude API, you can set workspace spend limits on the total Claude Code workspace spend. Admins can view cost and usage reporting in the Console.

On Bedrock, Vertex, and Foundry, Claude Code does not send metrics from your cloud. To get cost metrics, several large enterprises reported using LiteLLM, which is an open-source tool that helps companies track spend by key. This project is unaffiliated with Anthropic and has not been audited for security.

Rate limit recommendations

When setting up Claude Code for teams, consider these Token Per Minute (TPM) and Request Per Minute (RPM) per-user recommendations based on your organization size:

Team sizeTPM per userRPM per user
1-5 users200k-300k5-7
5-20 users100k-150k2.5-3.5
20-50 users50k-75k1.25-1.75
50-100 users25k-35k0.62-0.87
100-500 users15k-20k0.37-0.47
500+ users10k-15k0.25-0.35

For example, if you have 200 users, you might request 20k TPM for each user, or 4 million total TPM (200*20,000 = 4 million). The TPM per user decreases as team size grows because fewer users tend to use Claude Code concurrently in larger organizations. These rate limits apply at the organization level, not per individual user, which means individual users can temporarily consume more than their calculated share when others aren’t actively using the service.

Agent team token costs

Agent teams spawn multiple Claude Code instances, each with its own context window. Token usage scales with the number of active teammates and how long each one runs. To keep agent team costs manageable:

Reduce token usage

Token costs scale with context size: the more context Claude processes, the more tokens you use. Claude Code automatically optimizes costs through prompt caching (which reduces costs for repeated content like system prompts) and auto-compaction (which summarizes conversation history when approaching context limits). The following strategies help you keep context small and reduce per-message costs.

Manage context proactively

Use /cost to check your current token usage, or configure your status line to display it continuously.

You can also customize compaction behavior in your CLAUDE.md:

# Compact instructions

When you are using compact, please focus on test output and code changes

Choose the right model

Sonnet handles most coding tasks well and costs less than Opus. Reserve Opus for complex architectural decisions or multi-step reasoning. Use /model to switch models mid-session, or set a default in /config. For simple subagent tasks, specify model: haiku in your subagent configuration.

Reduce MCP server overhead

Each MCP server adds tool definitions to your context, even when idle. Run /context to see what’s consuming space.

Install code intelligence plugins for typed languages

Code intelligence plugins give Claude precise symbol navigation instead of text-based search, reducing unnecessary file reads when exploring unfamiliar code. A single “go to definition” call replaces what might otherwise be a grep followed by reading multiple candidate files. Installed language servers also report type errors automatically after edits, so Claude catches mistakes without running a compiler.

Offload processing to hooks and skills

Custom hooks can preprocess data before Claude sees it. Instead of Claude reading a 10,000-line log file to find errors, a hook can grep for ERROR and return only matching lines, reducing context from tens of thousands of tokens to hundreds. A skill can give Claude domain knowledge so it doesn’t have to explore. For example, a “codebase-overview” skill could describe your project’s architecture, key directories, and naming conventions. When Claude invokes the skill, it gets this context immediately instead of spending tokens reading multiple files to understand the structure. For example, this PreToolUse hook filters test output to show only failures:

Add this to your settings.json to run the hook before every Bash command:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "~/.claude/hooks/filter-test-output.sh"
          }
        ]
      }
    ]
  }
}

The hook calls this script, which checks if the command is a test runner and modifies it to show only failures:

#!/bin/bash
input=$(cat)
cmd=$(echo "$input" | jq -r '.tool_input.command')

# If running tests, filter to show only failures
if [[ "$cmd" =~ ^(npm test|pytest|go test) ]]; then
  filtered_cmd="$cmd 2>&1 | grep -A 5 -E '(FAIL|ERROR|error:)' | head -100"
  echo "{\"hookSpecificOutput\":{\"hookEventName\":\"PreToolUse\",\"permissionDecision\":\"allow\",\"updatedInput\":{\"command\":\"$filtered_cmd\"}}}"
else
  echo "{}"
fi

Move instructions from CLAUDE.md to skills

Your CLAUDE.md file is loaded into context at session start. If it contains detailed instructions for specific workflows (like PR reviews or database migrations), those tokens are present even when you’re doing unrelated work. Skills load on-demand only when invoked, so moving specialized instructions into skills keeps your base context smaller. Aim to keep CLAUDE.md under ~500 lines by including only essentials.

Adjust extended thinking

Extended thinking is enabled by default with a budget of 31,999 tokens because it significantly improves performance on complex planning and reasoning tasks. However, thinking tokens are billed as output tokens, so for simpler tasks where deep reasoning isn’t needed, you can reduce costs by lowering the effort level in /model for Opus 4.6, disabling thinking in /config, or lowering the budget (for example, MAX_THINKING_TOKENS=8000).

Delegate verbose operations to subagents

Running tests, fetching documentation, or processing log files can consume significant context. Delegate these to subagents so the verbose output stays in the subagent’s context while only a summary returns to your main conversation.

Manage agent team costs

Agent teams use approximately 7x more tokens than standard sessions when teammates run in plan mode, because each teammate maintains its own context window and runs as a separate Claude instance. Keep team tasks small and self-contained to limit per-teammate token usage. See agent teams for details.

Write specific prompts

Vague requests like “improve this codebase” trigger broad scanning. Specific requests like “add input validation to the login function in auth.ts” let Claude work efficiently with minimal file reads.

Work efficiently on complex tasks

For longer or more complex work, these habits help avoid wasted tokens from going down the wrong path:

Background token usage

Claude Code uses tokens for some background functionality even when idle:

These background processes consume a small amount of tokens (typically under $0.04 per session) even without active interaction.

Understanding changes in Claude Code behavior

Claude Code regularly receives updates that may change how features work, including cost reporting. Run claude --version to check your current version. For specific billing questions, contact Anthropic support through your Console account. For team deployments, start with a small pilot group to establish usage patterns before wider rollout.