Skip to main content
Back to Blog
Technical8 min read

Understanding AI Agent Token Costs: Input, Output, and Cache

The per-token prices on provider pricing pages are not what you actually pay. What you pay depends on how many tokens you send, how many you receive, and how much of that repeated context the model serves from cache. This article explains the mechanics so you can estimate session costs before they show up on your bill.

What Is a Token?

A token is a chunk of text, roughly 3-4 characters in English. The word "function" is typically two tokens. A line of code like const x = 42; is about 5-6 tokens. A 100-line TypeScript file is roughly 800-1,200 tokens depending on verbosity.

Tokenization varies by model. Claude and GPT use different tokenizers, so the same text produces slightly different token counts. The difference is usually under 10% for code. A practical rule of thumb: 1,000 tokens is approximately 750 words of English text, or about 40-60 lines of code.

Input vs. Output Pricing

Every AI model charges differently for tokens you send (input) and tokens you receive (output). Output tokens are more expensive because they require more computation: the model generates them one at a time, each requiring a forward pass through the neural network.

ModelInput (per 1M tokens)Output (per 1M tokens)Output/Input Ratio
Claude Opus 4$15.00$75.005x
Claude Sonnet 4$3.00$15.005x
GPT-4o$2.50$10.004x
Gemini 2.5 Pro$1.25$10.008x

The ratio matters more than the headline rate. Claude models charge 5x more for output than input. Gemini charges 8x more. A verbose agent that generates long explanations alongside code costs significantly more than one that produces concise output.

What Counts as Input Tokens

In a coding agent session, input tokens include everything the model receives:

  • System prompt. The agent's instructions. For Claude Code, this is substantial: tool definitions, safety rules, and behavioral guidelines.
  • Your prompt. The task you asked the agent to do.
  • Code context. Files the agent reads to understand your codebase. This is often the largest component.
  • Conversation history. In multi-turn sessions, all previous exchanges are sent as context for each new turn.
  • Tool results. Output from commands the agent ran (test results, build output, file contents).

In a typical 20-turn session, conversation history grows with each turn. By turn 20, the input includes 19 previous exchanges. This is why long sessions are disproportionately expensive compared to short ones, even when the task itself is simple.

Cache Tokens: The Discount Mechanism

When the model receives the same input text in consecutive requests, the provider serves it from cache. Cached tokens are charged at a fraction of the input price:

  • Claude: 10% of input price for cache reads
  • GPT-4o: 50% of input price for cache reads

In a multi-turn session, most of the input is repeated context: the system prompt, codebase files, and previous conversation turns. Only the new turn is fresh input. This means effective input costs are much lower than the headline rate.

Example: A session with 100K input tokens per turn, where 90K are cached:

  • Without cache: 100K tokens at $3.00/M = $0.30 per turn
  • With cache: 10K fresh at $3.00/M + 90K cached at $0.30/M = $0.057 per turn

Caching reduces the input cost by over 80% in this scenario. This is why multi-turn sessions are not as expensive as naive token math suggests.

Estimating Session Costs

A formula for rough estimates:

Session cost ≈
  (fresh_input_tokens × input_price) +
  (cached_input_tokens × cache_price) +
  (output_tokens × output_price)

For a 20-turn Sonnet 4 session:
  Fresh input per turn: ~10K tokens
  Cached input per turn: ~90K tokens (grows over session)
  Output per turn: ~3K tokens

  Total fresh input: 20 × 10K = 200K → 200K × $3/M = $0.60
  Total cached: 20 × 90K = 1.8M → 1.8M × $0.30/M = $0.54
  Total output: 20 × 3K = 60K → 60K × $15/M = $0.90

  Estimated session cost: ~$2.04

Cost Reduction Strategies

  • Use cheaper models for simple tasks. Sonnet handles most implementation work. Reserve Opus for complex architecture decisions and difficult debugging.
  • Keep sessions focused. Shorter sessions with less context are cheaper. Start a new session for a new task instead of continuing an existing one with accumulated history.
  • Be selective with context. Sending your entire codebase as context is expensive. Point the agent at specific files.
  • Watch for retry loops. An agent retrying the same approach costs tokens without progress. Intervene early with better context.

Where Styrby Fits

Styrby records input, output, and cache tokens per message and calculates costs using current model pricing. The dashboard shows these breakdowns per session and per agent. Tag sessions by client or project to see where your spend is going.

Ready to manage your AI agents from one place?

Styrby gives you cost tracking, remote permissions, and session replay across five agents.