The price of an AI token has been falling faster than almost any cost in the history of technology. Yet enterprise AI bills keep climbing. The reason is not a contradiction, it is the single most important thing to understand about AI token cost: the price you pay per token is collapsing, but the number of tokens you consume is growing even faster. This guide explains what AI tokens actually cost businesses, why total spend rises as unit prices fall, and how to manage token cost by the variable that matters, which is consumption.
Key takeaways
- Token prices are collapsing. Stanford HAI found the inference cost for GPT-3.5-level performance fell roughly 280x in two years. Per-token prices for a given capability fall on the order of 10x per year.
- Total bills are rising anyway. This is the Jevons paradox: cheaper tokens unlock more usage, and the volume growth outruns the price decline. Goldman Sachs projects token consumption growing about 24x by 2030.
- Reasoning and agentic models are token-hungry. They consume far more tokens per task than chatbots, which is why "the model got cheaper" rarely lowers the bill.
- The frontier tier barely deflates. The newest flagship models tend to launch near the price of the previous flagship, so the most ambitious workloads see little benefit from falling prices.
- Token cost concentrates by team. In Pylon's published breakdown, engineering was roughly 63% of a ~$118K monthly Claude forecast, with a long tail spending almost nothing.
- Manage consumption, not price. Because unit price cannot predict your bill, the only reliable lever is visibility into how many tokens you consume, where, and whether they produce value.
How AI token pricing works
Large language models price by the token, a chunk of text roughly three-quarters of a word. You pay for input tokens (your prompt and context) and output tokens (the model's response), usually quoted per million tokens. Two facts about that pricing matter for cost:
- Prices vary enormously by model. Effective rates can span a 20x range, from a few cents per million tokens on the smallest models to several dollars per million on the frontier tier.
- Prices are falling fast for a fixed capability. The cost to achieve a given level of performance drops dramatically year over year, often cited at around 10x annually, and Stanford HAI measured a 280x drop for GPT-3.5-level quality over two years.
So the natural assumption is that AI is getting cheaper. At the unit level, it is. At the invoice level, for most enterprises, it is not.
Why your token bill rises even as prices fall
This is the Jevons paradox, an economic pattern first observed with coal: when a resource becomes more efficient to use, consumption rises so much that total usage increases even as unit cost drops. Applied to AI tokens, three forces push total cost up at the same time prices fall:
- New workloads become viable. Use cases that were too expensive to justify last year are worth running once tokens are cheap, so more workflows move into production.
- Existing workloads run more often. When a workflow works and is cheap, teams use it far more, and usage scales with success.
- Models consume more tokens per task. Reasoning models and agents generate and process far more tokens per request than older chatbots, looping and re-checking to reach an answer. Goldman Sachs projects total token consumption growing about 24x by 2030.
The result: the per-token price line falls while the monthly bill rises. Falling token prices are not the same as falling token costs. Price is per unit; cost is volume times price, and volume is exploding.
A second trap compounds this. The frontier tier barely deflates. The newest, most capable model usually launches at roughly the price the previous flagship launched at, so teams doing the most ambitious work see little of the savings everyone is talking about.
What AI tokens actually cost a business: the Pylon example
Averages hide the real story, which is that token cost concentrates heavily by team. A useful public example comes from Pylon, whose CEO Marty Kausas shared the company's forecasted Claude spend broken down by department.
Out of roughly $118,000 per month, engineering alone accounted for about 63% (around $75,000). Support followed at 9%, sales at 8%, and founders at 5%, after which spend dropped into low single digits for customer success, marketing, and data, then trailed off to near zero across product, EA, solutions, design, and finance. On a per-person basis the picture shifts again: the data and engineering functions had the highest spend per head, while large support and sales teams spread a smaller per-person figure across many people.
The lesson is not that engineering overspends. It is that AI token cost is concentrated, and concentration is exactly what finance and procurement need to see. When one function carries nearly two-thirds of the token bill, a should-cost conversation with that team is worth more than scrutinizing the other twelve combined. You only get that insight by attributing token consumption to teams. From the blended invoice, every department looks identical.
How to manage AI token cost
Because unit price cannot predict your bill, manage the variable that can: consumption.
- Budget for growth, not the price drop. Assume token consumption rises faster than prices fall, because for most organizations it has. Treat falling prices as a tailwind, not a plan.
- Attribute consumption by team and use case. Find the function carrying most of the token bill and focus there. This requires connecting token usage across models and bundled tools into one view, a spend visibility exercise.
- Set should-cost benchmarks for high-volume workloads so you know what a process ought to cost and can flag drift, rather than reacting at month-end.
- Use the efficiency levers where they fit: route routine tasks to cheaper models, cache repeated prompts, and trim context. These lower cost per call without changing output.
- Connect tokens to value. A team burning many tokens and producing real value is doing what you want; the same token count on low-value work is waste. Token count alone cannot tell them apart, so tie spend to outcomes, the same way procurement learns to prove what its work delivered.
Doing this from a blended invoice or a spreadsheet is nearly impossible, because token spend is fragmented across direct APIs, AI bundled into SaaS, and cloud, and it changes weekly. Suplari approaches token and AI cost as a spend intelligence problem for finance and procurement: consolidate every source of spend, attribute it to teams and use cases, and connect it to the value it produced, so you can manage the volume that actually drives the bill.
The bottom line
AI tokens are getting cheaper per unit and more expensive in total, at the same time. That is not a paradox to resolve, it is the central fact of AI cost in 2026, and it means the price list is the wrong place to look. Your budget lives at the bottom of the invoice, where volume wins. Measure consumption, attribute it to the teams where it concentrates, benchmark the heavy workloads, and tie tokens to the value they produce. Do that and falling prices become a real tailwind. Ignore it, and you will keep reading that AI is getting cheaper while your bill tells you otherwise.
Trying to figure out what AI tokens really cost your business? Suplari is an AI-ready procurement intelligence platform that consolidates and attributes AI spend across every source. Explore spend analytics or read how to increase spend visibility with AI.
