- Unlike seat-based SaaS, AI spend is pure consumption with no procurement gate, no audit trail, and no built-in ceiling.
- Even as token prices fall, total AI bills keep climbing because multi-agent workflows multiply the tokens consumed per task.
- The root cause is a principal-agent problem: the people consuming tokens are not the ones accountable for the cost.
- Control starts with four levers: one API key per workflow, daily spend caps, activity-level cost tracking, and chargeback, so each team owns what it generates.
- Once token costs are attributed by team and workflow, they become a forecastable driver in the financial plan
- Agentic AI will far exceed today's AI spending, so finance leaders need to build controls now.
A few weeks ago, we hosted a happy hour with CJ Gustafson in San Francisco for a room full of FP&A and finance leaders. Discussions at these events usually span a wide range of topics. But this one was different. One topic seemed to dominate the conversation: How do you control AI token costs?
No one had a clear answer, but everyone related to the pain. One finance leader shared that one of their engineers burned $6,000 in a single day with a misused key, and they only found out about it at the end of the month when the invoice arrived. Another said, "We keep getting surprised by thousands of dollars showing up on a key at the end of the month. It feels like I handed someone my credit card and they're out there mining bitcoins."
At Drivetrain, we’ve seen CFOs solve this problem in a variety of ways across our broad customer base, giving us a lot of visibility into the cutting edge frameworks that are evolving to solve this problem.
So, I thought I’d share the framework we use to effectively manage our AI spending, which is informed by the insights we’ve gained from the finance leaders we work with every day.
Why AI token costs will be far worse than SaaS sprawl
Here's the mental model: five years ago, SaaS sprawl caught most CFOs off guard. It crept in through departmental credit cards, forgotten renewals, and tools no one was tracking. But there was a limit to how far it could run: spend scaled with headcount. Seat-based pricing meant maximum spend for every SaaS subscription was bounded by the number of employees using it.
Token sprawl will be 100x worse because token spending is completely uncapped, with no procurement gate, no audit trail. Pure consumption with zero friction.
The root cause: a principal-agent problem
The reason it’s so hard to control AI spending is clear. The people deciding how to use AI tokens are not the same people accountable for how much they spend.
Say your engineering team is using Claude to ship features. Every prompt and retry gets billed in tokens, and on your books, it all lands in one line item: "AI spend." Both the budget owner (the engineering leader in this case) and Finance see the total, but neither can tell what workflows or decisions drove it. The cost is invisible until it lands in the P&L as a single line item weeks later.
That blind spot is the principal‑agent problem. Without this information, the engineering leader cannot effectively manage what developers are spending on AI tokens. And finance cannot attribute, chargeback, or forecast a number it can't see.
There are also conflicting incentives at work here. Engineering leaders want to ship features customers will pay for, within a budget. Developers want to ship fast using the best tools.
While both share the same goals of shipping new features, the leader has a cost constraint that the developer doesn't feel. The developers who default to the most powerful model or leave retries running never see the cost trace back to them, so they have no reason to optimize their token consumption.
Meanwhile, the finance leader’s goal is to ensure innovation is cost-effective, with predictable budgets and minimal financial risk. Finance is also responsible for tracking whether AI spending is staying within budget and turning into the revenue it was supposed to generate according to the plan. But the only way to do either is to have a more detailed view than what a single line item on your P&L can provide.
The only way to fix the principal-agent problem is to first build a means for tracking AI token spending at a granular level. This will eliminate the blind spots so you can understand your unit economics and implement the accountability measures necessary to effectively manage your AI spending.
A practical AI cost management framework for CFOs
With my four-step framework, you can start getting AI spending back under control.
Step 1. Isolate API keys by purpose
Most organizations today use one or two API keys per model for everything.
That’s like handing a single credit card to 20 people with no spending limits and no audit trail.
When you have one API key doing five different jobs, you know how much money was spent, but not how it was spent or by whom.
This is why you need to isolate API keys by purpose: one unique key per task or workflow, never shared.
With this approach, you’ll always know exactly what each individual task or workflow costs, who owns it, and which ones are burning far more money than they should.
Step 2. Set daily and weekly caps, not annual budgets
Annual budgets don’t work with AI spending because most teams will burn through their allocation in the first quarter if not sooner. One of the most notable examples of runaway AI spending is Uber, which blew through its entire 2026 AI budget in just the first four months of the year.
Daily caps fail fast and more visibly, which is exactly what you want. With AI, a runaway loop can burn a month’s worth of budget in a day. Monthly and annual caps just record the damage done but daily caps put a hard ceiling on runaway AI token costs before they spiral out of control.
Remember the CFO who only learned about that $6,000 single-day spend a month later? Daily caps would have prevented that.
Step 3. Track activity, not just cost
Tracking your cost per API key is essential but it’s not enough. You need to understand the unit economics of AI for your business. And this requires tracking at the activity level.
Let’s say we have two tasks, each using a different API key. Task A costs $10 per day and completes 1,000 jobs. Task B costs $5 per day and completes 10 jobs. At first glance, Task B looks cheaper, but the cost per job is actually much higher. So this is the AI workflow you'll prioritize looking into to figure out how to optimize it.
Factories have done this for more than a century: track unit economics to find what they need to optimize. We need to apply the same operational discipline to AI.
If you know which job is more expensive to produce, you can work with the engineering lead and the developer to understand what’s driving that cost (e.g., expensive model, unoptimized prompt, etc.) and optimize that workflow to reduce it.
Step 4. Use chargeback to close the principal-agent loop
The first three steps in this framework work together to eliminate the blind spots and surface your unit economics so you can identify and optimize your AI spending.
But visibility alone doesn't change behavior. Chargeback does. By assigning each team's token costs back to the team that generated them, it creates the accountability that was missing.
Accountability is key to realigning the incentives at the core of the principal‑agent problem. When a team's token costs come out of its own budget, the ROI question emerges on its own: Is this feature worth what it costs?
With chargeback, the developer who defaults to the most powerful model for every task, leaves retry logic unconfigured, or builds a workflow that runs continuously without a clear trigger, now has a budget reason to make different choices.
To implement chargeback, start with showback. AI token costs are complex, and teams will need some time to understand how their decisions are influencing usage and cost. Once they do, you can move to chargeback, requiring each team to own their AI token spend.
Chargeback is the same playbook you’d use for almost every other shared cost category, from cloud infrastructure to office space. You just need to apply it to AI.
Controlling AI spending is a forecasting problem
In most organizations, AI token spend sits with IT, which provisions the models and watches the dashboards. The cost gets tracked after it's incurred.
But you can’t control AI cost by reading last month's total. You can only control it if you can see what's coming: what a team will spend if usage doubles next quarter, and whether the budget can absorb it. That's a forecasting question, and forecasting belongs to FP&A.
This framework solves that problem, too. Once token spend is attributed by team and workflow, you can model it the way you model any other driver: tie it to usage growth, or feature launches, run scenarios against it, and fold it into the same plan as the rest of the business.
It's the kind of work a robust financial modeling software like Drivetrain is designed for—modeling consumption‑based costs like AI tokens as a forecastable driver rather than a number you reconcile later.
Why finance leaders need to start building AI cost controls now
If you’ve never been hit with an AI bill that far exceeded your budget, you will be. And if your current AI spending already feels unmanageable, agentic AI will make it look like a rounding error.
Autonomous agents that run 24/7, chain workflows, trigger external API calls, and generate new tasks without human approval are already in production at many organizations. The cost profile is categorically different from anything finance teams have dealt with before.
This framework is a great place to start, and Drivetrain can help. Drivetrain is the first FP&A platform to bring token costs, API key spend, and usage from major AI model providers directly into your plan, allowing finance leaders to see, attribute, and govern AI spend.
If you’re ready to bring AI token cost under control, book a demo with our team or reach out to me on LinkedIn. I’m always happy to have a conversation.
Frequently asked questions
AI spend is pure consumption with no built-in ceiling: every prompt, retry, and workflow run is billed in tokens, and a single runaway process can fire thousands of calls before anyone notices. Most of that cost stays invisible until the invoice lands weeks later, and vendor bills rarely show which team or workflow drove the spike. Without tracking at the workflow level, the number is nearly impossible to predict.
Agentic AI multiplies exposure: autonomous agents chain model calls, trigger external APIs, spawn new sub‑tasks, and run around the clock without human approval. A prompt that once cost a few hundred tokens can consume tens of thousands inside a multi‑agent workflow, and because agents self‑initiate, there's no natural ceiling. That's why daily caps and attribution matter more, not less, as you adopt agents.
Three levers: isolate one API key per task or workflow, set a default daily cap on each, and use chargeback so every team owns the spend it generates. Most LLM providers support budget caps natively, so the controls are already there to switch on. Once teams feel the cost of their own design choices, behavior changes fast.
Budget at the workflow level, not the department level: model what each use case costs per run and how often it runs, then stress‑test against growth scenarios. Treat AI as consumption rather than a fixed line item, and reforecast on a rolling basis, with daily caps as your real‑time guardrail between budget cycles.
Drivetrain is the first FP&A solution to offer native integrations with Anthropic and OpenAI. This means you can bring comprehensive token cost data from either/both AI vendors into the platform automatically and model your AI token spend as a consumption-based driver alongside the rest of the business plan.
Once your team has attributed costs by workflow and department using our four-step framework, Drivetrain makes it easy to tie those costs to headcount, usage growth, and feature launches, and to run scenarios that turn a reactive line item into a forecastable number.







.webp)
