This article explains why traditional SaaS metrics are no longer sufficient for AI-native SaaS companies reporting to investors and offers context for understanding the real impacts of variable usage on your business. We’ll show you a new metric that does work and how to accurately forecast your revenue and give you a roadmap for making the transition to a hybrid AI revenue model, along with some tips on how to protect your margins from the unpredictability in inference costs.
The global AI market size is expected to exceed $390M in 2025, which would amount to a 40% growth since 2024. So far, this has been a year in which the industry has witnessed a tectonic shift in how SaaS companies deliver value.
However, traditional SaaS accounting has lagged behind in its ability to connect the value delivered with the growth and profit realized by the companies that provide it. Classic SaaS growth and profitability metrics like ARR and COGS were designed for predictable seat-based models. They weren’t meant for businesses where the costs associated with incorporating AI features into their products introduce new, highly variable infrastructure costs based on customer usage that is difficult to predict.
In this article, we dive into the limitations of traditional SaaS metrics for companies offering AI-powered features and look how the emergence of the AI era has redefined COGS and changed the way finance teams forecast volatile usage patterns. We offer tips on how to protect your margins from unpredictable inference costs and provide a roadmap for making the transition to a hybrid AI revenue model.
Why traditional ARR falls short for AI-native SaaS companies
Traditional seat-based ARR assumes an almost linear relationship between seats purchased and value delivered. That link doesn’t hold when we look at AI-native SaaS products. For them, value and cost scale with the amount of compute required (tokens, queries, automations, model calls) instead of the number of users.
This is because different users can have radically different AI consumption profiles, which in turn drive inference costs. So, the revenue and cost dynamics of two customers paying the same subscription can vary wildly.
Inference costs include all the ongoing expenses associated with running pre-trained AI models to generate outputs. These costs include the compute (i.e., the specialized hardware, such as graphics processing units or GPUs) plus the memory required to process data, the cloud services needed to host and run the models, and the associated energy costs. Depending on the agreement between the SaaS company and the model provider, these costs may be explicit or instead bundled into a per unit cost.
For a detailed discussion of how inference costs impact SaaS infrastructure costs, see our guide, How to manage AI infrastructure costs in a rapidly evolving AI landscape. But here’s a quick overview of how inference costs are challenging for CFOs to wrap their heads around:
- Variable inference costs don’t play ball with fixed pricing: Inference cost rises with every incremental workflow that your users run, even if your ARR stays flat.
- Per-seat revenue is wildly skewed by power users: For example, Business Insider refers to “inference whales”—customers who “generated $35,000 in inference usage while paying $200 monthly” on an unlimited plan. That’s about 175x realized value/usage and recognized subscription revenue for that month.
- CFO anxiety is rational: And if inference whales weren’t enough to keep a CFO up at night, consider that inference costs that vary so wildly make it extremely difficult to forecast revenue and profit. They know that a seat-only ARR can underestimate the economic potential of heavy users and hide margin risk from their usage.
The new metric: ARR + annualized usage in practice
Several AI-native companies now report a hybrid figure more aligned with reality: annual recurring revenue + annualized usage (ARR+AU). This metric is sometimes called committed ARR plus annualized usage.
Many AI-native finance teams have begun to use this metric—otherwise most companies would either “be underselling or undercounting what [they] actually earn.”
Here’s what goes into ARR+AU:
- Committed ARR: This is the part you already know. It’s the contracted annual subscription revenue based on seats, platform fees, etc.
- Annualized usage: This is the annualized expansion revenue above any included/committed allotments. This is typically computed as a trailing average and scaled to a year.
Here's the basic formula:
ARR+AU = CARR + Annualized usage
To understand how this is calculated, let's look at a company with:
- Committed ARR (based on the number of seats + a platform minimum usage) = $50M
- Revenue from usage exceeding the minimum over the last three months = $1.5M, $2M, $2.5M (respectively)
Based on this information, we start by calculating the trailing three-month average usage (i.e., the usage above the defined platform minimum usage):
Trailing three-month average usage = [($1.5M + $2M + $2.5M) / 3] = $2M/month
Then we annualize the average usage:
Annualized usage = $2M/month x 12 months = $24M
Finally, we add the CARR and the Annualized usage to get our ARR+AU:
ARR+AU = $50M x $24M = $74M
So, the real question is, do you really need to go to the trouble of calculating ARR+AU? The answer is yes, when:
- You have hybrid pricing (subscription + metered AI features) and a significant percentage of customers regularly exceed included credits or quotas.
- AI usage is heterogeneous, and power users drive the majority of value/consumption, making seat-level metrics misleading.
- You want to communicate upside and avoid undercounting revenue from enterprise workloads adopting agents/automations mid-term, without waiting for true-up renewals.
Redefining COGS: What AI costs belong in your cost structure
Traditionally, the marginal cost for SaaS has been near zero. Once you build the product, the cost of delivery is minimal. But AI-native SaaS products have incremental costs for every query, inference, and GPU hour.
Here’s a practical framework to help you determine what belongs in COGS vs. OpEx vs. R&D:
- COGS (Cost of Goods Sold): Typically includes anything directly required to deliver AI-powered service to your customers goes in this category. Inference costs (API calls, GPU cycles, vector DB lookups, etc.), human adjudication and review if it scales with customer usage, along with third-party AI model fees, can all be classified as COGS.
- OpEx (Operating Expenses): These are usually fixed infrastructure and team costs that don’t scale with customer usage. This category mostly includes the costs of DevOps teams, platform engineering, security monitoring, customer support overhead, and sales and marketing.
- R&D (Research & Development): Examples of R&D costs include the cost of model training, experimentation, and fundamental project innovation—per the company’s accounting guidelines.
If you’re ever confused when classifying an AI-related cost, here’s the decision criteria you can use:
- Does the cost scale with customer usage? If yes, it’s COGS.
- Is it tied to product delivery but relatively fixed? If yes, it’s OpEx.
- Is it speculative or future-facing? If yes, it’s R&D.
Remember: Treating inference as COGS can have a major impact on gross margins.
An 80% gross margin can suddenly drop to 40–50% with inference costs factored in. But, it’s still the best way to measure true unit economics because it factors in a real and significant cost associated with delivering your product. And, seeing the real cost of serving your customers is critical to making better pricing decisions as AI usage continues to evolve.
Treating Human-in-the-Loop (HITL) as a first-class cost component
Human-in-the-loop (HITL) means your model doesn’t make decisions by itself. You assign humans to review, correct, or adjudicate model outputs. This means your people moderate content, approve decisions, and verify compliance checks.
HITL costs are included in COGS when they scale with customer volume. More users require more adjudications, which means you need a bigger review team. This team is as essential to delivering service as inference itself.
The June 2025 newsletter from a16z included interviews with CFOs highlighting this shift—some AI companies measure the cost per unit of adjudication teams alongside compute and explicitly put them in COGS. Over time, they track margin expansion as automation improves and reduces the need for human staff.
Third-party model and GPU costs in COGS
An AI-native company’s biggest COGS components are:
- Third-party APIs (e.g., OpenAI, Anthropic, ElevenLabs): Every token, every generation carries a line-item expense (dependent on your region’s accounting policies). These are booked to COGS because they scale proportionally to customer usage.
- GPU compute (self-hosted AI models): Running powerful GPUs for inference comes with huge costs. Whether you own or lease them, per-inference GPU time is a direct delivery cost that belongs in COGS.
These two costs can lead to several margin risks, such as:
- Usage spikes: An increase in the use of automated workflows and AI-heavy features can spike inference costs while seat-based revenue lags.
- Uncapped plans: “Unlimited AI” pricing leaves your gross margin vulnerable to significant shrinkage if you don’t meter or cap.
- Cloud GPU shortages: Spot pricing or surge pricing on GPUs can hit without warning and compress your margins overnight.
Here are a few strategies you might consider to protect your margins against these costs:
- Usage-based pricing: Charge per 1,000 tokens, seconds of audio, inference calls, or anything else that better aligns your revenue with cost.
- Overage billing: Include some usage in the base plan, then charge extra if customers go over.
- Dynamic margin alerts: Define a threshold beyond which you will re-evaluate your price and/or pricing strategy. For example, if inference costs climb over 40% of revenue (your threshold), it acts as a signal to raise prices or revisit contracts.
- Hedging GPU supply: Long-term capacity commitments with GPU manufacturers and providers and cloud service providers can smooth volatility.
The forecasting challenge: Predicting variable AI usage
Revenue forecasting used to be a simpler exercise for SaaS CFOs. Take contracted ARR, apply a churn/growth assumption, and voilà, you’ve got your revenue projections. Now, variable usage drives a huge share of revenue and costs for AI-native SaaS, and the usage is notoriously volatile.
Customers might suddenly launch a new AI-powered workflow or roll out agents across multiple departments and spike inference volumes overnight. Traditional forecasting models simply cannot handle that level of volatility.
However, there are ways to deal with it:
- Model the consumption of tokens explicitly: Build models that translate customer behavior (queries, automations, interactions per seat) into tokens consumed. For example, one customer service agent using an AI assistant might use about 500K tokens/month.
- Look for seasonal patterns: AI tools might also have seasonal spikes in different industries. For example,US Internal Revenue Service (IRS) call centers typically surge during tax season, and companies that offer online tools and services for educators might see heavy usage at the beginning of each new semester. Consider this information in your model.
- Segment by customer type: Power users and casual adopters have significantly different usage patterns. Instead of averaging usage numbers across all segments, create separate cohorts in your forecast based on usage.
- Tie usage to customer growth trajectories: When a customer adds hundreds of users or starts using new AI workflows, their token usage can increase dramatically. Your forecast should account for these abrupt shifts to the extent you can anticipate them.
- Use rolling, dynamic forecasts: Don’t lock in 12-month forecasts. Update them monthly or quarterly based on trailing usage to keep the model fresh and align it more closely with reality.
Investor relations: How VCs value variable usage revenue
Investors love predictability. That’s why ARR has been the north star for SaaS valuations for decades. AI-native companies can seem more complex from an investor’s perspective because a growing slice of their revenue is variable usage that doesn’t fit neatly into ARR.
VCs are rewriting their playbooks to deal with this uncertainty. Let’s see how:
- ARR + annualized usage is the new baseline: Instead of ignoring usage revenue, VCs now ask founders to present the new ARR+AU metric (aka “hybrid run rate”).
- Multiples get tiered by revenue type: Contracted ARR might still command a 10-12x multiple, but VCs are likely to value usage revenue more conservatively (say, 3-6x), depending on volatility and gross margin.
- Durability matters more than labels: For example, a $10M ARR business with 90% gross margins and steady usage growth makes investors feel more comfortable than a $15M “hybrid” business with a 40% month-to-month swing in usage.
If you’re about to pitch investors in the near future, here’s some guidance:
- Show both contracted and usage revenue: Separate variable revenue from your reported GAAP topline. Present ARR and annualized usage as individual values, then combine them into ARR+AU.
- Prove the stickiness of usage: Highlight cohorts that steadily increased their usage 6-12 months. Investors will appreciate this predictability even in “variable” revenue.
- Gross margin proof points: Explicitly state how inference costs and HITL expenses affect your margins. For instance, a $1M usage line with an 80% gross margin is more valuable to investors than $2M with a 30% margin.
- Educate with peer examples: Use examples such as ElevenLabs and RunwayML to show how usage is already being factored into valuation multiples across the market.
Implementation roadmap: Making the transition to a hybrid AI revenue model
Transitioning from traditional ARR to a hybrid AI-native revenue model requires an operational overhaul. Many parts, including billing systems, finance ops, product, and go-to-market, need to all work together.
Here’s a roadmap to help you make your journey easier:
Audit your billing stack
Check to see if your current system can handle metering. Legacy billing platforms assume flat per-seat pricing. AI-native models require granular metering for tokens, API calls, and GPU minutes. Your billing system also needs to integrate with product telemetry to capture actual usage. Without telemetry, it can’t track billing events and revenue.
Define your metrics and revenue model
Discuss with your leaders how you’ll define ARR+AU, what counts as COGS, and which metrics you’ll present to investors. And define thresholds (like base credits and overage triggers) so finance, product, customer success, and other teams are aligned on the outcomes.
Make sure all teams are prepared
Finance needs accurate revenue recognition rules. Product and engineering need to implement usage metering and safeguards against runaway costs. And, sales and customer success need clear playbooks for explaining usage-based pricing. Make sure everyone has the information they need.
Start with a pilot
Start with a small group of customers or a single product line. Monitor margin impact and billing accuracy before a complete rollout. But be prepared to tweak thresholds (such as included credits) after actual usage data comes in.
Set a realistic timeline
If you’re starting from scratch, set aside 3-6 months for the pilot. Over the next 6-12 months, work on the full transition across products and geographies. You might also want to build a buffer for edge cases, like enterprise accounts demanding custom billing integrations.
Communicate
Proactive communication throughout the process is non-negotiable. Educate your internal teams on why gross margins may look messy for a few quarters. At the same time, guide customers and investors through this shift. Aim to build trust through transparency to avoid confusion around spikes in usage-based bills.
Modernize your finance stack to go from ARR to AI-ready
It has become abundantly clear that spreadsheets can’t handle volatility in usage or track hybrid metrics like ARR+AU. AI is not only changing the metrics that CFOs use to measure performance but also how they approach their work. Today's AI-empowered CFOs demand innovative FP&A solutions capable of tracking the fluctuations in the data and with advanced features that quickly extract meaningful insights.
Modern AI-powered FP&A software like Drivetrain can easily tackle these challenges. Once your billing system is metering usage properly, Drivetrain can pull that (and other critical data) into the platform, enabling your team to work from a single source of truth, create custom metrics (like ARR+AU and token cost per dollar of revenue), build accurate forecasts, and communicate effectively with investors and the board.
The AI era requires financial models as adaptive as the technology itself. With Drivetrain, you can stop dealing with chaotic spreadsheets and run the finance function with the same intelligence you’re building into your product.

Book a demo today to learn more about how Drivetrain can enable you to scale your business in an AI world.
Frequently asked questions
HITL costs should be classified as COGS if they scale directly with customer usage.
For example, if every AI decision requires human review or adjudication, HITL costs should be treated as COGS because they’re part of delivering the service, just like inference costs.
This is a standard practice based on insights from the a16z CFO roundtable, where many leaders mentioned they treat HITL as COGS when it’s directly tied to product delivery.
However, if human work is more fixed overhead, like a small QA team validating models or spot-checking outputs regardless of volume, it should be classified as OpEx.
Any cost directly tied to delivering the product to customers, such as inference API fees, GPU compute for production workloads, vector DB queries, and human review that scales with usage, should be treated as COGS.
On the other hand, upfront or experimental spending on model training, fine-tuning, and research that isn’t tied to immediate customer delivery is treated as R&D.
Start by estimating your token usage and looking at your model’s pricing to get an idea of your AI spend. Then, set clear budget thresholds and implement show-back or charge-back models so teams have visibility over the cost impact of their experiments.
Review actuals against forecasts and see if there are major deviations from estimates. If there are, investigate further. At the same time, create a feedback loop between finance and engineering to course-correct quickly when usage spikes or assumptions change.
When usage varies dramatically between customers, forecast based on averages for each customer segment instead of averaging figures for your entire customer base.
Build cohorts (such as high-usage enterprise, mid-tier, light users, etc.) and apply different usage assumptions to each. Then factor in contracted revenue as your stable base and model variable usage revenue on top using scenarios (conservative, expected, and aggressive).
Yes, investors accept ARR+AU as long as it’s presented clearly and consistently. Traditional ARR is still the baseline because it shows contracted, predictable revenue, but with AI and usage-based models, most investors now also look for annualized usage.
To calculate ARR+AU for reporting, normalize the usage revenue of the past 3-6 months and project it over 12 months to make it comparable to ARR. Then report ARR+AU, showing both of them separately.