Saturday, May 23, 2026

AI Inference Pricing: Flat Rate, Usage, Hybrid

“Heavy users” are a recurring issue for any computing service using fixed rate billing.


For consumer or enterprise users, a general rule of thumb is that customers prefer predictable billing, and hence set rates. That might not be an issue if usage is predictable and moderate over time. 


But fixed rate billing can undermine a business model if usage and costs to provision are highly variable (marginal cost is high). 


On the other hand, fixed costs encourage sampling and experimentation, encouraging market growth. 


But there are customer hurdles when usage-based pricing is used. For suppliers, there is:

  • Revenue volatility

  • Usage spikes or dips can meaningfully swing revenue

  • Sales friction. 


For buyers, the issues include”

  • Bill shocks

  • Complex buying process at scale

  • Loss of cost prediction. 


Variable costs match “value” and “cost to supply,” as heavier users pay more. But that comes at the price of cost predictability, which tends to limit adoption. 


Fixed cost pricing (generally with usage allowances)  might include:

  • Per-seat: $X per user/month

  • Per-account: $Y per workspace or tenant/month

  • Tiered plans: Basic, Pro, Enterprise (cost increases with tier)


Metered or usage-based billing might feature prices based on:

  • Per API call: $0.001 per request

  • Per token: $/1,000 tokens in or out

  • Per compute hour: $/GPU-hour or vCPU-hour

  • Per output unit: per generated document, per image, per transcription minute.


The perhaps-obvious compromise are hybrid models including both fixed cost and variable cost charging. For suppliers and customers alike, this provides some predictability of revenue or cost; plus some ability to match usage volume with cost. 


For most suppliers, usage-based billing makes most sense when:

  • You supply infrastructure

  • Usage and costs are highly variable

  • Customers are developers and customers experimenting

  • Cash and margin discipline are top of mind.


Fixed pricing will tend to work better when:

  • AI is a feature in a broader SaaS product

  • Usage per customer is relatively consistent or capped

  • Buyers want simple prices for yearly contracts

  • Model and cloud costs are low and covered at the plan price.


Hybrid appeals when:

  • You want predictable baseline revenue and upside from power users

  • You have meaningful unit costs but strong sales

  • You sell into finance or procurement buyers and need to protect margin.


Hybrid is generally preferable, if only because AI inference costs are highly variable and scale directly with usage. 


Also, as a practical matter, hybrid helps assure some level of ability to plan for recurring revenue magnitudes, per account or per user. 


And most suppliers will sell to consumers, developers, smaller and enterprise-sized customers, some preferring low entry costs; others preferring cost predictability at scale. 


And a reasonable assumption is that usage will climb over time, for virtually every customer set. 


All that noted, many prior computing products have shifted from flat rate to metered, or metered to flat rate. The shift to cloud computing involved another change from “own” to “rent,” with the emergence of subscriptions (flat rate, generally) and end of perpetual license (one payment, upfront). 


And where “subscription” is the product payment structure, flat rate tends to be preferred, where possible. That tends to be the case where costs are not highly variable. 


The issue with AI inference is simply that costs are fairly linear with usage. 


Also, there should eventually be a stronger trend towards "billable units" that align with standard business workflows. In healthcare, that might be cost per-patient, per-test, or per-episode. 


In other businesses, perhaps the shift is to per-user and per-task metrics. There also is much talk of pricing based on outcomes or value, but that always requires clear metrics, always a difficult task. 


Product/Shift

Direction

Effects on Volume/Use Cases

AWS Cloud (2006+)

To usage-based (pay-as-you-go) from traditional on-prem/capex

Massive increase in volume; enabled elastic scaling, burst workloads, startups, and new experiments. Lowered barriers, exponential growth in compute/storage usage.

Adobe Creative Suite to Creative Cloud

Perpetual/one-time to subscription (flat recurring)

Increased customer engagement, continuous updates, higher lifetime value; broader access but some backlash from users preferring ownership. Steady revenue growth.

Microsoft Office/Enterprise Software

Perpetual to subscription (e.g., Office 365)

Higher retention, regular updates; shifted to predictable opex. Increased adoption in teams via per-user flat elements; some optimization of usage.

Mobile Data Plans

Usage-based/metered to flat-rate "unlimited" (then back with caps)

Flat-rate caused surge in data consumption, app innovation (streaming, social); re-introducing usage elements controlled heavy users without killing growth much.

Snowflake/Databricks (cloud data)

Usage-based (compute credits)

Enabled pay-for-query model; grew adoption for variable analytics workloads; high volume in data-heavy use cases with efficiency focus.

GitHub Copilot (2026)

Flat-rate/subscription to usage-based

Ongoing; expected to better align costs with heavy AI coding use; may gate extreme users but improve sustainability and model choice.

No comments:

AI Inference Pricing: Flat Rate, Usage, Hybrid

“Heavy users” are a recurring issue for any computing service using fixed rate billing. For consumer or enterprise users, a general rule of ...