Brand Bakery

The Margin Trap: What AI Features Actually Cost SaaS Companies in 2026

AI inference now consumes 23% of revenue at scaling-stage B2B companies. Cursor reportedly paid Anthropic $650M against $500M of revenue. The 2026 margin math founders are getting wrong.

A

Abhishek

Founder, Brand Bakery

Apr 22, 202610 min read

A venture investor on a January call mentioned, almost in passing, that one of his portfolio companies had watched its software costs triple in 90 days. The cause was not engineering hires. It was not cloud expansion. It was a single AI feature shipped to production in November.

That story is no longer rare. As of April 2026, the median AI-first B2B company spends roughly 23 percent of revenue on inference alone — a line item that did not exist on a SaaS P&L five years ago, and one that has quietly redrawn the economics of building software. The SaaS gross margins of 75 to 85 percent that anchor every comparable-company multiple in public markets are not what AI-first companies are reporting. ICONIQ Capital's January 2026 State of AI report puts the cohort average at 52 percent — up from 41 percent in 2024, but still 20 to 30 points below mature SaaS benchmarks.

The companies posting these numbers are not unprofitable accidents. They are the fastest-growing software businesses in history. That is the point.

The Cursor Math: Anatomy of a Billion-Dollar Negative Margin

Anysphere, the company behind Cursor, crossed $2 billion in annualised revenue in April 2026 — the fastest B2B SaaS company ever to reach that milestone, eclipsing Slack, Zoom, and Snowflake. It is also, by external estimates, one of the largest single line items on Anthropic's enterprise revenue ledger. One investment firm reportedly calculated that Anysphere was paying Anthropic approximately $650 million annually against $500 million of its own revenue — a negative 30 percent gross margin on the inference layer alone.

Anysphere does not appear to dispute the basic shape of the math. Its public response has been to launch a proprietary model in November 2025, integrate cheaper third-party alternatives including Kimi, and project gross margin recovery from 74 percent to 85 percent by 2027. As of April 2026, the company reports positive gross margins on enterprise contracts and continued losses on individual developer subscriptions. The path back to a defensible margin is real — and it requires the kind of capital and engineering depth that most AI startups do not have.

The lesson is not that Cursor was reckless. It is that even the most successful AI product of this generation could not buy its way out of the inference cost problem with growth alone.

The 23 Percent Problem: How Inference Quietly Eats Revenue

ICONIQ Capital's January 2026 report places average inference spend at 23 percent of revenue across scaling-stage AI B2B companies. That figure compounds in two directions at once. First, every product surface that uses AI is consuming variable cost on every interaction — there is no marginal-cost-approaches-zero curve to ride down, the way SaaS companies historically rode compute and storage. Second, the most valuable interactions, the ones that drive retention and expansion, tend to be the most expensive.

Output tokens cost three to ten times more than input tokens. A reasoning-mode call on Claude Opus 4.7 at $25 per million output tokens, or its GPT-class equivalent, can rack up multi-cent costs per request. For a feature that triggers ten times per session on a $30-a-month seat, the unit economics are no longer hypothetical — they are visible in the next billing cycle.

The compounding effect is structural. Cloud hosting and inference together are consuming up to 80 percent of revenue at some early-stage AI companies — a figure that, in a traditional SaaS context, would imply imminent insolvency. In AI, it is increasingly treated as the cost of being in the market.

The LLMflation Paradox: Why Cheaper Tokens Do Not Save You

The optimist's response to all of this is well-rehearsed. Andreessen Horowitz coined the term LLMflation to describe the roughly tenfold annual decline in inference cost for equivalent model performance. By that math, the cost problem solves itself. Wait twelve months and the same feature costs a tenth of what it costs today.

The data does not bear this out. According to a16z's own OpenRouter analysis, the volume of tokens consumed quintupled over the same period that per-token prices dropped to one-third. The net effect: total inference spend went up, not down. This is a textbook Jevons Paradox — efficiency gains in a resource cause demand to rise faster than prices fall, leaving aggregate consumption higher.

In practical terms: every time a model gets cheaper, product teams ship more AI features. Every new feature consumes more tokens. The customer experience gets richer, the cost line item gets larger, and the gross margin does not improve. AWS raised GPU Capacity Block pricing 15 percent in January 2026. Lead times on H100 and H200 clusters now exceed 30 weeks. The compute layer beneath every API call is not getting cheaper at the rate the model layer is.

LLMflation is a real signal. It is not, by itself, a margin strategy.

What Actually Defends Margin in 2026

The companies pulling out of the margin trap are doing four things at once.

First, they are building or fine-tuning proprietary models for their highest-volume use cases. Cursor's own model, Notion's purpose-built classifiers, Perplexity's distilled stack — these are not vanity exercises. They are the only durable way to escape paying retail for inference at scale.

Second, they are aggressive about prompt caching and batched inference. The combination — 90 percent savings on cached input plus 50 percent off batched requests through Anthropic's Message Batches API — can push effective inference cost down by up to 95 percent on the right workloads. Most teams discover these levers six months after they should have.

Third, they are routing intelligently between models. A query that genuinely needs Opus 4.7's reasoning costs $25 per million output tokens. The same query handled by Haiku 4.5 costs one-fifth as much. A well-designed router preserves quality on the queries that need it and recovers margin on the ones that do not.

Fourth — and this is the harder change — they are rebuilding their pricing models. Flat per-seat subscriptions, the foundation of two decades of SaaS revenue, do not survive contact with variable inference costs. The shift toward outcomes-based pricing, usage tiers, and credit systems is not a fad. It is a structural response to the fact that the cost of serving the marginal user is no longer zero.

The Pricing Question Most Founders Are Avoiding

The most uncomfortable conversation in any AI product team in 2026 is about pricing. Per-seat pricing makes the customer happy and the CFO miserable. Pure usage-based pricing makes the CFO comfortable and the customer hesitant. Outcomes-based pricing — where the customer pays only when the AI delivers a measurable result — sounds like the answer until someone has to operationalise the measurement.

The pattern emerging across the most disciplined AI companies is hybrid. A modest base seat or platform fee covers the fixed cost of being in the product. Variable consumption — calls, tokens, agents launched, documents processed — covers the inference layer. Unit economics are managed at the feature level rather than the company level, and pricing is revised quarterly rather than annually.

This is not a model most early-stage teams know how to operate. It requires per-feature cost telemetry, real-time margin dashboards, and the willingness to deprecate features that consistently lose money. Few founders are emotionally prepared for the moment a beloved feature has to be killed because the margin math will not bend.

What This Means If You Are Shipping Your First AI Feature

For the founder weighing whether to put an AI feature into production this quarter, the question is not whether AI is worth shipping. The question is whether the specific feature, at the specific price point, with the specific expected usage pattern, will earn a margin you can defend in front of your next investor or your next board meeting.

Three diagnostic questions are worth answering before any code ships.

What is the per-call cost at the model and quality level you actually need? Not the headline pricing on a marketing page, but the realistic figure with output tokens included and no caching assumed.

How often will the average user trigger the feature in a billing period? Multiply that by per-call cost. Compare to the marginal revenue from that user. If the second number is not at least three times the first, the feature is structurally unprofitable.

What is your fallback if model prices rise rather than fall? Token tax-in is not theoretical. Anthropic, OpenAI, and Google have all adjusted prices upward on specific models in the last twelve months when capacity tightened. The default assumption that inference becomes cheaper every quarter is not a plan.

If the answers to those three questions do not yield a defensible margin, the feature is not ready — regardless of how impressive the demo is.

The companies that will define this decade are not the ones shipping the most AI. They are the ones shipping AI that pays for itself — and the ones honest enough to ship less of it where it does not.

A

About the author

Abhishek

Founder, Brand Bakery

Abhishek is the founder of Brand Bakery — a digital agency that crafts premium brand identities and digital experiences. With a background in full-stack engineering and DevOps, he brings a rare combination of technical depth and design sensibility to every project.

More from Pulse