Monetization: the math behind a sustainable $9.99.
Price is not a value judgement — it is a function of three numbers: customer acquisition cost, AI inference cost per active user, and how long the user keeps paying. In an AI product like Babelio, the inference cost is the binding constraint: a heavy user can wipe out the entire $9.99 tier in a single month if you let them. This lesson teaches you to compute the five numbers that decide whether your price survives contact with reality.
why this matters for you
- contextBabelio's AI cost per heavy Pro user (4 hrs/mo) with ElevenLabs is $12.68 — gross-margin negative at $9.99. The model is what tells you which features to gate, where to route cheap users, and when the price stops working.
- decisionUnit economics also decide the deepest founder question: raise or bootstrap. Babelio is bootstrap-able to $10K MRR; a raise only makes sense to launch Studio or enterprise. The numbers, not the founder's mood, make that call.
What this lesson does / does not do.
Does
- Explain tiered freemium and when a reverse trial outperforms a free trial.
- Compute AI cost per user at median and p95 for a real audio pipeline.
- Derive LTV, CAC and payback period from first principles.
- Frame the raise-vs-bootstrap decision as a numeric trigger, not a vibe.
Does not
- Pick your accountant, tax structure or stock-option pool.
- Negotiate term sheets — that's a different skill, hire a lawyer.
- Replace bottoms-up financial modelling at Series A scale.
- Set prices in markets you haven't validated (that was Lesson 03).
Free is the funnel, not the product.
A freemium tier is a customer-acquisition channel disguised as a product. The job is to give away the cheap-to-serve feature and gate the wow-moment behind paid.
The standard structure has three tiers. Free covers the lowest-cost surface area and exists to seed word-of-mouth; Pro is the consumer-priced workhorse where you make your money; a Studio or Team tier captures the long tail of power users at 3–5× the Pro price. Designed well, Free is unprofitable per user but profitable per cohort, because Free users convert and refer.
A reverse trial inverts the usual flow. Instead of free → paywall, every signup gets the Pro experience for 7 days, then drops to Free. Superhuman, Krisp and Notion AI all report 2–3× conversion lift versus a flat free trial — losing the magic feels worse than never having it, and that loss-aversion is the conversion engine. The cost is one week of paid features per signup, which is trivial for cheap-to-serve software and expensive for heavy AI workloads. Size the trial usage cap accordingly.
in your startup
- free$0 · 60 min/mo translated audio, subtitles only. Single TTS preview voice, one language pair. Subtitles cost ~$0.40/user/mo to serve — viable loss leader. Screenshots and clips spread the brand virally.
- pro$9.99/mo (or $79/yr, −34%). Unlimited minutes, voice dubbing, 30+ languages, voice picker, low-latency mode. This is where the wow lives and where the revenue lives.
- studio$29/mo (roadmap Q4). Voice cloning, speaker diarization, 3 team seats, priority infra. Targets creators and small media teams who currently pay Rask AI $50–$120.
- trial7-day reverse trial: every signup gets Pro by default, then downgrades to Free unless they pay. Cap trial usage at 4 hrs to prevent abuse on the dubbing tier.
Five numbers that decide everything.
Unit economics is a five-number system: ARPU, gross margin, retention, LTV and CAC. If you cannot recite all five from memory, you do not yet know your business.
ARPU is the blended monthly revenue per paying user. Gross margin is what survives after you subtract variable cost — for AI products that means inference, hosting and payment fees. Retention is how many months the average user stays before churning; for consumer SaaS, 14–22 months is the realistic band. LTV is the product of all three: ARPU × retention × gross margin. CAC is what you spend to acquire one paying user, all-in: ads, content, influencer seeds, tooling, the part of your founder time that actually moves users.
Two ratios matter. LTV:CAC ≥ 3× means the business has headroom to grow. Payback ≤ 12 months for consumer (≤ 18 for B2B) means you can reinvest revenue into more acquisition without burning a hole. Anything worse and you are subsidising your users — sometimes correct, but only if you know the bill.
in your startup
- arpuBlended monthly+annual Pro: $8.50/mo. Annual at $79 drags headline down but locks LTV.
- cogsMedian Pro user (90 min/mo, Cartesia + caching): STT $0.69 + MT $0.02 + TTS $2.70 = $3.40/user/mo. GM = (8.50−3.40)/8.50 = 60–66%. With Stripe + infra: blended ~60%.
- retentionTarget 18 months (consumer SaaS habit benchmark — Krisp/Otter report 14–22). Conservative assumption.
- ltvLTV = $8.50 × 18 × 0.60 = $91.80.
- cac & paybackCAC target $25–$30 (paid social + influencer). LTV:CAC = 3.0–3.7×. Payback = $28 / ($8.50 × 0.60) = 5.5 months. Both inside the consumer band.
Inference is a ceiling, not a line item.
In traditional SaaS, COGS is roughly flat per user. In AI products, COGS scales with usage and is dominated by inference. That makes price inference-bound — your heavy users can torch the tier even when your median is profitable.
The standard model assumes the median user is the average user. In a heavy-tailed product that's false: the top decile of usage often consumes 5–10× the median. Multiply that by the per-call cost of streaming STT, MT and TTS and the picture changes — a $10 subscription that gives unlimited inference is a put option the user can exercise against you. Once the cost per heavy user exceeds the price, you lose money the more successful the user becomes, which is the worst possible flywheel.
Three structural defences exist. First, gate truly expensive features (voice cloning, dual-stream dub) behind a higher tier whose price covers the worst case. Second, route cheap users to cheaper providers — every dollar of inference spend should have a cheaper substitute the moment quality permits. Third, set a soft fair-use cap high enough that 99% of users never see it but low enough that the long-tail abuser cannot exist.
in your startup
- heavyHeavy Pro (4 hrs/mo, ElevenLabs Flash): STT $1.85 + MT $0.02 + TTS $10.80 = $12.68/user. At $9.99, that's a −27% gross margin. Death if uncapped.
- gateMove voice cloning + speaker diarization to Studio at $29. Pro stays single-voice + standard dub. The wow is still there; the cost ceiling moves.
- routeDefault new Pro users to Cartesia Sonic ($0.03/min) instead of ElevenLabs Flash ($0.05/1K chars ≈ $0.045/min). MT defaults to Gemini Flash for non-nuanced pairs; reserve GPT-4o for idiomatic ones. Saves ~33% on TTS, ~50% on MT.
- capSoft fair-use at 20 hrs/mo on Pro (>99.5% of users untouched). Heavy mode > 10 hrs/mo prompts a Studio upsell in-app.
Don't promise "unlimited" without a fair-use cap
"Unlimited" in marketing copy + heavy-tailed usage = guaranteed margin collapse the month a streamer adopts you. Use "unlimited for normal use" and define the cap in the ToS.
Don't lock your TTS vendor in the contract
Cartesia, ElevenLabs and Deepgram compete on price quarterly. Architect the audio layer behind an interface so you can switch providers in one PR. Vendor lock turns inference-bound pricing into a hostage situation.
Capital is not the goal; it's a tool.
Raising money is not validation. It is buying time you could otherwise have to earn. The right question is not "can I raise?" but "what specific thing do I need cash to do that revenue can't fund?"
Most consumer software at $10/mo is bootstrap-able to first signal. The AI infra cost on the first thousand paying users is a few thousand dollars per month — covered by revenue if your gross margin holds. Marketing in the early phase is content, Product Hunt, and influencer seeds, not paid acquisition. You raise when you hit a step-function expense that revenue cannot bridge: a second segment that needs a sales motion, infrastructure that needs SOC 2, or a geography that needs a local operator.
The discipline is to write the trigger before you have the option. "Raise at $X MRR" or "raise when we hire engineer #3" is a number a founder can defend; "raise when it feels right" is a number a founder regrets. Bootstrapping to clear PMF signal also doubles or triples your valuation when you do raise — investors price uncertainty, and revenue is the cheapest uncertainty-removal you can buy.
in your startup
- bootstrapMonths 0–9: ~$500–$3K/mo burn (no salary, AI on free credits then revenue-covered). Bootstrap-able to $10K MRR by month 12 in the base case (1,150 paying users).
- triggerRaise pre-seed only at $10K MRR + Studio launch (~month 9–12). Use of funds: voice-cloning infra, SOC 2, one sales hire. Target: $500K–$1M at $5–8M post.
- don'tDon't raise to "hire more engineers" — Babelio is a 1–2 engineer product until distribution proves out. Don't raise to "extend runway" — that's a symptom that PMF hasn't landed and money won't fix it.
- expansionSeries A later, only when (a) Studio shows $30K MRR independent of Pro, or (b) one geo (Korea, Japan, LatAm) hits $5K MRR organically and demands localisation operators.
Checklist for this week.
Six concrete actions. By Friday you should be able to recite five numbers about your unit economics from memory and point at the exact MRR trigger that flips you from bootstrap to raise.
«Price is the model talking back.»