Product: the smallest thing that proves the bet.
An MVP is not "less of the full product" — it is the smallest experiment that proves the Job-to-be-Done. For Babelio the bet is one sentence: a user opens any app on their machine, presses one key, and translation Just Works in under 800ms. This lesson turns that bet into a four-week build, a single North Star metric, and an aggressive cut list that fits on a sticky note.
why this matters for you
- contextBabelio could spiral into Mac + Windows + web + mobile + voice-clone + team console inside a month — every one of those features is "obvious" in isolation. Cut now, or you ship in 2027.
- riskThe window between the macOS 14+ CoreAudio Process Taps API maturing and Apple shipping a system-wide translate in macOS 16 is ~12 months. Each deferred cut decision burns a week of that runway.
What this lesson does / does not do.
Does
- Define a single North Star Metric for Babelio and explain why it captures value.
- Frame the MVP as a falsifiable experiment, not a feature subset.
- Name the seven screens that ship in v1 — and only those.
- Make the cut list explicit: what is consciously not in v1, and why.
Does not
- Design pixel-level UI or pick a typeface — that lives in the design file.
- Pick a backend stack, STT/MT/TTS vendor, or hosting region — see
research/tech.md. - Set pricing or define the trial flow — that is Lesson 04.
- Choose channels, ads, or creator partnerships — that is Lesson 06.
One number that means the product is working.
A North Star Metric is a single proxy for product value. Not revenue, not signups — the unit of customer-perceived utility that, if it grows, everything else eventually follows.
The North Star Metric is the one number you would optimise if you could only optimise one. It has three properties. It must be a measure of actual usage, not intent — DAU is closer than signups; minutes watched is closer than DAU. It must correlate with revenue across a horizon long enough to matter — six to twelve months, not a week. And it must be hard to game without delivering real value — counting "sessions started" rewards a confused user opening the app five times; counting "minutes successfully translated" only goes up when the product actually works.
The discipline of choosing one NSM is mostly the discipline of refusing the others. Revenue is downstream. Retention is a constraint, not a target. Engagement-without-value is the trap that killed Quibi. Pick the one number that, when it moves, means your customer just had a moment they could not have had without you.
in your startup
- NSMMinutes of audio translated per active user per week (MATAUW). Captures depth (do they use it during real meetings?), retention (weekly active), and value (more minutes = more displaced friction).
- target60 minutes per active user per week by month 3. Equivalent to two 30-min Zoom calls or one foreign-language YouTube session per week.
- why notNot DAU (people open the app, never start translation). Not signups (free trials inflate it). Not revenue (lags by 60 days under a monthly plan). Not "sessions" (gameable by reconnects).
- readIf MATAUW grows, it means real meetings are happening through Babelio. If it stalls under 10 min/wk, the product is a curiosity, not a tool — kill the bet.
The MVP is the experiment, not the product.
A Minimum Viable Product is the smallest build that lets you falsify one specific hypothesis about user behaviour. It is not "the cheap version of the real product" — it is the experiment that decides whether the real product should exist.
Eric Ries' framing is widely misread. "Minimum" does not mean "ugly". "Viable" does not mean "feature-complete-but-small". An MVP is viable if and only if it can produce a clean yes/no answer to a hypothesis you wrote down beforehand. Everything that does not contribute to that answer is excluded — not deferred, excluded. The discipline of an MVP is subtraction.
Write the hypothesis as one sentence with a measurable threshold. If you cannot, you do not have an experiment, you have a project. Then ask: what is the smallest artefact that, when 50 real users touch it, will move the threshold up or down decisively? That artefact is the MVP. Anything beyond it is a guess about what to build for the version of the company that does not yet exist.
in your startup
- hypothesis"A user opens any app, presses one key, and hears a fluent dub of the speaker within 800ms — and comes back next week to do it again." Threshold: 40% Week-2 retention on cohort of 100 paid users.
- scope5 source langs × 5 target langs (EN, ES, ZH, RU, PT), simultaneous voice-dub + subtitle overlay, p50 latency <800ms, auto-volume duck, global hotkey, source-lang autodetect.
- build4-6 weeks on top of the existing prototype. Mac + Windows in one Tauri binary. Three vendor pipelines wired (STT/MT/TTS) with feature flags to swap.
- test50 paying alpha users recruited from the 60-person waitlist + r/remotework + 5 prosumer streamers. Two-week paid trial at $9.99 ends with a retention readout.
One second where the user becomes a believer.
Every product that retains has a single moment — usually under two seconds long — where the user stops evaluating and starts believing. Your job is to name that moment, design the path to it, and remove every obstacle in between.
Slack's moment is the first message that lands in a channel and someone in another room replies in 30 seconds. Figma's is the second cursor appearing on your canvas. Loom's is hitting "Stop recording" and seeing the link auto-copy. None of these are "features" — they are perceptual events that reveal what the product fundamentally is. Users do not buy features; they buy the feeling that the perceptual event made permanent.
If you cannot name the moment, you have not yet found the product. If you can name it but cannot reach it in under 60 seconds from first launch, your onboarding is the product's actual UX problem — not the dashboard, not the settings, not the integrations.
in your startup
- the momentThe user opens their Tokyo standup on Zoom, presses Cmd+Shift+B, and ~600ms after a colleague speaks Japanese they hear a fluent voice in their own language — over a quiet Japanese murmur. No tab juggling, no copy-paste.
- pathInstall .dmg → one OS permission dialog → pick target language → loopback sample plays in user's language → HUD docks → open Zoom → press hotkey. Target: 60 seconds from download to first dub.
- obstaclesPermission dialogs that confuse, audio routing that needs explanation, sample test that fails silently, latency above 1s on the first packet. Each one is an explicit kill-this-step item in the onboarding spec.
- readIf a user reaches the moment, retention probability triples in our analog data (Krisp, Wispr). If they do not reach it in the first session, they almost never come back. Onboarding is the product.
The list of things you will not ship in v1.
A roadmap without a cut list is wishful thinking. The cut list is a public, explicit list of features you have consciously refused to ship in v1, with a one-sentence justification for each. It is the single artefact that protects the four-week build from creep.
Most product timelines slip not because the listed work was hard, but because unlisted work crept in. A cut list reverses the default: instead of "ship these features", it commits to "we will not ship these features, even though they are obvious, useful, and someone has asked for them". Each cut needs a reason — cost, latency, legal exposure, distraction, distribution mismatch. Without a reason, the cut will not survive contact with the first customer who asks for it.
The right test for a cut is the "Friday at 5pm" test. It is Friday at 5pm, week three of the build. A founder, an engineer, and a designer are arguing about whether to add the cut item. The cut list ends the argument in fifteen seconds, because the reason is already written down. If the cut list cannot end the argument, it is just decoration.
No voice cloning in v1
3-5× the cost per minute of TTS, requires explicit consent of the speaker (legal exposure), and undermines the "fast + easy" positioning. Ship a single neutral voice per language; revisit in v2.
No mobile app
iOS does not allow system-wide audio capture. Android is partial and inconsistent. Mobile is a different product on a different bet — defer until the desktop wedge is paid.
No browser extension
Per-tab audio access, no system-wide capture, different distribution funnel. It would dilute the "one button, every app" message. Revisit after 1,000 paying users.
No recording or audio export
Invites copyright and consent complexity. Replay is local-only, last 10 minutes, no export to disk. Keeps Babelio out of the rights-management swamp.
No multi-speaker diarization
"Alice said X, Bob said Y" doubles the latency budget and adds a second model in the hot path. The 800ms hypothesis dies the moment we add it. Land v1 first.
in your startup
- 7 screens inOnboarding · Main HUD · App Picker · Subtitle Overlay · Settings · History/Replay · Tray Menu. Anything not on this list does not exist in v1.
- cut for v1Voice cloning · tone register · mobile · browser ext · recording/export · diarization · team accounts/SSO · admin console · file translation. Each justified above.
- post-itPrint the cut list on a single A4. Tape it above the engineer's monitor. Every "what about X?" Slack message gets the cut-list line number as the reply.
Checklist for this week.
Five concrete actions. By Friday you should be able to recite the NSM in one sentence, name the seven screens without looking, and point at a printed cut list on the wall.
«The MVP is the experiment, not the product.»