Market: how to read the rings.
Markets are not single numbers. They are nested rings — total opportunity, the slice you can reach, and the wedge you can actually win in three years. This lesson teaches you to read those rings, treat competitors as evidence that the problem is real, and find the empty quadrant on a 2×2 map. We apply each idea to Babelio's space: AI dubbing, speech-to-speech translation, and consumer voice-OS overlays.
why this matters for you
- contextBabelio sits inside three growing markets at once — AI dubbing ($1.15B, 17.7% CAGR), speech-to-speech translation, and the voice-OS overlay category that Wispr Flow proved fundable at $700M. You need to know which ring funds which kind of story.
- riskApple, Microsoft and Google all moved in 2025 — AirPods Live Translation, Teams Interpreter, DeepL Voice. Mis-size the market and you'll either pick a fight you can't win or miss a wedge sitting in plain sight.
What this lesson does / does not do.
Does
- Teach the TAM / SAM / SOM frame and how to sanity-check each ring.
- Show how to use competitors as validation, not threat.
- Read a 2×2 positioning map and find empty quadrants.
- Translate "market trends" into specific founder decisions.
Does not
- Hand you a finished investor-pitch market slide.
- Cover audience psychographics — that is Lesson 02.
- Price a category (Lesson 04) or pick channels (Lesson 06).
- Replace primary research with paid analyst reports.
Three rings of the same market.
TAM, SAM and SOM are not three different markets — they are three views of the same market at different distances. Confusing them is the most common pitch-deck mistake.
TAM is total addressable market: every dollar that could be spent on this category by everyone, everywhere, if there were no friction. It tells you whether the problem is large enough to matter. SAM is the slice you can serve with your current product, geography and channel — same problem, narrower lens. SOM is what you can realistically capture in three years given your team, money and execution speed.
The mistake is using TAM as a proof of seriousness ("$100B market!") and SOM as if it were guaranteed. The correct reading is inverse: TAM proves the category isn't a hobby; SOM proves you've done the maths. A founder who can't compute SOM bottom-up is asking you to trust their vibes.
in your startup
- TAMAI in Language Translation umbrella: $2.94B (2025) → $3.68B (2026), 25.2% CAGR. Language Translation Software adjacent: $116.55B by 2035.
- SAMSpeech-to-Speech Translation: $481.61M (2025) → $1.19B by 2030 (9.5–10.1% CAGR). AI Dubbing Tools adjacent: $1.15B → $2.56B (17.7% CAGR).
- SOM Y3Bottom-up: 32.6M US remote workers × 15% multilingual × 5% reachable × $8/mo × 12 = ~$23M ARR US alone. Add streamer + student segments → ceiling ~$50M ARR, conservative target $5M ARR Y2.
- implicationYou sit in two double-digit-CAGR rings simultaneously. Story is "consumer voice-OS layer", anchored in the AI-dubbing ring for pitch and the speech-to-speech ring for technical depth.
Competitors are your audit, not your enemy.
A market with no competitors is usually a market with no customers. Competitors are third-party evidence that the willingness-to-pay you assume actually exists.
Read competitors in three layers. Direct: same job, same channel — they reveal your real price ceiling and feature floor. Adjacent: same job, different shape — they teach you which assumptions to challenge. Substitutes: not products at all — the hack people use today (a side tab with Google Translate, a teammate translating inline). Substitutes are the truest measure of pain.
The worst category is one with many adjacents and zero direct — it usually means the problem is too narrow to fund a dedicated tool. The best category is one with several well-funded direct competitors all attacking from one side and ignoring another. The ignored side is your wedge.
in your startup
- directDeepL Voice ($8.74/mo, $2B valuation) — best MT, but app-locked to Teams/Zoom/mobile. Microsoft Teams Interpreter — Teams-only, enterprise-bundled. Krisp ($37.7M revenue, S2S in 2025) — system-audio, but pivoting B2B call-center.
- adjacentHeyGen (~$100M ARR) and ElevenLabs ($3.3B val) — async dubbing for video. Otter.ai ($100M ARR, 35M users) — captions, English-centric. Wispr Flow ($700M val, Nov 2025) — proves OS-overlay is fundable.
- substituteThe real competitor: a Chrome tab with Google Translate + manual copy-paste, plus "asking a teammate to repeat".
- readNo one occupies system-wide × voice-dub × consumer. DeepL/Teams are app-locked; HeyGen/Rask are async; Krisp is B2B. Wispr proves OS-overlay distribution. Combined, that is your fundability story.
Find the empty quadrant.
A 2×2 is not a marketing artefact. It is the cheapest tool for seeing which orthogonal axis everyone else is ignoring.
Pick two axes that are independent — not correlated with each other and not a restatement of price-vs-quality. Plot every direct and adjacent competitor. The valuable output is not where you sit; it is the quadrant that stays empty. An empty quadrant is either a wedge (no one has solved it yet) or a graveyard (no one wants it). The interview script in Lesson 02 tells you which.
Two warnings. First, do not invent axes to put yourself top-right; investors see this in seconds. Second, an empty quadrant is a hypothesis, not a moat — Lesson 06 explains how to defend it before Big Tech notices.
in your startup
- axesX: per-app scope ↔ system-wide. Y: subtitles ↔ voice dub. Both axes describe real architectural choices, not vibes.
- filledLower-left: Otter, Chrome Live Caption (per-app captions). Lower-right: Krisp, Wispr (system-wide but no translate). Upper-left: HeyGen, DeepL Voice, Teams Interpreter (voice dub but app-locked).
- emptyUpper-right: system-wide voice dubbing for consumers. Babelio's lane. No incumbent has it as of 2026-05.
- windowApple macOS 16 system-wide translate is rumoured for WWDC. Your wedge has a ~12-month moat window; ship inside it.
A trend is a decision in disguise.
Most "trends" slides are wallpaper. The useful trick is to convert each trend into a specific decision you make differently because of it.
For every trend you write down, finish this sentence: "because of this, we will do X and not Y." If you cannot finish it, the trend is decoration. Three buckets matter: technology curves (what becomes possible), consumer behaviour (what becomes habitual), and regulation (what becomes mandatory). Each demands different proof — benchmarks for tech, surveys for behaviour, statutes for regulation.
Founders fail at this in opposite directions. They either invoke trends abstractly ("AI is changing everything") or worship one trend so hard they ignore the others ("on-device latency dropped, therefore we're a winner"). The market sits at the intersection of all three. So does your decision.
in your startup
- techThree latency curves crossed in 2025: STT <300ms, MT <200ms, TTS <200ms — total <700ms end-to-end. Decision: ship now, before the window normalises.
- osmacOS 14+ CoreAudio Process Taps + Windows 11 WASAPI per-process loopback. Decision: no kernel extensions, no driver install — ship a 3-10MB Tauri binary, not 80MB Electron.
- behaviour52.9% of EU enterprises run remote meetings; Gen Z consumes ~30% foreign-language content. Decision: consumer wedge first, not enterprise sales cycle.
- regEU AI Act Article 50 lands Aug 2026 — synthetic voice must be labeled, fines up to 7% revenue. Decision: watermarking + UI disclosure from day 1, not retrofitted later.
Wedges are not moats yet.
A gap is what you find on the 2×2. A moat is what keeps the gap closed behind you. They are different objects with different lifecycles.
Wedges expire. The minute your gap is visible — through a Product Hunt launch, a TechCrunch piece, an investor announcement — the clock starts. Moats grow inside that window: distribution lead, integration depth, data flywheel, brand. A founder who confuses "no one has done this" with "no one can do this" is borrowing a moat they have not earned.
The honest read on most software moats is brutal — model quality is commodity, APIs are interchangeable, and design is copyable. What is not copyable in twelve months: cumulative OS-integration work, a base of paying habits, and a creator community that names you as the default.
Don't claim "AI moat" if your AI is GPT/Gemini/Claude APIs
Investors will read it as naïve. Foundation models are commodity in 2026 — your moat is the audio routing, OS integrations, per-app glossaries, and creator distribution you build on top.
Don't pick a wedge that's also Apple's roadmap
macOS 16 is rumoured to ship system-wide translation. If Apple ships it Apple-only at WWDC, you need a Windows-first counter-position and an obvious "works everywhere Apple doesn't" story already in market.
in your startup
- wedgeOS-wide audio capture + auto-mute + voice dub at $5–$15/mo consumer price. Empty quadrant on the 2×2. None of DeepL, Teams, HeyGen, Otter or Krisp combine all three.
- moatThree candidates from the research: (1) audio-routing IP (auto-duck + dub mix), (2) latency optimisation across STT→MT→TTS, (3) prosumer distribution before Big Tech ships consumer equivalent.
- analogKrisp proved system-wide audio enhancement scales to $37.7M revenue bootstrapped at $8/mo. Same architecture, 10× larger problem. Use this story directly.
Checklist for this week.
Six concrete actions. By Friday you should be able to recite five numbers about your market from memory and point at one empty quadrant on a 2×2 you drew yourself.
«A market is a rumour about money. Your job is to verify it.»