Demo · test idea — Babelio is an exploratory concept, not a real product.
b
Babelio Playbook Lesson 05 / 08
2026-05-16
Lesson 05 · Tech

Tech: boring stack, ruthless latency.

For a real-time desktop AI app, the hard problem isn't the model — it's the audio pipeline and the latency budget that decides whether the product feels like magic or like a bad phone line. Pick boring tech for the shell, the backend, the billing, and the auth. Spend all your complexity where it actually matters: the path from a microphone sample to a translated voice in someone's ear.

Duration
18 min read
Format
read + checklist
Goal
Stack / Latency / Scale path
Outcome
A latency budget and an MVP stack you can defend

What this lesson does / does not do.

Does
  • Explain the boring-tech bias and where it does and does not apply.
  • Draw the end-to-end pipeline from microphone to virtual output.
  • Give you a frame-by-frame latency budget with hard caps per component.
  • Sketch the 100 → 10K → 100K user scale path, with the inflection points marked.
Does not
  • Write a single line of Rust for you.
  • Pick your IDE, your test runner, or your branch strategy.
  • Replace the detailed spec in research/tech.md.
  • Cover GTM (Lesson 06) or fundraising (Lesson 07).
01.
Concept 01 · Boring tech > shiny tech

Spend weirdness where it pays.

4 minread

Every startup has a complexity budget. Choosing exotic tech in places that don't matter burns the budget before you reach the part that actually differentiates you.

Boring tech means choosing the option that has been in production for a decade, that hires can pick up in a week, and that does not surprise you on a Tuesday. Postgres instead of a new database. A managed PaaS instead of self-managed Kubernetes. A signed installer instead of a custom auto-updater. The boring choice is rarely the best on any single axis — it is the best on the axis that matters most for a small team: predictability.

The corollary: weirdness is a finite resource, and you should spend it on the part of the product that is the product. For an AI-first desktop app, weirdness belongs in the audio path, the model router, and the per-OS capture code — not in the billing system, not in the auth stack, not in the queue between two HTTP services.

02.
Concept 02 · The latency budget

Latency is a frame-by-frame contract.

5 minread

A latency target is not "fast". It is a budget assigned to each component, with hard caps and known headroom, written down before any code is shipped.

A real-time system has one rule: every component owes the pipeline a fixed number of milliseconds, and when it overspends, the whole product breaks. The budget is not allocated by goodwill — it is computed top-down from the perceived-latency threshold (~700ms for voice to still feel live) and divided across the stages. Each stage has a p50 target and a p95 hard cap. If a component cannot meet its cap, you do not relax the budget. You change the component or split the work.

The most common failure is treating latency as a property of the system rather than a sum of properties of the parts. The cure is mechanical: write the budget on one page, instrument every stage with structured timing, alert on p95 drift, and refuse any feature that does not respect the contract.

03.
Concept 03 · The audio capture problem

OS audio is three different problems wearing one name.

5 minread

Capturing audio sounds like one thing. In practice it is three orthogonal problems — per-OS API, per-process scoping, and virtual output routing — and each has its own version floor, entitlement, and trap.

Desktop audio on macOS and Windows is not symmetric. The APIs differ, the privacy models differ, the version floors differ, and the path for shipping a virtual output device differs by an order of magnitude in effort. Treating "audio capture" as one item on the spec is how teams lose two months. Treating it as three problems with three owners, three test matrices, and three fallbacks is how you ship.

The non-obvious rule: do not invent your own kernel layer in the MVP. Use the official per-process tap APIs that Apple and Microsoft shipped in the last two years; for the virtual output side, ride on existing signed drivers (BlackHole, VB-CABLE) until you have revenue, then ship your own. The order is API → routing → driver, not the other way around.

Don't ship your own kernel extension in MVP

macOS kexts are functionally dead; Apple wants Server Plugins. Windows kernel drivers need WHQL signing, $300+ EV cert, and weeks of approval. Use BlackHole / VB-CABLE until product-market fit.

Don't run audio through the webview

Web Audio API in a Tauri webview adds an IPC hop and 20–40ms of jitter. Keep audio entirely in Rust; the webview only renders the overlay and settings.

04.
Concept 04 · Scale path without YAGNI

Mark the inflection, don't build it.

4 minread

Premature scale architecture is the most expensive mistake an early-stage team makes. The cure is not to ignore scale — it is to mark the inflection points and refuse to build past the next one.

Every system has natural breakpoints — user counts, request rates, or cost curves where the cheap approach stops working and a different architecture becomes ROI-positive. A founder's job is to know where those points sit, write them down, and stay one step ahead — not three. Kubernetes at a hundred users is masochism; Postgres at a million users is debt. Both are failures of the same skill: matching architecture to the actual load curve.

The trick is leaving doors unlocked but unopened. Pick boring components that can be swapped without rewriting business logic. Keep stateless services stateless. Keep your data model normal. When the inflection comes, the change is contained, not a rewrite.

Checklist for this week.

Six concrete actions. By Friday you should have a one-page latency budget pinned above your monitor and a written commitment to not touch Kubernetes until 10K users.

lesson mantra

«Latency is the product.»

— onward to Lesson 06 · Growth
Next lesson · 06

Growth: the wedge channel before the rest.