#001 May 2026

Latency Is A Trust Problem

tags: voice · AI APIs · product decisions · Gemini · OpenAI
reading time: 3 min

I built Curious Garden — a voice-first AI learning companion for children — in four days for a Google hackathon. The constraint was clear: use the Gemini Live API. So I did.

It worked. I deployed it to Google Cloud Run, recorded a demo, and missed the submission deadline by two minutes because I didn't know I needed a YouTube link. That is a story for another note.

But here is what I actually learned in the weeks that followed — something no benchmark or pricing page will tell you.

During the build, Gemini felt slow. So I tried OpenAI.

When you are building a conversational voice experience, latency is not a performance metric. It is a trust metric.

A slow response in a text chat is mildly annoying. A slow response in a voice conversation feels broken. The user stops believing the system is listening. They repeat themselves. They speak louder. The natural rhythm of conversation collapses and what should feel like talking to a companion starts to feel like leaving a voicemail.

I felt this during the build. The responses had a gap that made the experience feel unnatural. So I tested OpenAI Realtime API.

The difference was immediate. Responses came back fast enough to feel present. Interruptions were handled naturally. The conversation had rhythm.

The problem: OpenAI Realtime API burns tokens quickly and has no free tier. Gemini's free tier is genuinely generous. For a solo founder prototyping at speed, that gap matters. I made a note of both and kept building on Gemini.

Then something unexpected happened.

Weeks after the deadline passed, I opened Curious Garden in a park with a friend. Cellular connection, no special setup, completely casual.

El responded almost instantly. The conversation flowed. My friend was surprised. I was shocked.

The app I had wrestled with during the build — the one that felt sluggish at 2am four days before a deadline — was now performing beautifully in the open air on a phone.

I had one immediate thought: the hackathon.

The Gemini Live Agent Challenge had thousands of participants building simultaneously against the same deadline, all hammering the same API endpoints at the same time. What I had experienced during the build was almost certainly not Gemini's baseline performance. It was Gemini under load — shared infrastructure absorbing the traffic of an entire global hackathon converging on a single submission window.

I had benchmarked an API at its worst possible moment and drawn conclusions from it.

Conclusion

Latency in AI voice apps is not fixed. It is environmental, load-dependent, and will surprise you — in both directions.

The honest comparison between Gemini Live API and OpenAI Realtime API is still worth making — they do feel different under normal conditions, and OpenAI's voice quality and response speed remain strong. But "Gemini is slow" is not the right takeaway. "Gemini under hackathon load is slow" is closer to the truth, and even that deserves more testing before it becomes a conclusion.

For Curious Garden, I kept Gemini. The free tier gave me space to build and iterate without burning money. And the park moment reminded me that real-world performance is the only benchmark that actually counts.

One deliberate UX decision I had made during the build also turned out to matter more than the API choice: visual support cards. Instead of expecting the voice response to carry everything, the interface surfaces spelling cards, number helpers, and definition cards while El speaks. That decision reduces the perceived latency regardless of which API is running underneath — because something useful is already happening on screen.

That was not a workaround. It turned out to be a better product decision regardless of API speed.

The takeaway

If you are building a voice-first app, do not benchmark your API during a hackathon. Do not benchmark it at 2am under deadline pressure either.

Take it to a park. Hand it to a friend. Watch their face.

That is the only benchmark that actually tells you something.