The Debrief
L7L14L30L90All
PaidSearchIndustryTechDataBrandConversion
Tech · 2 min read8 May 2026

OpenAI Just Launched Three Voice Intelligence APIs. The Conversational Interface Is No Longer Optional.

Three new models dropped: GPT-Realtime-2 for reasoning voice agents, GPT-Realtime-Translate for 70-plus language translation and GPT-Realtime-Whisper for streaming speech-to-text. Voice just became a production-grade marketing channel.

The gap between a voice chatbot and a voice agent is reasoning. OpenAI just closed that gap.

2 min read

OpenAI has released three new voice intelligence APIs that move conversational AI from demo to production grade. GPT-Realtime-2 adds reasoning capability to voice agents, meaning they can think through complex queries mid-conversation rather than pattern-matching from a script. GPT-Realtime-Translate handles real-time translation across more than 70 languages. GPT-Realtime-Whisper provides streaming speech-to-text with accuracy that matches or exceeds existing transcription services.

The three models work together as a stack. A customer calls a support line. Whisper transcribes in real time. Realtime-2 processes the query with reasoning. Translate handles multilingual conversations without a handoff. The entire interaction happens in natural voice with sub-second latency.

70+

Languages supported by GPT-Realtime-Translate for real-time voice translation

For marketing and customer experience teams, the implications are immediate. Voice-based customer service that can actually solve problems, not just route calls, is now buildable at API level. Multilingual campaign support that does not require hiring native speakers for every market is available today. And the transcription layer means every customer interaction generates structured data that feeds back into the CRM.

The pricing model is consumption-based. Input audio tokens and output audio tokens are billed separately. For high-volume customer service applications, the unit economics need modelling. But for marketing use cases like conversational landing pages, voice-enabled product demos and real-time event translation, the costs are likely lower than the human alternative.

Why it matters

Voice has been the missing channel in most digital marketing stacks. Text chat, email, social, paid media and content are all mature channels with established tools and workflows. Voice has been stuck in the IVR era: rigid menus, scripted responses, frustrated customers. These APIs change the capability baseline. A voice agent that can reason, translate and transcribe is a fundamentally different proposition from a phone tree.

The competitive pressure is also real. ElevenLabs, Google and Amazon are all building in the same space. The voice AI market is moving from novelty to infrastructure faster than most marketing teams have planned for.

What to do about it

Identify your highest-volume customer interaction channel. If it involves voice or could benefit from voice, map the workflow against these APIs. Start with transcription (Whisper) as the lowest-risk entry point. Every call your team handles today that is not being transcribed and analysed is data you are throwing away. The reasoning and translation layers come next, but the transcription layer alone changes how you learn from customer conversations.

Share this brief
Send it to a colleague who'll find it useful.
Filip Ivanković
The Debrief / From Filip Ivanković
One every morning. Six months in, you'll see the patterns most don't.
Strategy, benchmarks, and what's actually moving in Australian marketing. Four-minute read. The reps compound.
Filip Ivanković·Founder, New RebellionLinkedIn