You know that feeling when you’re on a customer service call and you just know the person on the other end is reading from a script? Well, that script might soon be replaced entirely not by a better-trained human, but by a very smart machine that can actually tell when you’re frustrated.
On March 26, Google and Cohere both dropped new audio tools on the same day and together they cover two very different problems people deal with all the time.
Google’s side of the story
Google launched Gemini 3.1 Flash Live a model built specifically for voice interactions. Think of it as the brain behind an AI that can pick up a customer service call, understand what you’re saying, figure out your mood and actually respond in a helpful way.
What makes it stand out from older voice tools is that it doesn’t just hear words it picks up on frustration and confusion too. So if you’re annoyed, it adjusts. It can also look at images you send, which means if your wifi router is acting up, you could snap a photo and the AI would help you troubleshoot it on the spot. Google says it scored over 90% on a tough industry benchmark a nearly 20-point jump from their previous model.
This is already powering Gemini Live and Google’s Search Live feature under the hood. If you’ve noticed Gemini getting noticeably better at holding longer conversations, this is the upgrade responsible for that.
Cohere’s side of the story
Cohere went in a completely different direction. Their new model, Cohere Transcribe, does exactly one thing it listens to speech and turns it into accurate, clean text. No chatting, no voice agents. Just transcription, done really well.
It beat out competitors including ElevenLabs Scribe and IBM’s model on a public leaderboard, landing an average word error rate of just 5.42%. In plain terms out of every 100 words it hears, it only gets about five wrong. It handles 14 languages and can chew through over 500 minutes of audio in just one real-time minute.
The biggest win for businesses? It’s fully open-source and free. Teams that rely on AI-powered transcription for meetings, interviews, or call logs can run it on their own servers without paying per minute. Cohere plans to integrate it into their North platform, which helps companies search documents and automate repetitive tasks.
Why does any of this matter?
When Google, Cohere launch new audio AI models on the same day, it’s a clear signal that audio is the next big battleground in the AI race. Voice is how most of us naturally communicate and companies are moving fast to make sure machines can keep up. Whether it’s a voice agent handling customer calls or a tool quietly turning your team meeting into a clean text document the gap between human and machine listening is closing faster than most people realize.