Alexa, How Will Voice Technology Define 2022?

I was on a podcast recently and the sound engineer told me something impressive

If I missed a word or forgot a detail, they could fix it in post-production with speech synthesis. Just to be clear, that doesn’t mean I would record a new line at home and they’d slot it in. Instead, they can use technology to model my voice and produce sounds, snippets, and words that fit seamlessly into my dialog. I’ve heard it once completed; it’s imperceptible and amazing.

This got me thinking about voice technology more generally.

I can remember a time when you’d spend hours training a dictation program on your voice only to have it mistranscribe every second word, get confused at different intonations, and turn simple writing tasks into day-long ordeals. I used early voice-to-text services that sound like bad 1980s sci-fi movies. And I’ve seen brilliant engineers struggle to teach programs to understand even simple sentences.

Over the past few years, we’ve made huge leaps forward in speech recognition, natural language processing (NLP), and speech synthesis. In 2022, we’re going to see voice technology gain traction in previously unexplored niches and fields. We’ll enjoy unexpected innovations and exponential improvements to the services we’ve already got.

The advancement of technology is notoriously difficult to predict. Just think about how often you’ve read articles on the imminent arrival of self-driving cars (by my count, we should already have fully autonomous vehicles from Tesla, Toyota, General Motors, and Google). Despite the challenge, I wanted to put down some markers and describe three key advances I think are likely to arrive within the next few years.

Tapping the last offline data source

Great managers are always on the lookout for warning signs—falling productivity, increased frustration, that sort of thing. If they see an employee struggling, they’ll step in and offer help. When it works, it’s great.

Managers can catch employees before they burn out and rebalance their workload. But what about when things fall through the cracks? What about when managers are distracted by a dozen different things? In those cases, things get missed.

So how can voice technology help?

If you’re automatically recording, transcribing, and analyzing that employee’s communications, you’d be able to see their burnout coming a mile off. Now it’s a system highlighting potential burnout, not a manager. It’s proactive, rather than reactive.

And this is just one benefit that comes from tapping the last offline data source.

When you normalize recording and analyzing conversations, you unlock a ton of insights. Hearing discovery calls and demos reveals what’s really going on in the sales cycle. Analyzing support and service calls allows you to compute how happy customers are before they tell you. It’s called predictive customer satisfaction. In the same vein, you can calculate predictive churn rates, helping you get in front of unhappy customers before they decide to leave.

That’s just the simple stuff.

When organizations get their hands on this data, they’re going to do some amazing stuff.

Passive voice tech becomes active

Think back to the last time you interacted with a computer via voice. You used a command phrase, right?

● Hey Google, did the Sacramento Kings win?

● Alexa, play the Beatles.

● Siri, call my brother.

All existing digital assistants are passive. They sit on your countertop or on your phone and wait for you to tell them what to do. It’s only when you say the command phrase that they pay attention to the precise words you say next. And those words are usually very simple: What’s the weather? Where’s my Uber? Dim the lights.

Now, compare that to a section of real transcribed speech:

Um, okay. I’ll talk on… I’ll talk on — probably on — like, similar themes. One is real-time transcriptions become prevalent across everything we do. Because most of the stuff I think about is probably on like three themes, right? And we’ve talked a little bit about… Like you’ve got podcasts that have transcriptions, live captions on any videos, that we do recording of any video we do that comes with.

As a human, you can probably understand what’s going on. But a computer? That’s a difficult challenge. When we speak, our thoughts aren’t perfect prose. We jump forward and loop back. We ask rhetorical questions and pose hypothetical examples. There are vocal tics and language quirks. It takes humans years of constant learning to understand speech. Although there’s a really long way to go, we’re starting to get there with machines.

When technology can understand not just simple instructions, but also general speech, it can begin to play a more active role in our work lives.

Say I offer to send a prospect our pricing information during a sales call. Maybe my personal assistant locates the document, drafts an email, and leaves it in my outbox for approval. I’ve not explicitly told it to do anything, but because it can follow conversations, it can step in and help out.

Technology is for everyone

Designing technology for a diverse audience is difficult. People have different expectations, behaviors, goals, preconceptions, and so on. Nowhere is this more true than in speech technology. People speak and sound wildly different depending on their backgrounds, cultures, contexts, and more. Where technology once overlooked differences, now it’s embracing them.

Take Google. In 2019, the search giant launched Project Euphonia, a broad effort to improve its speech recognition models for those with atypical speech. More recently, they announced Project Relate, a new app they claim will help those with speech impairments better communicate with others.

These are all relatively new developments and progress is still slow—but it’s much needed. Technology is for everyone. We should do anything we can to widen its scope and make products and services available to all.


About the Author

Dan O’Connell is the Chief Strategy Officer and a board member at Dialpad. Previously, he was the CEO of TalkIQ, a real-time speech recognition and natural language processing start-up that Dialpad acquired in May of 2018. Prior to TalkIQ, he held various sales leadership positions at AdRoll and Google.

Featured image: ©Made360

Copy link