Voice-first: Is NLP bringing communications full circle?

It may be a long time before we see the full likenesses of “HAL” from “2001: A Space Odyssey”, but technology that can improve the way a businesses operates is already here, writes Craig Walker, Director Cloud Services at Alcatel-Lucent Enterprise.

With the wave of personal assistants, such as Siri, Cortana and Google Assistant, and new startups leveraging AI and analytics to build personal companions, it’s becoming clear we are moving toward a new voice-controlled relationship with technology. As we have already seen in the consumer market, it is all but a given that these voice-activation systems will eventually make it into the enterprise environment, as the potential benefits of these systems could be tremendous in simplifying and automating activities.

Lights, camera, action!

Think how much easier it would be for a physician to just say “System: update Mary Smith’s chart with the following: “Patient experiencing abdominal pain, issue pharmacy order for 200MG of ‘SuperAntiGas’, signed Dr. FeelBetter.” Or in a conference room, instead of the struggle to figure out which remote control puts on the projector and the screen, a simple voice request “System: turn on projector, turn on TV and dim lights.”

The challenges

So, where are we on the road to voice-first? Voice analytics firm, VoiceLabs, has provided a view on the various layers needed to support a voice-first approach in the consumer world. However, to move from the simple consumer-based use cases to providing a more voice-first environment in the enterprise world, a few things will need to happen.

Security will be critical if we are to start having our enterprise systems relying on voice commands – should anyone be able to command critical equipment or systems just by speaking? The answer, clearly, is no. Privacy too is a top concern, and while the physician example above seems simple enough, we need to think about this in context of regulations. Are a patient’s rights – as per HIPAA regulations in the US – violated if these verbal commands expose the patient’s medical information to third parties?

Secure access

We are already seeing the next step of voice recognition systems where the technology is able to support secure access.

Banks are among those introducing voice authentication to their telephone banking systems. While this may leave some customers a little concerned over the security of their account, my feeling is that it will follow the adoption cycle we saw in e-Commerce where the initial concerns for credit card fraud needed to be overcome before we saw the meteoric rise in online purchasing.

We will continue to see continued innovation in voice recognition systems and improvements that will enable voice system security to be viable in an enterprise environment and ensure that only authorised users with the right privileges can perform the associated actions.

And whereas your microwave might not be spying on you, some devices will be always-on, always listening and potentially recording. A few well publicised cases of privacy invasion, commercial espionage or legal jeopardy could stall adoption. This suggests that a big On/Off switch or function needs to be included in voice-first products, so that users may get the benefits without risking the downsides of constant monitoring. Secure software access would also need to be in place in the products to prevent and detect hacking efforts.

Building even more effective voice recognition systems

The first use cases are primarily around voice response systems – whether from a call centre perspective or those implemented in our cars and smartphones. But as many of us know from firsthand experience, this works marginally at best. Recognition and contextualisation need to be refined through technological developments before we can realistically think about enterprise-wide adoption.

Research programmes such as Carnegie-Mellon University’s Sphinx project continue to enhance language recognition capabilities. An Internet Trends report by Mary Meeker indicated that in 2016, Google’s voice recognition system could recognise over five million words with around 90 per cent accuracy – but that’s still not extensive or accurate enough. Is 90 per cent accuracy good enough to interact with a life support system in a hospital or a utility provider’s network?

It’s not just about recognition of the words either, it is about what to do with the words. Here is where cognitive engines and AI come into play. Some of the biggest players in the industry – for example Microsoft, with its open source cognitive recognition engine – can be leveraged to understand the context of the words. “How do I get to Green Park?” may sound simple enough, but it needs to be put into context. Location awareness could indicate you likely mean Green Park in London and assumptions about transportation mode. If you were sitting at Piccadilly Circus, the answer could be, “Take one stop, Westbound, on the Piccadilly line.”, but here we assumed it was Green Park in London and not Green Park in Manchester or Birmingham.

The search for a deeper meaning

The real challenge comes in what is behind the voice recognition systems – both from the integration of the IoT devices to the system itself, and ensuring the commands requested make sense. Here, we need to further leverage those cognitive engines as a check and validation system. Think of someone accidentally giving a command to “Turn off cooling system to reactor 4” instead of reactor 3 – which has already been shut down, or a doctor using the system to prescribe a harmful dose of medication because he accidently said 400 grams instead of 400 milligrams. These might be extreme examples, but there will need to be a holistic view of the actions that are being automated to prevent human error and bring in broader intelligence to understand the actions related to voice-controlled requests. For example, maybe “Turn off cooling system to reactor 4” was correct, but the system would then need to understand the set of operational procedures to implement those actions.

Creating an API platform for true voice integrated solutions

An interesting element that could tie in strategically with the development of true voice-controlled enterprise environments comes from the innovations occurring in the traditional voice communication world. We are seeing the explosion of CPaaS (Communication Platform as-a-Service) in the enterprise, leveraging APIs to transform today’s applications into voice-integrated solutions. Some of the major voice communication vendors are now entering this market, providing CPaaS infrastructures with a standardised set of APIs to enable companies to integrate communications into their business processes.

While we traditionally look at integration as things like incorporating voice and video services into existing applications – think of a banking application that allows you to move from an online application to a voice call with your banking advisor – I believe these will play a big part in that “voice-first” environment by leveraging the rich API infrastructure of CPaaS to communicate to applications and things.

Behind the communications infrastructure requirements, just how CPaaS or other platforms communicate with devices really needs to be standardised before we will see rapid development of voice technology. Each of today’s consumer-based voice-controlled systems have their own interfaces, own API integrations and, as with the historic “Beta vs. VHS” battle from decades ago, may lead to product obsolescence. Just as a consumer doesn’t want to invest in the latest “smart coffee maker” only to find that the platform that controls it was just discontinued, an enterprise wants to ensure that investments they make into new technologies won’t be obsolete before they are able to realise a return.

The best is yet to come

The good news is there are a set of technologies in the works to help minimise potential obsolescence. Frameworks like IoTivity are being developed to build a standardised platform. We are already seeing the value, benefits and rapid expansion of new voice applications for consumers. In the near term, we will see some of the basic use cases move into the enterprise. Longer term, as advances continue to be made in voice recognition, voice security and simplification/standardisation in device connectivity, we will see more and more voice-first activities in both the consumer and enterprise world to help reduce complexity and improve our productivity.