Interviews, insight & analysis on digital media & marketing

NDA Viewpoints: Voice – from crap wizard to personalised genie

Following  The World Federation of Advertisers launch of its Voice Coalition, NDA is running a series of articles looking at the future of voice commerce, asking industry experts for their views on its opportunities.

By Charlie Cadbury, CEO and co-founder, Say It Now

This year’s Cog X (the Festival of AI and Emerging Technology) provided me with my current favourite quote. Grace Boswood Karthi, COO for the BBC’s Design and Engineering team, noted that we’re in ‘the crap wizard’ phase of voice – meaning that, while we know that our smart speakers can do a lot, we don’t know the ‘spells’, the right words to say to make ‘Alexa’ or ‘OK Google’ do the thing.

For many, the assistants are limited to delivering simple tasks, a ‘glorified switch’ if you like that lets people turn lights on and off and select music to play without having to press any buttons.   It’s a bit like the early days of computing when there was only a blank command line and users needed to know what to type to produce any meaningful results.

However, voice differs in two key ways: rapid adoption (smart speakers are the fastest selling consumer good ever) and peoples’ willingness to try it, meaning it’s being used by the masses – search by voice for example is very much on the rise.

Looking into the not-too-distant future, the platform (Google Assistant, Amazon Alexa, Microsoft Cortana, Samsung’s Bixby) that creates the highly personalised go-to assistant gets to know all about each individual and becomes a trusted advisor, ultimately having the power to nudge each of us towards products and services of its choosing.  This makes it a winner takes all race.

The platforms, acutely aware of their current limitations, are working at breakneck speed to augment core functionality and reduce friction to beat off the competition. 

There is a vast array of challenges to solve, not least the complexity of language, but the major platforms continually progress new support in the form of increasing the functionality and robustness of responses; this is released to the developer community and supported to maturity.   Looking at the Alexa platform for example the following are currently core areas:

Conversations:  This will allow Alexa to deal with much of the nuanced engagement required to anticipate how a conversation will flow between the initial user request and the final outcome delivered.  As a result it will be far easier for a company to create certain functionality when developing a skill.

Name-free skill interaction (NFSI): At the moment using a skill requires asking for it by name,  ‘Alexa, open Talisker Tasting’ (the skill Say It Now’ developed for Diageo) for example. NFSI will allow people to speak more naturally to their assistant, getting tasks completed without having to remember the name of a specific skill.

Cross-skill actions: For more complex tasks, like planning a night out including taxi rides, restaurants and possibly a club it might be necessary to switch between skills; this functionality will enable that.

And the attention being given to voice means the platforms are actually getting pretty good. A recent study (below) shows Google Assistant getting the answer to brand and product category questions (‘What is the longest lasting lipstick?’, ‘What is Brand X / BrandY?’, for example) right nine times out of ten. It’s also interesting to note that Google Assistant performs 11.5% better on a smartphone, where it can build a trusted and personalised view of what the user is trying to achieve, than a smart speaker (Google Home): 

Source: Voicebot.ai

This may all feel a bit ‘space-agey’ — but 20 years ago, who would have thought we’d have entrusted so much of our lives to a personal computer small enough to fit into our pocket?

We’re entering a new era where very, very slowly we will grow to trust our assistants a little more with each successful task completed, gradually delegating increasing amounts of ‘life admin’ to them. They will learn more and more in order to better anticipate our requests and serve our every desire.  From gradual beginnings, growth will then be exponential.

Looking back on the basic assistants we’ll realise we have graduated from crap wizards to being fully enabled by genie-like assistants able to deliver on our every wish. Say It Now is pretty excited about playing a part in creating this magic….