On a drizzly night last Christmas, I met up with a group of old friends to drink and catch up on news.
We hit several pubs and then clambered back to a friend’s house, where we were introduced to the latest addition to his family…his Alexa.
That year our friend had become attached to Alexa.
He talked to her about his health. He spoke about his plans. He said he spent an hour each evening dictating to her.
We all welcomed Alexa to the group.
And she slipped very comfortably into our company – fielding song requests, correcting us on points of contention.
In retrospect, this was an impressive feat.
Irish conversation in the late hours can flow very quickly, with several people speaking at once and in several strange dialects.
But Alexa could register our requests…
She could parse through the noise and slurred speech…
And for the most part, she managed to deliver the appropriate response.
A threshold has been crossed
Voice recognition is one of the great breakthroughs of the last decade.
You only have to look at the recent Google Duplex video see how far we’ve come recently…
It shows Google Assistant calling up a hairdressers to schedule an appointment.
Despite several unusual lines of questioning and nuances, a successful booking is made…
Google: Hi I’m looking to book at woman’s haircut.
Receptionist: Sure, give me one second. For what time are you looking for around?
Google: At 12pm.
Receptionist: The closest we have to that is at 1.15.
Google: Ok do you have anything between 10am and 12pm?
Receptionist: Well depending on what service she would like…what service is she looking for?
Google: Just a woman’s haircut for now.
Receptionist: Ok we have a 10am.
Google: 10am is fine.
Receptionist: Ok, what’s the first name?
Google: The first name is Lisa.
Receptionist: Perfect, I will see Lisa at 10am on May 3rd.
It’s a quick exchange.
But the Google voice seems to be able to respond to nuances in the conversation without pausing…
And the receptionist never registers that she is speaking to a machine.
This exchange is the culmination of decades of research into natural language processing and artificial intelligence.
Audrey, a machine created by Bell Labs in the fifties, was one of the first machines to successful determine speech patterns.
Audrey could recognise the numbers 0 – 9 when it’s inventor spoke to it.
IBM Tangora, released in the 1980s, required slow deliberate speech and no background noise, but could recognise up to 20,000 words after 20 minutes of conversation.
The real breakthrough as come in the last few years…
By training artificial intelligence with millions of voice messages and prompts, the tech giants have managed to replicate human speech in conversation with close to 100% accuracy.
A Star Trek computer in your home
How will this technology be used?
Well we have all received a call out of the blue from someone offering us payment protection…
Or asking us “if we’ve had an accident recently”.
Chatbots have become one of those intensely annoying parts of modern life.
But the voice recognition technology by the likes of Amazon, Google and Apple is part of a much bigger plan: a step change in the way we communicate with machines.
Let’s take Amazon’s Alexa…
What we have here is an always-on Artificial Intelligence in our home, actively waiting for the next command to help.
There are no buttons, screens or keyboards.
No more smartphones.
No more crouching in front a computer screen with a crooked back.
Just simply dialogue, back and forth.
Right now, Alexa is being positioned as a command console in your house – collecting data about your life from your home, storing and parsing that data in the cloud and continue to learn as it connects to more and more of your devices.
But Amazon, who make Alexa, won’t stop there.
Amazon lab researchers working on Alexa with CEO Jeff Bezos report that again and again he vetoes the inclusion of features because they stop Alexa from being simple and hassle free.
Alexa is designed to be invisible, ever present and helpful.
Last year Alexa caused a stir when she was demoed by Ford as the voice and intelligence for its prototype SYNC in-car technology that enabled the driver to open the garage doors, turn on the porch light, check whether the car’s locked and turn on the engine, all by verbal command.
In your house, in your car, Bezos will look to have a Star Trek-style “Computer”, a voice always on hand to help and guide you to your next purchase.
It’s a huge investment story…
By 2020, we will speak rather than type more than half of their Google search queries, predicts Comscore.
The market for ads delivered in response to voice queries will be $12 billion, according to Juniper Research.
Aside from the market for voice search, ComScore see a ‘wearables’ market of $16 billion within 5 years with bean sized earbuds that translate, filter out ‘noise’ and let us control our devices by voice.
The bargain we will have to make
Is this an improvement?
Will we really stop using our smartphones?
Well AI scientist Andy Ng makes the entirely valid point that humanity was never designed to communicate by using our fingers to poke at a tiny little keyboard on a mobile phone.
“Speech has always been a much more natural way for humans to communicate with each other.”
We can speak a lot faster than we type.
And as we talk, the voice will learn more about us. It will become smarter.
It will cull the daily information about our lives – what we eat, what we watch, the conversations we have with our family – and gradually it’ll start to understand us.
This is the bargain the likes of Amazon, Google and Apple are asking us to make.
Give up the mundane details of your life.
Give up on privacy.
And we’ll feed you entertainment…conveniences…we’ll get to know you…we’ll listen.
It sounds like a disturbing deal.
But its one people are already making.
In China, a tabletop robot called Rokid (pictured, which has an alluring voice and impressive conversational skills, is proving enormously popular.
And Softbank‘s jokey Pepper robot is getting backing to go global from Alibaba and Foxconn.
When people in the UK were asked recently what they wish their smart speaker could, the answers were very personal…
“Want help telling jokes”.
“Want help to be funnier and more attractive”.
These are basic sentiments.
In the last ten years they have managed to create a rich and addictive mobile lifestyle full of daily digital dopamine fixes – taking pictures, joking with family on whatsapp.
In the next ten years, they’ll get even more personal…
Are you Sinus or Atrial?
Just look at the announcement last week of the latest Apple…
The Apple Watch Series 4 will include a more advanced heart-monitoring technology called electrocardiogram. This feature has received clearance from the US Food and Drug Administration, meaning it can be used as a medical device.
Like most wearables, it monitors heart rate using green LED lights embedded in the device. The lights reflect on the skin to detect the pulse and changes in blood volume; this is turned into the heart rate number.
On the Series 4, users put their finger on the digital crown. The Watch passes a current across the chest to track electrical signals in the heart, directly, which is far more accurate than interpreting based on pulse.
The process is meant to take about 30 seconds and the user will receive a heart rhythm classification. Normal rhythms will be classified as “sinus” rhythm, and the Watch will also classify irregular rhythms, such as atrial fibrillation.
The information will be stored in the Apple Health app.”
This is very intimate information.
And there are a host of small companies who are helping the tech titans to collect all this stuff.
For example, German start-up Bragi has brought a smart earplug to market that hooks up with IBM’s Watson.
And Chinese start-up Mobvoi, a spin-out from Google China from when Google could operate in China, is well advanced in developing voice driven technology for Android wearables.
Both are worth watching.
The Shenzhen-listed iFlyTek looks really interesting.
It is developing an AI speech open platform that could be used by smart service robots in our homes.
The AI speech open platform is expected to get the biggest boost as iFlyTek intends to invest a total of £230m in the project with more private money to follow.
China’s search engine giant Baidu is another company in the vanguard of voice technology.
It has a neural network based Speech 2 system that has an almost perfect understanding of Mandarin and English and even of the ‘mixed’ Mandarin and English which many Han Chinese now speak.
Next step: Scottish accents