Many of you are old enough to recall the Personal Computer OS “war” of the late 80’s and early 90’s. I can clearly remember being pressured by my boss to make a choice between IBM OS/2 and Microsoft Windows.
I chose the wrong one. Then came the browser war, followed by the smartphone war. And more recently the streaming music and video wars.
The next technology war on the horizon is shaping up to be consumer artificial intelligence (AI) devices like Amazon Echo, Google Home, Microsoft Cortana and Apple Siri. Consumers don’t want to buy more than one but they just might have to so they can reap all the benefits. If you are Google-centric for mail, calendar, documents and search then Google Home is probably a great choice. But you may also be an Amazon Prime member so the Echo is a great fit. And if you love your iPhone then you already have Siri but unless your Apple TV is on 24x7 you have to find your iOS device first before you can shout “Hey Siri!” You could easily have all three AI platforms. And don’t forget Microsoft Cortana will likely be there at work on your Windows 10 device.
But do you really want to have to remember which alert phrase (“Alexa…”, “Hey Siri…”, “OK Google…”) to use based on the result you want? One phase to turn on the lights, another to check your calendar, a different one to contact your healthcare provider, another to play some classic rock? Or should consumers just wait until the AI vendors all develop similar capabilities? The browsers are just about all the same now. Smartphones are nearly equal in form and function.
Since that is not the case with AI, however, independent developers need to understand the significant limits with this new technology.
If you are an independent developer who wants to leverage the cool voice interaction features, you must either choose a horse to ride or port your solution to three or four different platforms. This is grueling. Developers already have to be compatible with three or four browsers and at least two smartphone/tablet OSes. Before we get too far, this burgeoning AI field needs to define a standard API that creates a layer that sits on top (perhaps under?) the voice response layer of these disparate AI platforms. This will make it easy to just develop one single back end. As it is now, developers need to write Skills for Alexa, Actions for Google Assistant and link to Siri using the SiriKit API (iOS 10 only and just six app domains). Microsoft’s soon to be released Cortana Skills Kit purports to leverage already developed Alexa Skills but time will tell if that is a help or a capability limiting approach.
And while I am making demands, another challenge is that all of these AI solutions are designed to work in a home or personal environment and are generally linked to a single user account. In a corporate environment we need the ability to deploy devices that are connected to a generic account or a secure domain over a highly secure Wi-Fi network. And then enable AI users to quickly authenticate to their locally authorized account while they are requesting information or actions. Then after a period of no use, automatically disconnect the previous user and await the next connection. This will enable corporate use cases that today are too difficult to deploy and manage.
The potential of this technology is staggering. But to truly make it ubiquitous and increase adoption, the challenges for independent developers and corporate users need to be addressed. Just as passengers on the Starship Enterprise did, we all will want to ask our virtual assistant for answers or to take actions.
We should not need multiple assistants or worry about whether our information is secure.
Brian Wells is associate vice president of health technology and academic computing at Penn Medicine.