Natural UIs Become Reality as Microsoft Unleashes TellMe Speech APIs to Developers
Back when Microsoft was still working on Windows Mobile, the company had announced that it was researching new and exciting ways to evolve the user interface on smartphones. We’ve seen the early fruition of the Redmond, Washington software company’s efforts with the Metro UI consisting of hubs and live tiles on Windows Phone 7, but Microsoft wants to tackle other natural user interfaces beyond touchscreens. In the next generation of Windows Phone 7, we’ll see the growing relevance and importance of speech recognition and voice as key areas where Windows Phone 7 will evolve.
Microsoft Voice Command
Microsoft was one of the early players in voice and speech technology on mobile handsets, releasing Microsoft Voice Command for Pocket PC and Windows Mobile. The company’s optional software suite provided a way for users to play music, access programs, dial contacts, and listen to text messages and caller ID with voice and speech. Since then, we’re seeing other platforms evolve and integrate voice, including Google’s voice input on Android, the rumored Nuance voice integration on iOS, and third-party apps like Vlingo.
The company had subsequently acquired TellMe as it was working on Windows Phone 7, and these early demos of the TellMe integration on future Windows Phone 7 devices may be the most powerful instances of speech and voice interfaces on a mobile handset to date.
TellMe on Windows Phone 7 Mango
Rather than just generic voice control, TellMe will allow developers to create personalized voice UI experiences, greeting Windows Phone 7 owners by name, and asking them what they want to do. It’s like having your own concierge or personal assistant, rather than trying to dictate and talk to a mechanical voice or robot.
TellMe on Windows Phone 7 will be able to support voice queries for voice searches, advanced speech-to-text and text-to-speech (SST and TTS) features.
Voice on Tablets Will Become Increasingly Important
As tablets support a different paradigm of text input, voice input and recognition will be an important part of the upcoming Windows 8 slates, according to TellMe’s Ilya Bukshteyn who is the company’s director of sales and marketing. Since tablets often are used independent of mouse and keyboards, and on-screen keyboards aren’t natural yet for many people, having voice may help Microsoft bridge the computing gap for many tablet owners looking at its Windows 8 slates.
Additionally, with robust HTML5 support on Windows 8, web-application developers can begin to leverage HTML tags to enable TellMe’s speech capabilities within their programs and offerings.
It’s About Conversation, Not Dictation
While Android’s and iOS’s speech engines right now are pretty remarkable, the engines still require a form of dictation. Though these commands are simple–like ‘Call John Smith at Work’–you still have to remember these specific commands that are linked to specific tasks. Also, without system-wide third-party app support, some apps and programs may not be able to tap into the full features of voice command on rival platforms. For example, though you can tell those Nuance-driven engines to play music using the default music player, for example, you can’t command it to play a certain playlist in Pandora or another third-party music streamer.
Additionally, with conversation-based commands, the UI of voice will become simpler and consumers can begin talking to their phones like it’s their assistant, rather than trying to control a computer with their voice. The experience becomes more personal, more natural, and more intuitive:
“We see a future where the service will know you: know your intent, your social and business connections, your likes and dislikes, your privacy preferences, and the things that define the context that’s important to you. The result will be a speech NUI service that helps you accomplish everyday tasks in a more natural and conversational manner. This service will simplify tasks that used to be tedious or impossible on a TV or other device, by combining an understanding of language and intent with a deep knowledge of you, the user. We envision a future where we build on the experiences we deliver today with Kinect for Xbox 360, Windows Phone, or Bing for iPad or iPhone apps, by enhancing the speech NUI experience to understand more layers of context: what you are doing, where you are doing it, the kinds of devices you are using and your historical preferences. Because this is a cloud-based service, your interactions will be able to persist over time, enabling you to pick up where you left off, regardless of what device you may be using.”
Microsoft will be leveraging its Bing search to help drive its natural voice interface. According to ZDNet, “As the Tellme team pushes beyond speech recognition and into conversational understanding, scenarios become even more interesting, Bukshteyn said. When CEO Steve Ballmer recently touted the ability of Bing to support complex natural-languge-query commands, he didn’t explain what would make that magic happen. It turns out it’s Tellme’s voice technology, combined with social-graph information delivered via Windows Live, plus Bing’s search functionality. (”Windows Live is a social graph hub for FaceBook, Twitter and LinkedIn,” Bukshteyn explained.)”
Interestingly, it looks like TellMe will take a different approach. Rather than the app-driven approach on rival operating system–opening an app, then doing the task–Windows Phone 7 will be contact-driven. If you want to plan and schedule a dinner with your BFF Joe next week, on Android you’d have to open Yelp, look up restaurant reviews, and then go to the contact apps and email and coordinate with Joe that way. With voice through TellMe on Windows Phone, you can just say, “Arrange dinner with Joe in NYC next Thursday,” and Windows Phone will pull Joe’s information from various address books (LinkedIn, Facebook, etc), figure out which Joe you’re wanting to coordinate with, look up restaurants that you have both ‘liked’ on Facebook using Bing search, and then compare shared calendars to propose some potential times.
Will Voice Win?
TellMe promises to change the phone experience once again. With the iPhone, we’ve changed the smartphone UI from a menu-driven approach with stylus input to a broad graphical approach that’s more conducive to touch. Now, with TellMe, we’re seeing the evolution of the smartphone UI to be voice driven. Given Microsoft’s Metro UI still hasn’t really caught on quite yet, it’s unclear if the company’s TellMe integration will make a big dent on market share.