Voice-first design

Voice-first? Insights from testing multi-mode visual and voice interfaces

Voice-first design

2018 is the year ‘over-the-top’ (OTT) devices broke through into the mainstream with most high-speed connected homes streaming video to their TV sets.[1] While streaming content has become the norm, we’re still very early in determining the best interaction models to connect users to their content. Today’s consumers are plagued by experiences ill-suited for the task at hand; anyone who has had to type their username and password with a game controller or remote can tell you that.

It was precisely these negative shared experiences that made our Fell Swoop UX team so excited to take on a Fire TV app design project melding entertainment and commerce features. With over 70% of US adults using a second screen while watching TV, [2] it seemed natural to blur the lines; fusing content and commerce onto a single screen.

Amazon, not surprisingly, is at the forefront of development in the space. And while their new Watch & Shop platform shows promise, it’s still quite new and leaves a great deal of room for improvement; especially for brands that want to offer a unique and ownable experience. The key question we had going into this was how to subtly blend entertainment with commerce within a Fire TV app that was not only useful, but also engaged users in a way that could convert a sale. With this challenge in mind, we embarked on a classic user-centered design process to seek answers.

Voice design starts with a pencil

To get started, we conducted a rapid ideation workshop (we call them Clarity Labs) exploring a variety of methods for delivering a great user experience. We typically go ‘blue sky’ during these sessions; generating a broad array of ideas with hand-sketches, user stories, and occasionally scripts and story boards. It was during this phase that we began exploring how voice interaction within the Fire TV app could not only compliment the visual user interface, but possibly supplant it altogether.

As our designs developed it became clear that relying on both visual user interface (UI) and voice UI interactions could offer a compelling experience and leverage some of the consumer excitement around voice UIs in the Amazon ecosystem. Through the design we learned most tasks could be accomplished with visual or voice modes or a combination of the two. The visual UI responds to the voice interactions with helpful contextual displays, facilitating a ‘hands free’ experience, but voice could be relied on entirely if the user wishes. The next question was whether users would understand the offering and successfully complete key tasks using the visual and voice UIs. Anytime you introduce a new feature, discoverability presents a challenge; particularly when competing for screen real estate with lots of competing elements. Voice interfaces, being so nascent, and lacking visual triggers, present a problem in merchandising. How will users engage with a feature if they don’t know it’s there or what it’s capable of?

To answer these basic questions, we built a functional prototype of the app to run on a Fire TV device with both visual and voice UI enabled. The Fire TV remote was the primary input device, along with a connected Alexa device for voice interaction. We’ll cover how we prototyped this experience with Alexa and Fire TV integration in a future post, but in the meantime, here are a few interesting insights we gathered during our research.

Participants easily completed complex buy flows using a voice-first interaction

While it was clear the cognitive load was high, participants were still able to easily complete a buy flow using voice interaction on a Fire TV interface. The app we designed requires Amazon account authentication to enable access to the user’s saved billing and shipping preferences. While that simplifies the process overall, the users must still make several key decisions such as initiating a shopping mode, selecting a product, choosing product attributes (like size and color), and completing the purchase.

During the study one participant said, “that was too easy.” If conspiracy theorists are right, and Amazon was listening-in over our lab-connected Echo, they likely stifled a smile. The users were genuinely surprised at the ease of the voice-only process, and in some cases, saying it was so easy to use that they were concerned about how this could be used by children in their homes if security measures weren’t taken. I observed this pattern of user behavior repeatedly throughout the study.

While current data states that voice-only Alexa shopping is quite low [3] our team’s hypothesis is that a visual user interface can more easily establish trust in the app or offering. While it is debatable precisely how much non-verbal communication factors into establishing trust and confidence [4] it’s fair to say visual information is still quite influential, especially when shopping. One way this might occur in a Fire TV app is when users can see the Prime logo. Several participants in our study indicated seeing the Prime logo would help them to build confidence in the app. When shopping it’s the accumulation of several pieces of information such a product’s photo, title, price and Prime eligibility that can sway the consumer more effectively than voice alone. Voice-only UIs are simply limited in how much information they can share at any one time, creating more friction when shopping for unfamiliar items.

Voice UI functionality was unexpected

While voice assistants are frequently built into our OTT remotes, those experiences are typically reserved for single tasks such as voice-initiated search. In the case of our app, a complete buy flow with many branches in the experience is supported. Users didn’t anticipate or expect this level of functionality for a voice-only UI. Even though the app interface we designed promoted these voice features, it just wasn’t something users expected to see, and our subtle approach to messaging wasn’t strong enough to stand out.

My hypothesis here is that it will take some time for consumer awareness to build around these types of experiences. Just as the hamburger menu took time to develop as an arguable standard for mobile navigation, it will take time for voice-driven patterns to emerge. We’re in the early stages of the technology’s development, and consumer education will take time. If these features are critical additions to your Fire TV app today, a significant amount of education in the customer journey will be required to build more awareness of the voice UI features and capabilities.

The power of multi-mode visual and voice user interfaces is highly encouraging

Initially I was a skeptic of introducing voice UI functionality into a Fire TV app, which is primarily a visual-first experience. Now seeing how easy it was for users to interact with the app in a multi-mode experience, using voice and visual user interfaces in tandem, I believe this is the sweet spot for consumers in the coming years. While many multi-mode apps and products already exist (automobiles, Siri on an iPhone, Amazon Echo Show), few platforms can so easily ‘bolt-on’ voice capabilities to an existing platform with real improvements to the experience. In my opinion over the top platforms such as Fire TV are ripe for more of this type of experience.

As the Internet of Things (IoT) phenomenon continues to develop we’ll see more and more interesting multimodal experiences such as this. What’s exciting about this is that the combination of visual user interface and voice user interface capabilities makes for a much more ‘human’ experience in my opinion, an experience more akin to what we rely on in our daily communication with other people. It’s visual and auditory communication combined that enriches our lives. While a disembodied voice has its place, it might not be that satisfying over time, and might just get a bit creepy to boot [5].

[1] OTT Market Maturing, But ‘Opportunity’ For Growth Remains

[2] EMarketer: 70% of US adults ‘second-screen’ while watching TV

[3] The Reality Behind Voice Shopping Hype

[4] Is Nonverbal Communication a Numbers Game?

[5] Alexa is laughing at users and creeping them out

All product and company names are trademarks™ or registered® trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them. 


Like what you see?

We’re always looking for interesting challenges and ways to provide value.
Contact us today and let’s talk about what we can do for you newbusiness@fellswoop.com