How a paired voice and screen experience could change the way we shop

This post was originally published on Geekwire.

In the past twenty years we’ve seen continuous ecommerce innovation. Omnichannel investments have blurred the line between channels with conveniences like buy online and in-store pick up. Specialty retailers are investing in a wide range of offerings from expert consultations to in-store classes and special events. Mobile device proliferation has enabled us to order nearly anything from anywhere. And subscribe and save offerings mean our cupboards never run dry.

However, despite these innovations, and billions invested, the actual experience of shopping online remains fundamentally the same. For the first ten years we hunched over our desktops and laptops; pogo-sticking from page-to-page looking for that elusive item. Fortunately, best practices emerged making shopping incrementally easier over time. The websites got better, and we got more reps. Faster connections, CDNs, and improved tech made the experience snappier. And tighter security has made things, well, more secure.

But it’s still basically the same. The main experience shift in the past ten years is that we’re now hunched over our phones instead; trading a larger screen for the convenience of portability. We’re still bouncing from page-to-page, still fumbling with the same filters to narrow our choices, and still enduring the tedious taps of checkout.

Where are the experiential leaps forward? For years we’ve heard about the promise of augmented reality, but it’s yet to fundamentally change how we shop. Voice assistants work if I want to replenish a predetermined item but are nearly impossible to shop. Have businesses’ penchant for risk aversion limited us to convergent thinking? Maybe, just maybe, we’ve optimized ourselves right smack into the local maxima.

Where are the divergent investments that revolutionize how we shop in the same way Netflix changed our Friday nights, or how Uber changed how we get around? Arguably the logistics of e-commerce has seen more innovation than the consumer experience. Robots have scaled the ability to ship goods at unprecedented levels. AI-enhanced systems make countless recommendations and optimizations behind the scenes. RFID tagging has improved inventory measures. It’s time for the innovation to move from back office to front of house.

The future should be now.

The frustrating part is that we have the ingredients. The devices are already in our homes and the software is the cloud. What we need is a bold chef with a fresh recipe; one that combines emerging tech in a way that leverages their strengths in an arresting fashion.

What’s the answer? It starts, but doesn’t end, with voice-first design. Voice-driven experiences have the potential to change shopping as we know it, but they face some challenges. The first challenge is one of retained-context. Humans are pros at conversation – we can shift seamlessly from one topic to the next and back thanks for non-verbal cues, gestures, non-linear thinking, and lots of practice.

Pair this constraint with the obvious lacking ability to visualize an option set, and shopping by voice is just plain hard. It doesn’t cripple buying if you know what you want, but it sure makes shopping hard when you don’t.

It’s not for lack of trying.

While startups are often hailed as industry disrupters, Amazon has the best chance to up-end shopping as we know it. To continue with the cooking metaphor – they not only have all the ingredients (expertise, resources, device penetration, data, etc.), they also have the motivation. Unlocking the next generation shopping experience could yield billions.

Shop for an item on an Alexa-enabled device and you’ll quickly experience the limitations firsthand. Listening to a lengthy SEO-optimized item name alone is tedious and violates one of the principles in voice-first design: brevity. It’s a sure sign that Amazon has yet to properly leverage their platform in a voice-friendly fashion.

However, what Alexa says next is telling. “Would you like me to send some selections to your mobile device?” A simple notification on my phone yields the same experience as if I typed my request in myself on my Prime app. With the virtual hand-off complete, I’ve simply transitioned from one device to another. As solutions go, it’s incomplete, but it does shed light on the answer to the most glaring problem – pairing voice inputs with the screens we already have.

Experiment with devices in the Echo line-up that come with screens like the Echo Show or a Fire Tablet and you’ll see the future more clearly. Thanks to Amazon’s relatively new Alexa Presentation Language (APL) that connects voice inputs to screen outputs, customers can see the promise of a ‘say-see’ experience first-hand. Regrettably, it’s a multi-modal promise not yet fully realized.  

My initial use case of “Amazon, show me a new sports coat” first yielded the score of the Mariner’s season opener in Japan. After some clarification, my sports scores were replaced with sports jackets. Rows of numbered options, complete with a photo, were available onscreen to browse with a simple swipe. Using the on-screen buttons, I could either make a purchase or get more details, but my ability to refine my choices by voice was still quite constrained. Requests to “Show me more choices like #1” yielded women’s blazers instead of men’s. Other verbal requests kept returning to sports scores.

As experiences go, it was clear the more efficient path was to simply abandon my voice commands and pick-up my device in favor of touch-driven inputs. Despite my initial disappointment, the potential is palpable. By fusing voice-driven inputs with their proprietary screens, Amazon is experimenting with the ingredients that have the potential to disrupt the shopping experience as we know it. They just have a few hurdles in front of them.

The first are the limitations of APL. In its present form, APL requires that voice-enabled experiences be created from the ground-up. It’s a daunting task for a time-starved team and one that individual retailers might not be willing to take on. Factor in the limited audience with screen-enhanced Echo devices and it’s a recipe for limited traction. A better path to adoption might be in working with platform providers. Imagine if Adobe’s Magento or Oracle’s Commerce Cloud enabled APL integration as part of their platforms. It could open voice-driven shopping on a larger scale. Better still, a next generation of APL could more effectively work with sites as they are built today shedding the current constraints of APL.

Take things beyond Amazon’s proprietary screens and enable pairing with something millions of consumers already have – the ubiquitous flat screen – and you’re off to the races.

Imagine this future.

Say I need a new jacket for an important work event. With a simple request “show me some sport coats” my Alexa-device hears my request, gathers some AI-enabled recommendations from my shopping profile, and sends them to my networked TV. In just five words, I’m presented with an array of options from my preferred and likeminded brands right on the big screen. The essential hook is making my voice assistant aware of what I’m looking at; providing it the context on which to pivot and make voice-driven recommendations.

Instead of tapping tedious filters, my options are continuously refreshed based on simple verbal commands. “Show me more choices like option #2” (APL-enabled options are already numbered for easy recognition) or “do you have #3 in a plaid?” Customer-centric options could not only show me new choices available for purchase, but also integrate complimentary accessory choices based on items I own already. After all, Zappos has a lock on my shoe tree and Amazon knows all too well of my wristwatch addiction. By enabling API access, I can tap directly into my existing closet too.

Taken a step further, AR kit-like enhancements could be extended to the smart TV taking advantage of any network cameras – like my Kinect for Xbox One. With the right integration, I could be presented with my mirror image on the big screen, one that’s digitally enhanced with the new sport coat and shirt, my favorite jeans, and shoes. Need the personal touch? Integration opportunities abound. Skype or Facetime could let me dial a fashion-forward friend. Want to call in a pro? A wardrobe consultant is standing by.

With a simple, “I’ll take it” I’m presented with a confirmation on my checkout preferences. Tiny screens and tedious keystrokes are replaced by giant screens and verbal commands. My potential for sartorial side-steps is mitigated through easy access to the expert eye.

A powerful potential.

While I used a fashion emergency in my scenario, the potential of paired voice and screen experiences is tremendous. Collaborative decisions might benefit even more. No more emailing furniture, vacation, or real estate choices back and forth or peering over shoulders. Let’s sit in front of the big screen and do it together. Complex decisions could be enhanced by networked sales agents and even consultants.

Admittedly, I’ve taken some short cuts in this fantasy thought experiment. I’m not a technologist. But it doesn’t take one to see that with, a little vision, the future is right around the corner. I just want it now.

Like what you see?

We’re always looking for interesting challenges and ways to provide value.
Contact us today and let’s talk about what we can do for you