back to indexRohit Prasad: Amazon Alexa and Conversational AI | Lex Fridman Podcast #57
link |
The following is a conversation with Rohit Prasad.
link |
He's the vice president and head scientist of Amazon Alexa
link |
and one of its original creators.
link |
The Alexa team embodies some of the most challenging,
link |
incredible, impactful, and inspiring work
link |
that is done in AI today.
link |
The team has to both solve problems
link |
at the cutting edge of natural language processing
link |
and provide a trustworthy, secure,
link |
and enjoyable experience to millions of people.
link |
This is where state of the art methods
link |
in computer science meet the challenges
link |
of real world engineering.
link |
In many ways, Alexa and the other voice assistants
link |
are the voices of artificial intelligence
link |
to millions of people and an introduction to AI
link |
for people who have only encountered it in science fiction.
link |
This is an important and exciting opportunity.
link |
And so the work that Rohit and the Alexa team are doing
link |
is an inspiration to me and to many researchers
link |
and engineers in the AI community.
link |
This is the Artificial Intelligence Podcast.
link |
If you enjoy it, subscribe on YouTube,
link |
give it five stars on Apple Podcasts,
link |
support it on Patreon,
link |
or simply connect with me on Twitter.
link |
Alex Friedman spelled F R I D M A N.
link |
If you leave a review on Apple Podcasts especially,
link |
but also Cast Box or comment on YouTube,
link |
consider mentioning topics, people, ideas, questions, quotes,
link |
and science, tech, or philosophy that you find interesting.
link |
And I'll read them on this podcast.
link |
I won't call out names, but I love comments
link |
with kindness and thoughtfulness in them,
link |
so I thought I'd share them.
link |
Someone on YouTube highlighted a quote
link |
from the conversation with Ray Dalio,
link |
where he said that you have to appreciate
link |
all the different ways that people can be AI players.
link |
This connected with me too.
link |
On teams of engineers, it's easy to think
link |
that raw productivity is the measure of excellence,
link |
but there are others.
link |
I worked with people who brought a smile to my face
link |
every time I got to work in the morning.
link |
Their contribution to the team is immeasurable.
link |
I recently started doing podcast ads
link |
at the end of the introduction.
link |
I'll do one or two minutes after introducing the episode,
link |
and never any ads in the middle
link |
that break the flow of the conversation.
link |
I hope that works for you.
link |
It doesn't hurt the listening experience.
link |
This show is presented by Cash App,
link |
the number one finance app in the App Store.
link |
I personally use Cash App to send money to friends,
link |
but you can also use it to buy, sell,
link |
and deposit Bitcoin in just seconds.
link |
Cash App also has a new investing feature.
link |
You can buy fractions of a stock, save $1 worth,
link |
no matter what the stock price is.
link |
Brokerage services are provided by Cash App Investing,
link |
a subsidiary of Square and Member SIPC.
link |
I'm excited to be working with Cash App
link |
to support one of my favorite organizations called First,
link |
best known for their first robotics and Lego competitions.
link |
They educate and inspire hundreds of thousands of students
link |
in over 110 countries,
link |
and have a perfect rating on Charity Navigator,
link |
which means the donated money is used
link |
to maximum effectiveness.
link |
When you get Cash App from the App Store, Google Play,
link |
and use code LEX Podcast, you'll get $10,
link |
and Cash App will also donate $10 to First,
link |
which again is an organization
link |
that I've personally seen inspire girls and boys
link |
to dream of engineering a better world.
link |
This podcast is also supported by ZipRecruiter.
link |
Hiring great people is hard,
link |
and to me is one of the most important elements
link |
of a successful mission driven team.
link |
I've been fortunate to be a part of
link |
and lead several great engineering teams.
link |
The hiring I've done in the past
link |
was mostly through tools we built ourselves,
link |
but reinventing the wheel was painful.
link |
ZipRecruiter is a tool that's already available for you.
link |
It seeks to make hiring simple, fast, and smart.
link |
For example, codable cofounder Gretchen Hebner
link |
used ZipRecruiter to find a new game artist
link |
to join our education tech company.
link |
By using ZipRecruiter's screening questions
link |
to filter candidates,
link |
Gretchen found it easier to focus on the best candidates
link |
and finally hiring the perfect person for the role
link |
in less than two weeks from start to finish.
link |
ZipRecruiter, the smartest way to hire.
link |
CY ZipRecruiter is effective for businesses of all sizes
link |
by signing up, as I did, for free at ziprecruiter.com
link |
slash lexpod, that's ziprecruiter.com slash lexpod.
link |
And now, here's my conversation with Rohit Prasad.
link |
In the movie, Her, I'm not sure if you've ever seen her.
link |
Human falls in love with the voice of an AI system.
link |
Let's start at the highest philosophical level
link |
before we get to deep learning and some of the fun things.
link |
Do you think this, what the movie, Her, shows,
link |
is within our reach?
link |
I think, not specifically about her,
link |
but I think what we are seeing is a massive increase
link |
in adoption of AI assistance or AI
link |
and all parts of our social fabric.
link |
And I think it's, what I do believe
link |
is that the utility these AIs provide
link |
some of the functionality
link |
some of the functionalities that are shown
link |
are absolutely within reach.
link |
So some of the functionality in terms
link |
of the interactive elements,
link |
but in terms of the deep connection,
link |
that's purely voice based.
link |
Do you think such a close connection is possible
link |
It's been a while since I saw Her,
link |
but I would say in terms of the,
link |
in terms of interactions which are both human like
link |
and in these AI assistance, you have to value
link |
what is also superhuman.
link |
We as humans can be in only one place.
link |
AI assistance can be in multiple places at the same time.
link |
One with you on your mobile device,
link |
one at your home, one at work.
link |
So you have to respect these superhuman capabilities too.
link |
Plus as humans, we have certain attributes
link |
we're very good at, very good at reasoning.
link |
AI assistance not yet there,
link |
but in the realm of AI assistance,
link |
what they're great at is computation, memory.
link |
It's infinite and pure.
link |
These are the attributes you have to start respecting.
link |
So I think the comparison with human like
link |
versus the other aspect,
link |
which is also superhuman,
link |
has to be taken into consideration.
link |
So I think we need to elevate the discussion
link |
to not just human like.
link |
So there's certainly elements we just mentioned.
link |
Alex is everywhere, computation is speaking.
link |
So this is a much bigger infrastructure
link |
than just the thing that sits there
link |
in the room with you.
link |
But it certainly feels to us mere humans
link |
that there's just another little creature there
link |
when you're interacting with it.
link |
You're not interacting with the entirety
link |
of the infrastructure, you're interacting with the device.
link |
The feeling is, okay, sure, we anthropomorphize things,
link |
but that feeling is still there.
link |
So what do you think we as humans,
link |
the purity of the interaction with a smart assistant,
link |
what do you think we look for?
link |
And in that interaction?
link |
I think in the certain interactions,
link |
I think we'll be very much where it does feel like a human
link |
because it has a person of its own.
link |
And in certain ones, it wouldn't be.
link |
So I think a simple example to think of it is
link |
if you're walking through the house
link |
and you just wanna turn on your lights on and off
link |
and you're issuing a command,
link |
that's not very much like a human like interaction.
link |
And that's where the AI shouldn't come back
link |
and have a conversation with you.
link |
Just it should simply complete that command.
link |
So I think the blend of,
link |
we have to think about this is not human, human alone.
link |
It is a human machine interaction
link |
and certain aspects of humans are needed
link |
and certain aspects and situations
link |
demanded to be like a machine.
link |
So I told you, it's gonna be full soft code in parts.
link |
What's the difference between human and machine
link |
in that interaction?
link |
When we interact to humans,
link |
especially those who are friends and loved ones
link |
versus you and a machine that you also are close with.
link |
I think you have to think about the roles the AI plays, right?
link |
So, and it differs from different customer to customer,
link |
different situation to situation,
link |
especially I can speak from Alexa's perspective.
link |
It is a companion, a friend at times, an assistant
link |
and an advisor down the line.
link |
So I think most AI's will have this kind of attributes
link |
and it will be very situational in nature.
link |
So where is the boundary?
link |
I think the boundary depends on exact context
link |
in which you're interacting with the AI.
link |
So the depth and the richness of natural language conversation
link |
has been, by Alan Turing,
link |
been used to try to define what it means to be intelligent.
link |
There's a lot of criticism of that kind of test,
link |
but what do you think is a good test of intelligence
link |
in your view in the context of the Turing test?
link |
And Alexa, with the Alexa prize, this whole realm,
link |
do you think about this human intelligence,
link |
what it means to define it,
link |
what it means to reach that level?
link |
I do think the ability to converse
link |
is a sign of an ultimate intelligence.
link |
I think that there's no question about it.
link |
So if you think about all aspects of humans,
link |
there are sensors we have
link |
and those are basically a data collection mechanism.
link |
And based on that, we make some decisions
link |
with our sensory brains, right?
link |
And from that perspective,
link |
I think there are elements we have to talk about
link |
how we sense the word
link |
and then how we act based on what we sense.
link |
Those elements clearly machines have.
link |
But then there's the other aspects of computation
link |
that is way better.
link |
I also mentioned about memory again
link |
in terms of being near infinite,
link |
depending on the storage capacity you have.
link |
And the retrieval can be extremely fast and pure
link |
in terms of like, there's no ambiguity of
link |
who did I see when, right?
link |
I mean, if machines can remember that quite well.
link |
So again, on a philosophical level,
link |
I do subscribe to the fact that to be able to converse
link |
and as part of that to be able to reason
link |
based on the world knowledge you've acquired
link |
and the sensory knowledge that is there
link |
is definitely very much the essence of intelligence.
link |
But intelligence can go beyond human level,
link |
intelligence based on what machines are getting capable of.
link |
So what do you think maybe stepping outside of Alexa
link |
broadly as an AI field?
link |
What do you think is a good test of intelligence?
link |
Put it another way outside of Alexa,
link |
because so much of Alexa is a product,
link |
is an experience for the customer.
link |
On the research side,
link |
what would impress the heck out of you if you saw?
link |
What is the test where you said, wow,
link |
wow, this thing is now starting to encroach
link |
into the realm of what we loosely think
link |
of as human intelligence?
link |
So, well, we think of it as AGI
link |
and human intelligence altogether, right?
link |
So in some sense, and I think we are quite far from that.
link |
I think an unbiased view I have
link |
is that the Alexa's intelligence capability is a great test.
link |
I think of it as there are many other true points
link |
like self driving cars,
link |
game playing like go or chess.
link |
Let's take those two for as an example.
link |
Clearly requires a lot of data driven learning
link |
and intelligence, but it's not as hard a problem
link |
as conversing with as an AI is with humans
link |
to accomplish certain tasks or open domain chat,
link |
as you mentioned, Alexa prize.
link |
In those settings, the key differences
link |
that the end goal is not defined unlike game playing.
link |
You also do not know exactly what state you are in
link |
in a particular goal completion scenario.
link |
In certain sense, sometimes you can if it's a simple goal,
link |
but if you're even certain examples
link |
like planning a weekend or you can imagine
link |
how many things change along the way.
link |
You look for whether you may change your mind
link |
and you change the destination
link |
or you want to catch a particular event
link |
and then you decide, no, I want this other event
link |
So these dimensions of how many different steps are possible
link |
when you're conversing as a human with a machine
link |
makes it an extremely daunting problem.
link |
And I think it is the ultimate test for intelligence.
link |
And don't you think that natural language
link |
is enough to prove that conversation?
link |
Just pure conversation.
link |
From a scientific standpoint,
link |
natural language is a great test,
link |
but I would go beyond, I don't want to limit it
link |
to as natural language as simply understanding an intent
link |
or parsing for entities and so forth.
link |
We are really talking about dialogue.
link |
So I would say human machine dialogue
link |
is definitely one of the best tests of intelligence.
link |
So can you briefly speak to the Alexa prize
link |
for people who are not familiar with it
link |
and also just maybe where things stand
link |
and what have you learned and what's surprising?
link |
What have you seen that surprising
link |
from this incredible competition?
link |
Absolutely, it's a very exciting competition.
link |
Alexa prize is essentially a grand challenge
link |
in conversational artificial intelligence
link |
where we threw the gauntlet to the universities
link |
who do active research in the field to say,
link |
can you build what we call a social bot
link |
that can converse with you coherently
link |
and engagingly for 20 minutes?
link |
That is an extremely hard challenge talking to someone
link |
who you're meeting for the first time
link |
or even if you've met them quite often
link |
to speak at 20 minutes on any topic
link |
and evolving nature of topics is super hard.
link |
We have completed two successful years of the competition.
link |
The first was one with the University of Washington,
link |
second University of California.
link |
We are in our third instance.
link |
We have an extremely strong team of 10 cohorts
link |
and the third instance of the Alexa prize is underway now.
link |
And we are seeing a constant evolution.
link |
First year was definitely a learning.
link |
It was a lot of things to be put together.
link |
We had to build a lot of infrastructure
link |
to enable these universities to be able
link |
to build magical experiences
link |
and do high quality research.
link |
Just a few quick questions, sorry for the interruption.
link |
What does failure look like in the 20 minute session?
link |
So what does it mean to fail not to reach the 20 minute mark?
link |
So there are one, first of all,
link |
I forgot to mention one more detail.
link |
It's not just 20 minutes,
link |
but the quality of the conversation too that matters.
link |
And the beauty of this competition
link |
before I answer that question on what failure means
link |
is first that you actually converse
link |
with millions and millions of customers
link |
as these social bots.
link |
So during the judging phases, there are multiple phases.
link |
Before we get to the finals,
link |
which is a very controlled judging in a situation
link |
where we bring in judges and we have contractors
link |
who interact with these social bots,
link |
that is a much more controlled setting.
link |
But till the point we get to the finals,
link |
all the judging is essentially by the customers of Alexa.
link |
And there you basically rate on a simple question
link |
how good your experience was.
link |
So that's where we are not testing
link |
for a 20 minute boundary being crossed
link |
because you do want it to be very much like a clear cut winner,
link |
be chosen and it's an absolute bar.
link |
So did you really break that 20 minute barrier?
link |
Is why we have to test it in a more controlled setting
link |
with actors, essentially interactors
link |
and see how the conversation goes.
link |
So this is why it's a subtle difference
link |
between how it's being tested in the field
link |
with real customers versus in the lab to award the prize.
link |
So on the latter one, what it means is that
link |
essentially there are three judges
link |
and two of them have to say this conversation
link |
is stalled, essentially.
link |
And the judges are human experts.
link |
Judges are human experts.
link |
So this is in the third year.
link |
So what's been the evolution?
link |
How far, so the DARPA challenge in the first year,
link |
the autonomous vehicles and nobody finished in the second year,
link |
a few more finished in the desert.
link |
So how far along in this, I would say,
link |
much harder challenge are we?
link |
This challenge has come a long way to the extent that
link |
we're definitely not close to the 20 minute barrier
link |
being with coherence and engaging conversation.
link |
I think we are still five to 10 years away
link |
in that horizon to complete that.
link |
But the progress is immense.
link |
Like what you're finding is the accuracy
link |
and what kind of responses these social bots generate
link |
is getting better and better.
link |
What's even amazing to see that now there's humor coming in.
link |
The bots are quite...
link |
You're talking about ultimate science of intelligence.
link |
I think humor is a very high bar
link |
in terms of what it takes to create humor.
link |
And I don't mean just being goofy.
link |
I really mean good sense of humor
link |
is also a sign of intelligence in my mind
link |
and something very hard to do.
link |
So these social bots are now exploring
link |
not only what we think of natural language abilities
link |
but also personality attributes
link |
and aspects of when to inject an appropriate joke,
link |
when you don't know the domain,
link |
how you come back with something more intelligible
link |
so that you can continue the conversation.
link |
If you and I are talking about AI
link |
and we are domain experts, we can speak to it.
link |
But if you suddenly switch a topic to that,
link |
How do I change the conversation?
link |
So you're starting to notice these elements as well.
link |
And that's coming from partly by the nature
link |
of the 20 minute challenge
link |
that people are getting quite clever
link |
on how to really converse
link |
and essentially mask some of the understanding defects
link |
So some of this, this is not Alexa the product.
link |
This is somewhat for fun, for research, for innovation
link |
I have a question sort of in this modern era,
link |
there's a lot of, if you look at Twitter
link |
and Facebook and so on, there's discourse,
link |
public discourse going on
link |
and some things that are a little bit too edgy,
link |
people get blocked and so on.
link |
I'm just out of curiosity.
link |
Are people in this context pushing the limits?
link |
Is anyone using the F word?
link |
Is anyone sort of pushing back sort of arguing,
link |
I guess I should say as part of the dialogue
link |
to really draw people in?
link |
First of all, let me just back up a bit
link |
in terms of why we are doing this, right?
link |
So you said it's fun.
link |
I think fun is more part of the engaging part for customers.
link |
It is one of the most used skills as well in our skill store.
link |
But up that apart, the real goal was essentially
link |
what was happening is
link |
with a lot of AI research moving to industry,
link |
we felt that academia has the risk
link |
of not being able to have the same resources
link |
at disposal that we have, which is lots of data,
link |
massive computing power,
link |
and a clear ways to test these AI advances
link |
with real customer benefits.
link |
So we brought all these three together in the Alexa prize.
link |
That's why it's one of my favorite projects in Amazon.
link |
And with that, the secondary effect is,
link |
yes, it has become engaging for our customers as well.
link |
We're not there in terms of where we want it to be, right?
link |
But it's a huge progress.
link |
But coming back to your question on
link |
how do the conversations evolve?
link |
Yes, there is some natural attributes
link |
of what you said in terms of argument
link |
and some amount of swearing.
link |
The way we take care of that
link |
is that there is a sensitive filter we have built.
link |
That's some keywords and so on.
link |
It's more than keywords, a little more in terms of,
link |
of course, there's keyword based too,
link |
but there's more in terms of,
link |
these words can be very contextual, as you can see.
link |
And also the topic can be something
link |
that you don't want a conversation to happen
link |
because this is a communal device as well.
link |
A lot of people use these devices.
link |
So we have put a lot of guardrails for the conversation
link |
to be more useful for advancing AI
link |
and not so much of these other issues you attributed
link |
what's happening in the AI field as well.
link |
Right, so this is actually a serious opportunity.
link |
I didn't use the right word, fun.
link |
I think it's an open opportunity to do some,
link |
some of the best innovation in conversational agents
link |
Why just universities?
link |
Why just universities?
link |
Because as I said, I really felt the young minds,
link |
it's also to, if you think about the other aspect
link |
of where the whole industry is moving with AI,
link |
there's a dearth of talent in, in given the demands.
link |
So you do want universities to have a clear place
link |
where they can invent and research and not fall behind
link |
with that they can't motivate students.
link |
Imagine all grad students left to, to industry, like us,
link |
or faculty members, which has happened too.
link |
So this is a way that if you're so passionate
link |
about the field where you feel industry and academia
link |
need to work well, this is a great example
link |
and a great way for universities to participate.
link |
So what do you think it takes to build a system
link |
that wins Deluxe Surprise?
link |
I think you have to start focusing on aspects of reasoning
link |
that it is, there are still more lookups
link |
of what intents the customer is asking for
link |
and responding to those rather than really reasoning
link |
about the elements of the, of the conversation.
link |
For instance, if you have, if you're playing,
link |
if the conversation is about games
link |
and it's about a recent sports event,
link |
there's so much context involved
link |
and you have to understand the entities
link |
that are being mentioned so that the conversation
link |
is coherent rather than you suddenly just switch
link |
to knowing some fact about a sports entity
link |
and you're just relaying that rather
link |
than understanding the true context of the game.
link |
Like if you just said, I learned this fun fact
link |
about Tom Brady rather than really say
link |
how he played the game the previous night,
link |
then the conversation is not really that intelligent.
link |
So you have to go to more reasoning elements
link |
of understanding the context of the dialogue
link |
and giving more appropriate responses,
link |
which tells you that we are still quite far
link |
because a lot of times it's more facts being looked up
link |
and something that's close enough as an answer
link |
but not really the answer.
link |
So that is where the research needs to go more
link |
and actual true understanding and reasoning.
link |
And that's why I feel it's a great way to do it
link |
because you have an engaged set of users working
link |
to make help these AI advances happen in this case.
link |
You mentioned customers, they're quite a bit,
link |
and there's a skill, what is the experience
link |
for the user that's helping?
link |
So just to clarify, this isn't, as far as I understand,
link |
the Alexa, so this skill is a standalone
link |
for the Alexa prize, I mean it's focused
link |
on the Alexa prize, it's not you ordering certain things
link |
on Amazon.com or checking the weather
link |
or playing Spotify, right, it's a separate skill.
link |
And so you're focused on helping that,
link |
I don't know how do people, how do customers think of it?
link |
Are they having fun?
link |
Are they helping teach the system?
link |
What's the experience like?
link |
I think it's both, actually,
link |
and let me tell you how you invoke this skill.
link |
So all you have to say, Alexa, let's chat.
link |
And then the first time you say, Alexa, let's chat,
link |
it comes back with a clear message
link |
that you're interacting with one of those
link |
university social bots, and there's a clear,
link |
so you know exactly how you interact, right?
link |
And that is why it's very transparent.
link |
You are being asked to help, right?
link |
And we have a lot of mechanisms where as the,
link |
we are in the first phase of feedback phase,
link |
then you send a lot of emails to our customers,
link |
and then they know that the team needs a lot of interactions
link |
to improve the accuracy of the system.
link |
So we know we have a lot of customers
link |
who really want to help these university bots,
link |
and they're conversing with that.
link |
And some are just having fun with just saying,
link |
Alexa, let's chat.
link |
And also some adversarial behavior to see whether,
link |
how much do you understand as a social bot?
link |
So I think we have a good, healthy mix
link |
of all three situations.
link |
So what is the, if we talk about solving the Alexa challenge,
link |
the Alexa prize, what's the data set
link |
of really engaging pleasant conversations look like?
link |
Because if we think of this
link |
as a supervised learning problem,
link |
I don't know if it has to be,
link |
but if it does, maybe you can comment on that.
link |
Do you think there needs to be a data set
link |
of what it means to be an engaging,
link |
successful, fulfilling conversation?
link |
I think that's part of the research question here.
link |
This was, I think, we at least got the first spot right,
link |
which is have a way for universities to build and test
link |
in a real world setting.
link |
Now you're asking in terms of the next phase of questions,
link |
which we are still, we're also asking, by the way,
link |
what does success look like from a optimization function?
link |
That's what you're asking.
link |
In terms of, we as researchers are used
link |
to having a great corpus of annotated data
link |
and then making or then sort of tune our algorithms
link |
And fortunately and unfortunately,
link |
in this world of Alexa prize,
link |
that is not the way we are going after it.
link |
So you have to focus more on learning
link |
based on live feedback.
link |
That is another element that's unique,
link |
where just now I started with giving you how you ingress
link |
and experience this capability as a customer.
link |
What happens when you're done?
link |
So they ask you a simple question on a scale of one to five,
link |
how likely are you to interact with this social border game?
link |
That is a good feedback
link |
and customers can also leave more open ended feedback.
link |
And I think partly that to me is one part of the question
link |
you're asking, which I'm saying is a mental model shift
link |
that as researchers also, you have to change your mindset
link |
that this is not a DARPA evaluation or NSF funded study
link |
and you have a nice corpus.
link |
This is where it's real world.
link |
You have real data.
link |
The scale is amazing and that's a beautiful thing.
link |
And then the customer, the user can quit the conversation
link |
Exactly, the user can.
link |
That is also a signal for how good you were at that point.
link |
So, and then on a scale of one to five, one to three,
link |
do they say how likely are you or is it just a binary?
link |
That's such a beautifully constructed challenge.
link |
You said the only way to make a smart assistant really smart
link |
is to give it eyes and let it explore the world.
link |
I'm not sure you might have been taken out of context,
link |
but can you comment on that?
link |
Can you elaborate on that idea?
link |
Is that I personally also find that idea super exciting
link |
from a social robotics, personal robotics perspective?
link |
Yeah, a lot of things do get taken out of context.
link |
My, this particular one was just as philosophical discussion
link |
we were having on terms of what does intelligence look like?
link |
And the context was in terms of learning,
link |
I think just we said we as humans are empowered
link |
with many different sensory abilities.
link |
I do believe that eyes are an important aspect of it
link |
in terms of if you think about how we as humans learn,
link |
it is quite complex and it's also not unimodal
link |
that you are fed a ton of text or audio
link |
and you just learn that way.
link |
No, you learn by experience, you learn by seeing,
link |
you're taught by humans and we are very efficient
link |
Machines on the contrary are very inefficient
link |
on how they learn, especially these AIs.
link |
I think the next wave of research
link |
is going to be with less data,
link |
not just less human, not just with less labeled data
link |
but also with a lot of weak supervision
link |
and where you can increase the learning rate.
link |
I don't mean less data in terms of not having
link |
a lot of data to learn from that.
link |
We are generating so much data
link |
but it is more about from a aspect of how fast can you learn?
link |
So improving the quality of the data
link |
that's the quality of data and the learning process.
link |
I think more on the learning process.
link |
I think we have to, we as humans learn
link |
with a lot of noisy data, right?
link |
And I think that's the part that I don't think should change.
link |
What should change is how we learn, right?
link |
So if you look at, you mentioned supervised learning,
link |
we have making transformative shifts
link |
from moving to more unsupervised, more weak supervision.
link |
Those are the key aspects of how to learn.
link |
And I think in that setting, I hope you agree with me
link |
that having other senses is very crucial
link |
in terms of how you learn.
link |
So absolutely, and from a machine learning perspective
link |
which I hope we get a chance to talk to a few aspects
link |
that are fascinating there, but to stick on the point
link |
of sort of a body, you know, an embodiment.
link |
So Alexa has a body.
link |
It has a very minimalistic, beautiful interface
link |
where there's a ring and so on.
link |
I mean, I'm not sure of all the flavors of the devices
link |
that Alexa lives on, but there's a minimalistic,
link |
And nevertheless, we humans, so I have a room,
link |
but I have all kinds of robots all over everywhere.
link |
So what do you think the Alexa of the future looks like
link |
if it begins to shift what his body looks like?
link |
What, maybe beyond the Alexa, what do you think
link |
are the different devices in the home
link |
as they start to embody their intelligence more and more?
link |
What do you think that looks like?
link |
Philosophically, a future, what do you think that looks like?
link |
I think let's look at what's happening today.
link |
You mentioned, I think, other devices as an Amazon devices,
link |
but I also wanted to point out Alexa
link |
is already integrated in a lot of third party devices
link |
which also come in lots of forms and shapes.
link |
Some in robots, right?
link |
Some in microwaves, some in appliances
link |
that you use in everyday life.
link |
So I think it's not just the shape Alexa takes
link |
in terms of form factors,
link |
but it's also where all it's available.
link |
And it's getting in cars,
link |
it's getting in different appliances in homes,
link |
even toothbrushes, right?
link |
So I think you have to think about it
link |
as not a physical assistant.
link |
It will be in some embodiment as you said,
link |
we already have these nice devices.
link |
But I think it's also important to think of it
link |
as a virtual assistant.
link |
It is superhuman in the sense
link |
that it is in multiple places at the same time.
link |
So I think the actual embodiment in some sense
link |
to me doesn't matter.
link |
I think you have to think of it as not as human like
link |
and more of what its capabilities are
link |
that derive a lot of benefit for customers
link |
and how there are different ways to delight it
link |
and delight customers and different experiences.
link |
And I think I'm a big fan of it
link |
not being just human like,
link |
it should be human like in certain situations.
link |
Alexa Price, social bot in terms of conversation
link |
is a great way to look at it,
link |
but there are other scenarios where human like
link |
I think is underselling the abilities of this AI.
link |
So if I could trivialize what we're talking about.
link |
So if you look at the way Steve Jobs thought
link |
about the interaction with the device
link |
that Apple produced,
link |
there was a extreme focus on controlling the experience
link |
by making sure there's only these Apple produced devices.
link |
You see the voice of Alexa being,
link |
taking all kinds of forms
link |
depending on what the customers want.
link |
And that means it could be anywhere
link |
from the microwave to vacuum cleaner to the home
link |
and so on the voice is the essential element
link |
of the interaction.
link |
I think voice is an essence.
link |
It's not all, but it's a key aspect.
link |
I think to your question in terms of
link |
you should be able to recognize Alexa.
link |
And that's a huge problem.
link |
I think in terms of a huge scientific problem,
link |
I should say like what are the traits?
link |
What makes it look like Alexa,
link |
especially in different settings
link |
and especially if it's primarily voice what it is.
link |
But Alexa is not just voice either, right?
link |
I mean, we have devices with a screen.
link |
Now you're seeing just other behaviors of Alexa.
link |
So I think we're in very early stages of what that means.
link |
And this will be an important topic for the following years.
link |
But I do believe that being able to recognize
link |
and tell when it's Alexa versus it's not
link |
is going to be important from an Alexa perspective.
link |
I'm not speaking for the entire AI community,
link |
but from, but I think attribution.
link |
And as we go into more of understanding
link |
who did what, that identity of the AI
link |
is crucial in the coming world.
link |
I think from the broad AI community perspective,
link |
that's also a fascinating problem.
link |
So basically if I close my eyes and listen to the voice,
link |
what would it take for me to recognize that this is Alexa?
link |
Or at least the Alexa that I've come to know
link |
from my personal experience in my home
link |
through my interactions that come.
link |
And the Alexa here in the US is very different.
link |
The Alexa in UK and the Alexa in India,
link |
even though they are all speaking English
link |
or the Australian version.
link |
So again, when, so now think about
link |
when you go into a different culture,
link |
a different community, but you travel there,
link |
what do you recognize Alexa?
link |
I think these are super hard questions actually.
link |
So there's a team that works on personality.
link |
So if we talk about those different flavors
link |
of what it means, culturally speaking,
link |
India, UK, US, what does it mean to add?
link |
So the problem that we just stated is just fascinating.
link |
How do we make it purely recognizable that it's Alexa?
link |
Assuming that the qualities of the voice are not sufficient.
link |
It's also the content of what is being said.
link |
How do we do that?
link |
How does the personality keep on coming to play?
link |
What's that research you would look like?
link |
I mean, it's such a fascinating.
link |
We have some very fascinating folks who,
link |
from both the UX background and human factors,
link |
are looking at these aspects and these exact questions.
link |
But I'll definitely say it's not just how it sounds,
link |
the choice of words, the tone,
link |
not just, I mean, the voice identity of it,
link |
but the tone matters, the speed matters,
link |
how you speak, how you enunciate words,
link |
how, what choice of words are you using?
link |
How terse are you or how lengthy in your explanations you are?
link |
All of these are factors.
link |
And you also mentioned something crucial
link |
that it may have personalized Alexa to some extent
link |
in your homes or in the devices you are interacting with.
link |
So you, as your individual,
link |
how you prefer Alexa sounds
link |
can be different than how I prefer.
link |
And the amount of customizability you want to give
link |
is also a key debate we always have.
link |
But I do want to point out it's more than the voice actor
link |
that recorded and it sounds like that actor.
link |
It is more about the choices of words,
link |
the attributes of tonality,
link |
the volume in terms of how you raise your pitch
link |
and so forth, all of that matters.
link |
This is such a fascinating problem
link |
from a product perspective.
link |
I could see those debates just happening
link |
inside of the Alexa team
link |
of how much personalization do you do
link |
for the specific customer?
link |
Because you're taking a risk if you over personalize.
link |
Because you don't,
link |
if you create a personality for a million people,
link |
you can test that better.
link |
You can create a rich, fulfilling experience
link |
that will do well.
link |
But the more you personalize it,
link |
the less you can test it,
link |
the less you can know that it's a great experience.
link |
So how much personalization, what's the right balance?
link |
I think the right balance depends on the customer.
link |
Give them the control.
link |
So I'll say, I think the more control you give customers,
link |
the better it is for everyone.
link |
And I'll give you some key personalization features.
link |
I think we have a feature called remember this,
link |
which is where you can tell Alexa to remember something.
link |
There you have an explicit sort of control
link |
in customer's hand
link |
because they have to say Alexa, remember X, Y, Z.
link |
What kind of things would that be used for?
link |
So you can like use it.
link |
I have stored my tire specs for my car
link |
because it's so hard to go and find and see what it is
link |
right when you're having some issues.
link |
I store my mileage plan numbers
link |
for all the frequent flyer ones
link |
where I'm sometimes just looking at it and it's not handy.
link |
So those are my own personal choices I've made
link |
for Alexa to remember something on my behalf.
link |
So again, I think the choice was be explicit
link |
about how you provide that to a customer as a control.
link |
So I think these are the aspects of what you do.
link |
Like think about where we can use
link |
speaker recognition capabilities that it's,
link |
if you taught Alexa that you are Lex
link |
and this person in your household is person two,
link |
then you can personalize the experiences.
link |
Again, these are very in the CX customer experience patterns
link |
are very clear about and transparent
link |
when a personalization action is happening.
link |
And then you have other ways
link |
like you go through explicit control right now
link |
through your app that your multiple service providers,
link |
let's say for music, which one is your preferred one?
link |
So when you say place sting,
link |
depend on your, whether you have preferred Spotify
link |
or Amazon music or Apple music
link |
that the decision is made where to play it from.
link |
So what's Alexa's backstory from her perspective?
link |
Is there, I remember just asking as probably
link |
a lot of us are just the basic questions
link |
about love and so on of Alexa,
link |
just to see what the answer would be.
link |
Just it feels like there's a little bit of a back,
link |
this feels like there's a little bit of personality
link |
Is Alexa have a metaphysical presence
link |
in this human universe who live in
link |
or is it something more ambiguous?
link |
Is there a family kind of idea
link |
even for joking purposes and so on?
link |
I think, well, it does tell you,
link |
if I think you, I should double check this,
link |
but if you said, when were you born?
link |
I think we do respond.
link |
I need to double check that,
link |
but I'm pretty positive about it.
link |
I think that you do it
link |
because I think I've tested that.
link |
But that's like a, that's like how,
link |
like I was born in your brand of champagne
link |
and whatever the year kind of thing.
link |
So on terms of the metaphysical,
link |
I think it's early,
link |
does it have the historic knowledge about herself
link |
to be able to do that?
link |
Maybe, have we crossed that boundary?
link |
In terms of being, thank you.
link |
Have we thought about it quite a bit,
link |
but I wouldn't say that we have come to a clear decision
link |
in terms of what it should look like.
link |
But you can imagine though,
link |
and I bring this back to the Alexa Prize social bot one,
link |
there you will start seeing some of that.
link |
Like you, these bots have their identity.
link |
And in terms of that,
link |
you may find, you know,
link |
this is such a great research topic
link |
that some academia team may think of these problems
link |
and start solving them too.
link |
So let me ask a question.
link |
It's kind of difficult, I think,
link |
but it feels fascinating to me
link |
because I'm fascinated with psychology.
link |
It feels that the more personality you have,
link |
the more dangerous it is.
link |
In terms of a customer perspective, a product,
link |
if you want to create a product that's useful.
link |
By dangerous, I mean creating an experience that upsets me.
link |
And so, how do you get that right?
link |
Because if you look at the relationships,
link |
maybe I'm just a screwed up Russian,
link |
but if you look at the human relationship,
link |
some of our deepest relationships have fights,
link |
have tension, have the push and pull,
link |
have a little flavor in them.
link |
Do you want to have such flavor
link |
in an interaction with Alexa?
link |
How do you think about that?
link |
So there's one other common thing that you didn't say,
link |
but we think of it as paramount for any deep relationship.
link |
So I think if you trust every attribute you said,
link |
a fight, some tension is all healthy.
link |
But what is sort of unnegotiable in this instance is trust.
link |
And I think the bar to earn customer trust for AI
link |
is very high, in some sense, more than a human.
link |
It's not just about personal information or your data,
link |
it's also about your actions on a daily basis.
link |
How trustworthy are you in terms of consistency,
link |
in terms of how accurate are you in understanding me?
link |
Like if you're talking to a person on the phone,
link |
if you have a problem with your,
link |
let's say your internet or something,
link |
if the person's not understanding,
link |
you lose trust right away.
link |
You don't want to talk to that person.
link |
That whole example gets amplified by a factor of 10
link |
because when you're a human interacting with an AI,
link |
you have a certain expectation.
link |
Either you expect it to be very intelligent
link |
and then you get upset, why is it behaving this way?
link |
Or you expect it to be not so intelligent
link |
and when it surprises you are like,
link |
really, you're trying to be too smart.
link |
So I think we grapple with these hard questions as well,
link |
but I think the key is actions need to be trustworthy
link |
from these AIs, not just about data protection,
link |
your personal information protection,
link |
but also from how accurately it accomplishes
link |
all commands or all interactions.
link |
Well, it's tough to hear because trust,
link |
you're absolutely right,
link |
but trust is such a high bar with AI systems
link |
because people, and I see this
link |
because I work with autonomous vehicles,
link |
I mean, the bar that's placed on AI system
link |
is unreasonably high.
link |
Yeah, that is going to be, I agree with you.
link |
And I think of it as, it's a challenge
link |
and it's also keeps my job, right?
link |
So from that perspective, I totally,
link |
I think of it at both sides as a customer and as a researcher.
link |
I think as a researcher,
link |
yes, occasionally it will frustrate me
link |
that why is the bar so high for these AIs?
link |
And as a customer, then I say absolutely
link |
it has to be that high, right?
link |
So I think that's the trade off we have to balance,
link |
but doesn't change the fundamentals
link |
that trust has to be earned.
link |
And the question then becomes is,
link |
are we holding the AIs to a different bar
link |
in accuracy and mistakes than we hold humans?
link |
That's going to be a great societal questions
link |
for years to come, I think for us.
link |
Well, one of the questions that we grapple
link |
as a society now that I think about a lot,
link |
I think a lot of people in the AI think about a lot
link |
and Alexis taking on head on is privacy.
link |
Is the reality is us giving over data
link |
to any AI system can be used to enrich our lives
link |
So if basically any product that does anything awesome
link |
for you, the more data has,
link |
the more awesome things it can do.
link |
And yet, at the other side,
link |
people imagine the worst case possible scenario
link |
of what can you possibly do with that data?
link |
People, it boils down to trust, as you said before.
link |
There's a fundamental distrust
link |
of in certain groups of governments and so on,
link |
depending on the government, depending on who's empowered,
link |
depending on all these kinds of factors.
link |
And so here's Alexa in the middle of all of it
link |
in the home trying to do good things for the customers.
link |
So how do you think about privacy in this context,
link |
the smart assistance in the home?
link |
How do you maintain, how do you earn trust?
link |
Absolutely, so as you said, trust is the key here.
link |
So you start with trust and then privacy
link |
is a key aspect of it.
link |
It has to be designed from very beginning about that.
link |
And we believe in two fundamental principles.
link |
One is transparency and second is control.
link |
So by transparency, I mean when we build
link |
what is now called smart speaker or the first echo.
link |
We were quite judicious about making these right tradeoffs
link |
on customers behalf that it is pretty clear
link |
when the audio is being sent to cloud.
link |
The light ring comes on when it has heard you say
link |
the word wake word and then the streaming happens, right?
link |
So when the light ring comes up, we also had,
link |
we put a physical mute button on it,
link |
just so if you didn't want it to be listening,
link |
even for the wake word,
link |
then you turn the mute button on
link |
and that disables the microphones.
link |
That's just the first decision
link |
on essentially transparency and control.
link |
Over then, even when we launched,
link |
we gave the control in the hands of the customers
link |
that you can go and look at any of your individual utterances
link |
that is recorded and delete them anytime.
link |
And we've got to do that promise, right?
link |
So and that is super, again, a great instance
link |
of showing how you have the control.
link |
Then we made it even easier.
link |
You can say, like I said, delete what I said today.
link |
So that is now making it even just more control
link |
in your hands with what's most convenient
link |
about this technology is voice.
link |
You delete it with your voice now.
link |
So these are the types of decisions we continually make.
link |
We just recently launched this feature called
link |
what we think of it as if you wanted humans
link |
not to review your data because you mentioned
link |
supervised learning, right?
link |
So in supervised learning,
link |
humans have to give some annotation.
link |
And that also is now a feature where you can,
link |
essentially, if you've selected that flag,
link |
your data will not be reviewed by a human.
link |
So these are the types of controls
link |
that we have to constantly offer with customers.
link |
So why do you think it bothers people so much
link |
that, so everything you just said is really powerful.
link |
So the control, the ability to delete,
link |
because we collect, we have studies here running at MIT
link |
that collects huge amounts of data
link |
and people consent and so on.
link |
The ability to delete that data is really empowering.
link |
And almost nobody ever asked to delete it,
link |
but the ability to have that control is really powerful.
link |
But still, there's these popular anecdotal evidence
link |
that people say they like to tell that
link |
them and a friend were talking about something,
link |
I don't know, sweaters for cats.
link |
And all of a sudden they'll have advertisements
link |
for cat sweaters on Amazon.
link |
There's that, that's a popular anecdote
link |
as if something is always listening.
link |
Can you explain that anecdote,
link |
that experience that people have?
link |
What's the psychology of that?
link |
What's that experience?
link |
And can you, you've answered it,
link |
but let me just ask, is Alexa listening?
link |
No, Alexa listens only for the wake word on the device, right?
link |
And the wake word is?
link |
The words like Alexa, Amazon, Echo,
link |
and you, but you only choose one at a time.
link |
So you choose one and it listens only
link |
for that on our devices.
link |
So that's first, from a listening perspective,
link |
you have to be very clear that it's just the wake word.
link |
So you said, why is there this anxiety, if you may?
link |
It's because there's a lot of confusion
link |
what it really listens to, right?
link |
And I think it's partly on us to keep educating
link |
our customers and the general media more
link |
in terms of like what really happens
link |
and we've done a lot of it.
link |
And our pages on information are clear,
link |
but still people have to have more,
link |
there's always a hunger for information and clarity.
link |
And we'll constantly look at how best to communicate.
link |
If you go back and read everything,
link |
yes, it states exactly that.
link |
And then people could still question it.
link |
And I think that's absolutely okay to question.
link |
What we have to make sure is that we are,
link |
because our fundamental philosophy is customer first,
link |
customer obsession is our leadership principle.
link |
If you put, as researchers,
link |
I put myself in the shoes of the customer
link |
and all decisions in Amazon are made with that and that.
link |
And trust has to be earned
link |
and we have to keep earning the trust
link |
of our customers in this setting.
link |
And to your other point on like,
link |
is there something showing up
link |
based on your conversations?
link |
No, I think the answer is like you,
link |
a lot of times when those experiences happen,
link |
you have to also be know that, okay,
link |
it may be a winter season,
link |
people are looking for sweaters, right?
link |
And it shows up on your Amazon.com
link |
because it is popular.
link |
So there are many of these,
link |
you mentioned that personality or personalization.
link |
Turns out we are not that unique either, right?
link |
So those things we, as humans, start thinking,
link |
oh, must be because something was heard
link |
and that's why this other thing showed up.
link |
Probably it is just the season for sweaters.
link |
I'm not gonna ask you this question
link |
because it's just, because you're also,
link |
because people have so much paranoia.
link |
But for my, let me just say, from my perspective,
link |
I hope there's a day when the customer
link |
can ask Alexa to listen all the time
link |
to improve the experience, to improve,
link |
because I personally don't see the negative
link |
because if you have the control
link |
and if you have the trust,
link |
there's no reason why I shouldn't be listening
link |
all the time to the conversations
link |
to learn more about you.
link |
Because ultimately, as long as you have control and trust,
link |
every data you provide to the device
link |
that the device wants is going to be useful.
link |
And so to me, as a machine learning person,
link |
I think it worries me how sensitive people are
link |
about their data relative to how empowering
link |
it could be for the devices around them,
link |
enriching it could be for their own life
link |
to improve the product.
link |
So it's something I think about sort of a lot,
link |
how do we make that devices?
link |
Obviously Alexa thinks about it a lot as well.
link |
I don't know if you wanna comment on that.
link |
So have you seen, let me ask it in the form of a question.
link |
Okay, have you seen an evolution
link |
in the way people think about their private data
link |
in the previous several years?
link |
So as we as a society get more and more comfortable
link |
to the benefits we get by sharing more data.
link |
First, let me answer that part
link |
and then I'll wanna go back
link |
to the other aspect you were mentioning.
link |
So as a society, on a general,
link |
we are getting more comfortable as a society.
link |
Doesn't mean that everyone is
link |
and I think we have to respect that.
link |
I don't think one size fits all
link |
is always gonna be the answer for all, right?
link |
So I think that's something to keep in mind in these.
link |
Going back to your on what more magical experiences
link |
can be launched in these kind of AI settings.
link |
I think again, if you give the control,
link |
it's possible certain parts of it.
link |
So we have a feature called followup mode
link |
where if you turn it on and Alexa,
link |
after you've spoken to it will open the mics again,
link |
thinking you will answer something again.
link |
Like if you're adding lists to your shopping items,
link |
shopping list or to do list, you're not done.
link |
So in that setting, it's awesome
link |
that it opens the mic for you to say eggs and milk
link |
and then bread, right?
link |
So these are the kind of things which you can empower.
link |
So, and then another feature we have
link |
which is called Alexa guard.
link |
I said it only listens for the wake word, all right?
link |
But if you have a, let's say you're going to say,
link |
Alexa, you leave your home and you want Alexa
link |
to listen for a couple of sound events
link |
like smoke alarm going off
link |
or someone breaking your glass, right?
link |
So it's like just to keep your peace of mind.
link |
So you can say Alexa on guard or I'm away
link |
or and then it can be listening for these sound events.
link |
And when you're home, you come out of that mode, right?
link |
So this is another one where you again gave controls
link |
in the hands of the user or the customer
link |
and to enable some experience that is high utility
link |
and maybe even more delightful in the certain settings
link |
like follow up mode and so forth.
link |
And again, this general principle is the same,
link |
control in the hands of the customer.
link |
So I know we kind of started with a lot of philosophy
link |
and a lot of interesting topics
link |
and we're just jumping all over the place.
link |
But really some of the fascinating things
link |
that the Alexa team and Amazon is doing
link |
is in the algorithm side, the data side,
link |
the technology, the deep learning, machine learning
link |
So can you give a brief history of Alexa
link |
from the perspective of just innovation,
link |
the algorithms, the data of how it was born,
link |
how it came to be, how it has grown, where it is today?
link |
Yeah, it starts with the, in Amazon,
link |
everything starts with the customer.
link |
And we have a process called working backwards.
link |
Alexa, and more specifically than the product Echo,
link |
there was a working backwards document essentially
link |
that reflected what it would be,
link |
started with a very simple vision statement, for instance,
link |
that morphed into a full fledged document
link |
along the way it changed into what all it can do, right?
link |
You can, but the inspiration was the Star Trek computer.
link |
So when you think of it that way,
link |
everything is possible, but when you launch a product,
link |
you have to start with someplace.
link |
And when I joined, the product was already in conception
link |
and we started working on the far field speech recognition
link |
because that was the first thing to solve.
link |
By that we mean that you should be able to speak
link |
to the device from a distance.
link |
And in those days, that wasn't a common practice.
link |
And even in the previous research world I was in,
link |
was considered to an unsolvable problem then
link |
in terms of whether you can converse from a length.
link |
And here I'm still talking about the first part
link |
of the problem where you say,
link |
get the attention of the device,
link |
as in by saying what we call the wake word,
link |
which means the word Alexa has to be detected
link |
with a very high accuracy because it is a very common word.
link |
It has sound units that map with words like I like you
link |
or Alec, Alex, right?
link |
So it's an undoubtedly hard problem to detect
link |
the right mentions of Alexa's address to the device
link |
versus I like Alexa.
link |
So you have to pick up that signal
link |
when there's a lot of noise.
link |
Not only noise, but a lot of conversation in the house, right?
link |
You remember on the device,
link |
you're simply listening for the wake word Alexa.
link |
And there's a lot of words being spoken in the house.
link |
How do you know it's Alexa?
link |
And directed at Alexa.
link |
Because I could say, I love my Alexa.
link |
I want Alexa to do this.
link |
And in all these three sentences I said Alexa,
link |
I didn't want it to wake up.
link |
So can I just pause on that second?
link |
What would be your device that I should probably
link |
in the introduction of this conversation give to people
link |
in terms of with them turning off their Alexa device,
link |
if they're listening to this podcast conversation out loud?
link |
Like what's the probability that an Alexa device
link |
will go off because we mentioned Alexa
link |
like a million times.
link |
So it will, we have done a lot of different things
link |
where we can figure out that there is the device,
link |
the speech is coming from a human versus over the air.
link |
Also, I mean, in terms of like also it is think about ads
link |
or so we also launched a technology
link |
for watermarking kind of approaches
link |
in terms of filtering it out.
link |
But yes, if this kind of a podcast is happening,
link |
it's possible your device will wake up a few times, right?
link |
It's an unsolved problem, but it is definitely
link |
something we care very much about.
link |
But the idea is you want to detect Alexa.
link |
Meant for the device.
link |
I mean, first of all, just even hearing Alexa
link |
versus I like something, I mean, that's a fascinating part.
link |
So that was the first relief.
link |
The world's best detector of Alexa.
link |
Yeah, the world's best wake word detector
link |
in a far field setting, not like something
link |
where the phone is sitting on the table.
link |
This is like people have devices 40 feet away,
link |
like in my house or 20 feet away
link |
and you still get an answer.
link |
So that was the first part.
link |
The next is, okay, you're speaking to the device.
link |
Of course, you're gonna issue many different requests.
link |
Some may be simple, some may be extremely hard,
link |
but it's a large vocabulary speech recognition problem,
link |
essentially, where the audio is now not coming
link |
onto your phone or a handheld mic like this
link |
or a close talking mic, but it's from 20 feet away
link |
where if you're in a busy household,
link |
your son may be listening to music,
link |
your daughter may be running around with something
link |
and asking your mom something and so forth, right?
link |
So this is like a common household setting
link |
where the words you're speaking to Alexa
link |
need to be recognized with very high accuracy, right?
link |
Now, we're still just in the recognition problem.
link |
You haven't yet come to the understanding one, right?
link |
And if we pause them, sorry, once again,
link |
what year was this, is this before neural networks
link |
began to start to seriously prove themselves
link |
in the audio space?
link |
Yeah, this is around, so I joined in 2013 in April, right?
link |
So the early research in neural networks coming back
link |
and showing some promising results
link |
in speech recognition space had started happening,
link |
but it was very early.
link |
But we just now build on that on the very first thing we did
link |
when I joined the team and remember,
link |
it was a very much of a startup environment,
link |
which is great about Amazon.
link |
And we doubled on deep learning right away
link |
and we knew we'll have to improve accuracy fast.
link |
And because of that, we worked on and the scale of data
link |
once you have a device like this, if it is successful,
link |
will improve big time.
link |
Like you'll suddenly have large volumes of data
link |
to learn from to make the customer experience better.
link |
So how do you scale deep learning?
link |
So we did one of the first works
link |
in training with distributed GPUs
link |
and where the training time was linear
link |
in terms of like in the amount of data.
link |
So that was quite important work
link |
where it was algorithmic improvements
link |
as well as a lot of engineering improvements
link |
to be able to train on thousands and thousands of speech.
link |
And that was an important factor.
link |
So if you ask me like back in 2013 and 2014
link |
when we launched Echo,
link |
the combination of large scale data,
link |
deep learning progress, near infinite GPUs
link |
we had available on AWS even then
link |
was all came together for us to be able to
link |
solve the far field speech recognition
link |
to the extent it could be useful to the customers.
link |
It's still not solved.
link |
Like I mean, it's not that we are perfect at recognizing speech
link |
but we are great at it in terms of the settings
link |
that are in homes, right?
link |
So and that was important even in the early stages.
link |
So first of all, just even I'm trying to look back
link |
at that time, if I remember correctly it was,
link |
it seems like the task would be pretty daunting.
link |
So like, so we kind of take it for granted that it works now.
link |
Yes, so you're right.
link |
So let me like how, first of all you mentioned startup
link |
I wasn't familiar how big the team was.
link |
I kind of, because I know there's a lot of really smart
link |
people working on it.
link |
So now it's very, very large team.
link |
How big was the team?
link |
How likely were you to fail in the highs of everyone else?
link |
And yourself, so like what?
link |
I'll give you a very interesting anecdote on that.
link |
When I joined the team, the speech recognition team
link |
was six people, my first meeting
link |
and we had hired a few more people, it was 10 people.
link |
Nine out of 10 people thought it can't be done, right?
link |
The one was me, say, actually I should say,
link |
and one was semi optimistic and eight were trying to convince
link |
let's go to the management and say,
link |
let's not work on this problem,
link |
let's work on some other problem like either telephony speech
link |
for customer service calls and so forth.
link |
But this is the kind of belief you must have.
link |
And I had experience with far field speech recognition
link |
and my eyes lit up when I saw a problem like that saying,
link |
okay, we have been in speech recognition
link |
always looking for that killer app.
link |
And this was a killer use case
link |
to bring something delightful in the hands of customers.
link |
So you mentioned the way you kind of think of it
link |
in the product way in the future,
link |
have a press release and an FAQ and you think backwards.
link |
Did you have, did the team have the echo in mind?
link |
So this far field speech recognition,
link |
actually putting a thing in the home that works
link |
that's able to interact with,
link |
was that the press release?
link |
Very close, I would say in terms of the,
link |
as I said, the vision was start a computer, right?
link |
So, or the inspiration.
link |
And from there, I can't divulge all the exact specifications,
link |
but one of the first things that was magical on Alexa
link |
It brought me to back to music
link |
because my taste is still in when I was an undergrad.
link |
So I still listen to those songs
link |
and I, it was too hard for me
link |
to be a music fan with a phone, right?
link |
So I hate things in my ears.
link |
So from that perspective, it was quite hard
link |
and music was part of the,
link |
at least the documents I've seen, right?
link |
So from that perspective, I think, yes,
link |
in terms of how far are we from the original vision?
link |
I can't reveal that,
link |
but that's why I have done a fun at work
link |
because every day we go in
link |
and thinking like these are the new set of challenges to solve.
link |
Yeah, it's a great way to do great engineering
link |
as you think of the product, the press release.
link |
I like that idea actually.
link |
Maybe we'll talk about it a bit later,
link |
which is a super nice way to have a focus.
link |
I'll tell you this, you're a scientist
link |
and a lot of my scientists have adopted that.
link |
They have now, they love it as a process
link |
because it was very,
link |
as scientists, you're trained to write great papers,
link |
but they are all after you've done the research
link |
or you've proven like, and your PhD dissertation proposal
link |
is something that comes closest
link |
or a DARPA proposal or a NSF proposal
link |
is the closest that comes to a press release.
link |
But that process is now ingrained in our scientists,
link |
which is like delightful for me to see.
link |
You write the paper first and then make it happen.
link |
I mean, in fact, it's not...
link |
State of the art results.
link |
Or you leave the results section open,
link |
but you have a thesis about here's what I expect, right?
link |
And here's what it will change, right?
link |
So I think it is a great thing.
link |
It works for researchers as well.
link |
So far field recognition.
link |
What was the big leap?
link |
What were the breakthroughs
link |
and what was that journey like to today?
link |
Yeah, I think the, as you said first,
link |
there was a lot of skepticism
link |
on whether far field speech recognition
link |
will ever work to be good enough, right?
link |
And what we first did was got a lot of training data
link |
in a far field setting.
link |
And that was extremely hard to get
link |
because none of it existed.
link |
So how do you collect data in far field setup, right?
link |
With no customer base at this time.
link |
With no customer base, right?
link |
So that was first innovation.
link |
And once we had that, the next thing was, okay,
link |
if you have the data, first of all,
link |
we didn't talk about like,
link |
what would magical mean in this kind of a setting?
link |
What is good enough for customers, right?
link |
That's always, since you've never done this before,
link |
what would be magical?
link |
So it wasn't just a research problem.
link |
You had to put some, in terms of accuracy
link |
and customer experience features,
link |
some stakes on the ground saying,
link |
here's where I think it should get to.
link |
So you established a bar
link |
and then how do you measure progress towards it?
link |
Given you have no customers right now.
link |
So from that perspective, we went,
link |
so first was the data without customers.
link |
Second was doubling down on deep learning as a way to learn.
link |
And I can just tell you that the combination of the two
link |
got our error rates by a factor of five.
link |
From where we were when I started to,
link |
within six months of having that data,
link |
we, at that point, I got the conviction
link |
that this will work, right?
link |
So because that was magical
link |
in terms of when it started working.
link |
And that reached the magic bar, became close to the magical bar.
link |
That to the bar, right?
link |
That we felt would be where people will use it,
link |
which was critical.
link |
Because you really have one chance at this.
link |
If we had launched in November,
link |
2014 is when we launched,
link |
if it was below the bar,
link |
I don't think this category exists if you don't meet the bar.
link |
Yeah, and just having looked at voice based interactions,
link |
like in the car or earlier systems,
link |
it's a source of huge frustration for people.
link |
In fact, we use voice based interaction
link |
for collecting data on subjects to measure frustration.
link |
So as a training set for computer vision, for face data,
link |
so we can get a data set of frustrated people.
link |
That's the best way to get frustrated people
link |
is having them interact with a voice based system in the car.
link |
So that bar, I imagine, was pretty high.
link |
And we talked about how also errors are perceived
link |
from AIs versus errors by humans.
link |
But we are not done with the problems that ended up,
link |
we had to solve to get it to launch.
link |
So do you want the next one?
link |
Yeah, the next one.
link |
So the next one was what I think of as
link |
multi domain natural language understanding.
link |
It's very, I wouldn't say easy,
link |
but it is during those days,
link |
solving it, understanding in one domain, a narrow domain,
link |
was doable, but for these multiple domains,
link |
like music, like information,
link |
other kinds of household productivity, alarms, timers,
link |
even though it wasn't as big as it is,
link |
in terms of the number of skills Alexa has
link |
in the confusion space has grown by three hours of magnitude,
link |
it was still daunting even those days.
link |
Again, no customer base yet.
link |
Again, no customer base.
link |
So now you're looking at meaning understanding
link |
and intent understanding and taking actions
link |
on behalf of customers based on their requests.
link |
And that is the next hard problem.
link |
Even if you have gotten the words recognized,
link |
how do you make sense of them?
link |
In those days, there was still a lot of emphasis
link |
on rule based systems for writing grammar patterns
link |
to understand the intent,
link |
but we had a statistical first approach even then,
link |
where for our language understanding,
link |
we had in even those starting days an entity recognizer
link |
and an intent classifier, which was all trained statistically.
link |
In fact, we had to build the deterministic matching
link |
as a follow up to fix bugs that statistical models have, right?
link |
So it was just a different mindset
link |
where we focused on data driven statistical understanding.
link |
When's in the end if you have a huge data set?
link |
Yes, it is contingent on that.
link |
And that's why it came back to how do you get the data?
link |
Before customers, the fact that this is why data
link |
becomes crucial to get to the point
link |
that you have the understanding system built in, built up.
link |
And notice that we were talking about human machine dialogue
link |
and even those early days,
link |
even it was very much transactional,
link |
do one thing, one shot utterances in great way.
link |
There was a lot of debate on how much should Alexa talk back
link |
in terms of if you misunderstood you
link |
or you said play songs by the stones
link |
and let's say it doesn't know early days,
link |
knowledge can be sparse.
link |
Who are the stones, right?
link |
It's the rolling stones, right?
link |
So, and you don't want the match
link |
to be stone temple pilots or rolling stones, right?
link |
So you don't know which one it is.
link |
So these kind of other signals to,
link |
no, there we had great assets, right?
link |
From Amazon in terms of.
link |
UX, like what is it?
link |
What kind of, yeah, how do you solve that problem?
link |
In terms of what we think of it
link |
as an entity resolution problem, right?
link |
So, which one is it, right?
link |
I mean, even if you figured out the stones as an entity,
link |
you have to resolve it to whether it's the stones
link |
or the stone temple pilots or some other stones.
link |
Maybe I misunderstood, is the resolution
link |
the job of the algorithm or is the job of UX
link |
communicating with the human to help the resolution?
link |
Well, there is both, right?
link |
It is, you want 90% or high 90s to be done
link |
without any further questioning or UX, right?
link |
So, but it's absolutely okay.
link |
Just like as humans, we ask the question,
link |
I didn't understand your legs.
link |
It's fine for Alexa to occasionally say,
link |
I did not understand you, right?
link |
And that's an important way to learn.
link |
And I'll talk about where we have come
link |
with more self learning with these kind of feedback signals.
link |
But in those days, just solving the ability
link |
of understanding the intent and resolving to an action
link |
where action could be play a particular artist
link |
or a particular song was super hard.
link |
Again, the bar was high as we were talking about, right?
link |
So, while we launched it in sort of 13 big domains,
link |
I would say in terms of our thing,
link |
we think of it as 13 of the big skills we had,
link |
like music is a massive one when we launched it.
link |
And now we have 90,000 plus skills on Alexa.
link |
So, what are the big skills?
link |
Can you just go over them?
link |
Because the only thing I use it for
link |
is music, weather, and shopping.
link |
So, we think of it as music information, right?
link |
So, whether it is a part of an information, right?
link |
So, when we launched, we didn't have smart home,
link |
but within, by smart home, I mean,
link |
you connect your smart devices,
link |
you control them with voice.
link |
If you haven't done it, it's worth,
link |
it will change your life.
link |
By turning on the lights and so on.
link |
Yeah, turning on your light to anything that's connected
link |
and has a, it's just that.
link |
What's your favorite smart device for you?
link |
And now you have the smart plug with,
link |
and you don't, you also have this Echo plug, which is...
link |
Oh yeah, you can plug in anything.
link |
You can plug in anything,
link |
and now you can turn that one on and off.
link |
I use this conversation motivation
link |
and get one and something.
link |
The garage door, you can check your status
link |
of the garage door and things like,
link |
and we have gone make Alexa more and more proactive,
link |
where it even has hunches now,
link |
that looks hunches like you left your light on.
link |
Let's say you've gone to your bed
link |
and you left the garage light on.
link |
So, it will help you out in these settings, right?
link |
That's smart devices.
link |
Information smart devices, you said music.
link |
Yeah, so I don't remember everything we had,
link |
but our last timers were the big ones,
link |
like that was, the timers were very popular right away.
link |
Music also, like you could play song, artist, album,
link |
everything, and so that was like a clear win
link |
in terms of the customer experience.
link |
So that's, again, this is language understanding.
link |
Now things have evolved, right?
link |
So where we want Alexa definitely to be
link |
more accurate, competent, trustworthy,
link |
based on how well it does these core things,
link |
but we have evolved in many different dimensions.
link |
First is what I think of her doing,
link |
more conversational for high utility,
link |
not just for chat, right?
link |
And there, at RIMARS this year,
link |
which is our AI conference,
link |
we launched what is called Alexa Conversations.
link |
That is providing the ability for developers to author
link |
multi turn experiences on Alexa with no code, essentially,
link |
where in terms of the dialogue code,
link |
initially it was like, you know, all these IVR systems,
link |
you have to fully author,
link |
if the customer says this, do that, right?
link |
So the whole dialogue flow is hand author.
link |
And with Alexa Conversations,
link |
the way it is that you just provide a sample interaction data
link |
with your service or an API,
link |
let's say your Atom tickets
link |
that provides a service for buying movie tickets.
link |
You provide a few examples
link |
of how your customers will interact with your APIs.
link |
And then the dialogue flow is automatically constructed
link |
using a regular neural network train on that data.
link |
So that simplifies the developer experience.
link |
We just launched our preview for the developers
link |
to try this capability out.
link |
And then the second part of it,
link |
which shows even increased utility for customers,
link |
is you and I, when we interact with Alexa or any customer,
link |
as I'm coming back to our initial part of the conversation,
link |
the goal is often unclear or unknown to the AI.
link |
If I say, Alexa, what movies are playing nearby?
link |
Am I trying to just buy movie tickets?
link |
Am I actually even, do you think I'm looking
link |
for just movies for curiosity,
link |
whether the Avengers is still in theater or when is it?
link |
Maybe it's gone and maybe it will come on my missed it.
link |
So I may watch it on prime, which happened to me.
link |
So from that perspective now,
link |
you're looking into what is my goal?
link |
And let's say I now complete the movie ticket purchase.
link |
Maybe I would like to get dinner nearby.
link |
So what is really the goal here?
link |
Is it night out or is it movies?
link |
As in just go watch a movie?
link |
The answer is, we don't know.
link |
So can Alexa now figure we have the intelligence
link |
that I think this meta goal is really night out
link |
or at least say to the customer
link |
when you've completed the purchase of movie tickets
link |
from Adam tickets or Fandango or Pick Your Anyone.
link |
Then the next thing is,
link |
do you want to get an Uber to the theater?
link |
Or do you want to book a restaurant next to it?
link |
And then not ask the same information over and over again.
link |
What time, how many people in your party?
link |
So this is where you shift the cognitive burden
link |
from the customer to the AI,
link |
where it's thinking of what is your,
link |
it anticipates your goal
link |
and takes the next best action to complete it.
link |
Now that's the machine learning problem.
link |
But essentially the way we saw this first instance
link |
and we have a long way to go to make it scale
link |
to everything possible in the world.
link |
But at least for this situation,
link |
it is from at every instance,
link |
Alexa is making the determination,
link |
whether it should stick with the experience
link |
with Adam tickets or offer or you,
link |
based on what you say,
link |
whether either you have completed the interaction
link |
or you said, no, get me an Uber now.
link |
So it will shift context
link |
into another experience or scale or another service.
link |
So that's a dynamic decision making.
link |
That's making Alexa,
link |
you can say more conversational
link |
for the benefit of the customer
link |
rather than simply complete transactions
link |
which are well thought through.
link |
You as a customer has fully specified
link |
what you want to be accomplished.
link |
It's accomplishing that.
link |
So it's kind of as,
link |
we do this with pedestrians,
link |
like intent modeling is predicting
link |
what your possible goals are
link |
and what's the most likely goal
link |
and then switching that depending on the things you say.
link |
So my question is there,
link |
it seems maybe it's a dumb question,
link |
but it would help a lot if Alexa remembered me
link |
what I said previously.
link |
Is it trying to use some memories
link |
for the customers?
link |
It is using a lot of memory within that.
link |
So right now, not so much in terms of,
link |
okay, which restaurant do you prefer?
link |
That is a more long term memory,
link |
but within the short term memory,
link |
within the session,
link |
it is remembering how many people did you,
link |
so if you said buy four tickets,
link |
now it has made an implicit assumption
link |
that you are gonna have,
link |
you need at least four seats at a restaurant, right?
link |
So these are the kind of context it's preserving
link |
between these skills, but within that session,
link |
but you're asking the right question
link |
in terms of for it to be more and more useful,
link |
it has to have more long term memory
link |
and that's also an open question.
link |
And again, these are still early days.
link |
So for me, I mean, everybody's different,
link |
but yeah, I'm definitely not representative
link |
of the general population
link |
in the sense that I do the same thing every day.
link |
Like I eat the same,
link |
like I do everything the same, the same thing.
link |
Wear the same thing clearly, this or the black shirt.
link |
So it's frustrating when Alexa doesn't get what I'm saying
link |
because I have to correct her every time in the exact same way.
link |
This has to do with certain songs.
link |
Like she doesn't know certain weird songs.
link |
And doesn't know, I've complained to Spotify about this,
link |
I talked to the head of RDA Spotify, Stairway to Heaven.
link |
I have to correct it every time.
link |
It doesn't play Led Zeppelin correctly.
link |
It plays cover of Led Zeppelin.
link |
So you should figure out,
link |
you should send me your next time it fails.
link |
Feel free to send it to me.
link |
You will take care of it.
link |
Because Led Zeppelin is one of my favorite brands
link |
that it works for me.
link |
So I'm like shocked, it doesn't work for you.
link |
This is an official bug report.
link |
I'll put it, I'll make it public or make everybody retweet it.
link |
We're gonna fix the Stairway to Heaven problem.
link |
Anyway, but the point is,
link |
you know, I'm pretty boring and do the same thing.
link |
But I'm sure most people do the same set of things.
link |
Do you see Alexa sort of utilizing that in the future
link |
for improving the experience?
link |
And not only utilizing, it's already doing some of it.
link |
We call it where Alexa is becoming more self learning.
link |
So Alexa is now auto correcting millions and millions
link |
of utterances in the US without any human supervision involved.
link |
The way it does it is, let's take an example
link |
of a particular song didn't work for you.
link |
What do you do next?
link |
You either, it played the wrong song and you said,
link |
Alexa, no, that's not the song I want.
link |
Or you say Alexa, play that, you try it again.
link |
And that is a signal to Alexa
link |
that she may have done something wrong.
link |
And from that perspective, we can learn
link |
if there's that failure pattern or that action
link |
of song A was played when song B was requested.
link |
And it's very common with station names because play NPR,
link |
you can have N be confused as an M.
link |
And then you, for a certain accent like mine,
link |
people confuse my N and M all the time.
link |
And because I have an Indian accent,
link |
they're confusable to humans.
link |
It is for Alexa too.
link |
And in that part, but it starts auto correcting.
link |
And we collect, we correct a lot of these automatically
link |
without a human looking at the failures.
link |
So the, one of the things that's for me missing in Alexa,
link |
I don't know if I'm a representative customer,
link |
but every time I correct it,
link |
it would be nice to know that that made a difference.
link |
You know what I mean?
link |
Like the sort of like, I heard you like a sort of.
link |
Some acknowledgement of that.
link |
We work a lot with Tesla, we study autopilot and so on.
link |
And a large amount of the customers that use Tesla autopilot,
link |
they feel like they're always teaching the system.
link |
They're almost excited by the possibility
link |
that they're teaching.
link |
I don't know if Alexa customers generally think of it
link |
as they're teaching to improve the system.
link |
And that's a really powerful thing.
link |
Again, I would say it's a spectrum.
link |
Some customers do think that way.
link |
And some would be annoyed by Alexa acknowledging that.
link |
Or so there's a, again, no one,
link |
you know, while there are certain patterns,
link |
not everyone is the same in this way.
link |
But we believe that again, customers helping Alexa
link |
is a tenet for us in terms of improving it.
link |
And more self learning is by, again,
link |
this is like fully unsupervised, right?
link |
There is no human in the loop and no labeling happening.
link |
And based on your actions as a customer,
link |
Alexa becomes smarter.
link |
Again, it's early days,
link |
but I think this whole area of teachable AI
link |
is gonna get bigger and bigger in the whole space,
link |
especially in the AI assistant space.
link |
So that's the second part where I mentioned
link |
more conversational, this is more self learning.
link |
The third is more natural.
link |
And the way I think of more natural
link |
is we talked about how Alexa sounds.
link |
And we've done a lot of advances in our text to speech
link |
by using again, neural network technology
link |
for it to sound very human like.
link |
From the individual texture of the sound
link |
to the timing, the tonality, the tone, everything.
link |
I would think in terms of,
link |
there's a lot of controls in each of the places
link |
for how, I mean, the speed of the voice,
link |
the prosthetic patterns,
link |
the actual smoothness of how it sounds.
link |
All of those are factored and we do ton of listening tests
link |
to make sure it was that,
link |
but naturalness, how it sounds should be very natural.
link |
How it understands requests is also very important.
link |
Like, and in terms of, like we have 95,000 skills
link |
and if we have, imagine that in many of these skills,
link |
you have to remember the skill name
link |
and say Alexa ask the tied skill to tell me X, right?
link |
Or now, if you have to remember the skill name,
link |
that means the discovery and the interaction is unnatural.
link |
And we are trying to solve that by what we think of as,
link |
again, this was, you don't have to have the app metaphor here.
link |
These are not individual apps, right?
link |
Even though they're,
link |
so you're not sort of opening one at a time and interacting.
link |
So it should be seamless because it's voice.
link |
And when it's voice,
link |
you have to be able to understand these requests
link |
independent of the specificity, like a skill name.
link |
And to do that, what we have done is again,
link |
built a deep learning base capability
link |
where we shortlist a bunch of skills
link |
when you say, Alexa, get me a car.
link |
And then we figure it out, okay,
link |
it's meant for an Uber skill versus a Lyft
link |
or based on your preferences.
link |
And then you can rank the responses from the skill
link |
and then choose the best response for the customer.
link |
So that's on the more natural,
link |
other examples of more natural is like,
link |
we were talking about lists, for instance.
link |
And you want to, you don't want to say Alexa add milk,
link |
Alexa add eggs, Alexa add cookies.
link |
No, Alexa add cookies, milk and eggs,
link |
and that in one shot, right?
link |
So that works, that helps with the naturalness.
link |
We talked about memory, like if you said,
link |
you can say Alexa, remember, I have to go to mom's house
link |
or you may have entered a calendar event
link |
through your calendar that's linked to Alexa.
link |
You don't want to remember whether it's in my calendar
link |
or did I tell you to remember something
link |
or some other reminder, right?
link |
So you have to now, independent of how customers
link |
create these events, it should just say Alexa,
link |
when do I have to go to mom's house?
link |
And it tells you when you have to go to mom's house.
link |
Now that's a fascinating problem.
link |
Who's that problem on?
link |
So there's people who create skills.
link |
Who's tasked with integrating all of that knowledge together?
link |
So the skills become seamless.
link |
Is it the creators of the skills or is it an infrastructure
link |
that Alexa provides problem?
link |
I think the large problem in terms
link |
of making sure your skill quality is high,
link |
that has to be done by our tools because it's a,
link |
so these skills, just to put the context,
link |
they're built through Alexa skills kit,
link |
which is a self serve way of building an experience on Alexa.
link |
This is like any developer in the world
link |
could go to Alexa skills kit
link |
and build an experience on Alexa.
link |
Like if you're a dominoes,
link |
you can build a domino skills.
link |
For instance, that does pizza ordering.
link |
When you've authored that,
link |
you do want to now, if people say Alexa open dominoes
link |
or Alexa ask dominoes to get a particular type of pizza,
link |
that will work, but the discovery is hard.
link |
You can't just say Alexa, get me a pizza
link |
and then Alexa figures out what to do.
link |
That latter part is definitely our responsibility
link |
in terms of when the request is not fully specific.
link |
How do you figure out what's the best skill
link |
or a service that can fulfill the customer's request?
link |
And it can keep evolving.
link |
Imagine going to the situation I said,
link |
which was the night out planning that it,
link |
the goal could be more than that individual request
link |
that came up a pizza ordering could mean a nighting.
link |
When you're having an event with your kids in their house
link |
and you're, so this is welcome to the word of conversational AI.
link |
This is super exciting because it's not the academic problem
link |
of NLP of natural English processing,
link |
understanding dialogue.
link |
This is like real world.
link |
And there's the stakes are high in the sense
link |
that customers get frustrated quickly,
link |
people get frustrated quickly.
link |
So you have to get it right.
link |
You have to get that interaction right.
link |
So it's, I love it.
link |
But so from that perspective, what are the challenges today?
link |
What are the problems that really need to be solved
link |
in the next few years?
link |
I think first and foremost, as I mentioned that
link |
get the basics right is still true.
link |
Basically, even the one shot request,
link |
which we think of as transactional request
link |
needs to work magically, no question about that.
link |
If it doesn't turn your light on and off,
link |
you'll be super frustrated.
link |
Even if I can complete the night out for you
link |
and not do that, that is unacceptable for as a customer.
link |
So that you have to get the foundational understanding
link |
The second aspect when I said more conversational
link |
is, as you imagine, is more about reasoning.
link |
It is really about figuring out what the latent goal is
link |
of the customer based on what I have the information now
link |
and the history and what's the next best thing to do.
link |
So that's a complete reasoning
link |
and decision making problem.
link |
Just like your self driving car,
link |
but the goal is still more finite.
link |
Here it evolves, your environment is super hard
link |
and self driving and the cost of a mistake is huge.
link |
Here, but there are certain similarities,
link |
but if you think about how many decisions Alexa is making
link |
or evaluating at any given time,
link |
it's a huge hypothesis space.
link |
And we're only talked about so far
link |
about what I think of reactive decision
link |
in terms of you asked for something
link |
and Alexa is reacting to it.
link |
If you bring the proactive part,
link |
which is Alexa having hunches.
link |
So any given instance then,
link |
it's really a decision at any given point
link |
based on the information.
link |
Alexa has to determine what's the best thing it needs to do.
link |
So these are the ultimate AI problem
link |
about decisions based on the information you have.
link |
Do you think, just from my perspective,
link |
I work a lot with sensing of the human face.
link |
Do you think they'll,
link |
and we touched this topic a little bit earlier,
link |
but do you think it'll be a day soon
link |
when Alexa can also look at you
link |
to help improve the quality of the hunch it has
link |
or at least detect frustration or detect,
link |
improve the quality of its perception
link |
of what you're trying to do?
link |
I mean, let me again bring back to what it already does.
link |
We talked about how based on you barge in over Alexa,
link |
clearly it's a very high probability
link |
it must have done something wrong.
link |
That's why you barged in.
link |
The next extension of whether frustration
link |
is a signal or not,
link |
of course, is a natural thought
link |
in terms of how that should be in a signal too.
link |
You can get that from voice.
link |
You can get from voice, but it's very hard.
link |
Like, I mean, frustration as a signal historically,
link |
if you think about emotions of different kinds,
link |
there's a whole field of affective computing,
link |
something that MIT has also done a lot of research in,
link |
And you're now talking about a far field device
link |
as in you're talking to a distance, noisy environment.
link |
And in that environment,
link |
it needs to have a good sense for your emotions.
link |
This is a very, very hard problem.
link |
Very hard problem, but you haven't shied away
link |
from hard problems.
link |
So deep learning has been at the core
link |
of a lot of this technology.
link |
Are you optimistic about the current deep learning approaches
link |
to solving the hardest aspects of what we're talking about?
link |
Or do you think there will come a time
link |
where new ideas need to,
link |
if you look at reasoning,
link |
so open AI, deep mind,
link |
a lot of folks are now starting to work in reasoning,
link |
trying to see how it can make neural networks reason.
link |
Do you see that new approaches need to be invented
link |
to take the next big leap?
link |
Absolutely, I think there has to be a lot more investment
link |
and I think in many different ways.
link |
And there are these, I would say nuggets of research
link |
forming in a good way,
link |
like learning with less data
link |
or like zero short learning, one short learning.
link |
And the active learning stuff you've talked about
link |
is an incredible stuff.
link |
So transfer learning is also super critical,
link |
especially when you're thinking about applying knowledge
link |
from one task to another or one language to another, right?
link |
That's really ripe.
link |
So these are great pieces.
link |
Deep learning has been useful too.
link |
And now we are sort of matting deep learning
link |
with transfer learning and active learning,
link |
of course, that's more straightforward
link |
in terms of applying deep learning
link |
and an active learning setup.
link |
But I do think in terms of now looking
link |
into more reasoning based approaches
link |
is going to be key for our next wave of the technology.
link |
But there is a good news.
link |
The good news is that I think for keeping on
link |
to delight customers,
link |
that a lot of it can be done by prediction tasks.
link |
So, and so we haven't exhausted that.
link |
So we don't need to give up
link |
on the deep learning approaches for that.
link |
So that's just, I wanted to sort of
link |
creating a rich, fulfilling, amazing experience
link |
that makes Amazon a lot of money
link |
and a lot of everybody a lot of money
link |
because it does awesome things, deep learning is enough.
link |
The point, the point.
link |
I don't think, no, I mean,
link |
I wouldn't say deep learning is enough.
link |
I think for the purposes of Alexa
link |
accomplish the task for customers,
link |
I'm saying there's still a lot of things we can do
link |
with prediction based approaches that do not reason.
link |
I'm not saying that, and we haven't exhausted those,
link |
but for the kind of high utility experiences
link |
that I'm personally passionate about
link |
of what Alexa needs to do, reasoning has to be solved.
link |
To the same extent as you can think
link |
of natural language understanding
link |
and speech recognition to the extent
link |
of understanding intents has been,
link |
how accurate it has become.
link |
But reasoning, we have very, very early days.
link |
Let me ask you another way.
link |
How hard of a problem do you think that is?
link |
I would say hardest of them because again,
link |
the hypothesis space is really, really large.
link |
And when you go back in time, like you were saying,
link |
I want Alexa to remember more things.
link |
That once you go beyond a session of interaction,
link |
which is by session, I mean a time span,
link |
which is today, two verses remembering
link |
which restaurant I like.
link |
And then when I'm planning a night out to say,
link |
do you want to go to the same restaurant?
link |
Now you're up the stakes big time.
link |
And this is where the reasoning dimension
link |
also goes way, way bigger.
link |
So you think the space, we'll be elaborating
link |
that a little bit, just philosophically speaking.
link |
Do you think when you reason about trying to model
link |
what the goal of a person is in the context
link |
of interacting with Alexa, you think that space is huge?
link |
Do you think so like another sort of devil's advocate
link |
would be that we human beings are really simple
link |
and we all want like just a small set of things.
link |
And so you think it's possible because we're not talking
link |
about a fulfilling general conversation.
link |
Perhaps actually the Alexa prize
link |
is a little bit more about after that.
link |
Creating a customer, like there's so many
link |
of the interactions, it feels like are clustered
link |
in groups that don't require general reasoning.
link |
I think yeah, you're right in terms of the head
link |
of the distribution of all the possible things
link |
customers may want to accomplish.
link |
But the tail is long and it's diverse, right?
link |
So from that perspective, I think you have
link |
to solve that problem otherwise.
link |
And everyone's very different.
link |
Like I mean, we see this already in terms of the skills, right?
link |
I mean, if you're an average surfer, which I am not, right?
link |
But somebody is asking Alexa about surfing conditions, right?
link |
And there's a skill that is there for them to get to, right?
link |
That tells you that the tail is massive.
link |
Like in terms of like what kind of skills
link |
people have created, it's humongous in terms of it.
link |
And which means there are these diverse needs.
link |
And when you start looking at the combinations of these, right?
link |
Even if you have pairs of skills and 90,000 choose two,
link |
it's still a big combination.
link |
So I'm saying there's a huge to do here now.
link |
And I think customers are wonderfully frustrated with things
link |
and they have to keep getting to do better things for them.
link |
And they're not known to be super patient.
link |
So you have to do it fast.
link |
You have to do it fast.
link |
So you've mentioned the idea of a press release,
link |
the research and development, Amazon, Alexa and Amazon in general,
link |
you kind of think of what the future product will look like
link |
and you kind of make it happen, you work backwards.
link |
So can you draft for me?
link |
You probably already have one, but can you make up one?
link |
For 10, 20, 30, 40 years out
link |
that you see the Alexa team putting out
link |
just in broad strokes, something that you dream about?
link |
I think let's start with the five years first.
link |
Right? So and I'll get to the 40 is too hard to take.
link |
Because I'm pretty sure you have a real five year one.
link |
Because I didn't want to.
link |
But yeah, in broad strokes, let's start with five years.
link |
I think the five years is where, I mean, I think of in these spaces,
link |
it's hard, especially if you're in the pick of things
link |
to think beyond the five years space
link |
because a lot of things change, right?
link |
I mean, if you ask me five years back,
link |
will Alexa will be here?
link |
I wouldn't have, I think it has surpassed
link |
my imagination of that time, right?
link |
So I think from the next five years perspective,
link |
from an AI perspective, what we're going to see
link |
is that notion which you said,
link |
goal oriented dialogues and open domain like Alexa Prize.
link |
I think that bridge is going to get closed.
link |
They won't be different.
link |
And I'll give you why that's the case.
link |
You mentioned shopping. How do you shop?
link |
Do you shop in one shot?
link |
Sure, your AA batteries, paper towels, yes.
link |
How long does it take for you to buy a camera?
link |
You do a ton of research.
link |
Then you make a decision.
link |
So is that a goal oriented dialogue
link |
when somebody says, Alexa, find me a camera?
link |
Is it simply inquisitiveness, right?
link |
So even in the something that you think of it as shopping,
link |
which you said, you yourself use a lot of.
link |
If you go beyond where it's reorders or items
link |
where you sort of are not brand conscious and so forth.
link |
So that was just in shot.
link |
Yeah, just to comment quickly,
link |
I've never bought anything through Alexa
link |
that I haven't bought before on Amazon on the desktop
link |
after I clicked in a bunch of, read a bunch of reviews,
link |
that kind of stuff. So it's repurchase.
link |
So now you think in even for something that you felt like
link |
is a finite goal, I think the space is huge
link |
because even products, the attributes are many,
link |
like and you want to look at reviews some on Amazon,
link |
some outside, some you want to look at what CNET is saying
link |
or another consumer forum is saying
link |
about even a product, for instance, right?
link |
So that's just shopping where you could argue
link |
the ultimate goal is sort of known.
link |
And we haven't talked about Alexa,
link |
what's the weather in Cape Cod this weekend, right?
link |
So why am I asking that weather question, right?
link |
So I think of it as how do you complete goals
link |
with minimum steps for our customers, right?
link |
And when you think of it that way,
link |
the distinction between goal oriented and conversations
link |
for open domain say goes away.
link |
I may want to know what happened in the presidential debate,
link |
right? And is it I'm seeking just information
link |
or I'm looking at who's winning, winning the debates, right?
link |
So these are all quite hard problems.
link |
So even the five year horizon problem,
link |
I'm like, I sure hope we'll solve these.
link |
And you're optimistic because that's a hard problem.
link |
The reasoning enough to be able to help explore
link |
complex goals that are beyond something simplistic.
link |
That feels like it could be, well, five years is a nice.
link |
Is a nice bar for that, right?
link |
I think you will, it's a nice ambition.
link |
And do we have press releases for that?
link |
Absolutely, can I tell you what specifically
link |
the roadmap will be now, right?
link |
And will we solve all of it in the five year space now?
link |
This will work on this forever, actually.
link |
This is the hardest of the AI problems.
link |
And I don't see that being solved even in a 40 year horizon
link |
because even if you limit to the human intelligence,
link |
we know we are quite far from that.
link |
In fact, every aspects of our sensing to neural processing
link |
to how brain stores information and how it processes it,
link |
we don't yet know how to represent knowledge, right?
link |
So we are still in those early stages.
link |
So I wanted to start, that's why at the five year.
link |
Because the five year success would look like that
link |
and solving these complex goals.
link |
And the 40 year would be where it's just natural
link |
to talk to these in terms of more of these complex goals.
link |
Right now, we've already come to the point where
link |
these transactions you mentioned of asking for weather
link |
or reordering something or listening to your favorite tune,
link |
it's natural for you to ask Alexa.
link |
It's now unnatural to pick up your phone, right?
link |
And that I think is the first five year transformation.
link |
The next five year transformation would be,
link |
okay, I can plan my weekend with Alexa
link |
or I can plan my next meal with Alexa
link |
or my next night out with seamless effort.
link |
So just to pause and look back at the big picture of it all,
link |
it's a year apart of a large team
link |
that's creating a system that's in the home,
link |
that's not human, that gets to interact with human beings.
link |
So we human beings, we these descendants of apes
link |
have created an artificial intelligence system
link |
that's able to have conversations.
link |
I mean, that to me, the two most transformative robots
link |
of this century, I think will be autonomous vehicles,
link |
but they're a little bit transformative in a more boring way.
link |
I think conversational agents in the home is like an experience.
link |
How does that make you feel
link |
that you're at the center of creating that?
link |
Did you sit back in awe sometimes?
link |
What is your feeling?
link |
What is your feeling about the whole mess of it?
link |
Can you even believe that we're able to create something like this?
link |
I think it's a privilege.
link |
I'm so fortunate, like where I ended up, right?
link |
And it's been a long journey,
link |
like I've been in this space for a long time in Cambridge, right?
link |
And it's so heartwarming to see
link |
the kind of adoption conversational agents are having now.
link |
Five years back, it was almost like,
link |
should I move out of this because we are unable to find
link |
this killer application that customers would love
link |
that would not simply be a good to have thing in research labs.
link |
And it's so fulfilling to see it make a difference
link |
to millions and billions of people worldwide.
link |
The good thing is that it's still very early.
link |
So I have another 20 years of job security doing what I love.
link |
Like, so I think from that perspective, I feel,
link |
I tell every researcher that joins or every member of my team,
link |
that this is a unique privilege.
link |
Like, I think, and we have,
link |
and I would say not just launching Alexa in 2014,
link |
which was first of its kind,
link |
along the way we have, when we launched Alexa SkillsKit,
link |
it became democratizing AI,
link |
when before that there was no good evidence
link |
of an SDK for speech and language.
link |
Now we are coming to this where you and I,
link |
having this conversation where I'm not saying,
link |
oh, Lex, planning a night out with an AI agent, impossible.
link |
I'm saying it's in the realm of possibility
link |
and not only possibility will be launching this, right?
link |
So some elements of that,
link |
every, it will keep getting better.
link |
We know that is a universal truth.
link |
Once you have these kind of agents out there being used,
link |
they get better for your customers.
link |
And I think that's where,
link |
I think the amount of research topics we are throwing out
link |
at our budding researchers
link |
is just gonna be exponentially hard.
link |
And the great thing is you can now get immense satisfaction
link |
by having customers use it,
link |
not just a paper and new reps or another conference.
link |
I think everyone, myself included,
link |
are deeply excited about that feature.
link |
So I don't think there's a better place to end, Rohit.
link |
Thank you so much for talking to us.
link |
Thank you so much.
link |
Thank you, same here.
link |
Thanks for listening to this conversation
link |
with Rohit Prasad.
link |
And thank you to our presenting sponsor, Cash App.
link |
Download it, use code LEGS Podcast.
link |
You'll get $10 and $10 will go to first,
link |
a STEM education nonprofit
link |
that inspires hundreds of thousands of young minds
link |
to learn and to dream of engineering our future.
link |
If you enjoy this podcast, subscribe on YouTube,
link |
give it five stars on Apple Podcast,
link |
support it on Patreon,
link |
or connect with me on Twitter.
link |
And now let me leave you with some words of wisdom
link |
from the great Alan Turing.
link |
Sometimes it is the people no one can imagine anything of
link |
who do the things no one can imagine.
link |
Thank you for listening and hope to see you next time.