back to indexRohit Prasad: Amazon Alexa and Conversational AI | Lex Fridman Podcast #57
link |
The following is a conversation with Rohit Prasad.
link |
He's the vice president and head scientist of Amazon Alexa
link |
and one of its original creators.
link |
The Alexa team embodies some of the most challenging,
link |
incredible, impactful, and inspiring work
link |
that is done in AI today.
link |
The team has to both solve problems
link |
at the cutting edge of natural language processing
link |
and provide a trustworthy, secure, and enjoyable experience
link |
to millions of people.
link |
This is where state of the art methods
link |
in computer science meet the challenges
link |
of real world engineering.
link |
In many ways, Alexa and the other voice assistants
link |
are the voices of artificial intelligence
link |
to millions of people and an introduction to AI
link |
for people who have only encountered it in science fiction.
link |
This is an important and exciting opportunity.
link |
So the work that Rohit and the Alexa team are doing
link |
is an inspiration to me and to many researchers
link |
and engineers in the AI community.
link |
This is the Artificial Intelligence Podcast.
link |
If you enjoy it, subscribe on YouTube,
link |
give it five stars on Apple Podcast, support it on Patreon,
link |
or simply connect with me on Twitter,
link |
at Lex Friedman, spelled F R I D M A N.
link |
If you leave a review on Apple Podcasts especially,
link |
but also cast box or comment on YouTube,
link |
consider mentioning topics, people, ideas, questions,
link |
quotes in science, tech, or philosophy
link |
that you find interesting,
link |
and I'll read them on this podcast.
link |
I won't call out names, but I love comments
link |
with kindness and thoughtfulness in them,
link |
so I thought I'd share them.
link |
Someone on YouTube highlighted a quote
link |
from the conversation with Ray Dalio,
link |
where he said that you have to appreciate
link |
all the different ways that people can be A players.
link |
This connected me to, on teams of engineers,
link |
it's easy to think that raw productivity
link |
is the measure of excellence, but there are others.
link |
I've worked with people who brought a smile to my face
link |
every time I got to work in the morning.
link |
Their contribution to the team is immeasurable.
link |
I recently started doing podcast ads
link |
at the end of the introduction.
link |
I'll do one or two minutes after introducing the episode,
link |
and never any ads in the middle
link |
that break the flow of the conversation.
link |
I hope that works for you.
link |
It doesn't hurt the listening experience.
link |
This show is presented by Cash App,
link |
the number one finance app in the App Store.
link |
I personally use Cash App to send money to friends,
link |
but you can also use it to buy, sell,
link |
and deposit Bitcoin in just seconds.
link |
Cash App also has a new investing feature.
link |
You can buy fractions of a stock, say $1 worth,
link |
no matter what the stock price is.
link |
Brokerage services are provided by Cash App Investing,
link |
a subsidiary of Square and member SIPC.
link |
I'm excited to be working with Cash App
link |
to support one of my favorite organizations called First,
link |
best known for their FIRST Robotics and Lego competitions.
link |
They educate and inspire hundreds of thousands of students
link |
in over 110 countries, and have a perfect rating
link |
on Charity Navigator, which means that donated money
link |
is used to maximum effectiveness.
link |
When you get Cash App from the App Store, Google Play,
link |
and use code LexPodcast, you'll get $10,
link |
and Cash App will also donate $10 to FIRST,
link |
which again, is an organization that I've personally seen
link |
inspire girls and boys to dream
link |
of engineering a better world.
link |
This podcast is also supported by ZipRecruiter.
link |
Hiring great people is hard, and to me,
link |
is one of the most important elements
link |
of a successful mission driven team.
link |
I've been fortunate to be a part of,
link |
and lead several great engineering teams.
link |
The hiring I've done in the past was mostly through tools
link |
we built ourselves, but reinventing the wheel was painful.
link |
ZipRecruiter is a tool that's already available for you.
link |
It seeks to make hiring simple, fast, and smart.
link |
For example, Codable cofounder, Gretchen Huebner,
link |
used ZipRecruiter to find a new game artist
link |
to join our education tech company.
link |
By using ZipRecruiter's screening questions
link |
to filter candidates, Gretchen found it easier
link |
to focus on the best candidates,
link |
and finally, hiring the perfect person for the role,
link |
in less than two weeks, from start to finish.
link |
ZipRecruiter, the smartest way to hire.
link |
See why ZipRecruiter is effective for businesses
link |
of all sizes by signing up,
link |
as I did, for free, at ziprecruiter.com slash lexpod.
link |
That's ziprecruiter.com slash lexpod.
link |
And now, here's my conversation with Rohit Prasad.
link |
In the movie Her, I'm not sure if you've ever seen it.
link |
Human falls in love with the voice of an AI system.
link |
Let's start at the highest philosophical level
link |
before we get to deep learning and some of the fun things.
link |
Do you think this, what the movie Her shows,
link |
is within our reach?
link |
I think not specifically about Her,
link |
but I think what we are seeing is a massive increase
link |
in adoption of AI assistance, or AI,
link |
in all parts of our social fabric.
link |
And I think it's, what I do believe,
link |
is that the utility these AIs provide,
link |
some of the functionalities that are shown
link |
are absolutely within reach.
link |
So some of the functionality
link |
in terms of the interactive elements,
link |
but in terms of the deep connection,
link |
that's purely voice based.
link |
Do you think such a close connection is possible
link |
It's been a while since I saw Her,
link |
but I would say in terms of interactions
link |
which are both human like and in these AI systems,
link |
you have to value what is also superhuman.
link |
We as humans can be in only one place.
link |
AI assistance can be in multiple places at the same time.
link |
One with you on your mobile device,
link |
one at your home, one at work.
link |
So you have to respect these superhuman capabilities too.
link |
Plus as humans, we have certain attributes
link |
we are very good at, very good at reasoning.
link |
AI assistance not yet there,
link |
but in the realm of AI assistance,
link |
what they're great at is computation, memory,
link |
it's infinite and pure.
link |
These are the attributes you have to start respecting.
link |
So I think the comparison with human like
link |
versus the other aspect, which is also superhuman,
link |
has to be taken into consideration.
link |
So I think we need to elevate the discussion
link |
to not just human like.
link |
So there's certainly elements,
link |
we just mentioned, Alexa is everywhere,
link |
computation speaking.
link |
So this is a much bigger infrastructure
link |
than just the thing that sits there in the room with you.
link |
But it certainly feels to us mere humans
link |
that there's just another little creature there
link |
when you're interacting with it.
link |
You're not interacting with the entirety
link |
of the infrastructure, you're interacting with the device.
link |
The feeling is, okay, sure, we anthropomorphize things,
link |
but that feeling is still there.
link |
So what do you think we as humans,
link |
the purity of the interaction with a smart device,
link |
interaction with a smart assistant,
link |
what do you think we look for in that interaction?
link |
I think in the certain interactions
link |
I think will be very much where it does feel like a human
link |
because it has a persona of its own.
link |
And in certain ones it wouldn't be.
link |
So I think a simple example to think of it
link |
is if you're walking through the house
link |
and you just wanna turn on your lights on and off
link |
and you're issuing a command,
link |
that's not very much like a human like interaction
link |
and that's where the AI shouldn't come back
link |
and have a conversation with you,
link |
just it should simply complete that command.
link |
So those, I think the blend of,
link |
we have to think about this is not human, human alone.
link |
It is a human machine interaction
link |
and certain aspects of humans are needed
link |
and certain aspects are in situations
link |
demand it to be like a machine.
link |
So I told you, it's gonna be philosophical in parts.
link |
What's the difference between human and machine
link |
in that interaction?
link |
When we interact to humans,
link |
especially those are friends and loved ones
link |
versus you and a machine that you also are close with.
link |
I think the, you have to think about the roles
link |
the AI plays, right?
link |
So, and it differs from different customer to customer,
link |
different situation to situation,
link |
especially I can speak from Alexa's perspective.
link |
It is a companion, a friend at times,
link |
an assistant, an advisor down the line.
link |
So I think most AIs will have this kind of attributes
link |
and it will be very situational in nature.
link |
So where is the boundary?
link |
I think the boundary depends on exact context
link |
in which you're interacting with the AI.
link |
So the depth and the richness
link |
of natural language conversation
link |
is been by Alan Turing been used to try to define
link |
what it means to be intelligent.
link |
There's a lot of criticism of that kind of test,
link |
but what do you think is a good test of intelligence
link |
in your view, in the context of the Turing test
link |
and Alexa or the Alexa prize, this whole realm,
link |
do you think about this human intelligence,
link |
what it means to define it,
link |
what it means to reach that level?
link |
I do think the ability to converse
link |
is a sign of an ultimate intelligence.
link |
I think that there's no question about it.
link |
So if you think about all aspects of humans,
link |
there are sensors we have,
link |
and those are basically a data collection mechanism.
link |
And based on that,
link |
we make some decisions with our sensory brains, right?
link |
And from that perspective,
link |
I think there are elements we have to talk about
link |
how we sense the world
link |
and then how we act based on what we sense.
link |
Those elements clearly machines have,
link |
but then there's the other aspects of computation
link |
that is way better.
link |
I also mentioned about memory again,
link |
in terms of being near infinite,
link |
depending on the storage capacity you have,
link |
and the retrieval can be extremely fast and pure
link |
in terms of like, there's no ambiguity
link |
of who did I see when, right?
link |
I mean, machines can remember that quite well.
link |
So again, on a philosophical level,
link |
I do subscribe to the fact that to be able to converse
link |
and as part of that, to be able to reason
link |
based on the world knowledge you've acquired
link |
and the sensory knowledge that is there
link |
is definitely very much the essence of intelligence.
link |
But intelligence can go beyond human level intelligence
link |
based on what machines are getting capable of.
link |
So what do you think maybe stepping outside of Alexa
link |
broadly as an AI field,
link |
what do you think is a good test of intelligence?
link |
Put it another way outside of Alexa,
link |
because so much of Alexa is a product,
link |
is an experience for the customer.
link |
On the research side,
link |
what would impress the heck out of you if you saw,
link |
you know, what is the test where you said,
link |
wow, this thing is now starting to encroach
link |
into the realm of what we loosely think
link |
of as human intelligence?
link |
So, well, we think of it as AGI
link |
and human intelligence altogether, right?
link |
So in some sense, and I think we are quite far from that.
link |
I think an unbiased view I have
link |
is that the Alexa's intelligence capability is a great test.
link |
I think of it as there are many other true points
link |
like self driving cars, game playing like go or chess.
link |
Let's take those two for as an example,
link |
clearly requires a lot of data driven learning
link |
and intelligence, but it's not as hard a problem
link |
as conversing with, as an AI is with humans
link |
to accomplish certain tasks or open domain chat,
link |
as you mentioned, Alexa prize.
link |
In those settings, the key differences
link |
that the end goal is not defined unlike game playing.
link |
You also do not know exactly what state you are in
link |
in a particular goal completion scenario.
link |
In certain sense, sometimes you can,
link |
if it's a simple goal, but if you're even certain examples
link |
like planning a weekend or you can imagine
link |
how many things change along the way,
link |
you look for whether you may change your mind
link |
and you change the destination,
link |
or you want to catch a particular event
link |
and then you decide, no, I want this other event
link |
So these dimensions of how many different steps
link |
are possible when you're conversing as a human
link |
with a machine makes it an extremely daunting problem.
link |
And I think it is the ultimate test for intelligence.
link |
And don't you think that natural language is enough to prove
link |
that conversation, just pure conversation?
link |
From a scientific standpoint,
link |
natural language is a great test,
link |
but I would go beyond, I don't want to limit it
link |
to as natural language as simply understanding an intent
link |
or parsing for entities and so forth.
link |
We are really talking about dialogue.
link |
So I would say human machine dialogue
link |
is definitely one of the best tests of intelligence.
link |
So can you briefly speak to the Alexa Prize
link |
for people who are not familiar with it,
link |
and also just maybe where things stand
link |
and what have you learned and what's surprising?
link |
What have you seen that's surprising
link |
from this incredible competition?
link |
Absolutely, it's a very exciting competition.
link |
Alexa Prize is essentially a grand challenge
link |
in conversational artificial intelligence,
link |
where we threw the gauntlet to the universities
link |
who do active research in the field,
link |
to say, can you build what we call a social bot
link |
that can converse with you coherently
link |
and engagingly for 20 minutes?
link |
That is an extremely hard challenge,
link |
talking to someone who you're meeting for the first time,
link |
or even if you've met them quite often,
link |
to speak at 20 minutes on any topic,
link |
an evolving nature of topics is super hard.
link |
We have completed two successful years of the competition.
link |
The first was won with the University of Washington,
link |
second, the University of California.
link |
We are in our third instance.
link |
We have an extremely strong team of 10 cohorts,
link |
and the third instance of the Alexa Prize is underway now.
link |
And we are seeing a constant evolution.
link |
First year was definitely a learning.
link |
It was a lot of things to be put together.
link |
We had to build a lot of infrastructure
link |
to enable these universities
link |
to be able to build magical experiences
link |
and do high quality research.
link |
Just a few quick questions, sorry for the interruption.
link |
What does failure look like in the 20 minute session?
link |
So what does it mean to fail,
link |
not to reach the 20 minute mark?
link |
Oh, awesome question.
link |
So there are one, first of all,
link |
I forgot to mention one more detail.
link |
It's not just 20 minutes,
link |
but the quality of the conversation too that matters.
link |
And the beauty of this competition
link |
before I answer that question on what failure means
link |
is first that you actually converse
link |
with millions and millions of customers
link |
as the social bots.
link |
So during the judging phases, there are multiple phases,
link |
before we get to the finals,
link |
which is a very controlled judging in a situation
link |
where we bring in judges
link |
and we have interactors who interact with these social bots,
link |
that is a much more controlled setting.
link |
But till the point we get to the finals,
link |
all the judging is essentially by the customers of Alexa.
link |
And there you basically rate on a simple question,
link |
how good your experience was.
link |
So that's where we are not testing
link |
for a 20 minute boundary being crossed,
link |
because you do want it to be very much like a clear cut,
link |
winner, be chosen, and it's an absolute bar.
link |
So did you really break that 20 minute barrier
link |
is why we have to test it in a more controlled setting
link |
with actors, essentially interactors.
link |
And see how the conversation goes.
link |
So this is why it's a subtle difference
link |
between how it's being tested in the field
link |
with real customers versus in the lab to award the prize.
link |
So on the latter one, what it means is that
link |
essentially there are three judges
link |
and two of them have to say
link |
this conversation has stalled, essentially.
link |
And the judges are human experts.
link |
Judges are human experts.
link |
So this is in the third year.
link |
So what's been the evolution?
link |
How far, so the DARPA challenge in the first year,
link |
the autonomous vehicles, nobody finished.
link |
In the second year, a few more finished in the desert.
link |
So how far along in this,
link |
I would say much harder challenge are we?
link |
This challenge has come a long way
link |
to the extent that we're definitely not close
link |
to the 20 minute barrier being with coherence
link |
and engaging conversation.
link |
I think we are still five to 10 years away
link |
in that horizon to complete that.
link |
But the progress is immense.
link |
Like what you're finding is the accuracy
link |
and what kind of responses these social bots generate
link |
is getting better and better.
link |
What's even amazing to see that now there's humor coming in.
link |
The bots are quite...
link |
You know, you're talking about
link |
ultimate science of intelligence.
link |
I think humor is a very high bar
link |
in terms of what it takes to create humor.
link |
And I don't mean just being goofy.
link |
I really mean good sense of humor
link |
is also a sign of intelligence in my mind
link |
and something very hard to do.
link |
So these social bots are now exploring
link |
not only what we think of natural language abilities,
link |
but also personality attributes
link |
and aspects of when to inject an appropriate joke,
link |
when you don't know the domain,
link |
how you come back with something more intelligible
link |
so that you can continue the conversation.
link |
If you and I are talking about AI
link |
and we are domain experts, we can speak to it.
link |
But if you suddenly switch a topic to that I don't know of,
link |
how do I change the conversation?
link |
So you're starting to notice these elements as well.
link |
And that's coming from partly by the nature
link |
of the 20 minute challenge
link |
that people are getting quite clever
link |
on how to really converse
link |
and essentially mask some of the understanding defects
link |
So some of this, this is not Alexa, the product.
link |
This is somewhat for fun, for research,
link |
for innovation and so on.
link |
I have a question sort of in this modern era,
link |
there's a lot of, if you look at Twitter and Facebook
link |
and so on, there's discourse, public discourse going on
link |
and some things that are a little bit too edgy,
link |
people get blocked and so on.
link |
I'm just out of curiosity,
link |
are people in this context pushing the limits?
link |
Is anyone using the F word?
link |
Is anyone sort of pushing back
link |
sort of arguing, I guess I should say,
link |
as part of the dialogue to really draw people in?
link |
First of all, let me just back up a bit
link |
in terms of why we are doing this, right?
link |
So you said it's fun.
link |
I think fun is more part of the engaging part for customers.
link |
It is one of the most used skills as well
link |
in our skill store.
link |
But up that apart, the real goal was essentially
link |
what was happening is with a lot of AI research
link |
moving to industry, we felt that academia has the risk
link |
of not being able to have the same resources
link |
at disposal that we have, which is lots of data,
link |
massive computing power, and a clear ways
link |
to test these AI advances with real customer benefits.
link |
So we brought all these three together in the Alexa price.
link |
That's why it's one of my favorite projects in Amazon.
link |
And with that, the secondary effect is yes,
link |
it has become engaging for our customers as well.
link |
We're not there in terms of where we want it to be, right?
link |
But it's a huge progress.
link |
But coming back to your question on
link |
how do the conversations evolve?
link |
Yes, there is some natural attributes of what you said
link |
in terms of argument and some amount of swearing.
link |
The way we take care of that is that there is
link |
a sensitive filter we have built that sees keywords.
link |
It's more than keywords, a little more in terms of,
link |
of course, there's keyword based too,
link |
but there's more in terms of these words can be
link |
very contextual, as you can see,
link |
and also the topic can be something
link |
that you don't want a conversation to happen
link |
because this is a communal device as well.
link |
A lot of people use these devices.
link |
So we have put a lot of guardrails for the conversation
link |
to be more useful for advancing AI
link |
and not so much of these other issues you attributed
link |
what's happening in the AI field as well.
link |
Right, so this is actually a serious opportunity.
link |
I didn't use the right word, fun.
link |
I think it's an open opportunity to do
link |
some of the best innovation
link |
in conversational agents in the world.
link |
Why just universities?
link |
Why just universities?
link |
Because as I said, I really felt
link |
Young minds, it's also to,
link |
if you think about the other aspect
link |
of where the whole industry is moving with AI,
link |
there's a dearth of talent given the demands.
link |
So you do want universities to have a clear place
link |
where they can invent and research
link |
and not fall behind that they can't motivate students.
link |
Imagine all grad students left to industry like us
link |
or faculty members, which has happened too.
link |
So this is a way that if you're so passionate
link |
about the field where you feel industry and academia
link |
need to work well, this is a great example
link |
and a great way for universities to participate.
link |
So what do you think it takes to build a system
link |
that wins the Alexa Prize?
link |
I think you have to start focusing on aspects of reasoning
link |
that it is, there are still more lookups
link |
of what intents customers asking for
link |
and responding to those rather than really reasoning
link |
about the elements of the conversation.
link |
For instance, if you're playing,
link |
if the conversation is about games
link |
and it's about a recent sports event,
link |
there's so much context involved
link |
and you have to understand the entities
link |
that are being mentioned
link |
so that the conversation is coherent
link |
rather than you suddenly just switch to knowing some fact
link |
about a sports entity and you're just relaying that
link |
rather than understanding the true context of the game.
link |
Like if you just said, I learned this fun fact
link |
about Tom Brady rather than really say
link |
how he played the game the previous night,
link |
then the conversation is not really that intelligent.
link |
So you have to go to more reasoning elements
link |
of understanding the context of the dialogue
link |
and giving more appropriate responses,
link |
which tells you that we are still quite far
link |
because a lot of times it's more facts being looked up
link |
and something that's close enough as an answer,
link |
but not really the answer.
link |
So that is where the research needs to go more
link |
and actual true understanding and reasoning.
link |
And that's why I feel it's a great way to do it
link |
because you have an engaged set of users
link |
working to help these AI advances happen in this case.
link |
You mentioned customers, they're quite a bit,
link |
and there's a skill.
link |
What is the experience for the user that's helping?
link |
So just to clarify, this isn't, as far as I understand,
link |
the Alexa, so this skill is a standalone
link |
for the Alexa Prize.
link |
I mean, it's focused on the Alexa Prize.
link |
It's not you ordering certain things on Amazon.
link |
Like, oh, we're checking the weather
link |
or playing Spotify, right?
link |
This is a separate skill.
link |
And so you're focused on helping that,
link |
I don't know, how do people, how do customers think of it?
link |
Are they having fun?
link |
Are they helping teach the system?
link |
What's the experience like?
link |
I think it's both actually.
link |
And let me tell you how you invoke this skill.
link |
So all you have to say, Alexa, let's chat.
link |
And then the first time you say, Alexa, let's chat,
link |
it comes back with a clear message
link |
that you're interacting with one of those
link |
university social bots.
link |
And there's a clear,
link |
so you know exactly how you interact, right?
link |
And that is why it's very transparent.
link |
You are being asked to help, right?
link |
And we have a lot of mechanisms
link |
where as we are in the first phase of feedback phase,
link |
then you send a lot of emails to our customers
link |
and then they know that the team needs a lot of interactions
link |
to improve the accuracy of the system.
link |
So we know we have a lot of customers
link |
who really want to help these university bots
link |
and they're conversing with that.
link |
And some are just having fun with just saying,
link |
Alexa, let's chat.
link |
And also some adversarial behavior to see whether,
link |
how much do you understand as a social bot?
link |
So I think we have a good,
link |
healthy mix of all three situations.
link |
if we talk about solving the Alexa challenge,
link |
what's the data set of really engaging,
link |
pleasant conversations look like?
link |
Because if we think of this
link |
as a supervised learning problem,
link |
I don't know if it has to be,
link |
but if it does, maybe you can comment on that.
link |
Do you think there needs to be a data set
link |
of what it means to be an engaging, successful,
link |
fulfilling conversation?
link |
I think that's part of the research question here.
link |
This was, I think, we at least got the first part right,
link |
which is have a way for universities to build
link |
and test in a real world setting.
link |
Now you're asking in terms of the next phase of questions,
link |
which we are still, we're also asking, by the way,
link |
what does success look like from a optimization function?
link |
That's what you're asking in terms of,
link |
we as researchers are used to having a great corpus
link |
of annotated data and then making,
link |
then sort of tune our algorithms on those, right?
link |
And fortunately and unfortunately,
link |
in this world of Alexa prize,
link |
that is not the way we are going after it.
link |
So you have to focus more on learning
link |
based on life feedback.
link |
That is another element that's unique,
link |
where just not to,
link |
I started with giving you how you ingress
link |
and experience this capability as a customer.
link |
What happens when you're done?
link |
So they ask you a simple question on a scale of one to five,
link |
how likely are you to interact with this social bot again?
link |
That is a good feedback
link |
and customers can also leave more open ended feedback.
link |
And I think partly that to me
link |
is one part of the question you're asking,
link |
which I'm saying is a mental model shift
link |
that as researchers also,
link |
you have to change your mindset
link |
that this is not a DARPA evaluation or NSF funded study
link |
and you have a nice corpus.
link |
This is where it's real world.
link |
You have real data.
link |
The scale is amazing and that's a beautiful thing.
link |
And then the customer,
link |
the user can quit the conversation at any time.
link |
Exactly, the user can,
link |
that is also a signal for how good you were at that point.
link |
So, and then on a scale one to five, one to three,
link |
do they say how likely are you
link |
or is it just a binary?
link |
Wow, okay, that's such a beautifully constructed challenge.
link |
You said the only way to make a smart assistant really smart
link |
is to give it eyes and let it explore the world.
link |
I'm not sure it might've been taken out of context,
link |
but can you comment on that?
link |
Can you elaborate on that idea?
link |
Is that I personally also find that idea super exciting
link |
from a social robotics, personal robotics perspective.
link |
Yeah, a lot of things do get taken out of context.
link |
This particular one was just
link |
as philosophical discussion we were having
link |
on terms of what does intelligence look like?
link |
And the context was in terms of learning,
link |
I think just we said we as humans are empowered
link |
with many different sensory abilities.
link |
I do believe that eyes are an important aspect of it
link |
in terms of if you think about how we as humans learn,
link |
it is quite complex and it's also not unimodal
link |
that you are fed a ton of text or audio
link |
and you just learn that way.
link |
No, you learn by experience, you learn by seeing,
link |
you're taught by humans
link |
and we are very efficient in how we learn.
link |
Machines on the contrary are very inefficient
link |
on how they learn, especially these AIs.
link |
I think the next wave of research is going to be
link |
with less data, not just less human,
link |
not just with less labeled data,
link |
but also with a lot of weak supervision
link |
and where you can increase the learning rate.
link |
I don't mean less data
link |
in terms of not having a lot of data to learn from
link |
that we are generating so much data,
link |
but it is more about from a aspect
link |
of how fast can you learn?
link |
So improving the quality of the data,
link |
the quality of data and the learning process.
link |
I think more on the learning process.
link |
I think we have to, we as humans learn
link |
with a lot of noisy data, right?
link |
And I think that's the part
link |
that I don't think should change.
link |
What should change is how we learn, right?
link |
So if you look at, you mentioned supervised learning,
link |
we have making transformative shifts
link |
from moving to more unsupervised, more weak supervision.
link |
Those are the key aspects of how to learn.
link |
And I think in that setting, I hope you agree with me
link |
that having other senses is very crucial
link |
in terms of how you learn.
link |
And from a machine learning perspective,
link |
which I hope we get a chance to talk to a few aspects
link |
that are fascinating there,
link |
but to stick on the point of sort of a body,
link |
So Alexa has a body.
link |
It has a very minimalistic, beautiful interface
link |
where there's a ring and so on.
link |
I mean, I'm not sure of all the flavors
link |
of the devices that Alexa lives on,
link |
but there's a minimalistic basic interface.
link |
And nevertheless, we humans, so I have a Roomba,
link |
I have all kinds of robots all over everywhere.
link |
So what do you think the Alexa of the future looks like
link |
if it begins to shift what his body looks like?
link |
Maybe beyond the Alexa,
link |
what do you think are the different devices in the home
link |
as they start to embody their intelligence more and more?
link |
What do you think that looks like?
link |
Philosophically, a future, what do you think that looks like?
link |
I think let's look at what's happening today.
link |
You mentioned, I think our devices as an Amazon devices,
link |
but I also wanted to point out Alexa is already integrated
link |
a lot of third party devices,
link |
which also come in lots of forms and shapes,
link |
some in robots, some in microwaves,
link |
some in appliances that you use in everyday life.
link |
So I think it's not just the shape Alexa takes
link |
in terms of form factors,
link |
but it's also where all it's available.
link |
And it's getting in cars,
link |
it's getting in different appliances in homes,
link |
even toothbrushes, right?
link |
So I think you have to think about it
link |
as not a physical assistant.
link |
It will be in some embodiment, as you said,
link |
we already have these nice devices,
link |
but I think it's also important to think of it,
link |
it is a virtual assistant.
link |
It is superhuman in the sense that it is in multiple places
link |
So I think the actual embodiment in some sense,
link |
to me doesn't matter.
link |
I think you have to think of it as not as human like
link |
and more of what its capabilities are
link |
that derive a lot of benefit for customers
link |
and how there are different ways to delight it
link |
and delight customers and different experiences.
link |
And I think I'm a big fan of it not being just human like,
link |
it should be human like in certain situations.
link |
Alexa price social bot in terms of conversation
link |
is a great way to look at it,
link |
but there are other scenarios where human like,
link |
I think is underselling the abilities of this AI.
link |
So if I could trivialize what we're talking about.
link |
So if you look at the way Steve Jobs thought
link |
about the interaction with the device that Apple produced,
link |
there was a extreme focus on controlling the experience
link |
by making sure there's only this Apple produced devices.
link |
You see the voice of Alexa being taking all kinds of forms
link |
depending on what the customers want.
link |
And that means it could be anywhere
link |
from the microwave to a vacuum cleaner to the home
link |
and so on the voice is the essential element
link |
of the interaction.
link |
I think voice is an essence, it's not all,
link |
but it's a key aspect.
link |
I think to your question in terms of,
link |
you should be able to recognize Alexa
link |
and that's a huge problem.
link |
I think in terms of a huge scientific problem,
link |
I should say like, what are the traits?
link |
What makes it look like Alexa,
link |
especially in different settings
link |
and especially if it's primarily voice, what it is,
link |
but Alexa is not just voice either, right?
link |
I mean, we have devices with a screen.
link |
Now you're seeing just other behaviors of Alexa.
link |
So I think we're in very early stages of what that means
link |
and this will be an important topic for the following years.
link |
But I do believe that being able to recognize
link |
and tell when it's Alexa versus it's not
link |
is going to be important from an Alexa perspective.
link |
I'm not speaking for the entire AI community,
link |
but I think attribution and as we go into more
link |
of understanding who did what,
link |
that identity of the AI is crucial in the coming world.
link |
I think from the broad AI community perspective,
link |
that's also a fascinating problem.
link |
So basically if I close my eyes and listen to the voice,
link |
what would it take for me to recognize that this is Alexa?
link |
Or at least the Alexa that I've come to know
link |
from my personal experience in my home
link |
through my interactions that come through.
link |
Yeah, and the Alexa here in the US is very different
link |
than Alexa in UK and the Alexa in India,
link |
even though they are all speaking English
link |
or the Australian version.
link |
So again, so now think about when you go
link |
into a different culture, a different community,
link |
but you travel there, what do you recognize Alexa?
link |
I think these are super hard questions actually.
link |
So there's a team that works on personality.
link |
So if we talk about those different flavors
link |
of what it means culturally speaking,
link |
India, UK, US, what does it mean to add?
link |
So the problem that we just stated,
link |
it's just fascinating, how do we make it purely recognizable
link |
that it's Alexa, assuming that the qualities
link |
of the voice are not sufficient?
link |
It's also the content of what is being said.
link |
How do we do that?
link |
How does the personality come into play?
link |
What's that research gonna look like?
link |
I mean, it's such a fascinating subject.
link |
We have some very fascinating folks
link |
who from both the UX background and human factors
link |
are looking at these aspects and these exact questions.
link |
But I'll definitely say it's not just how it sounds,
link |
the choice of words, the tone, not just, I mean,
link |
the voice identity of it, but the tone matters,
link |
the speed matters, how you speak,
link |
how you enunciate words, what choice of words
link |
are you using, how terse are you,
link |
or how lengthy in your explanations you are.
link |
All of these are factors.
link |
And you also, you mentioned something crucial
link |
that you may have personalized it, Alexa,
link |
to some extent in your homes
link |
or in the devices you are interacting with.
link |
So you, as your individual, how you prefer Alexa sounds
link |
can be different than how I prefer.
link |
And the amount of customizability you want to give
link |
is also a key debate we always have.
link |
But I do want to point out it's more than the voice actor
link |
that recorded and it sounds like that actor.
link |
It is more about the choices of words,
link |
the attributes of tonality, the volume
link |
in terms of how you raise your pitch and so forth.
link |
All of that matters.
link |
This is such a fascinating problem
link |
from a product perspective.
link |
I could see those debates just happening
link |
inside of the Alexa team of how much personalization
link |
do you do for the specific customer?
link |
Because you're taking a risk if you over personalize.
link |
Because you don't, if you create a personality
link |
for a million people, you can test that better.
link |
You can create a rich, fulfilling experience
link |
that will do well.
link |
But the more you personalize it, the less you can test it,
link |
the less you can know that it's a great experience.
link |
So how much personalization, what's the right balance?
link |
I think the right balance depends on the customer.
link |
Give them the control.
link |
So I'll say, I think the more control you give customers,
link |
the better it is for everyone.
link |
And I'll give you some key personalization features.
link |
I think we have a feature called Remember This,
link |
which is where you can tell Alexa to remember something.
link |
There you have an explicit sort of control
link |
in customer's hand because they have to say,
link |
Alexa, remember X, Y, Z.
link |
What kind of things would that be used for?
link |
So you can like you, I have stored my tire specs
link |
for my car because it's so hard to go and find
link |
and see what it is, right?
link |
When you're having some issues.
link |
I store my mileage plan numbers
link |
for all the frequent flyer ones
link |
where I'm sometimes just looking at it and it's not handy.
link |
So those are my own personal choices I've made
link |
for Alexa to remember something on my behalf, right?
link |
So again, I think the choice was be explicit
link |
about how you provide that to a customer as a control.
link |
So I think these are the aspects of what you do.
link |
Like think about where we can use speaker recognition
link |
capabilities that it's, if you taught Alexa
link |
that you are Lex and this person in your household
link |
is person two, then you can personalize the experiences.
link |
Again, these are very in the CX customer experience patterns
link |
are very clear about and transparent
link |
when a personalization action is happening.
link |
And then you have other ways like you go
link |
through explicit control right now through your app
link |
that your multiple service providers,
link |
let's say for music, which one is your preferred one.
link |
So when you say play sting, depend on your
link |
whether you have preferred Spotify or Amazon music
link |
or Apple music, that the decision is made
link |
where to play it from.
link |
So what's Alexa's backstory from her perspective?
link |
Is there, I remember just asking as probably a lot
link |
of us are just the basic questions about love
link |
and so on of Alexa, just to see what the answer would be.
link |
It feels like there's a little bit of a personality
link |
Is Alexa have a metaphysical presence
link |
in this human universe we live in
link |
or is it something more ambiguous?
link |
Is there a family kind of idea
link |
even for joking purposes and so on?
link |
I think, well, it does tell you if I think you,
link |
I should double check this but if you said
link |
when were you born, I think we do respond.
link |
I need to double check that
link |
but I'm pretty positive about it.
link |
I think you do actually because I think I've tested that.
link |
But that's like how I was born in your brand of champagne
link |
and whatever the year kind of thing, yeah.
link |
So in terms of the metaphysical, I think it's early.
link |
Does it have the historic knowledge about herself
link |
to be able to do that?
link |
Maybe, have we crossed that boundary?
link |
In terms of being, thank you.
link |
Have we thought about it quite a bit
link |
but I wouldn't say that we have come to a clear decision
link |
in terms of what it should look like.
link |
But you can imagine though, and I bring this back
link |
to the Alexa Prize social bot one,
link |
there you will start seeing some of that.
link |
Like these bots have their identity
link |
and in terms of that, you may find,
link |
this is such a great research topic
link |
that some academia team may think of these problems
link |
and start solving them too.
link |
So let me ask a question.
link |
It's kind of difficult, I think,
link |
but it feels, and fascinating to me
link |
because I'm fascinated with psychology.
link |
It feels that the more personality you have,
link |
the more dangerous it is
link |
in terms of a customer perspective of product.
link |
If you want to create a product that's useful.
link |
By dangerous, I mean creating an experience that upsets me.
link |
And so how do you get that right?
link |
Because if you look at the relationships,
link |
maybe I'm just a screwed up Russian,
link |
but if you look at the human to human relationship,
link |
some of our deepest relationships have fights,
link |
have tension, have the push and pull,
link |
have a little flavor in them.
link |
Do you want to have such flavor in an interaction with Alexa?
link |
How do you think about that?
link |
So there's one other common thing that you didn't say,
link |
but we think of it as paramount for any deep relationship.
link |
So I think if you trust every attribute you said,
link |
a fight, some tension, is all healthy.
link |
But what is sort of unnegotiable in this instance is trust.
link |
And I think the bar to earn customer trust for AI
link |
is very high, in some sense, more than a human.
link |
It's not just about personal information or your data.
link |
It's also about your actions on a daily basis.
link |
How trustworthy are you in terms of consistency,
link |
in terms of how accurate are you in understanding me?
link |
Like if you're talking to a person on the phone,
link |
if you have a problem with your,
link |
let's say your internet or something,
link |
if the person's not understanding,
link |
you lose trust right away.
link |
You don't want to talk to that person.
link |
That whole example gets amplified by a factor of 10,
link |
because when you're a human interacting with an AI,
link |
you have a certain expectation.
link |
Either you expect it to be very intelligent
link |
and then you get upset, why is it behaving this way?
link |
Or you expect it to be not so intelligent
link |
and when it surprises you, you're like,
link |
really, you're trying to be too smart?
link |
So I think we grapple with these hard questions as well.
link |
But I think the key is actions need to be trustworthy.
link |
From these AIs, not just about data protection,
link |
your personal information protection,
link |
but also from how accurately it accomplishes
link |
all commands or all interactions.
link |
Well, it's tough to hear because trust,
link |
you're absolutely right,
link |
but trust is such a high bar with AI systems
link |
because people, and I see this
link |
because I work with autonomous vehicles.
link |
I mean, the bar that's placed on AI system
link |
is unreasonably high.
link |
Yeah, that is going to be, I agree with you.
link |
And I think of it as it's a challenge
link |
and it's also keeps my job, right?
link |
So from that perspective, I totally,
link |
I think of it at both sides as a customer
link |
and as a researcher.
link |
I think as a researcher, yes, occasionally it will frustrate
link |
me that why is the bar so high for these AIs?
link |
And as a customer, then I say,
link |
absolutely, it has to be that high, right?
link |
So I think that's the trade off we have to balance,
link |
but it doesn't change the fundamentals.
link |
That trust has to be earned and the question then becomes
link |
is are we holding the AIs to a different bar
link |
in accuracy and mistakes than we hold humans?
link |
That's going to be a great societal questions
link |
for years to come, I think for us.
link |
Well, one of the questions that we grapple as a society now
link |
that I think about a lot,
link |
I think a lot of people in the AI think about a lot
link |
and Alexis taking on head on is privacy.
link |
The reality is us giving over data to any AI system
link |
can be used to enrich our lives in profound ways.
link |
So if basically any product that does anything awesome
link |
for you, the more data it has,
link |
the more awesome things it can do.
link |
And yet on the other side,
link |
people imagine the worst case possible scenario
link |
of what can you possibly do with that data?
link |
People, it's goes down to trust, as you said before.
link |
There's a fundamental distrust of,
link |
in certain groups of governments and so on.
link |
And depending on the government,
link |
depending on who's in power,
link |
depending on all these kinds of factors.
link |
And so here's Alexa in the middle of all of it in the home,
link |
trying to do good things for the customers.
link |
So how do you think about privacy in this context,
link |
the smart assistance in the home?
link |
How do you maintain, how do you earn trust?
link |
So as you said, trust is the key here.
link |
So you start with trust
link |
and then privacy is a key aspect of it.
link |
It has to be designed from very beginning about that.
link |
And we believe in two fundamental principles.
link |
One is transparency and second is control.
link |
So by transparency, I mean,
link |
when we build what is now called smart speaker
link |
or the first echo,
link |
we were quite judicious about making these right trade offs
link |
on customer's behalf,
link |
that it is pretty clear
link |
when the audio is being sent to cloud,
link |
the light ring comes on
link |
when it has heard you say the word wake word,
link |
and then the streaming happens, right?
link |
So when the light ring comes up,
link |
we also had, we put a physical mute button on it,
link |
just so if you didn't want it to be listening,
link |
even for the wake word,
link |
then you turn the power button or the mute button on,
link |
and that disables the microphones.
link |
That's just the first decision on essentially transparency
link |
Oh, then even when we launched,
link |
we gave the control in the hands of the customers
link |
that you can go and look at any of your individual utterances
link |
that is recorded and delete them anytime.
link |
And we've got to true to that promise, right?
link |
So, and that is super, again,
link |
a great instance of showing how you have the control.
link |
Then we made it even easier.
link |
You can say, like I said, delete what I said today.
link |
So that is now making it even just more control
link |
in your hands with what's most convenient
link |
about this technology is voice.
link |
You delete it with your voice now.
link |
So these are the types of decisions we continually make.
link |
We just recently launched this feature called,
link |
what we think of it as,
link |
if you wanted humans not to review your data,
link |
because you've mentioned supervised learning, right?
link |
So in supervised learning,
link |
humans have to give some annotation.
link |
And that also is now a feature
link |
where you can essentially, if you've selected that flag,
link |
your data will not be reviewed by a human.
link |
So these are the types of controls
link |
that we have to constantly offer with customers.
link |
So why do you think it bothers people so much that,
link |
so everything you just said is really powerful.
link |
So the control, the ability to delete,
link |
cause we collect, we have studies here running at MIT
link |
that collects huge amounts of data
link |
and people consent and so on.
link |
The ability to delete that data is really empowering
link |
and almost nobody ever asked to delete it,
link |
but the ability to have that control is really powerful.
link |
But still, there's these popular anecdote,
link |
anecdotal evidence that people say,
link |
they like to tell that,
link |
them and a friend were talking about something,
link |
I don't know, sweaters for cats.
link |
And all of a sudden they'll have advertisements
link |
for cat sweaters on Amazon.
link |
That's a popular anecdote
link |
as if something is always listening.
link |
What, can you explain that anecdote,
link |
that experience that people have?
link |
What's the psychology of that?
link |
What's that experience?
link |
And can you, you've answered it,
link |
but let me just ask, is Alexa listening?
link |
No, Alexa listens only for the wake word on the device.
link |
And the wake word is?
link |
The words like Alexa, Amazon, Echo,
link |
but you only choose one at a time.
link |
So you choose one and it listens only
link |
for that on our devices.
link |
From a listening perspective,
link |
we have to be very clear that it's just the wake word.
link |
So you said, why is there this anxiety, if you may?
link |
It's because there's a lot of confusion,
link |
what it really listens to, right?
link |
And I think it's partly on us to keep educating
link |
our customers and the general media more
link |
in terms of like how, what really happens.
link |
And we've done a lot of it.
link |
And our pages on information are clear,
link |
but still people have to have more,
link |
there's always a hunger for information and clarity.
link |
And we'll constantly look at how best to communicate.
link |
If you go back and read everything,
link |
yes, it states exactly that.
link |
And then people could still question it.
link |
And I think that's absolutely okay to question.
link |
What we have to make sure is that we are,
link |
because our fundamental philosophy is customer first,
link |
customer obsession is our leadership principle.
link |
If you put, as researchers, I put myself
link |
in the shoes of the customer,
link |
and all decisions in Amazon are made with that.
link |
And trust has to be earned,
link |
and we have to keep earning the trust
link |
of our customers in this setting.
link |
And to your other point on like,
link |
is there something showing up
link |
based on your conversations?
link |
No, I think the answer is like,
link |
a lot of times when those experiences happen,
link |
you have to also know that, okay,
link |
it may be a winter season,
link |
people are looking for sweaters, right?
link |
And it shows up on your amazon.com because it is popular.
link |
So there are many of these,
link |
you mentioned that personality or personalization,
link |
turns out we are not that unique either, right?
link |
So those things we as humans start thinking,
link |
oh, must be because something was heard,
link |
and that's why this other thing showed up.
link |
probably it is just the season for sweaters.
link |
I'm not gonna ask you this question
link |
because people have so much paranoia.
link |
But let me just say from my perspective,
link |
I hope there's a day when customer can ask Alexa
link |
to listen all the time,
link |
to improve the experience,
link |
to improve because I personally don't see the negative
link |
because if you have the control and if you have the trust,
link |
there's no reason why I shouldn't be listening
link |
all the time to the conversations to learn more about you.
link |
Because ultimately,
link |
as long as you have control and trust,
link |
every data you provide to the device,
link |
that the device wants is going to be useful.
link |
And so to me, as a machine learning person,
link |
I think it worries me how sensitive people are
link |
about their data relative to how empowering it could be
link |
relative to how empowering it could be
link |
for the devices around them,
link |
how enriching it could be for their own life
link |
to improve the product.
link |
So I just, it's something I think about sort of a lot,
link |
how do we make that devices,
link |
obviously Alexa thinks about a lot as well.
link |
I don't know if you wanna comment on that,
link |
sort of, okay, have you seen,
link |
let me ask it in the form of a question, okay.
link |
Have you seen an evolution in the way people think about
link |
their private data in the previous several years?
link |
So as we as a society get more and more comfortable
link |
to the benefits we get by sharing more data.
link |
First, let me answer that part
link |
and then I'll wanna go back
link |
to the other aspect you were mentioning.
link |
So as a society, on a general,
link |
we are getting more comfortable as a society.
link |
Doesn't mean that everyone is,
link |
and I think we have to respect that.
link |
I don't think one size fits all
link |
is always gonna be the answer for all, right?
link |
So I think that's something to keep in mind in these.
link |
Going back to your, on what more
link |
magical experiences can be launched
link |
in these kinds of AI settings.
link |
I think again, if you give the control,
link |
we, it's possible certain parts of it.
link |
So we have a feature called follow up mode
link |
where you, if you turn it on
link |
and Alexa, after you've spoken to it,
link |
will open the mics again,
link |
thinking you will answer something again.
link |
Like if you're adding lists to your shopping item,
link |
so right, or a shopping list or to do list,
link |
You want to keep, so in that setting,
link |
it's awesome that it opens the mic
link |
for you to say eggs and milk and then bread, right?
link |
So these are the kinds of things which you can empower.
link |
So, and then another feature we have,
link |
which is called Alexa Guard.
link |
I said it only listens for the wake word, right?
link |
But if you have, let's say you're going to say,
link |
like you leave your home and you want Alexa to listen
link |
for a couple of sound events like smoke alarm going off
link |
or someone breaking your glass, right?
link |
So it's like just to keep your peace of mind.
link |
So you can say Alexa on guard or I'm away
link |
and then it can be listening for these sound events.
link |
And when you're home, you come out of that mode, right?
link |
So this is another one where you again gave controls
link |
in the hands of the user or the customer
link |
and to enable some experience that is high utility
link |
and maybe even more delightful in the certain settings
link |
like follow up mode and so forth.
link |
And again, this general principle is the same,
link |
control in the hands of the customer.
link |
So I know we kind of started with a lot of philosophy
link |
and a lot of interesting topics
link |
and we're just jumping all over the place,
link |
but really some of the fascinating things
link |
that the Alexa team and Amazon is doing
link |
is in the algorithm side, the data side,
link |
the technology, the deep learning, machine learning
link |
So can you give a brief history of Alexa
link |
from the perspective of just innovation,
link |
the algorithms, the data of how it was born,
link |
how it came to be, how it's grown, where it is today?
link |
Yeah, it start with in Amazon,
link |
everything starts with the customer
link |
and we have a process called working backwards.
link |
Alexa and more specifically than the product Echo,
link |
there was a working backwards document essentially
link |
that reflected what it would be,
link |
started with a very simple vision statement for instance
link |
that morphed into a full fledged document
link |
along the way changed into what all it can do, right?
link |
But the inspiration was the Star Trek computer.
link |
So when you think of it that way,
link |
everything is possible, but when you launch a product,
link |
you have to start with some place.
link |
And when I joined, the product was already in conception
link |
and we started working on the far field speech recognition
link |
because that was the first thing to solve.
link |
By that we mean that you should be able to speak
link |
to the device from a distance.
link |
And in those days, that wasn't a common practice.
link |
And even in the previous research world I was in
link |
was considered to an unsolvable problem then
link |
in terms of whether you can converse from a length.
link |
And here I'm still talking about the first part
link |
of the problem where you say,
link |
get the attention of the device
link |
as in by saying what we call the wake word,
link |
which means the word Alexa has to be detected
link |
with a very high accuracy because it is a very common word.
link |
It has sound units that map with words like I like you
link |
or Alec, Alex, right?
link |
So it's a undoubtedly hard problem to detect
link |
the right mentions of Alexa's address to the device
link |
versus I like Alexa.
link |
So you have to pick up that signal
link |
when there's a lot of noise.
link |
Not only noise but a lot of conversation in the house,
link |
You remember on the device,
link |
you're simply listening for the wake word, Alexa.
link |
And there's a lot of words being spoken in the house.
link |
How do you know it's Alexa and directed at Alexa?
link |
Because I could say, I love my Alexa, I hate my Alexa.
link |
I want Alexa to do this.
link |
And in all these three sentences, I said, Alexa,
link |
I didn't want it to wake up.
link |
Can I just pause on that second?
link |
What would be your device that I should probably
link |
in the introduction of this conversation give to people
link |
in terms of them turning off their Alexa device
link |
if they're listening to this podcast conversation out loud?
link |
Like what's the probability that an Alexa device
link |
will go off because we mentioned Alexa like a million times.
link |
So it will, we have done a lot of different things
link |
where we can figure out that there is the device,
link |
the speech is coming from a human versus over the air.
link |
Also, I mean, in terms of like, also it is think about ads
link |
or so we have also launched a technology
link |
for watermarking kind of approaches
link |
in terms of filtering it out.
link |
But yes, if this kind of a podcast is happening,
link |
it's possible your device will wake up a few times.
link |
It's an unsolved problem,
link |
but it is definitely something we care very much about.
link |
But the idea is you wanna detect Alexa.
link |
Meant for the device.
link |
First of all, just even hearing Alexa versus I like something.
link |
I mean, that's a fascinating part.
link |
So that was the first relief.
link |
The world's best detector of Alexa.
link |
Yeah, the world's best wake word detector
link |
in a far field setting,
link |
not like something where the phone is sitting on the table.
link |
This is like people have devices 40 feet away
link |
like in my house or 20 feet away and you still get an answer.
link |
So that was the first part.
link |
The next is, okay, you're speaking to the device.
link |
Of course, you're gonna issue many different requests.
link |
Some may be simple, some may be extremely hard,
link |
but it's a large vocabulary speech recognition problem
link |
essentially, where the audio is now not coming
link |
onto your phone or a handheld mic like this
link |
or a close talking mic, but it's from 20 feet away
link |
where if you're in a busy household,
link |
your son may be listening to music,
link |
your daughter may be running around with something
link |
and asking your mom something and so forth, right?
link |
So this is like a common household setting
link |
where the words you're speaking to Alexa
link |
need to be recognized with very high accuracy, right?
link |
Now we are still just in the recognition problem.
link |
We haven't yet come to the understanding one, right?
link |
And if I pause them, sorry, once again,
link |
what year was this?
link |
Is this before neural networks began to start
link |
to seriously prove themselves in the audio space?
link |
Yeah, this is around, so I joined in 2013 in April, right?
link |
So the early research and neural networks coming back
link |
and showing some promising results
link |
in speech recognition space had started happening,
link |
but it was very early.
link |
But we just now build on that
link |
on the very first thing we did when I joined with the team.
link |
And remember, it was a very much of a startup environment,
link |
which is great about Amazon.
link |
And we doubled down on deep learning right away.
link |
And we knew we'll have to improve accuracy fast.
link |
And because of that, we worked on,
link |
and the scale of data, once you have a device like this,
link |
if it is successful, will improve big time.
link |
Like you'll suddenly have large volumes of data
link |
to learn from to make the customer experience better.
link |
So how do you scale deep learning?
link |
So we did one of the first works
link |
in training with distributed GPUs
link |
and where the training time was linear
link |
in terms of the amount of data.
link |
So that was quite important work
link |
where it was algorithmic improvements
link |
as well as a lot of engineering improvements
link |
to be able to train on thousands and thousands of speech.
link |
And that was an important factor.
link |
So if you ask me like back in 2013 and 2014,
link |
when we launched Echo,
link |
the combination of large scale data,
link |
deep learning progress, near infinite GPUs
link |
we had available on AWS even then,
link |
was all came together for us to be able
link |
to solve the far field speech recognition
link |
to the extent it could be useful to the customers.
link |
It's still not solved.
link |
Like, I mean, it's not that we are perfect
link |
at recognizing speech, but we are great at it
link |
in terms of the settings that are in homes, right?
link |
So, and that was important even in the early stages.
link |
So first of all, just even,
link |
I'm trying to look back at that time.
link |
If I remember correctly,
link |
it was, it seems like the task would be pretty daunting.
link |
So like, so we kind of take it for granted
link |
that it works now.
link |
Yes, you're right.
link |
So let me, like how, first of all, you mentioned startup.
link |
I wasn't familiar how big the team was.
link |
I kind of, cause I know there's a lot
link |
of really smart people working on it.
link |
So now it's a very, very large team.
link |
How big was the team?
link |
How likely were you to fail in the eyes of everyone else?
link |
I'll give you a very interesting anecdote on that.
link |
When I joined the team,
link |
the speech recognition team was six people.
link |
My first meeting, and we had hired a few more people,
link |
Nine out of 10 people thought it can't be done.
link |
The one was me, say, actually I should say,
link |
and one was semi optimistic.
link |
And eight were trying to convince,
link |
let's go to the management and say,
link |
let's not work on this problem.
link |
Let's work on some other problem,
link |
like either telephony speech for customer service calls
link |
But this was the kind of belief you must have.
link |
And I had experience with far field speech recognition
link |
and my eyes lit up when I saw a problem like that saying,
link |
okay, we have been in speech recognition,
link |
always looking for that killer app.
link |
And this was a killer use case
link |
to bring something delightful in the hands of customers.
link |
So you mentioned the way you kind of think of it
link |
in the product way in the future,
link |
have a press release and an FAQ and you think backwards.
link |
Did you have, did the team have the echo in mind?
link |
So this far field speech recognition,
link |
actually putting a thing in the home that works,
link |
that it's able to interact with,
link |
was that the press release?
link |
The way close, I would say, in terms of the,
link |
as I said, the vision was start a computer, right?
link |
Or the inspiration.
link |
And from there, I can't divulge
link |
all the exact specifications,
link |
but one of the first things that was magical on Alexa
link |
It brought me to back to music
link |
because my taste was still in when I was an undergrad.
link |
So I still listened to those songs and I,
link |
it was too hard for me to be a music fan with a phone, right?
link |
So I, and I don't, I hate things in my ears.
link |
So from that perspective, it was quite hard
link |
and music was part of the,
link |
at least the documents I have seen, right?
link |
So from that perspective, I think, yes,
link |
in terms of how far are we from the original vision?
link |
I can't reveal that, but it's,
link |
that's why I have done a fun at work
link |
because every day we go in and thinking like,
link |
these are the new set of challenges to solve.
link |
Yeah, that's a great way to do great engineering
link |
as you think of the press release.
link |
I like that idea actually.
link |
Maybe we'll talk about it a bit later,
link |
but it's just a super nice way to have a focus.
link |
I'll tell you this, you're a scientist
link |
and a lot of my scientists have adopted that.
link |
They have now, they love it as a process
link |
because it was very, as scientists,
link |
you're trained to write great papers,
link |
but they are all after you've done the research
link |
or you've proven that and your PhD dissertation proposal
link |
is something that comes closest
link |
or a DARPA proposal or a NSF proposal
link |
is the closest that comes to a press release.
link |
But that process is now ingrained in our scientists,
link |
which is like delightful for me to see.
link |
You write the paper first and then make it happen.
link |
In fact, it's not.
link |
State of the art results.
link |
Or you leave the results section open
link |
where you have a thesis about here's what I expect, right?
link |
And here's what it will change, right?
link |
So I think it is a great thing.
link |
It works for researchers as well.
link |
So far field recognition.
link |
What was the big leap?
link |
What were the breakthroughs
link |
and what was that journey like to today?
link |
Yeah, I think the, as you said first,
link |
there was a lot of skepticism
link |
on whether far field speech recognition
link |
will ever work to be good enough, right?
link |
And what we first did was got a lot of training data
link |
in a far field setting.
link |
And that was extremely hard to get
link |
because none of it existed.
link |
So how do you collect data in far field setup, right?
link |
With no customer base at this time.
link |
With no customer base, right?
link |
So that was first innovation.
link |
And once we had that, the next thing was,
link |
okay, if you have the data,
link |
first of all, we didn't talk about like,
link |
what would magical mean in this kind of a setting?
link |
What is good enough for customers, right?
link |
That's always, since you've never done this before,
link |
what would be magical?
link |
So it wasn't just a research problem.
link |
You had to put some in terms of accuracy
link |
and customer experience features,
link |
some stakes on the ground saying,
link |
here's where I think it should get to.
link |
So you established a bar
link |
and then how do you measure progress
link |
towards given you have no customer right now.
link |
So from that perspective, we went,
link |
so first was the data without customers.
link |
Second was doubling down on deep learning
link |
as a way to learn.
link |
And I can just tell you that the combination of the two
link |
got our error rates by a factor of five.
link |
From where we were when I started
link |
to within six months of having that data,
link |
we, at that point, I got the conviction
link |
that this will work, right?
link |
So, because that was magical
link |
in terms of when it started working and.
link |
That reached the magical bar.
link |
That came close to the magical bar.
link |
To the bar, right?
link |
That we felt would be where people will use it.
link |
That was critical.
link |
Because you really have one chance at this.
link |
If we had launched in November 2014 is when we launched,
link |
if it was below the bar,
link |
I don't think this category exists
link |
if you don't meet the bar.
link |
Yeah, and just having looked at voice based interactions
link |
like in the car or earlier systems,
link |
it's a source of huge frustration for people.
link |
In fact, we use voice based interaction
link |
for collecting data on subjects to measure frustration.
link |
So, as a training set for computer vision,
link |
for face data, so we can get a data set
link |
of frustrated people.
link |
That's the best way to get frustrated people
link |
is having them interact with a voice based system
link |
So, that bar I imagine is pretty high.
link |
And we talked about how also errors are perceived
link |
from AIs versus errors by humans.
link |
But we are not done with the problems that ended up,
link |
we had to solve to get it to launch.
link |
So, do you want the next one?
link |
Yeah, the next one.
link |
So, the next one was what I think of as
link |
multi domain natural language understanding.
link |
It's very, I wouldn't say easy,
link |
but it is during those days,
link |
solving it, understanding in one domain,
link |
a narrow domain was doable,
link |
but for these multiple domains like music,
link |
like information, other kinds of household productivity,
link |
alarms, timers, even though it wasn't as big as it is
link |
in terms of the number of skills Alexa has
link |
and the confusion space has like grown
link |
by three orders of magnitude,
link |
it was still daunting even those days.
link |
And again, no customer base yet.
link |
Again, no customer base.
link |
So, now you're looking at meaning understanding
link |
and intent understanding and taking actions
link |
on behalf of customers.
link |
Based on their requests.
link |
And that is the next hard problem.
link |
Even if you have gotten the words recognized,
link |
how do you make sense of them?
link |
In those days, there was still a lot of emphasis
link |
on rule based systems for writing grammar patterns
link |
to understand the intent.
link |
But we had a statistical first approach even then,
link |
where for our language understanding we had,
link |
and even those starting days,
link |
an entity recognizer and an intent classifier,
link |
which was all trained statistically.
link |
In fact, we had to build the deterministic matching
link |
as a follow up to fix bugs that statistical models have.
link |
So, it was just a different mindset
link |
where we focused on data driven statistical understanding.
link |
It wins in the end if you have a huge data set.
link |
Yes, it is contingent on that.
link |
And that's why it came back to how do you get the data.
link |
Before customers, the fact that this is why data
link |
becomes crucial to get to the point
link |
that you have the understanding system built up.
link |
And notice that for you,
link |
we were talking about human machine dialogue,
link |
and even those early days,
link |
even it was very much transactional,
link |
do one thing, one shot utterances in great way.
link |
There was a lot of debate on how much should Alexa talk back
link |
in terms of if you misunderstood it.
link |
If you misunderstood you or you said play songs by the stones,
link |
and let's say it doesn't know early days,
link |
knowledge can be sparse, who are the stones?
link |
It's the Rolling Stones.
link |
And you don't want the match to be Stone Temple Pilots
link |
or Rolling Stones.
link |
So, you don't know which one it is.
link |
So, these kind of other signals,
link |
now there we had great assets from Amazon in terms of...
link |
UX, like what is it, what kind of...
link |
Yeah, how do you solve that problem?
link |
In terms of what we think of it
link |
as an entity resolution problem, right?
link |
So, because which one is it, right?
link |
I mean, even if you figured out the stones as an entity,
link |
you have to resolve it to whether it's the stones
link |
or the Stone Temple Pilots or some other stones.
link |
Maybe I misunderstood, is the resolution
link |
the job of the algorithm or is the job of UX
link |
communicating with the human to help the resolution?
link |
Well, there is both, right?
link |
It is, you want 90% or high 90s to be done
link |
without any further questioning or UX, right?
link |
So, but it's absolutely okay, just like as humans,
link |
we ask the question, I didn't understand you, Lex.
link |
It's fine for Alexa to occasionally say,
link |
I did not understand you, right?
link |
And that's an important way to learn.
link |
And I'll talk about where we have come
link |
with more self learning with these kind of feedback signals.
link |
But in those days, just solving the ability
link |
of understanding the intent and resolving to an action
link |
where action could be play a particular artist
link |
or a particular song was super hard.
link |
Again, the bar was high as we were talking about, right?
link |
So, while we launched it in sort of 13 big domains,
link |
I would say in terms of,
link |
we think of it as 13, the big skills we had,
link |
like music is a massive one when we launched it.
link |
And now we have 90,000 plus skills on Alexa.
link |
So, what are the big skills?
link |
Can you just go over them?
link |
Because the only thing I use it for
link |
is music, weather and shopping.
link |
So, we think of it as music information, right?
link |
So, weather is a part of information, right?
link |
So, when we launched, we didn't have smart home,
link |
but within, by smart home I mean,
link |
you connect your smart devices,
link |
you control them with voice.
link |
If you haven't done it, it's worth,
link |
it will change your life.
link |
Like turning on the lights and so on.
link |
Turning on your light to anything that's connected
link |
and has a, it's just that.
link |
What's your favorite smart device for you?
link |
And now you have the smart plug with,
link |
and you don't, we also have this echo plug, which is.
link |
Oh yeah, you can plug in anything.
link |
You can plug in anything
link |
and now you can turn that one on and off.
link |
I use this conversation motivation to get one.
link |
Garage door, you can check your status of the garage door
link |
and things like, and we have gone,
link |
make Alexa more and more proactive,
link |
where it even has hunches now,
link |
that, oh, looks, hunches, like you left your light on.
link |
Let's say you've gone to your bed
link |
and you left the garage light on.
link |
So it will help you out in these settings, right?
link |
That's smart devices, information, smart devices.
link |
Yeah, so I don't remember everything we had,
link |
but alarms, timers were the big ones.
link |
Like that was, you know,
link |
the timers were very popular right away.
link |
Music also, like you could play song, artist, album,
link |
everything, and so that was like a clear win
link |
in terms of the customer experience.
link |
So that's, again, this is language understanding.
link |
Now things have evolved, right?
link |
So where we want Alexa definitely to be more accurate,
link |
competent, trustworthy,
link |
based on how well it does these core things,
link |
but we have evolved in many different dimensions.
link |
First is what I think of are doing more conversational
link |
for high utility, not just for chat, right?
link |
And there at Remars this year, which is our AI conference,
link |
we launched what is called Alexa Conversations.
link |
That is providing the ability for developers
link |
to author multi turn experiences on Alexa
link |
with no code, essentially,
link |
in terms of the dialogue code.
link |
Initially it was like, you know, all these IVR systems,
link |
you have to fully author if the customer says this,
link |
So the whole dialogue flow is hand authored.
link |
And with Alexa Conversations,
link |
the way it is that you just provide
link |
a sample interaction data with your service or your API,
link |
let's say your Atom tickets that provides a service
link |
for buying movie tickets.
link |
You provide a few examples of how your customers
link |
will interact with your APIs.
link |
And then the dialogue flow is automatically constructed
link |
using a record neural network trained on that data.
link |
So that simplifies the developer experience.
link |
We just launched our preview for the developers
link |
to try this capability out.
link |
And then the second part of it,
link |
which shows even increased utility for customers
link |
is you and I, when we interact with Alexa or any customer,
link |
as I'm coming back to our initial part of the conversation,
link |
the goal is often unclear or unknown to the AI.
link |
If I say, Alexa, what movies are playing nearby?
link |
Am I trying to just buy movie tickets?
link |
Am I actually even,
link |
do you think I'm looking for just movies for curiosity,
link |
whether the Avengers is still in theater or when is it?
link |
Maybe it's gone and maybe it will come on my missed it.
link |
So I may watch it on Prime, right?
link |
Which happened to me.
link |
So from that perspective now,
link |
you're looking into what is my goal?
link |
And let's say I now complete the movie ticket purchase.
link |
Maybe I would like to get dinner nearby.
link |
So what is really the goal here?
link |
Is it night out or is it movies?
link |
As in just go watch a movie?
link |
The answer is, we don't know.
link |
So can Alexa now figuratively have the intelligence
link |
that I think this meta goal is really night out
link |
or at least say to the customer
link |
when you've completed the purchase of movie tickets
link |
from Atom tickets or Fandango,
link |
or pick your anyone.
link |
Then the next thing is,
link |
do you want to get an Uber to the theater, right?
link |
Or do you want to book a restaurant next to it?
link |
And then not ask the same information over and over again,
link |
what time, how many people in your party, right?
link |
So this is where you shift the cognitive burden
link |
from the customer to the AI.
link |
Where it's thinking of what is your,
link |
it anticipates your goal
link |
and takes the next best action to complete it.
link |
Now that's the machine learning problem.
link |
But essentially the way we solve this first instance,
link |
and we have a long way to go to make it scale
link |
to everything possible in the world.
link |
But at least for this situation,
link |
it is from at every instance,
link |
Alexa is making the determination,
link |
whether it should stick with the experience
link |
with Atom tickets or not.
link |
Or offer you based on what you say,
link |
whether either you have completed the interaction,
link |
or you said, no, get me an Uber now.
link |
So it will shift context into another experience or skill
link |
or another service.
link |
So that's a dynamic decision making.
link |
That's making Alexa, you can say more conversational
link |
for the benefit of the customer,
link |
rather than simply complete transactions,
link |
which are well thought through.
link |
You as a customer has fully specified
link |
what you want to be accomplished.
link |
It's accomplishing that.
link |
So it's kind of as we do this with pedestrians,
link |
like intent modeling is predicting
link |
what your possible goals are and what's the most likely goal
link |
and switching that depending on the things you say.
link |
So my question is there,
link |
it seems maybe it's a dumb question,
link |
but it would help a lot if Alexa remembered me,
link |
what I said previously.
link |
Is it trying to use some memories for the customer?
link |
Yeah, it is using a lot of memory within that.
link |
So right now, not so much in terms of,
link |
okay, which restaurant do you prefer, right?
link |
That is a more longterm memory,
link |
but within the short term memory, within the session,
link |
it is remembering how many people did you,
link |
so if you said buy four tickets,
link |
now it has made an implicit assumption
link |
that you were gonna have,
link |
you need at least four seats at a restaurant, right?
link |
So these are the kind of context it's preserving
link |
between these skills, but within that session.
link |
But you're asking the right question
link |
in terms of for it to be more and more useful,
link |
it has to have more longterm memory
link |
and that's also an open question
link |
and again, these are still early days.
link |
So for me, I mean, everybody's different,
link |
but yeah, I'm definitely not representative
link |
of the general population in the sense
link |
that I do the same thing every day.
link |
Like I eat the same,
link |
I do everything the same, the same thing,
link |
wear the same thing clearly, this or the black shirt.
link |
So it's frustrating when Alexa doesn't get what I'm saying
link |
because I have to correct her every time
link |
in the exact same way.
link |
This has to do with certain songs,
link |
like she doesn't know certain weird songs I like
link |
and doesn't know, I've complained to Spotify about this,
link |
talked to the RD, head of RD at Spotify,
link |
it's their way to heaven.
link |
I have to correct it every time.
link |
It doesn't play Led Zeppelin correctly.
link |
It plays cover of Led's of Stairway to Heaven.
link |
You should figure, you should send me your,
link |
next time it fails, feel free to send it to me,
link |
we'll take care of it.
link |
Because Led Zeppelin is one of my favorite brands,
link |
it works for me, so I'm like shocked it doesn't work for you.
link |
This is an official bug report.
link |
I'll put it, I'll make it public,
link |
I'll make everybody retweet it.
link |
We're gonna fix the Stairway to Heaven problem.
link |
Anyway, but the point is,
link |
you know, I'm pretty boring and do the same things,
link |
but I'm sure most people do the same set of things.
link |
Do you see Alexa sort of utilizing that in the future
link |
for improving the experience?
link |
Yes, and not only utilizing,
link |
it's already doing some of it.
link |
We call it, where Alexa is becoming more self learning.
link |
So, Alexa is now auto correcting millions and millions
link |
of utterances in the US
link |
without any human supervision involved.
link |
The way it does it is,
link |
let's take an example of a particular song
link |
didn't work for you.
link |
What do you do next?
link |
You either it played the wrong song
link |
and you said, Alexa, no, that's not the song I want.
link |
Or you say, Alexa play that, you try it again.
link |
And that is a signal to Alexa
link |
that she may have done something wrong.
link |
And from that perspective,
link |
we can learn if there's that failure pattern
link |
or that action of song A was played
link |
when song B was requested.
link |
And it's very common with station names
link |
because play NPR, you can have N be confused as an M.
link |
And then you, for a certain accent like mine,
link |
people confuse my N and M all the time.
link |
And because I have a Indian accent,
link |
they're confusable to humans.
link |
It is for Alexa too.
link |
And in that part, but it starts auto correcting
link |
and we collect, we correct a lot of these automatically
link |
without a human looking at the failures.
link |
So one of the things that's for me missing in Alexa,
link |
I don't know if I'm a representative customer,
link |
but every time I correct it,
link |
it would be nice to know that that made a difference.
link |
You know what I mean?
link |
Like the sort of like, I heard you like a sort of.
link |
Some acknowledgement of that.
link |
We work a lot with Tesla, we study autopilot and so on.
link |
And a large amount of the customers
link |
that use Tesla autopilot,
link |
they feel like they're always teaching the system.
link |
They're almost excited
link |
by the possibility that they're teaching.
link |
I don't know if Alexa customers generally think of it
link |
as they're teaching to improve the system.
link |
And that's a really powerful thing.
link |
Again, I would say it's a spectrum.
link |
Some customers do think that way
link |
and some would be annoyed by Alexa acknowledging that.
link |
So there's, again, no one,
link |
while there are certain patterns,
link |
not everyone is the same in this way.
link |
But we believe that, again, customers helping Alexa
link |
is a tenet for us in terms of improving it.
link |
And some more self learning is by, again,
link |
this is like fully unsupervised, right?
link |
There is no human in the loop and no labeling happening.
link |
And based on your actions as a customer,
link |
Alexa becomes smarter.
link |
Again, it's early days,
link |
but I think this whole area of teachable AI
link |
is gonna get bigger and bigger in the whole space,
link |
especially in the AI assistant space.
link |
So that's the second part
link |
where I mentioned more conversational.
link |
This is more self learning.
link |
The third is more natural.
link |
And the way I think of more natural
link |
is we talked about how Alexa sounds.
link |
And we have done a lot of advances in our text to speech
link |
by using, again, neural network technology
link |
for it to sound very humanlike.
link |
From the individual texture of the sound to the timing,
link |
the tonality, the tone, everything, the whole thing.
link |
I would think in terms of,
link |
there's a lot of controls in each of the places
link |
for how, I mean, the speed of the voice,
link |
the prosthetic patterns,
link |
the actual smoothness of how it sounds,
link |
all of those are factored
link |
and we do a ton of listening tests to make sure.
link |
But naturalness, how it sounds should be very natural.
link |
How it understands requests is also very important.
link |
And in terms of, we have 95,000 skills.
link |
And if we have, imagine that in many of these skills,
link |
you have to remember the skill name
link |
and say, Alexa, ask the tide skill to tell me X.
link |
Now, if you have to remember the skill name,
link |
that means the discovery and the interaction is unnatural.
link |
And we are trying to solve that
link |
by what we think of as, again,
link |
you don't have to have the app metaphor here.
link |
These are not individual apps, right?
link |
Even though they're,
link |
so you're not sort of opening one at a time and interacting.
link |
So it should be seamless because it's voice.
link |
And when it's voice,
link |
you have to be able to understand these requests
link |
independent of the specificity, like a skill name.
link |
what we have done is again,
link |
built a deep learning based capability
link |
where we shortlist a bunch of skills
link |
when you say, Alexa, get me a car.
link |
And then we figure it out, okay,
link |
it's meant for an Uber skill versus a Lyft
link |
or based on your preferences.
link |
And then you can rank the responses from the skill
link |
and then choose the best response for the customer.
link |
So that's on the more natural,
link |
other examples of more natural is like,
link |
we were talking about lists, for instance,
link |
and you don't wanna say, Alexa, add milk,
link |
Alexa, add eggs, Alexa, add cookies.
link |
No, Alexa, add cookies, milk, and eggs
link |
and that in one shot, right?
link |
So that works, that helps with the naturalness.
link |
We talked about memory, like if you said,
link |
you can say, Alexa, remember I have to go to mom's house,
link |
or you may have entered a calendar event
link |
through your calendar that's linked to Alexa.
link |
You don't wanna remember whether it's in my calendar
link |
or did I tell you to remember something
link |
or some other reminder, right?
link |
So you have to now, independent of how customers
link |
create these events, it should just say,
link |
Alexa, when do I have to go to mom's house?
link |
And it tells you when you have to go to mom's house.
link |
Now that's a fascinating problem.
link |
Who's that problem on?
link |
So there's people who create skills.
link |
Who's tasked with integrating all of that knowledge together
link |
so the skills become seamless?
link |
Is it the creators of the skills
link |
or is it an infrastructure that Alexa provides problem?
link |
I think the large problem in terms of making sure
link |
your skill quality is high,
link |
that has to be done by our tools,
link |
because it's just, so these skills,
link |
just to put the context,
link |
they are built through Alexa Skills Kit,
link |
which is a self serve way of building
link |
an experience on Alexa.
link |
This is like any developer in the world
link |
could go to Alexa Skills Kit
link |
and build an experience on Alexa.
link |
Like if you're a Domino's, you can build a Domino's Skills.
link |
For instance, that does pizza ordering.
link |
When you have authored that,
link |
you do want to now,
link |
if people say, Alexa, open Domino's
link |
or Alexa, ask Domino's to get a particular type of pizza,
link |
that will work, but the discovery is hard.
link |
You can't just say, Alexa, get me a pizza.
link |
And then Alexa figures out what to do.
link |
That latter part is definitely our responsibility
link |
in terms of when the request is not fully specific,
link |
how do you figure out what's the best skill
link |
or a service that can fulfill the customer's request?
link |
And it can keep evolving.
link |
Imagine going to the situation I said,
link |
which was the night out planning,
link |
that the goal could be more than that individual request
link |
A pizza ordering could mean a night in,
link |
where you're having an event with your kids
link |
in their house, and you're, so this is,
link |
welcome to the world of conversational AI.
link |
This is super exciting because it's not
link |
the academic problem of NLP,
link |
of natural language processing, understanding, dialogue.
link |
This is like real world.
link |
And the stakes are high in the sense
link |
that customers get frustrated quickly,
link |
people get frustrated quickly.
link |
So you have to get it right,
link |
you have to get that interaction right.
link |
So it's, I love it.
link |
But so from that perspective,
link |
what are the challenges today?
link |
What are the problems that really need to be solved
link |
in the next few years?
link |
First and foremost, as I mentioned,
link |
that get the basics right is still true.
link |
Basically, even the one shot requests,
link |
which we think of as transactional requests,
link |
needs to work magically, no question about that.
link |
If it doesn't turn your light on and off,
link |
you'll be super frustrated.
link |
Even if I can complete the night out for you
link |
and not do that, that is unacceptable as a customer, right?
link |
So that you have to get the foundational understanding
link |
The second aspect when I said more conversational
link |
is as you imagine is more about reasoning.
link |
It is really about figuring out what the latent goal is
link |
of the customer based on what I have the information now
link |
and the history, what's the next best thing to do.
link |
So that's a complete reasoning and decision making problem.
link |
Just like your self driving car,
link |
but the goal is still more finite.
link |
Here it evolves, your environment is super hard
link |
and self driving and the cost of a mistake is huge here,
link |
but there are certain similarities.
link |
But if you think about how many decisions Alexa is making
link |
or evaluating at any given time,
link |
it's a huge hypothesis space.
link |
And we're only talked about so far
link |
about what I think of reactive decision
link |
in terms of you asked for something
link |
and Alexa is reacting to it.
link |
If you bring the proactive part,
link |
which is Alexa having hunches.
link |
So any given instance then it's really a decision
link |
at any given point based on the information.
link |
Alexa has to determine what's the best thing it needs to do.
link |
So these are the ultimate AI problem
link |
about decisions based on the information you have.
link |
Do you think, just from my perspective,
link |
I work a lot with sensing of the human face.
link |
Do you think they'll, and we touched this topic
link |
a little bit earlier, but do you think it'll be a day soon
link |
when Alexa can also look at you to help improve the quality
link |
of the hunch it has, or at least detect frustration
link |
or detect, improve the quality of its perception
link |
of what you're trying to do?
link |
I mean, let me again bring back to what it already does.
link |
We talked about how based on you barge in over Alexa,
link |
clearly it's a very high probability
link |
it must have done something wrong.
link |
That's why you barged in.
link |
The next extension of whether frustration is a signal or not,
link |
of course, is a natural thought
link |
in terms of how that should be in a signal to it.
link |
You can get that from voice.
link |
You can get from voice, but it's very hard.
link |
Like, I mean, frustration as a signal historically,
link |
if you think about emotions of different kinds,
link |
there's a whole field of affective computing,
link |
something that MIT has also done a lot of research in,
link |
And you are now talking about a far field device,
link |
as in you're talking to a distance noisy environment.
link |
And in that environment,
link |
it needs to have a good sense for your emotions.
link |
This is a very, very hard problem.
link |
Very hard problem, but you haven't shied away
link |
from hard problems.
link |
So, Deep Learning has been at the core
link |
of a lot of this technology.
link |
Are you optimistic
link |
about the current Deep Learning approaches
link |
to solving the hardest aspects of what we're talking about?
link |
Or do you think there will come a time
link |
where new ideas need to further,
link |
if we look at reasoning,
link |
so OpenAI, DeepMind,
link |
a lot of folks are now starting to work in reasoning,
link |
trying to see how we can make neural networks reason.
link |
Do you see that new approaches need to be invented
link |
to take the next big leap?
link |
Absolutely, I think there has to be a lot more investment.
link |
And I think in many different ways,
link |
and there are these, I would say,
link |
nuggets of research forming in a good way,
link |
like learning with less data
link |
or like zero short learning, one short learning.
link |
And the active learning stuff you've talked about
link |
is incredible stuff.
link |
So, transfer learning is also super critical,
link |
especially when you're thinking about applying knowledge
link |
from one task to another,
link |
or one language to another, right?
link |
So, these are great pieces.
link |
Deep learning has been useful too.
link |
And now we are sort of marrying deep learning
link |
with transfer learning and active learning.
link |
Of course, that's more straightforward
link |
in terms of applying deep learning
link |
and an active learning setup.
link |
But I do think in terms of now looking
link |
into more reasoning based approaches
link |
is going to be key for our next wave of the technology.
link |
But there is a good news.
link |
The good news is that I think for keeping on
link |
to delight customers, that a lot of it
link |
can be done by prediction tasks.
link |
So, we haven't exhausted that.
link |
So, we don't need to give up
link |
on the deep learning approaches for that.
link |
So, that's just I wanted to sort of point that out.
link |
Creating a rich, fulfilling, amazing experience
link |
that makes Amazon a lot of money
link |
and a lot of everybody a lot of money
link |
because it does awesome things, deep learning is enough.
link |
I don't think, I wouldn't say deep learning is enough.
link |
I think for the purposes of Alexa
link |
accomplished the task for customers.
link |
I'm saying there are still a lot of things we can do
link |
with prediction based approaches that do not reason.
link |
I'm not saying that and we haven't exhausted those.
link |
But for the kind of high utility experiences
link |
that I'm personally passionate about
link |
of what Alexa needs to do, reasoning has to be solved
link |
to the same extent as you can think
link |
of natural language understanding and speech recognition
link |
to the extent of understanding intents
link |
has been how accurate it has become.
link |
But reasoning, we have very, very early days.
link |
Let me ask it another way.
link |
How hard of a problem do you think that is?
link |
I would say hardest of them because again,
link |
the hypothesis space is really, really large.
link |
And when you go back in time, like you were saying,
link |
I wanna, I want Alexa to remember more things
link |
that once you go beyond a session of interaction,
link |
which is by session, I mean a time span,
link |
which is today to versus remembering which restaurant I like.
link |
And then when I'm planning a night out to say,
link |
do you wanna go to the same restaurant?
link |
Now you're up the stakes big time.
link |
And this is where the reasoning dimension
link |
also goes way, way bigger.
link |
So you think the space, we'll be elaborating that
link |
a little bit, just philosophically speaking,
link |
do you think when you reason about trying to model
link |
what the goal of a person is in the context
link |
of interacting with Alexa, you think that space is huge?
link |
It's huge, absolutely huge.
link |
Do you think, so like another sort of devil's advocate
link |
would be that we human beings are really simple
link |
and we all want like just a small set of things.
link |
And so do you think it's possible?
link |
Cause we're not talking about
link |
a fulfilling general conversation.
link |
Perhaps actually the Alexa prize is a little bit after that.
link |
Creating a customer, like there's so many
link |
of the interactions, it feels like are clustered
link |
in groups that are, don't require general reasoning.
link |
I think you're right in terms of the head
link |
of the distribution of all the possible things
link |
customers may wanna accomplish.
link |
But the tail is long and it's diverse, right?
link |
There's many, many long tails.
link |
So from that perspective, I think you have
link |
to solve that problem otherwise,
link |
and everyone's very different.
link |
Like, I mean, we see this already
link |
in terms of the skills, right?
link |
I mean, if you're an average surfer, which I am not, right?
link |
But somebody is asking Alexa about surfing conditions, right?
link |
And there's a skill that is there for them to get to, right?
link |
That tells you that the tail is massive.
link |
Like in terms of like what kind of skills
link |
people have created, it's humongous in terms of it.
link |
And which means there are these diverse needs.
link |
And when you start looking at the combinations
link |
Even if you had pairs of skills and 90,000 choose two,
link |
it's still a big set of combinations.
link |
So I'm saying there's a huge to do here now.
link |
And I think customers are, you know,
link |
wonderfully frustrated with things.
link |
And they have to keep getting to do better things for them.
link |
And they're not known to be super patient.
link |
You have to do it fast.
link |
So you've mentioned the idea of a press release,
link |
the research and development, Amazon Alexa
link |
and Amazon general, you kind of think of what
link |
the future product will look like.
link |
And you kind of make it happen.
link |
You work backwards.
link |
So can you draft for me, you probably already have one,
link |
but can you make up one for 10, 20, 30, 40 years out
link |
that you see the Alexa team putting out
link |
just in broad strokes, something that you dream about?
link |
I think let's start with the five years first, right?
link |
So, and I'll get to the 40 years too.
link |
Cause I'm pretty sure you have a real five year one.
link |
That's why I didn't want to, but yeah,
link |
in broad strokes, let's start with five years.
link |
I think the five year is where, I mean,
link |
I think of in these spaces, it's hard,
link |
especially if you're in the thick of things
link |
to think beyond the five year space,
link |
because a lot of things change, right?
link |
I mean, if you ask me five years back,
link |
will Alexa will be here?
link |
I wouldn't have, I think it has surpassed
link |
my imagination of that time, right?
link |
So I think from the next five years perspective,
link |
from a AI perspective, what we're gonna see
link |
is that notion, which you said goal oriented dialogues
link |
and open domain like Alexa prize.
link |
I think that bridge is gonna get closed.
link |
They won't be different.
link |
And I'll give you why that's the case.
link |
You mentioned shopping.
link |
Do you shop in one shot?
link |
Sure, your double A batteries, paper towels.
link |
Yes, how long does it take for you to buy a camera?
link |
You do ton of research, then you make a decision.
link |
So is that a goal oriented dialogue
link |
when somebody says, Alexa, find me a camera?
link |
Is it simply inquisitiveness, right?
link |
So even in the something that you think of it as shopping,
link |
which you said you yourself use a lot of,
link |
if you go beyond where it's reorders
link |
or items where you sort of are not brand conscious
link |
So that was just in shopping.
link |
Just to comment quickly,
link |
I've never bought anything through Alexa
link |
that I haven't bought before on Amazon on the desktop
link |
after I clicked in a bunch of read a bunch of reviews,
link |
that kind of stuff.
link |
So it's repurchase.
link |
So now you think in,
link |
even for something that you felt like is a finite goal,
link |
I think the space is huge because even products,
link |
the attributes are many,
link |
and you wanna look at reviews,
link |
some on Amazon, some outside,
link |
some you wanna look at what CNET is saying
link |
or another consumer forum is saying
link |
about even a product for instance, right?
link |
So that's just shopping where you could argue
link |
the ultimate goal is sort of known.
link |
And we haven't talked about Alexa,
link |
what's the weather in Cape Cod this weekend, right?
link |
So why am I asking that weather question, right?
link |
So I think of it as how do you complete goals
link |
with minimum steps for our customers, right?
link |
And when you think of it that way,
link |
the distinction between goal oriented and conversations
link |
for open domain say goes away.
link |
I may wanna know what happened
link |
in the presidential debate, right?
link |
And is it I'm seeking just information
link |
or I'm looking at who's winning the debates, right?
link |
So these are all quite hard problems.
link |
So even the five year horizon problem,
link |
I'm like, I sure hope we'll solve these.
link |
And you're optimistic because that's a hard problem.
link |
The reasoning enough to be able to help explore
link |
complex goals that are beyond something simplistic.
link |
That feels like it could be, well, five years is a nice.
link |
Is a nice bar for it, right?
link |
I think you will, it's a nice ambition
link |
and do we have press releases for that?
link |
Absolutely, can I tell you what specifically
link |
the roadmap will be?
link |
And what, and will we solve all of it
link |
in the five year space?
link |
No, this is, we'll work on this forever actually.
link |
This is the hardest of the AI problems
link |
and I don't see that being solved even in a 40 year horizon
link |
because even if you limit to the human intelligence,
link |
we know we are quite far from that.
link |
In fact, every aspects of our sensing to neural processing,
link |
to how brain stores information and how it processes it,
link |
we don't yet know how to represent knowledge, right?
link |
So we are still in those early stages.
link |
So I wanted to start, that's why at the five year,
link |
because the five year success would look like that
link |
in solving these complex goals.
link |
And the 40 year would be where it's just natural
link |
to talk to these in terms of more of these complex goals.
link |
Right now, we've already come to the point
link |
where these transactions you mentioned
link |
of asking for weather or reordering something
link |
or listening to your favorite tune,
link |
it's natural for you to ask Alexa.
link |
It's now unnatural to pick up your phone, right?
link |
And that I think is the first five year transformation.
link |
The next five year transformation would be,
link |
okay, I can plan my weekend with Alexa
link |
or I can plan my next meal with Alexa
link |
or my next night out with seamless effort.
link |
So just to pause and look back at the big picture of it all.
link |
It's a, you're a part of a large team
link |
that's creating a system that's in the home
link |
that's not human, that gets to interact with human beings.
link |
So we human beings, we these descendants of apes
link |
have created an artificial intelligence system
link |
that's able to have conversations.
link |
I mean, that to me, the two most transformative robots
link |
of this century, I think will be autonomous vehicles,
link |
but they're a little bit transformative
link |
in a more boring way.
link |
I think conversational agents in the home
link |
is like an experience.
link |
How does that make you feel?
link |
That you're at the center of creating that?
link |
Do you sit back in awe sometimes?
link |
What is your feeling about the whole mess of it?
link |
Can you even believe that we're able
link |
to create something like this?
link |
I think it's a privilege.
link |
I'm so fortunate like where I ended up, right?
link |
And it's been a long journey.
link |
Like I've been in this space for a long time in Cambridge,
link |
right, and it's so heartwarming to see
link |
the kind of adoption conversational agents are having now.
link |
Five years back, it was almost like,
link |
should I move out of this because we are unable
link |
to find this killer application that customers would love
link |
that would not simply be a good to have thing
link |
And it's so fulfilling to see it make a difference
link |
to millions and billions of people worldwide.
link |
The good thing is that it's still very early.
link |
So I have another 20 years of job security
link |
doing what I love.
link |
Like, so I think from that perspective,
link |
I tell every researcher that joins
link |
or every member of my team,
link |
that this is a unique privilege.
link |
Like I think, and we have,
link |
and I would say not just launching Alexa in 2014,
link |
which was first of its kind.
link |
Along the way we have, when we launched Alexa Skills Kit,
link |
it became democratizing AI.
link |
When before that there was no good evidence
link |
of an SDK for speech and language.
link |
Now we are coming to this where you and I
link |
are having this conversation where I'm not saying,
link |
oh, Lex, planning a night out with an AI agent, impossible.
link |
I'm saying it's in the realm of possibility
link |
and not only possibility, we'll be launching this, right?
link |
So some elements of that, it will keep getting better.
link |
We know that is a universal truth.
link |
Once you have these kinds of agents out there being used,
link |
they get better for your customers.
link |
And I think that's where,
link |
I think the amount of research topics
link |
we are throwing out at our budding researchers
link |
is just gonna be exponentially hard.
link |
And the great thing is you can now get immense satisfaction
link |
by having customers use it,
link |
not just a paper in NeurIPS or another conference.
link |
I think everyone, myself included,
link |
are deeply excited about that future.
link |
So I don't think there's a better place to end, Rohit.
link |
Thank you so much for talking to us.
link |
Thank you so much.
link |
Thank you, same here.
link |
Thanks for listening to this conversation
link |
with Rohit Prasad.
link |
And thank you to our presenting sponsor, Cash App.
link |
Download it, use code LEGSPodcast,
link |
you'll get $10 and $10 will go to FIRST,
link |
a STEM education nonprofit
link |
that inspires hundreds of thousands of young minds
link |
to learn and to dream of engineering our future.
link |
If you enjoy this podcast, subscribe on YouTube,
link |
give it five stars on Apple Podcast,
link |
support it on Patreon, or connect with me on Twitter.
link |
And now let me leave you with some words of wisdom
link |
from the great Alan Turing.
link |
Sometimes it is the people no one can imagine anything of
link |
who do the things no one can imagine.
link |
Thank you for listening and hope to see you next time.