back to indexElon Musk: Tesla Autopilot | Lex Fridman Podcast #18
link |
The following is a conversation with Elon Musk.
link |
He's the CEO of Tesla, SpaceX, Neuralink,
link |
and a cofounder of several other companies.
link |
This conversation is part
link |
of the artificial intelligence podcast.
link |
The series includes leading researchers
link |
in academia and industry, including CEOs and CTOs
link |
of automotive, robotics, AI, and technology companies.
link |
This conversation happened after the release of the paper
link |
from our group at MIT on driver functional vigilance
link |
during use of Tesla's autopilot.
link |
The Tesla team reached out to me,
link |
offering a podcast conversation with Mr. Musk.
link |
I accepted with full control of questions I could ask
link |
and the choice of what is released publicly.
link |
I ended up editing out nothing of substance.
link |
I've never spoken with Elon before this conversation,
link |
publicly or privately.
link |
Neither he nor his companies have any influence
link |
on my opinion, nor on the rigor and integrity
link |
of the scientific method that I practice
link |
in my position at MIT.
link |
Tesla has never financially supported my research
link |
and I've never owned a Tesla vehicle.
link |
I've never owned Tesla stock.
link |
This podcast is not a scientific paper.
link |
It is a conversation.
link |
I respect Elon as I do all other leaders
link |
and engineers I've spoken with.
link |
We agree on some things and disagree on others.
link |
My goal is always with these conversations
link |
is to understand the way the guest sees the world.
link |
One particular point of this agreement
link |
in this conversation was the extent
link |
to which camera based driver monitoring
link |
will improve outcomes and for how long
link |
it will remain relevant for AI assisted driving.
link |
As someone who works on and is fascinated
link |
by human centered artificial intelligence,
link |
I believe that if implemented and integrated effectively,
link |
camera based driver monitoring is likely to be of benefit
link |
in both the short term and the long term.
link |
In contrast, Elon and Tesla's focus
link |
is on the improvement of autopilot
link |
such that its statistical safety benefits
link |
override any concern of human behavior and psychology.
link |
Elon and I may not agree on everything
link |
but I deeply respect the engineering
link |
and innovation behind the efforts that he leads.
link |
My goal here is to catalyze a rigorous, nuanced
link |
and objective discussion in industry and academia
link |
on AI assisted driving,
link |
one that ultimately makes for a safer and better world.
link |
And now here's my conversation with Elon Musk.
link |
What was the vision, the dream of autopilot
link |
when in the beginning the big picture system level
link |
when it was first conceived
link |
and started being installed in 2014
link |
in the hardware and the cars?
link |
What was the vision, the dream?
link |
I would characterize the vision or dream
link |
simply that there are obviously two
link |
massive revolutions in the automobile industry.
link |
One is the transition to electrification
link |
and then the other is autonomy.
link |
And it became obvious to me that in the future
link |
any car that does not have autonomy
link |
I would be about as useful as a horse.
link |
Which is not to say that there's no use, it's just rare
link |
and somewhat idiosyncratic
link |
if somebody has a horse at this point.
link |
It's just obvious that cars will drive themselves completely.
link |
It's just a question of time
link |
and if we did not participate in the autonomy revolution
link |
then our cars would not be useful to people
link |
relative to cars that are autonomous.
link |
I mean an autonomous car is arguably worth
link |
five to 10 times more than a car that which is not autonomous.
link |
Turns out what you mean by long term,
link |
but let's say at least for the next five years
link |
So there are a lot of very interesting design choices
link |
with autopilot early on.
link |
First is showing on the instrument cluster
link |
or in the Model 3 on the center stack display
link |
what the combined sensor suite sees.
link |
What was the thinking behind that choice?
link |
Was there a debate?
link |
What was the process?
link |
The whole point of the display is to provide a health check
link |
on the vehicle's perception of reality.
link |
So the vehicle's taking information for a bunch of sensors
link |
primarily cameras, but also radar and ultrasonics,
link |
And then that information is then rendered into vector space
link |
and that with a bunch of objects with properties
link |
like lane lines and traffic lights and other cars.
link |
And then in vector space that is re rendered onto a display
link |
so you can confirm whether the car knows
link |
what's going on or not by looking out the window.
link |
Right, I think that's an extremely powerful thing
link |
for people to get an understanding
link |
to become one with the system
link |
and understanding what the system is capable of.
link |
Now, have you considered showing more?
link |
So if we look at the computer vision,
link |
you know, like road segmentation, lane detection,
link |
vehicle detection, object detection, underlying the system,
link |
there is at the edges some uncertainty.
link |
Have you considered revealing the parts
link |
that the uncertainty in the system, the sort of problem
link |
these associated with say image recognition
link |
or something like that?
link |
Yeah, so right now it shows like the vehicles
link |
and the vicinity of very clean crisp image
link |
and people do confirm that there's a car in front of me
link |
and the system sees there's a car in front of me
link |
but to help people build an intuition
link |
of what computer vision is by showing some of the uncertainty.
link |
Well, I think it's, in my car,
link |
I always look at the sort of the debug view
link |
and there's two debug views.
link |
One is augmented vision, which I'm sure you've seen
link |
where it's basically, we draw boxes and labels
link |
around objects that are recognized.
link |
And then there's what we call the visualizer,
link |
which is basically a vector space representation
link |
summing up the input from all sensors.
link |
That does not show any pictures,
link |
but it shows all of the,
link |
it basically shows the cause view of the world in vector space.
link |
But I think this is very difficult for normal people to understand.
link |
They would not know what they're looking at.
link |
So it's almost an HMI challenge.
link |
The current things that are being displayed
link |
is optimized for the general public understanding
link |
of what the system is capable of.
link |
It's like if you have no idea how computer vision works
link |
or anything, you can still look at the screen
link |
and see if the car knows what's going on.
link |
And then if you're a development engineer
link |
or if you have the development build like I do,
link |
then you can see all the debug information.
link |
But those would just be total diverse to most people.
link |
What's your view on how to best distribute effort?
link |
So there's three, I would say, technical aspects of autopilot
link |
that are really important.
link |
So it's the underlying algorithms,
link |
like the neural network architecture.
link |
There's the data that's trained on
link |
and then there's the hardware development.
link |
There may be others.
link |
But so look, algorithm, data, hardware.
link |
You only have so much money, only have so much time.
link |
What do you think is the most important thing
link |
to allocate resources to?
link |
Do you see it as pretty evenly distributed
link |
between those three?
link |
We automatically get fast amounts of data
link |
because all of our cars have
link |
eight external facing cameras and radar
link |
and usually 12 ultrasonic sensors, GPS, obviously,
link |
And so we basically have a fleet that has,
link |
we've got about 400,000 cars on the road
link |
that have that level of data.
link |
I think you keep quite close track of it, actually.
link |
So we're approaching half a million cars
link |
on the road that have the full sensor suite.
link |
So this is, I'm not sure how many other cars
link |
on the road have this sensor suite,
link |
but I'd be surprised if it's more than 5,000,
link |
which means that we have 99% of all the data.
link |
So there's this huge inflow of data.
link |
Absolutely, massive inflow of data.
link |
And then it's taken about three years,
link |
but now we've finally developed our full self driving computer,
link |
an order of magnitude as much as the NVIDIA system
link |
that we currently have in the cars.
link |
And it's really just to use it,
link |
you unplug the NVIDIA computer and plug the Tesla computer in.
link |
And it's, in fact, we're not even,
link |
we're still exploring the boundaries of its capabilities,
link |
but we're able to run the cameras at full frame rate,
link |
full resolution, not even crop of the images,
link |
and it's still got headroom, even on one of the systems.
link |
The full self driving computer is really two computers,
link |
two systems on a chip that are fully redundant.
link |
So you could put a bolt through basically any part of that system
link |
and it still works.
link |
The redundancy, are they perfect copies of each other?
link |
Or also it's purely for redundancy
link |
as opposed to an arguing machine kind of architecture
link |
where they're both making decisions.
link |
This is purely for redundancy.
link |
I think it's more like, if you have a twin engine aircraft,
link |
commercial aircraft,
link |
this system will operate best if both systems are operating,
link |
but it's capable of operating safely on one.
link |
So, but as it is right now, we can just run,
link |
we haven't even hit the edge of performance,
link |
so there's no need to actually distribute
link |
functionality across both SoCs.
link |
We can actually just run a full duplicate on each one.
link |
You haven't really explored or hit the limit of the system?
link |
Not yet, hit the limit now.
link |
So the magic of deep learning is that it gets better with data.
link |
You said there's a huge inflow of data,
link |
but the thing about driving the really valuable data
link |
to learn from is the edge cases.
link |
So how do you, I mean, I've heard you talk somewhere about
link |
autopilot disengagement as being an important moment of time to use.
link |
Is there other edge cases or perhaps can you speak to those edge cases,
link |
what aspects of them might be valuable,
link |
or if you have other ideas,
link |
how to discover more and more and more edge cases in driving?
link |
Well, there's a lot of things that I learned.
link |
There are certainly edge cases where I say somebody's on autopilot
link |
and they take over.
link |
And then, okay, that's a trigger that goes to a system that says,
link |
okay, do they take over for convenience
link |
or do they take over because the autopilot wasn't working properly?
link |
There's also, like let's say we're trying to figure out
link |
what is the optimal spline for traversing an intersection.
link |
Then the ones where there are no interventions
link |
and are the right ones.
link |
So you then say, okay, when it looks like this, do the following.
link |
And then you get the optimal spline for a complex,
link |
now getting a complex intersection.
link |
So that's for, there's kind of the common case.
link |
You're trying to capture a huge amount of samples
link |
of a particular intersection, how one thing went right.
link |
And then there's the edge case where, as you said,
link |
not for convenience, but something didn't go exactly right.
link |
Somebody took over, somebody asserted manual control from autopilot.
link |
And really, like the way to look at this is view all input is error.
link |
If the user had to do input, it does something.
link |
All input is error.
link |
That's a powerful line to think of it that way,
link |
because it may very well be error.
link |
But if you want to exit the highway,
link |
or if you want to, it's a navigation decision
link |
that all autopilot is not currently designed to do,
link |
then the driver takes over.
link |
How do you know the difference?
link |
Yeah, that's going to change with navigate and autopilot,
link |
which we've just released, and without stall confirm.
link |
So the navigation, like lane change based,
link |
like asserting control in order to do a lane change,
link |
or exit a freeway, or doing highway interchange,
link |
the vast majority of that will go away with the release
link |
that just went out.
link |
Yeah, I don't think people quite understand
link |
how big of a step that is.
link |
If you drive the car, then you do.
link |
So you still have to keep your hands on the steering wheel
link |
currently when it does the automatic lane change?
link |
So there's these big leaps through the development of autopilot
link |
through its history,
link |
and what stands out to you as the big leaps?
link |
I would say this one,
link |
navigate and autopilot without having to confirm,
link |
It is a huge leap.
link |
It also automatically overtakes slow cars.
link |
So it's both navigation and seeking the fastest lane.
link |
So it'll overtake a slow cause and exit the freeway
link |
and take highway interchanges.
link |
And then we have traffic light recognition,
link |
which is introduced initially as a warning.
link |
I mean, on the development version that I'm driving,
link |
the car fully stops and goes at traffic lights.
link |
So those are the steps, right?
link |
You just mentioned something sort of
link |
including a step towards full autonomy.
link |
What would you say are the biggest technological roadblocks
link |
to full cell driving?
link |
Actually, I don't think...
link |
I think we just...
link |
the full cell driving computer that we just...
link |
what we call the FSD computer.
link |
That's now in production.
link |
So if you order any Model SRX or any Model 3
link |
that has the full cell driving package,
link |
you'll get the FSD computer.
link |
That's important to have enough base computation.
link |
Then refining the neural net and the control software.
link |
But all of that can just be provided as an over there update.
link |
The thing that's really profound,
link |
and where I'll be emphasizing at the...
link |
that investor day that we're having focused on autonomy,
link |
is that the cars currently being produced,
link |
or the hardware currently being produced,
link |
is capable of full cell driving.
link |
But capable is an interesting word because...
link |
Like the hardware is.
link |
And as we refine the software,
link |
the capabilities will increase dramatically
link |
and then the reliability will increase dramatically
link |
and then it will receive regulatory approval.
link |
So essentially buying a car today is an investment in the future.
link |
You're essentially buying...
link |
I think the most profound thing is that
link |
if you buy a Tesla today,
link |
I believe you are buying an appreciating asset,
link |
not a depreciating asset.
link |
So that's a really important statement there
link |
because if hardware is capable enough,
link |
that's the hard thing to upgrade usually.
link |
So then the rest is a software problem.
link |
Yes. Software has no marginal cost, really.
link |
But what's your intuition on the software side?
link |
How hard are the remaining steps
link |
to get it to where...
link |
you know, the experience,
link |
not just the safety, but the full experience
link |
is something that people would enjoy.
link |
I think people would enjoy it very much on the highways.
link |
It's a total game changer for quality of life,
link |
for using Tesla autopilot on the highways.
link |
So it's really just extending that functionality to city streets,
link |
adding in the traffic light recognition,
link |
navigating complex intersections,
link |
and then being able to navigate complicated parking lots
link |
so the car can exit a parking space
link |
and come and find you even if it's in a complete maze of a parking lot.
link |
And then you can just drop you off and find a parking spot by itself.
link |
Yeah, in terms of enjoyability
link |
and something that people would actually find a lot of use from,
link |
the parking lot is a really...
link |
it's rich of annoyance when you have to do it manually,
link |
so there's a lot of benefit to be gained from automation there.
link |
So let me start injecting the human into this discussion a little bit.
link |
So let's talk about full autonomy.
link |
If you look at the current level four vehicles,
link |
being Tesla and road like Waymo and so on,
link |
they're only technically autonomous.
link |
They're really level two systems
link |
with just a different design philosophy
link |
because there's always a safety driver in almost all cases
link |
and they're monitoring the system.
link |
Maybe Tesla's full self driving
link |
is still for a time to come,
link |
requiring supervision of the human being.
link |
So its capabilities are powerful enough to drive,
link |
but nevertheless requires the human to still be supervising
link |
just like a safety driver is in a other fully autonomous vehicles.
link |
I think it will require detecting hands on wheel
link |
or at least six months or something like that from here.
link |
Really it's a question of like,
link |
from a regulatory standpoint,
link |
how much safer than a person does autopilot need to be
link |
for it to be okay to not monitor the car?
link |
And this is a debate that one can have.
link |
But you need a large amount of data
link |
so you can prove with high confidence,
link |
statistically speaking,
link |
that the car is dramatically safer than a person
link |
and that adding in the person monitoring
link |
does not materially affect the safety.
link |
So it might need to be like two or three hundred percent safer than a person.
link |
And how do you prove that?
link |
Incidence per mile.
link |
So crashes and fatalities.
link |
Yeah, fatalities would be a factor,
link |
but there are just not enough fatalities
link |
to be statistically significant at scale.
link |
But there are enough crashes,
link |
there are far more crashes than there are fatalities.
link |
So you can assess what is the probability of a crash,
link |
then there's another step which probability of injury
link |
and probability of permanent injury
link |
and probability of death.
link |
And all of those need to be much better than a person
link |
by at least perhaps two hundred percent.
link |
And you think there's the ability to have a healthy discourse
link |
with the regulatory bodies on this topic?
link |
I mean, there's no question that regulators pay
link |
disproportionate amount of attention to that which generates press.
link |
This is just an objective fact.
link |
And Tesla generates a lot of press.
link |
So that, you know, in the United States,
link |
there's I think almost 40,000 automotive deaths per year.
link |
But if there are four in Tesla,
link |
they'll probably receive a thousand times more press
link |
So the psychology of that is actually fascinating.
link |
I don't think we'll have enough time to talk about that,
link |
but I have to talk to you about the human side of things.
link |
So myself and our team at MIT recently released a paper
link |
on functional vigilance of drivers while using autopilot.
link |
This is work we've been doing since autopilot was first
link |
released publicly over three years ago,
link |
collecting video driver faces and driver body.
link |
So I saw that you tweeted a quote from the abstract
link |
so I can at least guess that you've glanced at it.
link |
Can I talk you through what we found?
link |
Okay, so it appears that in the data that we've collected
link |
that drivers are maintaining functional vigilance
link |
such that we're looking at 18,000 disengagement
link |
from autopilot, 18,900 and annotating were they able
link |
to take over control in a timely manner?
link |
So they were there present looking at the road
link |
to take over control.
link |
Okay, so this goes against what many would predict
link |
from the body of literature on vigilance with automation.
link |
Now the question is, do you think these results
link |
hold across the broader population?
link |
So ours is just a small subset.
link |
Do you think one of the criticism is that there's
link |
a small minority of drivers that may be highly responsible
link |
where their vigilance decrement would increase
link |
with autopilot use?
link |
I think this is all really going to be swept.
link |
I mean, the system's improving so much so fast
link |
that this is going to be a mood point very soon
link |
where vigilance is, if something's many times safer
link |
than a person, then adding a person does,
link |
the effect on safety is limited.
link |
And in fact, it could be negative.
link |
That's really interesting.
link |
So the fact that a human may, some percent of the population
link |
may exhibit a vigilance decrement will not affect
link |
overall statistics numbers of safety.
link |
No, in fact, I think it will become very, very quickly,
link |
maybe even towards the end of this year,
link |
but I'd say I'd be shocked if it's not next year,
link |
at the latest, that having a human intervene
link |
will increase safety.
link |
I can imagine if you're an elevator.
link |
Now, it used to be that there were elevator operators
link |
and you couldn't go on an elevator by yourself
link |
and work the lever to move between floors.
link |
And now, nobody wants an elevator operator
link |
because the automated elevator that stops the floors
link |
is much safer than the elevator operator.
link |
And in fact, it would be quite dangerous
link |
if someone with a lever that can move
link |
the elevator between floors.
link |
So that's a really powerful statement
link |
and a really interesting one.
link |
But I also have to ask, from a user experience
link |
and from a safety perspective,
link |
one of the passions for me algorithmically
link |
is camera based detection of sensing the human,
link |
but detecting what the driver is looking at,
link |
cognitive load, body pose.
link |
On the computer vision side, that's a fascinating problem,
link |
but there's many in industry who believe
link |
you have to have camera based driver monitoring.
link |
Do you think this could be benefit gained
link |
from driver monitoring?
link |
If you have a system that's out or below
link |
human level reliability, then driver monitoring makes sense.
link |
But if your system is dramatically better,
link |
more reliable than a human,
link |
then driver monitoring is not help much.
link |
And like I said, you wouldn't want someone into...
link |
You wouldn't want someone in the elevator.
link |
If you're in an elevator, do you really want someone
link |
with a big lever, some random person operating
link |
in the elevator between floors?
link |
I wouldn't trust that.
link |
I would rather have the buttons.
link |
Okay, you're optimistic about the pace
link |
of improvement of the system.
link |
From what you've seen with the full self driving car,
link |
The rate of improvement is exponential.
link |
So one of the other very interesting design choices
link |
early on that connects to this is the operational
link |
design domain of autopilot.
link |
So where autopilot is able to be turned on.
link |
So contrast another vehicle system that we're studying
link |
is the Cadillac SuperCrew system.
link |
That's in terms of ODD, very constrained to this particular
link |
kinds of highways, well mapped, tested,
link |
but it's much narrower than the ODD of Tesla vehicles.
link |
That's good. That's a good line.
link |
What was the design decision
link |
in that different philosophy of thinking where...
link |
There's pros and cons.
link |
What we see with a wide ODD is Tesla drivers are able
link |
to explore more the limitations of the system,
link |
at least early on, and they understand together
link |
the instrument cluster display.
link |
They start to understand what are the capabilities.
link |
So that's a benefit.
link |
The con is you're letting drivers use it basically anywhere.
link |
Well, anyways, I could detect lanes with confidence.
link |
Was there a philosophy design decisions that were challenging
link |
that were being made there?
link |
Or from the very beginning, was that done on purpose
link |
Frankly, it's pretty crazy letting people drive
link |
a two ton death machine manually.
link |
In the future, people will be like,
link |
I can't believe anyone was just allowed to drive
link |
one of these two ton death machines
link |
and they just drive wherever they wanted,
link |
just like elevators.
link |
You just move the elevator with the lever wherever you want.
link |
It can stop at halfway between floors if you want.
link |
It's pretty crazy.
link |
So it's going to seem like a mad thing in the future
link |
that people were driving cars.
link |
So I have a bunch of questions about the human psychology,
link |
about behavior and so on.
link |
Because you have faith in the AI system,
link |
not faith, but both on the hardware side
link |
and the deep learning approach of learning from data
link |
will make it just far safer than humans.
link |
Recently, there are a few hackers who tricked autopilot
link |
to act in unexpected ways with adversarial examples.
link |
So we all know that neural network systems
link |
are very sensitive to minor disturbances
link |
to these adversarial examples on input.
link |
Do you think it's possible to defend against something like this
link |
Can you elaborate on the confidence behind that answer?
link |
Well, a neural net is just like a basic bunch of matrix math.
link |
You have to be like a very sophisticated,
link |
somebody who really understands neural nets
link |
and basically reverse engineer how the matrix is being built
link |
and then create a little thing that just exactly causes
link |
the matrix math to be slightly off.
link |
But it's very easy to then block that by having
link |
basically anti negative recognition.
link |
It's like if the system sees something that looks like a matrix hack
link |
excluded, it's such an easy thing to do.
link |
So learn both on the valid data and the invalid data.
link |
So basically learn on the adversarial examples
link |
to be able to exclude them.
link |
Yeah, you basically want to both know what is a car
link |
and what is definitely not a car.
link |
You train for this is a car and this is definitely not a car.
link |
Those are two different things.
link |
People have no idea neural nets really.
link |
They probably think neural nets involves like, you know,
link |
fishing net or something.
link |
So as you know, taking a step beyond just Tesla and autopilot,
link |
current deep learning approaches still seem in some ways
link |
to be far from general intelligence systems.
link |
Do you think the current approaches will take us to general intelligence
link |
or do totally new ideas need to be invented?
link |
I think we're missing a few key ideas for general intelligence,
link |
general, artificial general intelligence.
link |
But it's going to be upon us very quickly
link |
and then we'll need to figure out what shall we do
link |
if we even have that choice.
link |
But it's amazing how people can't differentiate between, say,
link |
the narrow AI that, you know, allows a car to figure out
link |
what a lane line is and, you know,
link |
and navigate streets versus general intelligence.
link |
Like these are just very different things.
link |
Like your toaster and your computer are both machines,
link |
but one's much more sophisticated than another.
link |
You're confident with Tesla you can create the world's best toaster.
link |
The world's best toaster, yes.
link |
The world's best self driving.
link |
To me, right now, this seems game set match.
link |
I don't, I mean, that's, I don't want to be complacent or overconfident,
link |
but that's what it appears.
link |
That is just literally what it, how it appears right now.
link |
It could be wrong, but it appears to be the case
link |
that Tesla is vastly ahead of everyone.
link |
Do you think we will ever create an AI system
link |
that we can love and loves us back in a deep meaningful way
link |
like in the movie, Her?
link |
I think AI will be capable of convincing you
link |
to fall in love with it very well.
link |
And that's different than us humans?
link |
You know, we start getting into a metaphysical question
link |
and do emotions and thoughts exist in a different realm than the physical.
link |
And maybe they do, maybe they don't.
link |
I don't know, but from a physics standpoint,
link |
I tend to think of things, you know,
link |
like physics was my main sort of training.
link |
And from a physics standpoint,
link |
essentially, if it loves you in a way
link |
that you can't tell whether it's real or not, it is real.
link |
That's a physics view of love.
link |
If you cannot prove that it does not,
link |
if there's no test that you can apply
link |
that would make it allow you to tell the difference,
link |
then there is no difference.
link |
And it's similar to seeing our world as simulation.
link |
There may not be a test to tell the difference
link |
between what the real world and the simulation.
link |
And therefore, from a physics perspective,
link |
it might as well be the same thing.
link |
There may be ways to test whether it's a simulation.
link |
There might be, I'm not saying there aren't,
link |
but you could certainly imagine that a simulation could correct
link |
that once an entity in the simulation
link |
found a way to detect the simulation,
link |
it could either restart, you know,
link |
pause the simulation, start a new simulation,
link |
or do one of many other things that then corrects for that error.
link |
So when maybe you or somebody else creates an AGI system
link |
and you get to ask her one question,
link |
what would that question be?
link |
What's outside the simulation?
link |
Milan, thank you so much for talking today.
link |
All right, thank you.