Sometimes tweets get lost in the hustle and bustle of the ephemeral timeline, so I thought it would be fun to collect some of my favorite tweets. The full collection of my tweets, of course, is here.
Plan is to throw a party in the Andromeda galaxy 1B years from now. Everyone welcome, except for those who litter
— Andrej Karpathy (@karpathy) December 3, 2022
How long until we measure wealth inequality in FLOPS
— Andrej Karpathy (@karpathy) December 6, 2022
Potentially nitpicky but competitive advantage in AI goes not so much to those with data but those with a data engine: iterated data aquisition, re-training, evaluation, deployment, telemetry. And whoever can spin it fastest. Slide from Tesla to ~illustrate but concept is general pic.twitter.com/6O2KxZBg17
— Andrej Karpathy (@karpathy) December 5, 2022
If previous neural nets are special-purpose computers designed for a specific task, GPT is a general-purpose computer, reconfigurable at run-time to run natural language programs. Programs are given in prompts (a kind of inception). GPT runs the program by completing the document
— Andrej Karpathy (@karpathy) November 18, 2022
🤔automated companies made up just of LLMs (CEO LLM, manager LLMs, IC LLMs), running asynchronously and communicating over a Slack-like interface in text...
— Andrej Karpathy (@karpathy) November 17, 2022
Not sure if there is a name for (I think no) the feeling of a deep discomfort when the probability of an interruption is > 0 while trying to work. It’s a kind of fear.
— Andrej Karpathy (@karpathy) November 10, 2022
The Transformer is a magnificient neural network architecture because it is a general-purpose differentiable computer. It is simultaneously:
— Andrej Karpathy (@karpathy) October 19, 2022
1) expressive (in the forward pass)
2) optimizable (via backpropagation+gradient descent)
3) efficient (high parallelism compute graph)
It would be best if people made strong statements that are understood to be only 90% true, and ignore the counterexample police. This saves time and makes direction of statements clear.
— Andrej Karpathy (@karpathy) September 27, 2022
🤔 vision may be a high-enough throughput input to the brain that is also sufficiently connected to its reward modules that AI-assisted generative art may converge to wire-heading. Probably nothing
— Andrej Karpathy (@karpathy) August 30, 2022
There's something deep and borderline unintuitive about most real-world problems just happening to be (informally) NP-Complete: hard to solve but easy to verify a solution to. It's this asymmetry that makes progress possible, as culture can record previous computational work.
— Andrej Karpathy (@karpathy) August 14, 2022
Earth as a dynamical system is a really bad computer. A lot of information processing is concentrated in a few tiny compute nodes (brains, chips) with terrible interconnects, even as bad as use of physical translation and air pressure waves. And powered primitively by combustion.
— Andrej Karpathy (@karpathy) August 13, 2022
Human vision extracts only a tiny amount of information from surrounding EM radiation. Sensitive to narrow wavelength band. Nowhere near a full spectrogram, just ~gaussian sampled at 3 (SML) frequencies. With ok resolution in fovea. Without polarization. At just 2 points. Sad ;(
— Andrej Karpathy (@karpathy) July 23, 2022
AGI is a feeling. Like love. Stop trying to define it.
— Andrej Karpathy (@karpathy) June 4, 2022
real-world data distribution is ~N(0,1)
— Andrej Karpathy (@karpathy) May 22, 2022
good dataset is ~U(-2,2)
Looking back, my most valuable college classes were physics, but for general problem solving intuitions alone:
— Andrej Karpathy (@karpathy) April 16, 2022
- modeling systems with increasingly more complex terms
- extrapolating variables to check behaviors at limits
- pursuit of the simplest most powerful solutions
...
The time evolution of human condition (approximated as a gaussian) is more that of expanding variance than that of moving mean.
— Andrej Karpathy (@karpathy) April 16, 2022
Reading sci-fi with humanoid aliens who speak English and have faces is what others must be experiencing as they hear a fork scratching a plate.
— Andrej Karpathy (@karpathy) April 11, 2022
I don’t think a regular person appreciates how insane it is that computers work. I propose we stare at each other mind-blown for about 1 hour/day, in small groups in circles around a chip on a pedestal, appreciating that we can coerce physics to process information like that.
— Andrej Karpathy (@karpathy) March 19, 2022
Humans program each other by prompt engineering too, so it's interesting to see that form of programming becoming increasingly prevalent with computers. Programming turns into a kind of applied psychology of neural nets, biological or synthetic.
— Andrej Karpathy (@karpathy) February 26, 2022
Everybody gangsta until real-world deployment in production.
— Andrej Karpathy (@karpathy) January 26, 2022
(OH in a chat somewhere a while ago :D)
Actually the ATP Synthase (and proton gradients) is by far the coolest molecular invention of life, followed by the Ribosome and then maaaaybe DNA, despite it being so iconic and getting so much press. Tell your friends.
— Andrej Karpathy (@karpathy) December 13, 2021
The ongoing consolidation in AI is incredible. Thread: ➡️ When I started ~decade ago vision, speech, natural language, reinforcement learning, etc. were completely separate; You couldn't read papers across areas - the approaches were completely different, often not even ML based.
— Andrej Karpathy (@karpathy) December 8, 2021
Any binary variable you create in an API for something you'll eventually want to generalize to an int. Then you'll want to upgrade that to a string. Then to a tuple. Then you realize it should be a dict. Eventually it will become a class.
— Andrej Karpathy (@karpathy) December 4, 2021
Various computational workloads exhibit different amounts of parallelism and are accordingly best scheduled on CPU or GPU. Same is true for human organizations/projects/tasks, but it seems rarely analyzed from that perspective. Compiling a project to run fast on people is hard :)
— Andrej Karpathy (@karpathy) September 28, 2021
Deep Learning is a form of human-assisted but mostly constraint-driven software development. It works because a particular smooth relaxation of program space allows a surprisingly efficient and effective local search. Something like that, my favorite definition.
— Andrej Karpathy (@karpathy) September 19, 2021
Browsing the web, 2021 pic.twitter.com/VKJ7OkZ3nr
— Andrej Karpathy (@karpathy) September 9, 2021
A friend yesterday mentioned that semiconductor tech is probably the deepest node in our civilization's explored tech tree. This actually sounds right, but is also a fun concept, any other candidates?
— Andrej Karpathy (@karpathy) August 22, 2021
I like blockchain tech quite a bit because it extends open source to open source+state, a genuine/exciting innovation in computing paradigms. I'm just sad and struggle to get over it coming packaged with so much braindead bs (get rich quick pumps/dumps/scams/spams/memes etc.). Ew
— Andrej Karpathy (@karpathy) June 5, 2021
I like to play co-op against nature.
— Andrej Karpathy (@karpathy) May 28, 2021
WSJ front page every day is like >>> "Stock Market %s!!" % ('rises' if random.random() <= 0.54 else 'falls', )
— Andrej Karpathy (@karpathy) May 4, 2021
current status: C6H12O6 + 6 O2 ----(C8H10N4O2 catalyst)---> 6 CO2 + 6 H2O + code + heat
— Andrej Karpathy (@karpathy) February 27, 2021
Everyone is so obsessed with accelerating neural nets, so as a fun side project I've been building this breadboard 8bit neural net decelerator. It will crawl at best :D. (following along the excellent++ Ben Eater 8-bit computer https://t.co/iDw8gqNnGT) pic.twitter.com/E8NTdfQp43
— Andrej Karpathy (@karpathy) February 7, 2021
Because deep learning is so empirical, success in it is to a large extent proportional to raw experimental throughput - the ability to babysit a large number of experiments at once, staring at plots and tweaking/re-launching what works. This is necessary, but not sufficient.
— Andrej Karpathy (@karpathy) January 16, 2021
“Would aliens also have X?” for almost any X tickles the brain a lot.
— Andrej Karpathy (@karpathy) December 24, 2020
The X that primed it for me just now (again) is stainless steel, but almost any generalization of it works.
If you vibrate the electromagnetic field just right, cars passively awash in the radiation for a while will suddenly drive better.
— Andrej Karpathy (@karpathy) December 14, 2020
Is there a word for that paranoid feeling you get when you think you may be reading/listening to something generated by a GPT? And why should it matter that it was, exactly 🤦‍♂️🤔
— Andrej Karpathy (@karpathy) November 26, 2020
The unambiguously correct place to examine your training data is immediately before it feeds into the network. Take the raw x,y batch tuple, ship it back to CPU, unrender, visualize. V often catches bugs with data augmentation, label preprocessing, samplers, collation, etcetc.
— Andrej Karpathy (@karpathy) November 17, 2020
How to become expert at thing:
— Andrej Karpathy (@karpathy) November 7, 2020
1 iteratively take on concrete projects and accomplish them depth wise, learning “on demand” (ie don’t learn bottom up breadth wise)
2 teach/summarize everything you learn in your own words
3 only compare yourself to younger you, never to others
Aging has 100% mortality rate and no one cares
— Andrej Karpathy (@karpathy) October 26, 2020
When you sort your dataset descending by loss you are guaranteed to find something unexpected, strange and helpful.
— Andrej Karpathy (@karpathy) October 2, 2020
“For all x, p(x)”
— Andrej Karpathy (@karpathy) August 24, 2020
“Actually, there exists an x, !p(x)”
my least favorite conversation
Notifications. Masquerading as tiny and helpful but in reality psychologically invasive and damaging to the brain - interrupting complex thought, forcing (expensive, taxing) context switch, spiking dopamine, making thought reactive instead of proactive.
— Andrej Karpathy (@karpathy) August 8, 2020
By posting GPT generated text we’re polluting the data for its future versions
— Andrej Karpathy (@karpathy) July 19, 2020
Trees are solidified air 🌬✨🌳
— Andrej Karpathy (@karpathy) July 13, 2020
A human body is so wonderfully nested. Its ~40T cells descend from individual eukaryotic cells before multi-cellularity. And each has ~1000 mitochondria, which were free-living bacteria before endosymbiosis. And all of it is home to 1-3X as many bacteria in the nooks and crannies
— Andrej Karpathy (@karpathy) July 3, 2020
It's been a while since I last played Factorio, but I can't stop thinking about it, and of the entire economy as being an MMORPG version of it. Factory Town, RimWorld, Banished, Dawn of Man etc are all great too but somehow pack less long-term punch.
— Andrej Karpathy (@karpathy) June 20, 2020
Which style of drawing residual networks is semantically superior? 1: residual connection on the side of the layer or 2: layer on the side of the residual connection? Imo there is a correct answer and I feel strongly about it. pic.twitter.com/hp10PoBDJm
— Andrej Karpathy (@karpathy) March 8, 2020
This is a troll, but I think it would be funny. pic.twitter.com/JfS2w5tudu
— Andrej Karpathy (@karpathy) February 23, 2020
A 4 year old child actually has a few hundred million years of experience, not 4. Their rapid learning/generalization is much less shocking/magical considering this fact.
— Andrej Karpathy (@karpathy) December 15, 2019
Biotech is so much more powerful than our Normaltech. Imagine if we could tap its full potential; maybe your car could just heal itself of any scratches. Or it could give birth to your new car. And then you could feed the old one to your house.
— Andrej Karpathy (@karpathy) August 5, 2019
Protip: move your phone in a wide circle while capturing live photos to “future proof” them, so that the motion parallax information is there for some crazy future neural net to accurately recover full scene geometry and animate it into something amusing.
— Andrej Karpathy (@karpathy) May 29, 2019
The two tech stacks ❤️ pic.twitter.com/eDJx6N6iDJ
— Andrej Karpathy (@karpathy) May 13, 2019
When a person says that f(x) = f(a) + f’(a)(x - a) and someone disagrees strongly because of f’’(a)(x - a)^2/2
— Andrej Karpathy (@karpathy) March 15, 2019
Nature stuff all around us (plants, animals, etc) are best thought of as basically super advanced alien technology. These are nanotechnology devices magically grown in ambient conditions with complex information processing. Synthetic bio is tinkering with / hijacking this tech.
— Andrej Karpathy (@karpathy) January 14, 2019
My parents were visiting me once and as I was leaving for work I saw my mom sitting on the couch in the living room just looking forward. I’m like “mom what are you doing?”, “sitting”, she shrugged. Like not reading, listening, planning, or even meditating. Mind blown.
— Andrej Karpathy (@karpathy) January 3, 2019
Similar to Chemistry making Alchemy rigorous, Divination could be resurrected as a discipline but take on a highly scientific approach, as a degree combining history and CS+stats in equal measure. As a bonus, if you do a PhD you become a certified Oracle :)
— Andrej Karpathy (@karpathy) December 23, 2018
Discovering (paradoxically late in life) that I get more out of textbooks than books and that I don’t have to stop buying them just because I’m out of school. Good reading list pointers: https://t.co/ZDaqr3YQbW
— Andrej Karpathy (@karpathy) August 19, 2018
It took me a while to really admit to myself that just reading a book is not learning but entertainment.
— Andrej Karpathy (@karpathy) July 15, 2018
most common neural net mistakes: 1) you didn't try to overfit a single batch first. 2) you forgot to toggle train/eval mode for the net. 3) you forgot to .zero_grad() (in pytorch) before .backward(). 4) you passed softmaxed outputs to a loss that expects raw logits. ; others? :)
— Andrej Karpathy (@karpathy) July 1, 2018
1 hour and 5 diagrams later I optimized 100 lines of code that ran in 13 seconds to 20 lines of heavily vectorized code that runs in 0.02 seconds, and this might just be the best day of my life, so far.
— Andrej Karpathy (@karpathy) April 18, 2018
It looks like if you bombard Earth with photons for a while, it can emit a Roadster. hah
— Andrej Karpathy (@karpathy) February 7, 2018
The heat expelled from the back of a computer makes me sad.
— Andrej Karpathy (@karpathy) October 15, 2017
Ideally never absorb information without predicting it first. Then you can update both 1) your knowledge but also 2) your generative model.
— Andrej Karpathy (@karpathy) September 4, 2017
Gradient descent can write code better than you. I'm sorry.
— Andrej Karpathy (@karpathy) August 4, 2017
I've been using PyTorch a few months now and I've never felt better. I have more energy. My skin is clearer. My eye sight has improved.
— Andrej Karpathy (@karpathy) May 26, 2017
Loss addiction: self-destructive behavior of obsessively watching & reading into tiny fluctuations in loss functions of running experiments
— Andrej Karpathy (@karpathy) February 4, 2017
3e-4 is the best learning rate for Adam, hands down.
— Andrej Karpathy (@karpathy) November 24, 2016
Coworker on RL research: "We were supposed to make AI do all the work and we play games but we do all the work and the AI is playing games!"
— Andrej Karpathy (@karpathy) October 7, 2016
Jeff Dean: "I like your ConvNets in Javascript". Me: "Thank you. I like your map reduce."
— Andrej Karpathy (@karpathy) December 11, 2014
The _real_ Top unsolved problems in Computer Science: Video Conferencing, Presentation, Printing.
— Andrej Karpathy (@karpathy) October 4, 2014