1. First Impressions of Veo 3 and the Importance of Audio
Andrej Karpathy says he is deeply impressed by Veo 3 and the various examples people are discovering on r/aivideo and elsewhere.
"Very impressed with Veo 3 and all the things people are finding on r/aivideo etc. Makes a big difference qualitatively when you add audio."
He emphasizes that adding audio significantly elevates the quality of video — meaning the combination of visuals and sound greatly enhances impact and immersion.
2. Four Macro Aspects of Video Generation
Karpathy identifies four major aspects of video generation that aren't fully appreciated.
2-1. Video Is the Highest Bandwidth Input to the Brain
"Video is the highest bandwidth input to brain. Not just entertainment, but also for tasks or learning. Think diagrams, charts, animations."
- Video can convey far more information, far faster, than text.
- It plays an increasingly important role in learning and work.
2-2. Video Is Easy and Fun
"Normal people don't really like reading or writing. It's really hard. Everyone can easily (and wants to) engage with video."
- Reading and writing require significant effort, while video is naturally accessible and enjoyable.
- That's why it has such mass appeal.
2-3. The Barrier to Creating Video Is Approaching Zero
"The barrier to creating video is approaching zero."
- Thanks to AI, the era where anyone can easily create video is arriving.
2-4. Video Can Now Be Directly Optimized for the First Time
"For the first time, video can be directly optimized. I want to stress the weight of this a bit more."
- Until now, video was created by humans and recommended by AI.
- Now, AI can directly create video and optimize it for desired objectives (e.g., viewer engagement, ad clicks, etc.).
3. Limitations of Existing Video Systems and AI Video Innovation
Karpathy uses platforms like TikTok as examples to explain how inefficient current video systems are.
"So far, video has been all about indexing, ranking, and serving a finite set of candidates that humans created (at cost). In TikTok's case, creators make videos to capture attention, and the core task is deciding which video to show to whom."
- The model where human creators make videos and algorithms recommend them has inherent optimization limits.
- Karpathy calls this system a "very, very bad optimizer."
"It looks like a pretty good system because people are already addicted to TikTok, but in my opinion, it's far from what's possible in principle."
4. New Possibilities with Veo 3 and AI Video Generation
Now Veo 3 and similar AI video generators offer an entirely new approach.
"The videos that Veo 3 and friends produce are the outputs of a neural network. This is a differentiable process. Now you can set arbitrary objectives and use gradient descent to achieve them."
- Neural networks directly create video and can optimize toward objectives (e.g., viewer responses, ad clicks, etc.).
- Gradient descent is the method by which AI improves itself toward goals.
"I expect this optimizer will be much, much more powerful than anything we've seen so far."
- When AI directly optimizes video, the resulting system could be incomparably more powerful than today's TikTok.
5. The Meaning of Infinite Generation and Direct Optimization
"You can now directly optimize generated video for, say, engagement or pupil dilation. Or ad click conversion. Why index a finite set of videos when you can generate infinitely and optimize directly?"
- AI can generate infinite videos and immediately improve them toward desired objectives.
- This is an evolution from a 'recommendation' model to a 'generation + optimization' model.
6. AI-Human Communication and a New Chapter for Creativity
"Video has the potential to be a remarkable surface for AI-to-human communication. Ditto for the future AI GUI. Think about how much easier it is to understand a really great diagram or animation than a wall of text."
- An era where AI communicates directly through video could arrive.
- Human creativity may also unfold in new ways through video.
7. But Is the Future of 'Optimized' Video Really a Good Thing?
"But this intrinsic, high-bandwidth medium can now be directly optimized. In my opinion, TikTok is nothing compared to what's coming. And I'm not sure the 'optimized' version is something we'll like."
- Karpathy expresses concern about whether AI-optimized video is truly good for humans.
- New problems like addiction and manipulation could emerge.
8. Real-World Examples of Veo 3
Finally, he mentions stunning examples that appeared just one day after Veo 3's release.
"Google released Veo 3 just one day ago. This new model creates video and audio simultaneously from a single prompt! Here's one of 13 amazing examples: 1. A self-aware AI character."
- Veo 3 can generate video and audio simultaneously from a single prompt.
- Entirely new kinds of videos, such as self-aware AI characters, are emerging.
Key Takeaways
- Veo 3
- AI Video Generation
- Importance of Audio
- Video Bandwidth
- Accessibility / Mass Appeal
- Barrier to Video Creation
- Direct Optimization
- Gradient Descent
- Infinite Generation
- AI-Human Communication
- Creativity
- The Future of Optimization
Closing
Karpathy's post highlights both the innovations and the opportunities and concerns that AI video generation will bring. It's worth considering how technologies like Veo 3 will transform our communication, learning, and creative processes — and whether those changes will ultimately be positive.