Anthropic Co-Founder Tom Brown: GPT-3 Development, Claude Code Secrets, and the Future of Large-Scale AI Infrastructure

Summary Tom Brown is a self-taught engineer who built GPT-3 at OpenAI and co-founded Anthropic. In this video, he vividly discusses the scaling laws he discovered while growing from an MIT graduate into an entrepreneur, engineer, and AI researcher -- along with their far-reaching implications, the background of Claude's development, the future of AI infrastructure, and advice for young engineers studying AI today. The conversation is filled with real-life stories, concrete examples, and fascinating behind-the-scenes details.

1. From Failure to Success: The Entrepreneurial Path After MIT

Tom Brown graduated from MIT at age 21, then gained his first startup experience at 'Linked Language,' a startup his friends had launched. He recalls, "In college, I was like a dog that only does what it's told, but in a startup, you have to be a wolf actually hunting for survival."

"If you leave a company to die by default, it actually dies. Everyone had to act on their own, and that was an incredibly valuable experience."

He then moved on to founding his own startup. "I could have joined a major tech company, but the independent, adventurous atmosphere unique to startups appealed to me more."

2. Early Startup Experience: Grouper and New Networking Experiments

Tom Brown later started Grouper, a unique dating service, with colleagues from YC (Y Combinator). The service matched groups of three people so friends could go out together.

"I was a really awkward person, and I needed a service where I could safely go out with friends and meet new people. That was exactly Grouper's purpose."

Before AI was involved, the team manually handled the matching. This experience led to a connection with OpenAI's Greg Brockman, who repeatedly participated in Grouper events and built a friendship that would later become the link to OpenAI.

The emergence of competitor Tinder solved the problem Grouper was addressing more smoothly, and Tom cheerfully admits, "When a good solution shows up, you should applaud without regrets."

3. Joining OpenAI and the Start of an AI Career

After Grouper, Tom was driven by the thought that "AI capable of changing our lives might one day emerge, and I wanted to contribute even a little." But anxiety was also significant.

"I got a B- in linear algebra in college -- was I really qualified to do AI research?"

He created a 6-month self-study plan: Coursera machine learning courses, Kaggle projects, statistics and linear algebra textbooks, and even buying his own GPU to access via SSH for hands-on practice. The specificity of his approach was impressive.

"I messaged Greg saying 'I'll sweep floors, do anything, I just want to help.' He replied positively, saying 'We're short on people who know both machine learning and distributed systems.'"

After joining OpenAI, he spent his first 9 months on engineering tasks like StarCraft environment development rather than machine learning, establishing his footing before formally joining the AI research team.

4. GPT-3 and the Discovery of Scaling Laws

At OpenAI, Tom Brown took on a key role in developing GPT-3. The discovery that "training at larger scale consistently produces smarter models" -- the scaling laws -- was pivotal to GPT-3's breakthrough.

"I was stunned when the scaling laws graph showed a straight line spanning 12 orders of magnitude. I had never seen a straight line hold at that scale before."

Combined with improvements in algorithmic efficiency, it became clear that "intelligence would grow exponentially over the coming years."

At the time, there was considerable criticism within the industry that it was "just throwing money at the problem," but ultimately the conviction that "it's a dumb approach, but it actually works" prevailed.

"Anthropic's slogan was actually 'Let's do the dumb thing that works.'"

5. Anthropic's Founding and Building a Mission-Driven Organization

The GPT-3 development team at OpenAI held scaling and safety as top priorities, and a subset of this core team came together in 2020 to found Anthropic.

"Looking at it objectively, we hardly thought we'd succeed. OpenAI had massive capital and star-studded members. We were just 7 people, gathered in the middle of a pandemic, vaguely focused on our mission."

Despite this, all initial members were "people truly committed to the mission," which is why even as the organization grew to 2,000 people (as of 2025), 'purpose' rather than 'politics' remained the foundation.

6. The Emergence of Claude and the Product Development Journey

Anthropic's first goal was building large-scale language model training infrastructure that could actually be used. In the early days, Claude 1 operated only internally as a Slack bot, and there was much deliberation about "Would releasing this to the outside world actually help?" -- which delayed the actual product launch considerably.

"We weren't even convinced we should release a product, and in the beginning we didn't even have proper server infrastructure."

But ChatGPT's massive success (late 2022) served as a catalyst. From 2023, external APIs and services were fully launched, and Claude 3.5 Sonnet created a storm that began shifting the market landscape.

7. Coding Competitiveness: Internal Needs Become the Best Product

With Claude 3.5 Sonnet, code generation and analysis capabilities became overwhelmingly strong, making it the go-to model for developers.

"Among YC (Y Combinator) startup founders in particular, the preferred model for code work has shifted rapidly to Claude since 2024. It might even be at 80-90% now."

Claude's strength in coding wasn't because "someone ordered it" but because there was strong internal desire within the organization for "a model that's really good at coding."

And Claude Code, their dedicated coding tool, actually started as an internal tool:

"It was a tool we Anthropic developers built because we needed it, and we honestly didn't expect this level of response from the outside."

The key lesson here was that "the people who truly understand the user make the best products."

"Our competitive advantage was that while external startups could have built a similar tool, the difference was that we built it as the people who knew Claude's users best."

8. Benchmarks, Product Quality, and Real Value

An interesting point is the disconnect between benchmark evaluations and actual user preferences.

"Benchmarks are like 'test scores' -- there are even dedicated teams that specialize in gaming them. We don't focus on raising test scores. Internally, we have our own evaluation criteria for what we actually want."

Claude's 'personality' -- the warmth and friendliness of its conversations -- is also emphasized as "hard to measure quantitatively, but the most important differentiator."

9. The Massive Expansion of AI Infrastructure and Multi-Chip Strategy

As of 2025, Anthropic is leading "the largest infrastructure investment in history."

"This buildout will be bigger than the Apollo Project, bigger than the Manhattan Project. AI compute investment is already tripling annually. At this pace, it will surpass everything by next year."

The key bottlenecks are 'power supply, GPU procurement, and infrastructure construction permits,' with particular mention of the importance of expanding data centers within the U.S. and the permitting process. "We need renewable energy, nuclear power -- all of it," he answers candidly.

Anthropic is the only major lab that uses three or more types of chips (GPUs/TPUs/Trainium) in a hybrid approach.

"We use diverse chips to maximize speed and flexibility. It's hard on the performance optimization team, but assigning the right workloads to the right chips improves efficiency."

He connects this to his experience at OpenAI, where he led the architecture transition from TPUs to GPUs, and explains how he's now maximizing that experience here.

10. Advice for the Next Generation: Risk and Intrinsic Motivation

Finally, he offers the following advice to young engineers studying AI or contemplating their careers:

"Taking more risks is the wise thing to do. And I want to tell you to do work you'll truly be proud of. Degrees from prestigious schools and jobs at big companies don't matter anymore."

"The most important thing is inner motivation and doing work that is genuinely meaningful."

Closing

Tom Brown's growth story and Anthropic's experiences vividly illustrate the intensity at the forefront of the AI industry and the challenges that remarkable technological progress poses to the real world. The message that "the greatest innovations begin with small teams deeply committed to a mission" and that "failure, anxiety, and hesitation are themselves part of growth" are particularly striking. For anyone who wants to work alongside AI in the future, this is undoubtedly a deeply inspiring record.

1. From Failure to Success: The Entrepreneurial Path After MIT

"If you leave a company to die by default, it actually dies. Everyone had to act on their own, and that was an incredibly valuable experience."

He then moved on to founding his own startup. "I could have joined a major tech company, but the independent, adventurous atmosphere unique to startups appealed to me more."

2. Early Startup Experience: Grouper and New Networking Experiments

Tom Brown later started Grouper, a unique dating service, with colleagues from YC (Y Combinator). The service matched groups of three people so friends could go out together.

"I was a really awkward person, and I needed a service where I could safely go out with friends and meet new people. That was exactly Grouper's purpose."

The emergence of competitor Tinder solved the problem Grouper was addressing more smoothly, and Tom cheerfully admits, "When a good solution shows up, you should applaud without regrets."

3. Joining OpenAI and the Start of an AI Career

After Grouper, Tom was driven by the thought that "AI capable of changing our lives might one day emerge, and I wanted to contribute even a little." But anxiety was also significant.

"I got a B- in linear algebra in college -- was I really qualified to do AI research?"

"I messaged Greg saying 'I'll sweep floors, do anything, I just want to help.' He replied positively, saying 'We're short on people who know both machine learning and distributed systems.'"

4. GPT-3 and the Discovery of Scaling Laws

"I was stunned when the scaling laws graph showed a straight line spanning 12 orders of magnitude. I had never seen a straight line hold at that scale before."

Combined with improvements in algorithmic efficiency, it became clear that "intelligence would grow exponentially over the coming years."

"Anthropic's slogan was actually 'Let's do the dumb thing that works.'"

5. Anthropic's Founding and Building a Mission-Driven Organization

The GPT-3 development team at OpenAI held scaling and safety as top priorities, and a subset of this core team came together in 2020 to found Anthropic.

"Looking at it objectively, we hardly thought we'd succeed. OpenAI had massive capital and star-studded members. We were just 7 people, gathered in the middle of a pandemic, vaguely focused on our mission."

6. The Emergence of Claude and the Product Development Journey

"We weren't even convinced we should release a product, and in the beginning we didn't even have proper server infrastructure."

7. Coding Competitiveness: Internal Needs Become the Best Product

With Claude 3.5 Sonnet, code generation and analysis capabilities became overwhelmingly strong, making it the go-to model for developers.

"Among YC (Y Combinator) startup founders in particular, the preferred model for code work has shifted rapidly to Claude since 2024. It might even be at 80-90% now."

Claude's strength in coding wasn't because "someone ordered it" but because there was strong internal desire within the organization for "a model that's really good at coding."

And Claude Code, their dedicated coding tool, actually started as an internal tool:

"It was a tool we Anthropic developers built because we needed it, and we honestly didn't expect this level of response from the outside."

The key lesson here was that "the people who truly understand the user make the best products."

"Our competitive advantage was that while external startups could have built a similar tool, the difference was that we built it as the people who knew Claude's users best."

8. Benchmarks, Product Quality, and Real Value

An interesting point is the disconnect between benchmark evaluations and actual user preferences.

"Benchmarks are like 'test scores' -- there are even dedicated teams that specialize in gaming them. We don't focus on raising test scores. Internally, we have our own evaluation criteria for what we actually want."

Claude's 'personality' -- the warmth and friendliness of its conversations -- is also emphasized as "hard to measure quantitatively, but the most important differentiator."

9. The Massive Expansion of AI Infrastructure and Multi-Chip Strategy

As of 2025, Anthropic is leading "the largest infrastructure investment in history."

"This buildout will be bigger than the Apollo Project, bigger than the Manhattan Project. AI compute investment is already tripling annually. At this pace, it will surpass everything by next year."

Anthropic is the only major lab that uses three or more types of chips (GPUs/TPUs/Trainium) in a hybrid approach.

"We use diverse chips to maximize speed and flexibility. It's hard on the performance optimization team, but assigning the right workloads to the right chips improves efficiency."

He connects this to his experience at OpenAI, where he led the architecture transition from TPUs to GPUs, and explains how he's now maximizing that experience here.

10. Advice for the Next Generation: Risk and Intrinsic Motivation

Finally, he offers the following advice to young engineers studying AI or contemplating their careers:

"Taking more risks is the wise thing to do. And I want to tell you to do work you'll truly be proud of. Degrees from prestigious schools and jobs at big companies don't matter anymore."

"The most important thing is inner motivation and doing work that is genuinely meaningful."

1. From Failure to Success: The Entrepreneurial Path After MIT

2. Early Startup Experience: Grouper and New Networking Experiments

3. Joining OpenAI and the Start of an AI Career

4. GPT-3 and the Discovery of Scaling Laws

5. Anthropic's Founding and Building a Mission-Driven Organization

6. The Emergence of Claude and the Product Development Journey

7. Coding Competitiveness: Internal Needs Become the Best Product

8. Benchmarks, Product Quality, and Real Value

9. The Massive Expansion of AI Infrastructure and Multi-Chip Strategy

10. Advice for the Next Generation: Risk and Intrinsic Motivation

Closing

Related writing

Building a Self-Improving Company

Mitchell Hashimoto's New Way of Writing Code: Software Engineering in the AI Era

Granola's Secret to Success: Standing Out in the Crowded AI Notes Market and Pulling Off a Bold Rebrand

Reading

1. From Failure to Success: The Entrepreneurial Path After MIT

2. Early Startup Experience: Grouper and New Networking Experiments

3. Joining OpenAI and the Start of an AI Career

4. GPT-3 and the Discovery of Scaling Laws

5. Anthropic's Founding and Building a Mission-Driven Organization

6. The Emergence of Claude and the Product Development Journey

7. Coding Competitiveness: Internal Needs Become the Best Product

8. Benchmarks, Product Quality, and Real Value

9. The Massive Expansion of AI Infrastructure and Multi-Chip Strategy

10. Advice for the Next Generation: Risk and Intrinsic Motivation

Closing

Related writing

Building a Self-Improving Company

Mitchell Hashimoto's New Way of Writing Code: Software Engineering in the AI Era

Granola's Secret to Success: Standing Out in the Crowded AI Notes Market and Pulling Off a Bold Rebrand