Why Agent-Era Skill Standardization Changes Everything

This talk frames Skills, which appeared in October 2025, as no longer a "personal prompt collection" but organization-wide AI infrastructure. The core idea: design skills not as human-readable essays but as contracts agents can invoke. With Anthropic, OpenAI, and Microsoft converging on a common file-based direction, teams that stack skills well pull further ahead over time.

1. Skills shipped in October—but the game already moved

The video opens with the sense that after Anthropic shipped skills, the LLM/agent world changed faster than our mental models updated. Most people picture a specific tool when they say "agent," but the deeper shift is that skills are becoming the substrate for accurate, durable, predictable business outcomes.

"What we're missing is that skills are becoming the foundation for the accurate, lasting, predictable results businesses need."

So the session reframes skills as agent-readable, invocable skills—almost a curriculum update—and announces a community skill repository where people upload and learn together, not only personal packs.

"This will be a place where we learn how to build skills properly—where the community throws skills in and we all get better at meaningful work together."

2. Four big trends reshaping the skill ecosystem

2.1 From personal dotfiles to org infrastructure

The biggest shift is status: what used to be personal convenience is now team/enterprise deploy-once, versioned infrastructure. Skills show in sidebars and get pulled into Excel, PowerPoint, Claude, Copilot, and more.

"Six months ago it was personal settings; now it's org infrastructure—deployed company-wide, versioned, called from Excel, PowerPoint, Claude, Copilot."

Keywords: shared infra layer and documents humans and agents both read—know-how moves from "who holds it in their head" to "how the org runs it as files."

2.2 Who calls skills shifts from people to agents

In October, humans might use a skill a few times in chat; now a single agent run can invoke a skill hundreds of times. "Good enough for a human" ≠ "stable for an agent."

"Most skill calls aren't human anymore—agents call them. Humans a few times per conversation; agents hundreds per run. So you have to think agent-first."

2.3 Skills aren't dev-only—and big tech knows it

Skills aren't toys for terminal coders; they're vessels for methodology across work and even life, and large vendors are aligned.

"Skills don't live only in the terminal—they span business and personal life."

Anthropic × Microsoft (skills toward Copilot), OpenAI releases—skills are settling as a de facto open standard and common infra vehicle for how AI operates.

"Skills are becoming an open standard—the shared vehicle for how AI will work going forward."

2.4 Alpha isn't hoarded—it's discovered together

Old alpha was secrecy; here best practice is something discovered, not pre-known—community experimentation speeds learning.

"We're trading skills like baseball cards—the whole internet is learning together."

LLMs aren't shipped with finished manuals; people co-discover the manual through trial and error—unlike the 1990s when manuals were fixed and distributed.

"In the '90s the manual was a known thing. With LLMs we discover the manual together."

3. What a skill is and how people actually use it

Skills are "primitive": often one folder, one text file—but that holds methodology and context, so it's powerful.

"A skill is one folder and one text file inside. The only required file is skill.markdown."

Structure splits into:

Top: metadata
Bottom: methodology & instructions

Plain-language instructions create predictable input→output behavior for the LLM.

3.1 Common pattern: specialist stack

In Claude production, the specialist stack is common: drop a bundle of skills in a project folder—

vague ask → PRD skill
PRD → GitHub issue splitter
test-writing skill, etc.

Agents call those files to advance work.

"The agent doesn't need bespoke expert instructions every time—the expertise lives in the files."

Skills reduce prompt pain (nuance tweaking, retyping)—less reliance on brittle prompt tricks.

3.2 Not only dev: 50k lines of real-estate ops skills

A Texas GP on X stacked 50k lines across 50 repos—rent rolls, comps, cash flow, team handoff protocols.

"Agents can call those skills to run operations predictably—and writing them down helps human onboarding enormously."

Methodology lives in the repo, not only in heads—skills double as documentation machinery.

3.3 More advanced: orchestrator skills and sub-agent routing

Orchestrator skills analyze incoming requests, split work across research/code/UI/docs, and fan out to sub-agents.

"One top-level request comes in—the orchestrator skill fans it out to sub-agents reliably."

Standardization means the same assets reuse across Excel, PowerPoint, Copilot, Claude—higher reuse value everywhere.

4. Why skills compound while prompts evaporate

Prompts are useful, but conversations end and prompts vanish (copy-paste again); skills stay as files—improve, version, share—compound growth.

"Prompts disappear when the chat ends—you paste again. Skills remain. Skills are what lasts."

Six months in, skill authors keep editing files when output disappoints; prompt-only users keep repeating the same paste.

"Skill builders kept refining and compounding; prompt-only users kept copy-pasting the same thing."

Prompts are basic LEGO bricks; castles need reusable special blocks—those blocks are skills. 🧱

5. Building skills that actually work—where people fail most

From here, dense practical guidance—especially the description field.

5.1 "Skills die in the description"

The speaker is blunt:

"Here's the most important thing: most skills die in the description."

Bad descriptions are abstract and vague—e.g. "helps with competitive analysis"—weak triggers, wrong or missing calls.

Good descriptions should include:

Output artifact type
Trigger phrases users actually say
What the output should look like

"A good description states what you build, what phrases should fire it, and how the output should look."

Anthropic's guidance: skills tend to be under-triggered rather than over-triggered—bias descriptions toward more proactive matching.

5.2 Critical gotcha: description must be one line

A technical trap many hit:

"The skill description must be a single line. If your formatter inserts a newline, Claude may not read the second line."

Good content can still break discovery—risky when teams share skills.

5.3 Methodology: not only steps—add reasoning

Five elements for the body—summary:

Don't write steps alone—add judgment principles ("reasoning")
Procedure-only skills break on edge cases.

"Procedure-only skills are fragile. You need reasoning to generalize."

Specify output format concretely
Vague "summarize this" invites regret later.

"Markdown vs Excel vs PDF vs sections—if you don't specify, you'll regret it."

Write edge cases explicitly
Don't assume the LLM shares human common sense.

"Humans use common sense. Claude doesn't. You must write edge cases."

Add good examples so the model pattern-matches
You can put multiple files in the skill folder.
Still keep skills short and lean
Long conflicting instructions bloat context and hurt performance—often cap core files around 100–150 lines.

"Short, reliable-firing skills beat long, conflicting ones."
"In most cases don't write core skill files beyond ~100–150 lines."

Time-budget advice:

"Spend 80% on description, 20% on reasoning/clear instructions."

6. Agent-first design: skills are now routing and contracts

This matches the title—when humans fail they fix fast; agents can auto-push failures at scale, so design philosophy must change.

6.1 Failure modes changed → quantitative tests

Humans notice drift and correct; agents may fail without recovery loops at rising cost—so skills need test suites, versioning, and quantified performance.

"When an agent is wrong there may be no recovery loop—so you must test skill performance quantitatively."

Skill wording nudges the model's latent space—tiny phrasing changes swing outcomes—so A/B phrasing, bump versions, re-measure.

"Small wording changes can swing results a lot—run basket tests, bump versions, measure again."

6.2 Description isn't a label—it's a routing signal

As agents navigate workflows, description is the signpost for when this skill applies.

"Description isn't a label—it's a routing signal. It must tell the agent where to go in the workflow."

6.3 Agents need contracts to move

Declare outputs like API contracts—SLA, fields, constraints. Agents choose correctly when what you get / don't get is explicit.

"Agents need contracts—declaratively write what this skill yields and what it cannot."

6.4 Composability: must hand off to the next step

Don't treat a skill as a one-shot hammer—produce handoff-friendly artifacts for the next agent or process.

"Don't try to finish in one output—build outputs that hand off cleanly. Otherwise handoffs break."

6.5 Hard-wire behavior in scripts, not skills

The last principle is a practical boundary: skills are natural language and can drift, so anything that must be deterministic should live in scripts, the talk argues.

"If you really want it hard-wired, use a script, not a skill—not because AI is bad, but because you understand AI's limits."

7. Three-tier skill operations in teams

In mixed human+agent teams, skills are immediately executable context—especially for processes that hand work along. Strong teams split skills three ways:

7.1 Tier 1: Standard skills

Company-wide shared patterns—brand voice, format rules, approved templates.

"Make a brand-voice skill and deploy it org-wide—everyone uses it. If you haven't, try it."

7.2 Tier 2: Methodology skills

How your org does high-value work—often tacit senior know-how that took juniors months to learn.

"If you can ship in a skill what a junior learns in months—that's Tier 2."

The real edge often sits with senior practitioners, not enterprise admins—extract and share it.

7.3 Tier 3: Personal workflow skills

"Under my desk" productivity skills—with a warning: don't trap know-how only on a laptop; vacation, sick leave, or departure can strand the team.

"Don't keep it only on your laptop—someone may need that skill when you're gone."

Skills encode expertise—consciously choose scope (personal / team / enterprise).

8. Why another skill repository: the domain-skills gap

Late segment: a community repo announcement. Plenty of marketplaces and GitHub lists exist—why another? Starter packs for technologists abound; domain-specific skills that solve real problems are scarce.

"What we lack are domain skills that solve real problems—rent-roll analysis doesn't just float on random GitHub."

The community should trade "baseball cards" from real jobs; the repo lives as a section of the Open Brain GitHub repo for easy integration.

Simon Willison (Oct 2025) noted skills might outgrow MCP—we're not fluent enough yet; skill-craft must rise.

"Skills could be bigger than MCP—the problem is we're not fluent enough at building them yet."

Operating model: organize by workflow type, apply a consistent agent readability bar—a practitioner library of concrete skills.

"We'll cover competitive analysis, financial model review, deal memos, research/meeting summaries—very concrete skills."

9. Practical tips to start this week

Closing with a staged path:

Beginners: pick one repetitive task—something weekly—and feed the AI past chats, good/bad outputs, draft skill.markdown.

"Pick something you repeat weekly and ask: 'Could this become a skill?'"

Already building: deepen handoffs, output contracts, workflow wiring—especially declarative outputs and next-step readiness.
Team/enterprise: plan three-tier rollout—brand standards vs team methodology vs personal workflows need different governance.

Return to compounding: skills record successful workflow runs—leverage for future smarter agents and future teams.

"Skills compound. Future smarter agents—and future teams—can execute on them."
"Prompts are copy-paste hell—but skills last."

Closing

The talk's thesis: skills aren't prompt wrapping—they're execution infrastructure agents call—so design around triggers (description), contracts (outputs), composition (handoffs), tests (quantitative verification). As standardization lands, teams that version and stack skills well widen their lead over time.

1. Skills shipped in October—but the game already moved

"What we're missing is that skills are becoming the foundation for the accurate, lasting, predictable results businesses need."

"This will be a place where we learn how to build skills properly—where the community throws skills in and we all get better at meaningful work together."

2. Four big trends reshaping the skill ecosystem

2.1 From personal dotfiles to org infrastructure

"Six months ago it was personal settings; now it's org infrastructure—deployed company-wide, versioned, called from Excel, PowerPoint, Claude, Copilot."

Keywords: shared infra layer and documents humans and agents both read—know-how moves from "who holds it in their head" to "how the org runs it as files."

2.2 Who calls skills shifts from people to agents

In October, humans might use a skill a few times in chat; now a single agent run can invoke a skill hundreds of times. "Good enough for a human" ≠ "stable for an agent."

"Most skill calls aren't human anymore—agents call them. Humans a few times per conversation; agents hundreds per run. So you have to think agent-first."

2.3 Skills aren't dev-only—and big tech knows it

Skills aren't toys for terminal coders; they're vessels for methodology across work and even life, and large vendors are aligned.

"Skills don't live only in the terminal—they span business and personal life."

Anthropic × Microsoft (skills toward Copilot), OpenAI releases—skills are settling as a de facto open standard and common infra vehicle for how AI operates.

"Skills are becoming an open standard—the shared vehicle for how AI will work going forward."

2.4 Alpha isn't hoarded—it's discovered together

Old alpha was secrecy; here best practice is something discovered, not pre-known—community experimentation speeds learning.

"We're trading skills like baseball cards—the whole internet is learning together."

LLMs aren't shipped with finished manuals; people co-discover the manual through trial and error—unlike the 1990s when manuals were fixed and distributed.

"In the '90s the manual was a known thing. With LLMs we discover the manual together."

3. What a skill is and how people actually use it

Skills are "primitive": often one folder, one text file—but that holds methodology and context, so it's powerful.

"A skill is one folder and one text file inside. The only required file is skill.markdown."

Structure splits into:

Top: metadata
Bottom: methodology & instructions

Plain-language instructions create predictable input→output behavior for the LLM.

3.1 Common pattern: specialist stack

In Claude production, the specialist stack is common: drop a bundle of skills in a project folder—

vague ask → PRD skill
PRD → GitHub issue splitter
test-writing skill, etc.

Agents call those files to advance work.

"The agent doesn't need bespoke expert instructions every time—the expertise lives in the files."

Skills reduce prompt pain (nuance tweaking, retyping)—less reliance on brittle prompt tricks.

3.2 Not only dev: 50k lines of real-estate ops skills

A Texas GP on X stacked 50k lines across 50 repos—rent rolls, comps, cash flow, team handoff protocols.

"Agents can call those skills to run operations predictably—and writing them down helps human onboarding enormously."

Methodology lives in the repo, not only in heads—skills double as documentation machinery.

3.3 More advanced: orchestrator skills and sub-agent routing

Orchestrator skills analyze incoming requests, split work across research/code/UI/docs, and fan out to sub-agents.

"One top-level request comes in—the orchestrator skill fans it out to sub-agents reliably."

Standardization means the same assets reuse across Excel, PowerPoint, Copilot, Claude—higher reuse value everywhere.

4. Why skills compound while prompts evaporate

Prompts are useful, but conversations end and prompts vanish (copy-paste again); skills stay as files—improve, version, share—compound growth.

"Prompts disappear when the chat ends—you paste again. Skills remain. Skills are what lasts."

Six months in, skill authors keep editing files when output disappoints; prompt-only users keep repeating the same paste.

"Skill builders kept refining and compounding; prompt-only users kept copy-pasting the same thing."

Prompts are basic LEGO bricks; castles need reusable special blocks—those blocks are skills. 🧱

5. Building skills that actually work—where people fail most

From here, dense practical guidance—especially the description field.

5.1 "Skills die in the description"

The speaker is blunt:

"Here's the most important thing: most skills die in the description."

Bad descriptions are abstract and vague—e.g. "helps with competitive analysis"—weak triggers, wrong or missing calls.

Good descriptions should include:

Output artifact type
Trigger phrases users actually say
What the output should look like

"A good description states what you build, what phrases should fire it, and how the output should look."

Anthropic's guidance: skills tend to be under-triggered rather than over-triggered—bias descriptions toward more proactive matching.

5.2 Critical gotcha: description must be one line

A technical trap many hit:

"The skill description must be a single line. If your formatter inserts a newline, Claude may not read the second line."

Good content can still break discovery—risky when teams share skills.

5.3 Methodology: not only steps—add reasoning

Five elements for the body—summary:

Don't write steps alone—add judgment principles ("reasoning")
Procedure-only skills break on edge cases.

"Procedure-only skills are fragile. You need reasoning to generalize."

Specify output format concretely
Vague "summarize this" invites regret later.

"Markdown vs Excel vs PDF vs sections—if you don't specify, you'll regret it."

Write edge cases explicitly
Don't assume the LLM shares human common sense.

"Humans use common sense. Claude doesn't. You must write edge cases."

Add good examples so the model pattern-matches
You can put multiple files in the skill folder.
Still keep skills short and lean
Long conflicting instructions bloat context and hurt performance—often cap core files around 100–150 lines.

"Short, reliable-firing skills beat long, conflicting ones."
"In most cases don't write core skill files beyond ~100–150 lines."

Time-budget advice:

"Spend 80% on description, 20% on reasoning/clear instructions."

6. Agent-first design: skills are now routing and contracts

This matches the title—when humans fail they fix fast; agents can auto-push failures at scale, so design philosophy must change.

6.1 Failure modes changed → quantitative tests

Humans notice drift and correct; agents may fail without recovery loops at rising cost—so skills need test suites, versioning, and quantified performance.

"When an agent is wrong there may be no recovery loop—so you must test skill performance quantitatively."

Skill wording nudges the model's latent space—tiny phrasing changes swing outcomes—so A/B phrasing, bump versions, re-measure.

"Small wording changes can swing results a lot—run basket tests, bump versions, measure again."

6.2 Description isn't a label—it's a routing signal

As agents navigate workflows, description is the signpost for when this skill applies.

"Description isn't a label—it's a routing signal. It must tell the agent where to go in the workflow."

6.3 Agents need contracts to move

Declare outputs like API contracts—SLA, fields, constraints. Agents choose correctly when what you get / don't get is explicit.

"Agents need contracts—declaratively write what this skill yields and what it cannot."

6.4 Composability: must hand off to the next step

Don't treat a skill as a one-shot hammer—produce handoff-friendly artifacts for the next agent or process.

"Don't try to finish in one output—build outputs that hand off cleanly. Otherwise handoffs break."

6.5 Hard-wire behavior in scripts, not skills

The last principle is a practical boundary: skills are natural language and can drift, so anything that must be deterministic should live in scripts, the talk argues.

"If you really want it hard-wired, use a script, not a skill—not because AI is bad, but because you understand AI's limits."

7. Three-tier skill operations in teams

In mixed human+agent teams, skills are immediately executable context—especially for processes that hand work along. Strong teams split skills three ways:

7.1 Tier 1: Standard skills

Company-wide shared patterns—brand voice, format rules, approved templates.

"Make a brand-voice skill and deploy it org-wide—everyone uses it. If you haven't, try it."

7.2 Tier 2: Methodology skills

How your org does high-value work—often tacit senior know-how that took juniors months to learn.

"If you can ship in a skill what a junior learns in months—that's Tier 2."

The real edge often sits with senior practitioners, not enterprise admins—extract and share it.

7.3 Tier 3: Personal workflow skills

"Under my desk" productivity skills—with a warning: don't trap know-how only on a laptop; vacation, sick leave, or departure can strand the team.

"Don't keep it only on your laptop—someone may need that skill when you're gone."

Skills encode expertise—consciously choose scope (personal / team / enterprise).

8. Why another skill repository: the domain-skills gap

"What we lack are domain skills that solve real problems—rent-roll analysis doesn't just float on random GitHub."

The community should trade "baseball cards" from real jobs; the repo lives as a section of the Open Brain GitHub repo for easy integration.

Simon Willison (Oct 2025) noted skills might outgrow MCP—we're not fluent enough yet; skill-craft must rise.

"Skills could be bigger than MCP—the problem is we're not fluent enough at building them yet."

Operating model: organize by workflow type, apply a consistent agent readability bar—a practitioner library of concrete skills.

"We'll cover competitive analysis, financial model review, deal memos, research/meeting summaries—very concrete skills."

9. Practical tips to start this week

Closing with a staged path:

Beginners: pick one repetitive task—something weekly—and feed the AI past chats, good/bad outputs, draft skill.markdown.

"Pick something you repeat weekly and ask: 'Could this become a skill?'"

Already building: deepen handoffs, output contracts, workflow wiring—especially declarative outputs and next-step readiness.
Team/enterprise: plan three-tier rollout—brand standards vs team methodology vs personal workflows need different governance.

Return to compounding: skills record successful workflow runs—leverage for future smarter agents and future teams.

"Skills compound. Future smarter agents—and future teams—can execute on them."
"Prompts are copy-paste hell—but skills last."

1. Skills shipped in October—but the game already moved

2. Four big trends reshaping the skill ecosystem

2.1 From personal dotfiles to org infrastructure

2.2 Who calls skills shifts from people to agents

2.3 Skills aren't dev-only—and big tech knows it

2.4 Alpha isn't hoarded—it's discovered together

3. What a skill is and how people actually use it

3.1 Common pattern: specialist stack

3.2 Not only dev: 50k lines of real-estate ops skills

3.3 More advanced: orchestrator skills and sub-agent routing

4. Why skills compound while prompts evaporate

5. Building skills that actually work—where people fail most

5.1 "Skills die in the description"

5.2 Critical gotcha: description must be one line

5.3 Methodology: not only steps—add reasoning

6. Agent-first design: skills are now routing and contracts

6.1 Failure modes changed → quantitative tests

6.2 Description isn't a label—it's a routing signal

6.3 Agents need contracts to move

6.4 Composability: must hand off to the next step

6.5 Hard-wire behavior in scripts, not skills

7. Three-tier skill operations in teams

7.1 Tier 1: Standard skills

7.2 Tier 2: Methodology skills

7.3 Tier 3: Personal workflow skills

8. Why another skill repository: the domain-skills gap

9. Practical tips to start this week

Closing

Related writing

Inside YC's AI Playbook

AX Roadmap That Leads to Results: From Individual Efficiency to Org Productivity

Harness Engineering: Leveraging Codex in an Agent-First World

Reading

1. Skills shipped in October—but the game already moved

2. Four big trends reshaping the skill ecosystem

2.1 From personal dotfiles to org infrastructure

2.2 Who calls skills shifts from people to agents

2.3 Skills aren't dev-only—and big tech knows it

2.4 Alpha isn't hoarded—it's discovered together

3. What a skill is and how people actually use it

3.1 Common pattern: specialist stack

3.2 Not only dev: 50k lines of real-estate ops skills

3.3 More advanced: orchestrator skills and sub-agent routing

4. Why skills compound while prompts evaporate

5. Building skills that actually work—where people fail most

5.1 "Skills die in the description"

5.2 Critical gotcha: description must be one line

5.3 Methodology: not only steps—add reasoning

6. Agent-first design: skills are now routing and contracts

6.1 Failure modes changed → quantitative tests

6.2 Description isn't a label—it's a routing signal

6.3 Agents need contracts to move

6.4 Composability: must hand off to the next step

6.5 Hard-wire behavior in scripts, not skills

7. Three-tier skill operations in teams

7.1 Tier 1: Standard skills

7.2 Tier 2: Methodology skills

7.3 Tier 3: Personal workflow skills

8. Why another skill repository: the domain-skills gap

9. Practical tips to start this week

Closing

Related writing

Inside YC's AI Playbook

AX Roadmap That Leads to Results: From Individual Efficiency to Org Productivity

Harness Engineering: Leveraging Codex in an Agent-First World