State-of-the-Art Prompt Engineering for AI Agents

1. The Current State of Prompt Engineering and the Rise of Metaprompting 🚀

Metaprompting is emerging as a powerful tool that every AI startup is paying attention to right now.
The video opens with this observation:

"Metaprompting is becoming a really, really powerful tool that everyone is using. It feels like learning to code in 1995. The tools aren't fully there yet, but you have the sense that you're on a new frontier."
The video emphasizes that prompt engineering isn't simply about issuing commands — it's a process of thinking, much like managing people: "How do I give the AI the right information so it can make good decisions?"

2. How Top AI Startups Actually Write Prompts 🏆

2-1. Real-World Case: Parahelp's Customer Support AI Prompt

Parahelp is a company that handles customer support AI for well-known AI companies including Perplexity, Replit, and Bolt.
They shared their full production prompt — which is highly unusual.

"Prompts for vertical AI agents like this are core company assets and rarely made public. We're really grateful that the Parahelp team opened it up like open source."
Key characteristics of their prompt structure:
- Very long and detailed (six pages)
- Role definition:
  
  "You are a manager of customer service agents."
- Clear task specification:
  
  "Your role is to approve or reject tool calls."
- Step-by-step plan broken into stages:
  - Step 1, Step 2, Step 3... laid out concretely
- Important cautions:
  
  "Be careful not to call other types of tools arbitrarily."
- Explicit output format:
  
  "You must output the approval/rejection result in this format."
- Markdown-style organization:
  - Headings, sub-items, examples — all systematically structured
- Examples provided:
  
  "Using XML tag formats like this helps the LLM follow instructions much more reliably."
How the prompt evolves:
- Because each customer has different requirements, part of the prompt is shared (system prompt) while part is customer-specific (developer prompt).
  
  "Every customer has different workflows and preferences, so how to fork and merge prompts, and which parts are common vs. customer-specific, is a topic that's only just starting to be explored."

3. Prompt Architecture: System / Developer / User Prompts 🏗️

System prompt: Defines the company's core API and operating principles (customer-agnostic)
Developer prompt: Adds customer-specific context and examples
User prompt: The request entered directly by the end user (e.g., "Build me a site with this button")
This structure enables:
- Flexibility and scalability without having to write a new prompt from scratch for every engagement — more like a consulting firm that reuses a proven framework

4. Metaprompting and Prompt Folding: Prompts That Evolve Themselves! 🧠

Prompt Folding:
- A technique where a prompt generates an improved version of itself
- Example: a classifier prompt uses past queries to produce a more specialized prompt
  
  "You take your existing prompt, add examples of failures or cases where it didn't behave as intended, and ask the LLM to 'make this prompt better' — and it produces an improved one on its own."
The power of metaprompting:

"Metaprompting is becoming a really, really powerful tool that everyone is using."
For complex tasks, examples are the best approach:
- For hard problems like automated code bug detection, adding expert-level examples to the prompt dramatically improves LLM performance.
  
  "When it's hard to specify the exact rules, just giving examples is far more effective. It's a lot like test-driven development."

5. LLM Limitations and Providing an "Escape Hatch" 🛑

LLMs tend to produce "plausible-sounding answers" even when they lack sufficient information — this is the hallucination problem.
The fix:

"If you don't have enough information, don't just generate an answer — stop and ask me."
Internally at YC, they add a "complaints" field to the response format, letting the LLM directly report "I don't have enough information here."

"When you run it on real data, the LLM tells you 'I'm confused about this part,' and the developer can see that and improve the prompt. It creates a real to-do list."

6. Practical Metaprompting Techniques and Loop Structures 🔄

An easy way for beginners to start:

"Give the prompt a role and say, 'You are an expert prompt engineer. Critique and improve my prompt.' It will keep producing better prompts for you."
A common pattern is to run the metaprompt through a large model (e.g., Claude, GPT-4, Gemini Pro) to generate an optimized prompt, then apply that prompt to a smaller, lighter model.
As prompts grow longer:
- Keep notes on improvements and pain points in a Google Doc, then tell Gemini Pro (or similar) to "revise the prompt to reflect these notes" — it handles them well.
Leveraging Gemini Pro 2.5's Thinking Trace feature:

"You can watch the LLM's reasoning process in real time, see where it goes wrong, and learn how to steer it."

7. The Importance of Evals: The Real Crown Jewels Aren't the Prompts — They're the Evaluation Data 💎

Why Parahelp was able to share their prompt publicly:

"Because the prompt itself isn't the core asset — the evaluation data (evals) is. Without the eval data, you can't know why the prompt was written that way, and you can't improve it."
True competitive advantage comes from deeply understanding users' real work context and encoding that understanding into evaluation data.

"The real value is sitting next to a tractor sales rep, observing what they actually care about and what decisions they make in different situations, and turning that into eval data."
The core competency required of founders:

"The founder's core skill is understanding their users in a specific domain better than anyone else, and obsessing over the details of their workflow."

8. The Rise of the "Forward Deployed Engineer" Model 🧑‍💻

A concept pioneered by Palantir:
- Engineers go sit directly alongside the customer (e.g., an FBI agent), observe real work in action, and build software to solve it on the spot.
  
  "Other companies sent salespeople; Palantir sent engineers. And they'd say, 'Okay, we built it' — right there in front of you."
Advantages of this model:
- Fast feedback, real problem-solving, building genuine customer trust
- "When you show them something truly powerful, customers say, 'Wow, I've never seen anything like this. I want to buy it right now.'"
An essential competency for AI-era founders:

"Founders need to be their own company's forward deployed engineers. You can't outsource this. You have to meet customers yourself and keep improving the product."

9. Success Stories and Differentiation Strategies for AI Agent Startups 🏅

Giger ML, Happy Robot, and others have had engineers embedded directly at customer sites, rapidly iterating customer-tailored AI agents to close major contracts.

"Two outstanding software engineers went directly on-site, worked side by side with the customer support team, and kept tuning the software. The result was a massive enterprise deal."
Technical differentiation:
- For example, in voice AI agents, innovating a RAG pipeline that nails both accuracy and latency (low response time) creates a clear competitive edge.
  
  "We're now in an era where technical superiority at the demo stage is enough to beat established players."

10. The "Personality" of LLMs and the Nuances of Prompt Design 🤖

Model-specific characteristics:
- Claude: More human-like and flexible to calibrate
- Llama 4: More developer-oriented, requires finer prompt tuning
Using rubrics:

"If you want the LLM to produce a score, you must give it a rubric. But different models interpret rubrics differently."
- GPT-3.5/4: Follows the rubric very strictly
- Gemini 2.5 Pro: More flexible, considers exceptions
  
  "o3 checks the rubric like a soldier following a checklist; Gemini Pro 2.5 is more proactive and thinks, 'this might be an exception.'"

11. The Future of Prompt Engineering: Where Coding Meets People Management 🌏

Prompt engineering sits between coding and managing people:

"It feels like learning to code in 1995, but also like learning how to manage people."
Continuous improvement (Kaizen) and metaprompting:

"The people who really improve a process are the ones actually doing the work. That's exactly the principle behind metaprompting."
A memorable closing line:

"I'm really excited to see what prompts you all come up with!"

Key Keyword Summary

Metaprompting
Prompt Folding
System / Developer / User prompt architecture
Example-based prompting
LLM Escape Hatch
Importance of Evals (evaluation data)
Forward Deployed Engineer
Model personality differences
Rubric usage
Continuous improvement (Kaizen) and real-world feedback

This video offers a vivid, concrete look at the practical know-how behind state-of-the-art AI agent prompt engineering and the future of automation and continuous improvement through metaprompting. "Prompting is both coding and communicating with people." That single line captures it perfectly. Try it yourself, and build your own prompts! 🚀

1. The Current State of Prompt Engineering and the Rise of Metaprompting 🚀

Metaprompting is emerging as a powerful tool that every AI startup is paying attention to right now.
The video opens with this observation:

"Metaprompting is becoming a really, really powerful tool that everyone is using. It feels like learning to code in 1995. The tools aren't fully there yet, but you have the sense that you're on a new frontier."
The video emphasizes that prompt engineering isn't simply about issuing commands — it's a process of thinking, much like managing people: "How do I give the AI the right information so it can make good decisions?"

2. How Top AI Startups Actually Write Prompts 🏆

2-1. Real-World Case: Parahelp's Customer Support AI Prompt

Parahelp is a company that handles customer support AI for well-known AI companies including Perplexity, Replit, and Bolt.
They shared their full production prompt — which is highly unusual.

"Prompts for vertical AI agents like this are core company assets and rarely made public. We're really grateful that the Parahelp team opened it up like open source."
Key characteristics of their prompt structure:
- Very long and detailed (six pages)
- Role definition:
  
  "You are a manager of customer service agents."
- Clear task specification:
  
  "Your role is to approve or reject tool calls."
- Step-by-step plan broken into stages:
  - Step 1, Step 2, Step 3... laid out concretely
- Important cautions:
  
  "Be careful not to call other types of tools arbitrarily."
- Explicit output format:
  
  "You must output the approval/rejection result in this format."
- Markdown-style organization:
  - Headings, sub-items, examples — all systematically structured
- Examples provided:
  
  "Using XML tag formats like this helps the LLM follow instructions much more reliably."
How the prompt evolves:
- Because each customer has different requirements, part of the prompt is shared (system prompt) while part is customer-specific (developer prompt).
  
  "Every customer has different workflows and preferences, so how to fork and merge prompts, and which parts are common vs. customer-specific, is a topic that's only just starting to be explored."

3. Prompt Architecture: System / Developer / User Prompts 🏗️

System prompt: Defines the company's core API and operating principles (customer-agnostic)
Developer prompt: Adds customer-specific context and examples
User prompt: The request entered directly by the end user (e.g., "Build me a site with this button")
This structure enables:
- Flexibility and scalability without having to write a new prompt from scratch for every engagement — more like a consulting firm that reuses a proven framework

4. Metaprompting and Prompt Folding: Prompts That Evolve Themselves! 🧠

Prompt Folding:
- A technique where a prompt generates an improved version of itself
- Example: a classifier prompt uses past queries to produce a more specialized prompt
  
  "You take your existing prompt, add examples of failures or cases where it didn't behave as intended, and ask the LLM to 'make this prompt better' — and it produces an improved one on its own."
The power of metaprompting:

"Metaprompting is becoming a really, really powerful tool that everyone is using."
For complex tasks, examples are the best approach:
- For hard problems like automated code bug detection, adding expert-level examples to the prompt dramatically improves LLM performance.
  
  "When it's hard to specify the exact rules, just giving examples is far more effective. It's a lot like test-driven development."

5. LLM Limitations and Providing an "Escape Hatch" 🛑

LLMs tend to produce "plausible-sounding answers" even when they lack sufficient information — this is the hallucination problem.
The fix:

"If you don't have enough information, don't just generate an answer — stop and ask me."
Internally at YC, they add a "complaints" field to the response format, letting the LLM directly report "I don't have enough information here."

"When you run it on real data, the LLM tells you 'I'm confused about this part,' and the developer can see that and improve the prompt. It creates a real to-do list."

6. Practical Metaprompting Techniques and Loop Structures 🔄

An easy way for beginners to start:

"Give the prompt a role and say, 'You are an expert prompt engineer. Critique and improve my prompt.' It will keep producing better prompts for you."
A common pattern is to run the metaprompt through a large model (e.g., Claude, GPT-4, Gemini Pro) to generate an optimized prompt, then apply that prompt to a smaller, lighter model.
As prompts grow longer:
- Keep notes on improvements and pain points in a Google Doc, then tell Gemini Pro (or similar) to "revise the prompt to reflect these notes" — it handles them well.
Leveraging Gemini Pro 2.5's Thinking Trace feature:

"You can watch the LLM's reasoning process in real time, see where it goes wrong, and learn how to steer it."

7. The Importance of Evals: The Real Crown Jewels Aren't the Prompts — They're the Evaluation Data 💎

Why Parahelp was able to share their prompt publicly:

"Because the prompt itself isn't the core asset — the evaluation data (evals) is. Without the eval data, you can't know why the prompt was written that way, and you can't improve it."
True competitive advantage comes from deeply understanding users' real work context and encoding that understanding into evaluation data.

"The real value is sitting next to a tractor sales rep, observing what they actually care about and what decisions they make in different situations, and turning that into eval data."
The core competency required of founders:

"The founder's core skill is understanding their users in a specific domain better than anyone else, and obsessing over the details of their workflow."

8. The Rise of the "Forward Deployed Engineer" Model 🧑‍💻

A concept pioneered by Palantir:
- Engineers go sit directly alongside the customer (e.g., an FBI agent), observe real work in action, and build software to solve it on the spot.
  
  "Other companies sent salespeople; Palantir sent engineers. And they'd say, 'Okay, we built it' — right there in front of you."
Advantages of this model:
- Fast feedback, real problem-solving, building genuine customer trust
- "When you show them something truly powerful, customers say, 'Wow, I've never seen anything like this. I want to buy it right now.'"
An essential competency for AI-era founders:

"Founders need to be their own company's forward deployed engineers. You can't outsource this. You have to meet customers yourself and keep improving the product."

9. Success Stories and Differentiation Strategies for AI Agent Startups 🏅

Giger ML, Happy Robot, and others have had engineers embedded directly at customer sites, rapidly iterating customer-tailored AI agents to close major contracts.

"Two outstanding software engineers went directly on-site, worked side by side with the customer support team, and kept tuning the software. The result was a massive enterprise deal."
Technical differentiation:
- For example, in voice AI agents, innovating a RAG pipeline that nails both accuracy and latency (low response time) creates a clear competitive edge.
  
  "We're now in an era where technical superiority at the demo stage is enough to beat established players."

10. The "Personality" of LLMs and the Nuances of Prompt Design 🤖

Model-specific characteristics:
- Claude: More human-like and flexible to calibrate
- Llama 4: More developer-oriented, requires finer prompt tuning
Using rubrics:

"If you want the LLM to produce a score, you must give it a rubric. But different models interpret rubrics differently."
- GPT-3.5/4: Follows the rubric very strictly
- Gemini 2.5 Pro: More flexible, considers exceptions
  
  "o3 checks the rubric like a soldier following a checklist; Gemini Pro 2.5 is more proactive and thinks, 'this might be an exception.'"

11. The Future of Prompt Engineering: Where Coding Meets People Management 🌏

Prompt engineering sits between coding and managing people:

"It feels like learning to code in 1995, but also like learning how to manage people."
Continuous improvement (Kaizen) and metaprompting:

"The people who really improve a process are the ones actually doing the work. That's exactly the principle behind metaprompting."
A memorable closing line:

"I'm really excited to see what prompts you all come up with!"

Key Keyword Summary

Metaprompting
Prompt Folding
System / Developer / User prompt architecture
Example-based prompting
LLM Escape Hatch
Importance of Evals (evaluation data)
Forward Deployed Engineer
Model personality differences
Rubric usage
Continuous improvement (Kaizen) and real-world feedback

1. The Current State of Prompt Engineering and the Rise of Metaprompting 🚀

2. How Top AI Startups Actually Write Prompts 🏆

2-1. Real-World Case: Parahelp's Customer Support AI Prompt

3. Prompt Architecture: System / Developer / User Prompts 🏗️

4. Metaprompting and Prompt Folding: Prompts That Evolve Themselves! 🧠

5. LLM Limitations and Providing an "Escape Hatch" 🛑

6. Practical Metaprompting Techniques and Loop Structures 🔄

7. The Importance of Evals: The Real Crown Jewels Aren't the Prompts — They're the Evaluation Data 💎

8. The Rise of the "Forward Deployed Engineer" Model 🧑‍💻

9. Success Stories and Differentiation Strategies for AI Agent Startups 🏅

10. The "Personality" of LLMs and the Nuances of Prompt Design 🤖

11. The Future of Prompt Engineering: Where Coding Meets People Management 🌏

Key Keyword Summary

Related writing

Why Companies Are Done Renting Their AI

Block's AI Champion Strategy: Autonomizing a…

Understanding Society Through Simulation: Simile's Joon Sung Park

Reading

1. The Current State of Prompt Engineering and the Rise of Metaprompting 🚀

2. How Top AI Startups Actually Write Prompts 🏆

2-1. Real-World Case: Parahelp's Customer Support AI Prompt

3. Prompt Architecture: System / Developer / User Prompts 🏗️

4. Metaprompting and Prompt Folding: Prompts That Evolve Themselves! 🧠

5. LLM Limitations and Providing an "Escape Hatch" 🛑

6. Practical Metaprompting Techniques and Loop Structures 🔄

7. The Importance of Evals: The Real Crown Jewels Aren't the Prompts — They're the Evaluation Data 💎

8. The Rise of the "Forward Deployed Engineer" Model 🧑‍💻

9. Success Stories and Differentiation Strategies for AI Agent Startups 🏅

10. The "Personality" of LLMs and the Nuances of Prompt Design 🤖

11. The Future of Prompt Engineering: Where Coding Meets People Management 🌏

Key Keyword Summary

Related writing

Why Companies Are Done Renting Their AI

Block's AI Champion Strategy: Autonomizing a…

Understanding Society Through Simulation: Simile's Joon Sung Park