
Agent Architecture: Router + Skills + Memory
- Router: Decides which skill to invoke (the "boss")
- Skills: Execute actual tasks (API calls, LLM calls)
- Memory: Stores conversation state and history
Evaluation Points
Router: Did it choose the right skill? With the right parameters? Skills: Were results relevant and accurate? (Use LLM-based or code-based evaluation) Convergence: Does the agent complete the same task in a consistent number of steps across different models? Voice agents: Additionally evaluate sentiment, transcription accuracy, tone consistency, and intent detection.
Key Takeaway
"If you remember one thing: put evaluations at every stage of your agent. When problems occur, you need to quickly identify which layer failed."