AX Roadmap That Leads to Results: From Individual Efficiency to Org Productivity

This webinar starts by examining the structural reasons why the intuition that "using more AI raises productivity" so often fails to translate into organizational outcomes. Drawing on Flex team's own experiments, failures, and learnings around the traps of measurement, sequencing, and organizational adoption, it lays out a step-by-step AX design strategy that begins with the last mile and the bottleneck — covering SSOT, evaluation environments, validation, and access/security controls. The conclusion, put simply, is that results come from changing bottlenecks, validation, decision-making, and collaboration structures — not from increasing output volume.

1. Opening: "Rather than a product demo, let's talk about organizational AX strategy"

After a brief BGM and reaction segment, Tae-eun Kim, CCPO of Flex team, opens with a greeting. She explains that because attendees came from a wide range of industries and organizations, she prepared a more general talk rather than a comfortable product walkthrough — and notes that a recording or slide deck will not be available for this webinar.

"It would have been much easier if this were just a product intro or a look at how our team works… I'm a little nervous given how many different organizations are here, but we've prepared as best we can."

She immediately nails down the session's core message in one sentence: organizational AX that connects to results cannot be solved simply by raising individual productivity.

"Organizational AX that connects to organizational outcomes… cannot be resolved by AI adoption and application that only raises individual productivity."

She also sets the direction: today's talk is not about how individuals can "use AI better" (usage tips), but is centered on how organizations should think about AI adoption. She previews that Flex itself went through similar trial and error, and will share what they experimented with, failed at, and learned along the way.

2. "AI Has Clearly Gotten Faster — So Why Are the Metrics the Same?" — The Structure of the Performance Illusion

The presenter says that many organizations, even now in 2026, keep increasing AI budgets, rolling out tools, and seeing employees use them — yet key performance metrics don't change much. Introducing various tools like Gemini, Claude, and Copilot does lead to shorter individual task times and higher output volume, but making that translate into achieving organizational goals is "a different story entirely."

"Output has definitely increased through AI. Reduction in individual working hours is genuinely happening. But making that lead to organizational goal achievement and results… feels like a different story."

She pinpoints a common expected scenario: "faster development → more releases → eventually revenue." But over time, iteration cycles and release improvements don't show up as expected, and the organization ends up asking:

"Everyone else is getting results from AI… are we the only ones struggling?"

Here the presenter identifies the difference between individuals and organizations as the root cause. If a small team or individual contributor works in an environment where their work directly produces results, AI's effect is clear. But the more complex the journey to the last mile of customer delivery, and the more collaboration relationships involved, the harder it is for individual output gains alone to produce results.

"The more complex the path to the last mile, and the more varied the collaboration relationships… the harder it is for increased AI-driven output alone to lead to results."

She adds an important warning: when only one point in the pipeline speeds up, the next point can't keep up, creating new bottlenecks.

"If a PM uses AI to produce many specs and PRDs but development output doesn't increase… it simply becomes a bottleneck."

As another fundamental point, she says organizational productivity is the "final result" of countless decisions. If specs change, handoffs are delayed, or intent is misunderstood, even work done quickly can become meaningless or fail to produce results. So the real problem AI needs to solve is often not code or document output volume, but communication, collaboration, and decision-making systems.

"The organizational decision-making system or collaboration process may be a bigger bottleneck than productivity itself." "The points AI needs to solve may involve far more problems in communication and collaboration."

3. The 3 Traps Flex Ran Into Directly: Measurement, Sequencing, Organizational Adoption

The presenter distills Flex's experience into three traps: (1) the measurement trap, (2) the adoption/sequencing trap, and (3) the organizational adoption trap.

3-1. (Trap 1) Measurement: "Confusing Usage and Output Volume with Results" 📈

The most natural metrics to look at after AI adoption are usage volume (tokens), frequency, number of responses, and output count. But she emphasizes that these metrics only signal "we used it a lot" — they are different from business outcomes.

"It's easy to mistake high usage volume or high output volume for results." "But usage and output volume themselves are not results."

She says you need to connect the metrics to organizational goals — contribution to goal achievement, cost reduction, lead time, quality, and so on.

"You need to confirm whether it's connected to metrics tied to organizational goals — like how much it contributed to goal achievement."

Dev Team Experiment: "Running Parallel Agents May Not Reduce Total Time Much"

A Flex developer tracked time on their calendar while running agents, and found that even running multiple agents in parallel, the actual total working time was sometimes not much different from running them sequentially. She adds that the perception that model speeds have slowed down recently is also something you "miss if you don't measure."

"When we ran agents in parallel… and actually calculated total working time, there were cases where it wasn't much different from running them sequentially." "There are cases where we don't perceive this well if we're not actually measuring."

Parallel work also increases context-switching burden, leading to highly variable efficiency depending on the person.

"Doing multiple things in parallel causes a lot of context switching… and the results vary enormously."

Critically, even if code output increases, if code review and verification processes stay the same, that segment becomes a bottleneck.

"AI code output was good. But if a code review process still exists… failing to solve the verification problem creates a bottleneck."

Cost Illusion: "Cheaper Than Hiring Engineers?" → May Not Deliver as Expected

She shares a case where many AI coding tools were adopted but actual output was lower than expected. Looking at PR volume, it came to only about 15%, and there was also a lot of "discarded tokens and work."

"The expectation was to get maybe one to two times faster… but looking at PR count alone, we were at 15%." "There was more discarded work than expected."

Knowledge Bases and Agents Also Fail Without Measuring "What Was Saved"

Flex tried building internal knowledge bases and agents to reduce communication costs, but simply tracking usage counts didn't connect to which roles saved which time and costs. She also emphasizes that errors in responses are hard to catch, and using such tools for high-risk work involving customers or contracts is dangerous.

"We weren't measuring which organizations and roles were having their time and costs saved… and we realized it wasn't being connected." "Cases arose where people took responses at face value and used them directly… errors in responses used in customer-facing situations carry significant risk."

She concludes that SSOT (Single Source of Truth) is essential — someone must be accountable for keeping knowledge current — and that the problem must be solved holistically, including access and permissions management.

"We need an answer to who can guarantee the single source of truth for organizational knowledge." "The problem of managing usage permissions and data access must be solved."

3-2. (Trap 2) Sequencing: "Starting with the Easy Parts Only Speeds Up What Comes Before the Bottleneck" 🧩

Early in AI adoption, organizations typically start "where it's easy and applicable" to build literacy — but the presenter says this can be a major trap. Attaching AI without resolving the bottleneck means only the stages before the bottleneck get faster, and nothing translates into results.

"If you attach AI without resolving the bottleneck, only the stages before the bottleneck speed up — it doesn't translate into results."

She emphasizes that where you start (adoption sequence) has an outsized impact on outcomes, and says she'll revisit this in detail in the strategy section.

AI Literacy Means Understanding, Evaluation, and Accountability — Not Just Usage Skills

She acknowledges the trend of buying many tools to raise literacy, but clarifies that AI literacy isn't just proficiency — it encompasses understanding, evaluation, application, and ethics/accountability. She notes that in organizations, the "evaluation" capability is often the weakest, and that this may have been a root cause of AX failures.

"AI literacy doesn't simply mean familiarity with usage." "The evaluation piece… seems to be something many organizations are not well equipped with… and may have been a problem in AI adoption."

3-3. (Trap 3) Organizational Adoption: "The Moment You Move from Individual Use to Shared Use, Security and Permissions Explode" 🔒

AI used individually is fast and convenient, but the moment an organization tries to use it collectively, completely different problems emerge. Using open-style tools (e.g., open cloud or agentic tools) as an example, she explains that while attractive to individuals, they can create critical problems in organizations — data access, external communications, and exposure to untrusted content.

"Using AI alone has no worries and no security risk… but the moment you try to use it together in an organization, you run into completely different problems."

She also references a real case where granting messenger access led to an AI spiraling out of control and sending more than 500 spam messages. She notes strongly that even when major companies have banned certain tools, research shows 20% of employees still use them without approval.

"Twenty percent of employees are reportedly still using them without approval. I think that's really dangerous."

Many organizations try to aggregate their knowledge and attach agents to it, but the biggest failure cause turns out to be data access control (authorization). New hires and executives need access to different information, and when personnel changes and org restructuring happen frequently, permissions need to update in real time — and failure to solve this creates serious risk.

"Aggregating data in one place doesn't solve it on its own." "Every time there's a role change, org restructuring, or title change, access permissions need to update in real time."

She briefly previews Flex AI as a product that can handle permission decisions at the moment of personnel changes.

4. The Core Engine of Organizational AX: Validation and Harness Engineering

The presenter emphasizes that among all the issues raised, AI output validation is the most fundamental. How well an organization solves validation is decisive in reaching the AX outcomes it wants.

"The domain of validating AI outputs… I think this is the core issue."

Flex's development team says the biggest reason more code didn't translate into results was the unchanged code review process. As AI generates more code, review burden explodes, so "keeping the old process as-is" was meaningless.

"AI-written code also needs validation, but the volume increased more… doing it the same way as before becomes meaningless."

So they stopped relying on human-centered review and started building an environment for validating AI outputs. Examples include test code, having AI write E2E tests that validate actual APIs and databases, and even screen recordings to visualize results. Only after building this scaffolding did things start translating into results.

"We started investing time in building an environment to validate AI outputs." "We went as far as having AI write E2E tests directly."

She also pushes back on the misconception that "conventions like avoiding duplicated code aren't needed in the AI era." Because AI still relies on the current codebase's search and patterns, codebase patterns and conventions actually matter more than ever. AI might say it "fixed everything" but only update 9 out of 10 instances.

"AI said it found everything, but actually only 9 out of 10 were fixed… situations like this happened." "Codebase patterns and conventions are important in the AI era… or rather carry even higher value."

This leads into the concept of Harness Engineering — described as "the meta-engineering of putting a harness (constraints, feedback, validation, and cleanup) on a powerful agent so it runs reliably in the desired direction."

"Put a harness and saddle on the powerful horse that is the agent, and let it run safely in the direction we want."

The five components mentioned are context, architectural constraints, feedback loops, self-verification, and garbage collection. She says the engineer's role is shifting from direct coding to becoming an environment designer, intent specifier, and feedback architecture builder.

"Engineers seem to be transforming into environment designers, people who articulate intent to AI… people who build feedback architectures."

She also pushes back on claims that "developers aren't needed" or "SaaS is dead" — engineers who build, manage, and improve validation environments are still needed. She adds the anecdote that Anthropic, which built Claude Code and implied developers might not be needed, actually increased its engineering job postings.

"Saying AI codes isn't necessarily false. But there are times when more engineers are needed to build, manage, and validate AI."

5. Four AX Design Strategies That Lead to Results: Last Mile → SSOT/Evaluation → Role Redefinition → Phased Expansion

The presentation now turns to "So how should you design for results?" and organizes the answer into four strategies. The keywords threading through all of them are last mile, bottleneck removal, SSOT, evaluation/validation, and access/security.

5-1. Strategy 1) "Start from the Last Mile" + Revisit the Bottleneck 🎯

The presenter recommends starting AX at the last mile — the point where customer value is actually realized. That way, AI's effect connects directly to measurable outcome metrics, and the "measurement trap" is easier to avoid.

"AI adoption that leads to results must start at the point most directly connected to goal achievement. That is, it should start from the last mile." "The closer you are to the point connected to results, the simpler and clearer the metrics become."

Starting from the last mile also makes it less likely to create new bottlenecks. She gives the example that in software development, the last mile is not the frontend but the QA stage. Other last-mile examples include CS (customer-facing), operations, consulting/sales, and internal helpdesks.

"Examples of the last mile… the QA stage, the CS team at the customer touchpoint, operations, sales organizations, helpdesks…"

She adds an important caveat: if you apply AI without first cleaning up the process that leads to the last mile, you risk creating yet another bottleneck — so business processes and bottlenecks must be redesigned together.

"Applying AI without resolving the business process… is highly likely to create another bottleneck."

Flex Case: Creating "Last-Mile Impact" in CS, Partners, Internal Teams, and QA

Flex introduced Intercom to improve efficiency for their CS team, which handles upselling and issue response at the customer touchpoint, and shares these outcomes:

Average 70% problem resolution
Cost savings of 220 million KRW
AI customer support satisfaction rate of 83%
60% reduction in CX headcount compared to 2022

"We introduced Intercom… resolved an average of 70% of issues… cost savings of 220 million KRW… satisfaction rate of 83%… 60% reduction in CX headcount since 2022."

Rather than starting by embedding AI into the product first, she says they began with HR Partners and Payroll Partners — service-delivery organizations directly connected to revenue — because partner capacity was limited and customers were "waiting in line," making the impact large.

"Flex didn't start by thinking about embedding AI into the product… we started with HR Partners and Payroll Partners."

They also invested in a context-sustaining agent that integrates company information for sales and CS, carrying conversation, meeting, and consulting history from initial inquiry onward.

"From initial inquiry… an agent that continuously carries the context of conversations, meetings, and consulting history was made available."

Internally, they created an "Enhance Team" to solve organizational bottlenecks with AI, offering agents for contract review and company information aggregation.

On QA, they went even further. They redefined the QA organization as a QP (Quality Platform) Team and invested in reducing quality-verification bottlenecks under AI influence. Examples include test set generation, a tool that extracts specs from code and answers questions in Slack, a Chrome extension that records error screens and creates tickets, and an AI performance testing tool — with the note that "a quality manager who is not an engineering background is achieving all of this with AI."

"We made a Chrome extension ourselves that records the screen when an error occurs and creates a ticket." "A quality manager who doesn't come from an engineering background is achieving all of this with AI…"

5-2. Strategy 2) Without "SSOT (Organizational Truth) + Agent Evaluation Environment," There Is No Organizational Brain 🧠

The second strategy is that organizations need a proper SSOT (Single Source of Truth), and to run it correctly, an evaluation environment for agent/LLM responses is absolutely necessary.

The presenter distinguishes SSOT from "just a data stone where data is dumped in one place." SSOT is the organization's commitment: it must be centered on up-to-date, meaningful data with unnecessary content removed, and must deliver "truth" aligned with the intent behind a question. For example, if the PM spec in Notion, the design spec in Figma, and the development code spec are all different, being able to answer "which one is the true spec?" is what makes something SSOT.

"SSOT is like an organizational commitment." "When the Notion spec, Figma spec, and code spec differ… you need to be able to answer 'which is the true spec?'"

She also notes that most organizational problems arise not from production itself, but from communication failures on the path to the last mile — information mismatches, bad decisions, the cost of finding the right person or searching for the right information.

"Most organizational problems are often not in production itself." "Wrong decisions due to information mismatch… the cost of finding the right person or searching for information…"

So organizations need to turn integrated data into SSOT and build agents that use it — but agents also can't improve without an evaluation framework.

"Without a measurement system for SSOT… it's hard to get to solving collaboration problems through agent improvement."

Flex's Evaluation and Operations System: Prompt Versioning + Model Performance/Cost Monitoring + Feedback Loop

Flex built the following operating system to address this:

Prompt management system (version control, change history, centralized management)
LLM performance and cost monitoring
Continuous feedback loop (improvement cycle)

"Prompt version control, change history… centralized management system." "Monitoring LLM performance and cost… cost optimization." "Continuous feedback cycle… feedback loop."

Only with this system in place can AI investment step out of the black box — enabling model selection appropriate to each problem and cost optimization. She also notes that the optimal model differs by task: meeting STT, summarization models, labor law assistants, helpdesks, and so on.

5-3. Strategy 3) "Redefine Roles and Organizations" — AI Is Not a Tool; It's a Change in How You Work 🔄

The third strategy is moving beyond AI tool use to actually changing the way work is done. She says that once collaboration bottlenecks start being solved with AI, role boundaries naturally begin to blur — and this requires more explicit rules and conventions.

"It seems like we need to go beyond using AI tools and create a change in the way we work." "We come to realize that solving collaboration with AI requires far more commitments and rules."

As an example: for QA to validate quality with AI, you need spec document conventions, code that follows those conventions, AI-written test code, and E2E implementation — in short, "commitments." Frontend needs module usage rules and design system specifications. Designers and PMs need to spec things out in a form AI can read and execute. Once that environment is in place, she says it becomes natural for the boundaries between PM, PD, FE, and QA to dissolve.

"PMs now need spec documents for AI… when an environment is created where validation can be appropriately shared between AI and the responsible person… it's natural for those boundaries to dissolve."

Flex Internal Change Example 1: Design System Formalized "for AI to Use"

Flex is expanding its design system (FX) by formalizing it (e.g., MD-file-based specs) so AI can use it. She describes a target workflow where changing a spec document changes the Figma design, and the updated design is reflected in frontend code — noting that this foundation was used in Flex AI's custom page and dashboard features.

"We chose to formalize it so that AI can now use it." "The updated Figma design is reflected directly in FE code…"

Flex Internal Change Example 2: The Role of a PM on the AI Product Team Has Changed

Flex's AI product has become centered on "conversation" rather than "screens," and the PM on that team now handles evaluation-based setup, prompt management, and sharing source code without waiting for developers. This shifts the development cycle from "persuade-sprint" to hypothesize-implement-validate, and the time to validation shrinks from weeks to hours.

"PMs no longer wait for developers… they set up evaluation frameworks, manage prompts, and share source code alongside developers." "The cycle shifts to hypothesis-implement-validate… shrinking from week-level to hour-level."

She also highlights that there is no longer a need for the abstract medium of "a spec document" — intent can now be communicated through code. The moment where code written by a PM for agent design communicated design intent "more accurately than any spec document" is noted as striking.

"There was no longer a need to use the abstracted medium of a spec document… intent could be communicated directly through code." "The code the PM wrote conveyed the design intent more accurately than any spec document."

As a concrete example: over two weeks, a PM reviewed and completed 10 PRs, added 4,000+ lines of code, changed 75 files, and completed a knowledge migration that had previously taken 4 hours in 3 minutes — and she notes that this PM is not from an engineering background.

"Over two weeks, 10 PRs… over 4,000 lines of code added… 75 files changed." "Completed in 3 minutes what used to take 4 hours…"

Flex Internal Change Example 3: Mobile Engineering Team — "Individuals Using AI Well" vs. "Team Working AI-Native" Are Different Problems

The mobile team had dissolved boundaries through Kotlin Multiplatform and Mobile BFF, but also ran into the traps mentioned earlier. They initially mistook output metrics (file count, lines of code) for results, and realized in hindsight that they should have prioritized scenario specification and validation environments before code generation.

"We initially mistook output metrics for AI performance…" "We should have prioritized scenario specification and validation environments over code generation…"

And as a concluding point: "an individual using Claude well" and "a whole team working AI-native" are completely different problems. At the team level, skill-sharing structures, project-level management of sub-agent design, and workflow improvement are what matter.

"The individual using it well and the whole team working AI-native are completely different problems."

Here the presenter adds one more conclusion that echoes the talk's title:

"Engineers, too, ultimately need to work like PMs… focusing on What and Why rather than How, with code as merely a means."

5-4. Strategy 4) "Start with Public Knowledge, Then Expand Step by Step as You Solve Authorization and Security" 🪜

The fourth strategy is: trying to expand org-wide all at once will inevitably hit a wall of authorization and security problems, so start with public knowledge and expand in phases.

"If you try to expand AI adoption across the entire organization at once, you'll inevitably hit authorization and security problems." "Start with public knowledge… and once that's solved, design a phased expansion to the full organization."

Flex started by building agents around knowledge close to public — billing/payroll specs, labor law and case search, company policies, benefits, and IT security questions that "anyone can access."

"Build agents starting from information that anyone in the organization can access…"

As the next phase, they expanded into information that requires role-, title-, and level-based access, making the connection between HR data and the AI platform critical. They organized HR categories into 15 types, auto-classified documents to build the knowledge base, and applied a multi-agent system org-wide. An internal labor law assistant agent for internal partners was also launched into closed beta.

"Created a knowledge base across 15 categories… applied a multi-agent system and deployed it org-wide first…" "Labor law assistant agent… launched into closed beta…"

Information like meeting logs — "visible only to attendees" — was expanded with authorization controls and used internally for over six months before being productized and incorporated into Flex AI.

The final (ultimate) phase is org-wide agentic AI: member, organization, and goal data connected in real time so AI can understand the state of the organization and support decision-making. She says this requires all HR data — from appointments to compensation — to be grounded in an AI platform.

"Three data types — members, organization, and goals — connected in real time… a stage where AI grasps the organizational state and supports decision-making." "All HR data, from appointment information to compensation information, must be grounded in the AI platform."

6. "Things You Can Start Tomorrow" — Checklist

Near the end, the presenter offers a brief summary of actionable tips. The key themes are last mile, bottlenecks, measurement metrics, and evaluation environments.

If you're in the early stages of AI adoption, start by identifying the last mile where AI should be applied.
Before asking about output volume, find the collaboration problems and bottleneck points first.
Set measurable performance metrics tied to the last mile before anything else.
Check whether what you're currently measuring is usage volume (tokens/frequency) or outcome metrics.
Before pursuing SSOT, confirm whether you have an environment to evaluate and measure agents.

"I recommend finding the last mile… first find existing collaboration problems and bottleneck points." "Confirm whether what you're measuring is usage volume or metrics that connect to outcomes…" "Without evaluating agents, SSOT cannot be sustained continuously…"

7. HR-Based AX and Flex AI Introduction (Sponsored) + Talent Profile Proposal

In the closing segment, the presenter introduces Flex AI — with the premise "if you don't want to do all of this alone" — as an AI platform that builds the organizational brain grounded in HR data.

"You don't have to do organizational AI adoption, measurement, and SSOT building alone."

She also proposes a definition of "a person who handles AI well in an organization," centered on: multi-context fluency, understanding the start and end of work (the last mile), finding and resolving bottlenecks, extending context across role boundaries, thriving in an AI-native environment, and the ability to scale personal knowledge into organizational assets.

"Someone who can understand the beginning and end of work as it connects all the way to the last mile, rather than just a single task." "Someone who can transform personal knowledge into organizational-scale assets…"

She goes on to emphasize that HR data is foundational to AX success. The questions an organization needs to answer — Are we getting closer to our goals? What are the organization's current status and goal completion rates? What is the state of our people? What changed since last week or last month? — all require an HR data foundation.

"Are we getting closer to our goals? What is the organization's current status and goal completion rate?… What changed since last week and last month… These are the problems organizations need to solve."

With Flex, she explains, you can create briefings based on attendance data and answer questions like "How's it going?" with goal completion rates, collaboration network insights, and 1-on-1 topic predictions. She closes by stating that data transformation (DX) in the HR domain must come before AX is possible, announces Flex's upcoming official Flex AI launch, and thanks the audience.

"I think data transformation in the HR domain is critically important before AI." "Flex AI's official launch should be coming soon. I hope to see you all again then…" "Thank you so much for staying with us to the end."

Closing

This webinar breaks down the simplistic equation of AI adoption = productivity improvement = results, and argues that creating organizational outcomes requires first resolving bottlenecks at the last mile, then designing the "organizational systems" — validation, evaluation, SSOT, and access/security — before anything else. Ultimately, the presenter repeatedly emphasizes that the real battleground of AX is not individual prompt skills, but designing for measurable impact at the point where results connect, and structurally reducing the cost of collaboration and decision-making.

1. Opening: "Rather than a product demo, let's talk about organizational AX strategy"

"It would have been much easier if this were just a product intro or a look at how our team works… I'm a little nervous given how many different organizations are here, but we've prepared as best we can."

She immediately nails down the session's core message in one sentence: organizational AX that connects to results cannot be solved simply by raising individual productivity.

"Organizational AX that connects to organizational outcomes… cannot be resolved by AI adoption and application that only raises individual productivity."

2. "AI Has Clearly Gotten Faster — So Why Are the Metrics the Same?" — The Structure of the Performance Illusion

"Output has definitely increased through AI. Reduction in individual working hours is genuinely happening. But making that lead to organizational goal achievement and results… feels like a different story."

"Everyone else is getting results from AI… are we the only ones struggling?"

"The more complex the path to the last mile, and the more varied the collaboration relationships… the harder it is for increased AI-driven output alone to lead to results."

She adds an important warning: when only one point in the pipeline speeds up, the next point can't keep up, creating new bottlenecks.

"If a PM uses AI to produce many specs and PRDs but development output doesn't increase… it simply becomes a bottleneck."

"The organizational decision-making system or collaboration process may be a bigger bottleneck than productivity itself." "The points AI needs to solve may involve far more problems in communication and collaboration."

3. The 3 Traps Flex Ran Into Directly: Measurement, Sequencing, Organizational Adoption

The presenter distills Flex's experience into three traps: (1) the measurement trap, (2) the adoption/sequencing trap, and (3) the organizational adoption trap.

3-1. (Trap 1) Measurement: "Confusing Usage and Output Volume with Results" 📈

"It's easy to mistake high usage volume or high output volume for results." "But usage and output volume themselves are not results."

She says you need to connect the metrics to organizational goals — contribution to goal achievement, cost reduction, lead time, quality, and so on.

"You need to confirm whether it's connected to metrics tied to organizational goals — like how much it contributed to goal achievement."

Dev Team Experiment: "Running Parallel Agents May Not Reduce Total Time Much"

"When we ran agents in parallel… and actually calculated total working time, there were cases where it wasn't much different from running them sequentially." "There are cases where we don't perceive this well if we're not actually measuring."

Parallel work also increases context-switching burden, leading to highly variable efficiency depending on the person.

"Doing multiple things in parallel causes a lot of context switching… and the results vary enormously."

Critically, even if code output increases, if code review and verification processes stay the same, that segment becomes a bottleneck.

"AI code output was good. But if a code review process still exists… failing to solve the verification problem creates a bottleneck."

Cost Illusion: "Cheaper Than Hiring Engineers?" → May Not Deliver as Expected

"The expectation was to get maybe one to two times faster… but looking at PR count alone, we were at 15%." "There was more discarded work than expected."

Knowledge Bases and Agents Also Fail Without Measuring "What Was Saved"

"We weren't measuring which organizations and roles were having their time and costs saved… and we realized it wasn't being connected." "Cases arose where people took responses at face value and used them directly… errors in responses used in customer-facing situations carry significant risk."

"We need an answer to who can guarantee the single source of truth for organizational knowledge." "The problem of managing usage permissions and data access must be solved."

3-2. (Trap 2) Sequencing: "Starting with the Easy Parts Only Speeds Up What Comes Before the Bottleneck" 🧩

"If you attach AI without resolving the bottleneck, only the stages before the bottleneck speed up — it doesn't translate into results."

She emphasizes that where you start (adoption sequence) has an outsized impact on outcomes, and says she'll revisit this in detail in the strategy section.

AI Literacy Means Understanding, Evaluation, and Accountability — Not Just Usage Skills

"AI literacy doesn't simply mean familiarity with usage." "The evaluation piece… seems to be something many organizations are not well equipped with… and may have been a problem in AI adoption."

3-3. (Trap 3) Organizational Adoption: "The Moment You Move from Individual Use to Shared Use, Security and Permissions Explode" 🔒

"Using AI alone has no worries and no security risk… but the moment you try to use it together in an organization, you run into completely different problems."

"Twenty percent of employees are reportedly still using them without approval. I think that's really dangerous."

"Aggregating data in one place doesn't solve it on its own." "Every time there's a role change, org restructuring, or title change, access permissions need to update in real time."

She briefly previews Flex AI as a product that can handle permission decisions at the moment of personnel changes.

4. The Core Engine of Organizational AX: Validation and Harness Engineering

"The domain of validating AI outputs… I think this is the core issue."

"AI-written code also needs validation, but the volume increased more… doing it the same way as before becomes meaningless."

"We started investing time in building an environment to validate AI outputs." "We went as far as having AI write E2E tests directly."

"AI said it found everything, but actually only 9 out of 10 were fixed… situations like this happened." "Codebase patterns and conventions are important in the AI era… or rather carry even higher value."

"Put a harness and saddle on the powerful horse that is the agent, and let it run safely in the direction we want."

"Engineers seem to be transforming into environment designers, people who articulate intent to AI… people who build feedback architectures."

"Saying AI codes isn't necessarily false. But there are times when more engineers are needed to build, manage, and validate AI."

5. Four AX Design Strategies That Lead to Results: Last Mile → SSOT/Evaluation → Role Redefinition → Phased Expansion

5-1. Strategy 1) "Start from the Last Mile" + Revisit the Bottleneck 🎯

"AI adoption that leads to results must start at the point most directly connected to goal achievement. That is, it should start from the last mile." "The closer you are to the point connected to results, the simpler and clearer the metrics become."

"Examples of the last mile… the QA stage, the CS team at the customer touchpoint, operations, sales organizations, helpdesks…"

"Applying AI without resolving the business process… is highly likely to create another bottleneck."

Flex Case: Creating "Last-Mile Impact" in CS, Partners, Internal Teams, and QA

Flex introduced Intercom to improve efficiency for their CS team, which handles upselling and issue response at the customer touchpoint, and shares these outcomes:

Average 70% problem resolution
Cost savings of 220 million KRW
AI customer support satisfaction rate of 83%
60% reduction in CX headcount compared to 2022

"We introduced Intercom… resolved an average of 70% of issues… cost savings of 220 million KRW… satisfaction rate of 83%… 60% reduction in CX headcount since 2022."

"Flex didn't start by thinking about embedding AI into the product… we started with HR Partners and Payroll Partners."

They also invested in a context-sustaining agent that integrates company information for sales and CS, carrying conversation, meeting, and consulting history from initial inquiry onward.

"From initial inquiry… an agent that continuously carries the context of conversations, meetings, and consulting history was made available."

Internally, they created an "Enhance Team" to solve organizational bottlenecks with AI, offering agents for contract review and company information aggregation.

"We made a Chrome extension ourselves that records the screen when an error occurs and creates a ticket." "A quality manager who doesn't come from an engineering background is achieving all of this with AI…"

5-2. Strategy 2) Without "SSOT (Organizational Truth) + Agent Evaluation Environment," There Is No Organizational Brain 🧠

The second strategy is that organizations need a proper SSOT (Single Source of Truth), and to run it correctly, an evaluation environment for agent/LLM responses is absolutely necessary.

"SSOT is like an organizational commitment." "When the Notion spec, Figma spec, and code spec differ… you need to be able to answer 'which is the true spec?'"

"Most organizational problems are often not in production itself." "Wrong decisions due to information mismatch… the cost of finding the right person or searching for information…"

So organizations need to turn integrated data into SSOT and build agents that use it — but agents also can't improve without an evaluation framework.

"Without a measurement system for SSOT… it's hard to get to solving collaboration problems through agent improvement."

Flex's Evaluation and Operations System: Prompt Versioning + Model Performance/Cost Monitoring + Feedback Loop

Flex built the following operating system to address this:

Prompt management system (version control, change history, centralized management)
LLM performance and cost monitoring
Continuous feedback loop (improvement cycle)

"Prompt version control, change history… centralized management system." "Monitoring LLM performance and cost… cost optimization." "Continuous feedback cycle… feedback loop."

5-3. Strategy 3) "Redefine Roles and Organizations" — AI Is Not a Tool; It's a Change in How You Work 🔄

"It seems like we need to go beyond using AI tools and create a change in the way we work." "We come to realize that solving collaboration with AI requires far more commitments and rules."

"PMs now need spec documents for AI… when an environment is created where validation can be appropriately shared between AI and the responsible person… it's natural for those boundaries to dissolve."

Flex Internal Change Example 1: Design System Formalized "for AI to Use"

"We chose to formalize it so that AI can now use it." "The updated Figma design is reflected directly in FE code…"

Flex Internal Change Example 2: The Role of a PM on the AI Product Team Has Changed

"PMs no longer wait for developers… they set up evaluation frameworks, manage prompts, and share source code alongside developers." "The cycle shifts to hypothesis-implement-validate… shrinking from week-level to hour-level."

"There was no longer a need to use the abstracted medium of a spec document… intent could be communicated directly through code." "The code the PM wrote conveyed the design intent more accurately than any spec document."

"Over two weeks, 10 PRs… over 4,000 lines of code added… 75 files changed." "Completed in 3 minutes what used to take 4 hours…"

Flex Internal Change Example 3: Mobile Engineering Team — "Individuals Using AI Well" vs. "Team Working AI-Native" Are Different Problems

"We initially mistook output metrics for AI performance…" "We should have prioritized scenario specification and validation environments over code generation…"

"The individual using it well and the whole team working AI-native are completely different problems."

Here the presenter adds one more conclusion that echoes the talk's title:

"Engineers, too, ultimately need to work like PMs… focusing on What and Why rather than How, with code as merely a means."

5-4. Strategy 4) "Start with Public Knowledge, Then Expand Step by Step as You Solve Authorization and Security" 🪜

The fourth strategy is: trying to expand org-wide all at once will inevitably hit a wall of authorization and security problems, so start with public knowledge and expand in phases.

"If you try to expand AI adoption across the entire organization at once, you'll inevitably hit authorization and security problems." "Start with public knowledge… and once that's solved, design a phased expansion to the full organization."

Flex started by building agents around knowledge close to public — billing/payroll specs, labor law and case search, company policies, benefits, and IT security questions that "anyone can access."

"Build agents starting from information that anyone in the organization can access…"

"Created a knowledge base across 15 categories… applied a multi-agent system and deployed it org-wide first…" "Labor law assistant agent… launched into closed beta…"

"Three data types — members, organization, and goals — connected in real time… a stage where AI grasps the organizational state and supports decision-making." "All HR data, from appointment information to compensation information, must be grounded in the AI platform."

6. "Things You Can Start Tomorrow" — Checklist

Near the end, the presenter offers a brief summary of actionable tips. The key themes are last mile, bottlenecks, measurement metrics, and evaluation environments.

If you're in the early stages of AI adoption, start by identifying the last mile where AI should be applied.
Before asking about output volume, find the collaboration problems and bottleneck points first.
Set measurable performance metrics tied to the last mile before anything else.
Check whether what you're currently measuring is usage volume (tokens/frequency) or outcome metrics.
Before pursuing SSOT, confirm whether you have an environment to evaluate and measure agents.

"I recommend finding the last mile… first find existing collaboration problems and bottleneck points." "Confirm whether what you're measuring is usage volume or metrics that connect to outcomes…" "Without evaluating agents, SSOT cannot be sustained continuously…"

7. HR-Based AX and Flex AI Introduction (Sponsored) + Talent Profile Proposal

"You don't have to do organizational AI adoption, measurement, and SSOT building alone."

"Someone who can understand the beginning and end of work as it connects all the way to the last mile, rather than just a single task." "Someone who can transform personal knowledge into organizational-scale assets…"

"Are we getting closer to our goals? What is the organization's current status and goal completion rate?… What changed since last week and last month… These are the problems organizations need to solve."

"I think data transformation in the HR domain is critically important before AI." "Flex AI's official launch should be coming soon. I hope to see you all again then…" "Thank you so much for staying with us to the end."

1. Opening: "Rather than a product demo, let's talk about organizational AX strategy"

2. "AI Has Clearly Gotten Faster — So Why Are the Metrics the Same?" — The Structure of the Performance Illusion

3. The 3 Traps Flex Ran Into Directly: Measurement, Sequencing, Organizational Adoption

3-1. (Trap 1) Measurement: "Confusing Usage and Output Volume with Results" 📈

Dev Team Experiment: "Running Parallel Agents May Not Reduce Total Time Much"

Cost Illusion: "Cheaper Than Hiring Engineers?" → May Not Deliver as Expected

Knowledge Bases and Agents Also Fail Without Measuring "What Was Saved"

3-2. (Trap 2) Sequencing: "Starting with the Easy Parts Only Speeds Up What Comes Before the Bottleneck" 🧩

AI Literacy Means Understanding, Evaluation, and Accountability — Not Just Usage Skills

3-3. (Trap 3) Organizational Adoption: "The Moment You Move from Individual Use to Shared Use, Security and Permissions Explode" 🔒

4. The Core Engine of Organizational AX: Validation and Harness Engineering

5. Four AX Design Strategies That Lead to Results: Last Mile → SSOT/Evaluation → Role Redefinition → Phased Expansion

5-1. Strategy 1) "Start from the Last Mile" + Revisit the Bottleneck 🎯

Flex Case: Creating "Last-Mile Impact" in CS, Partners, Internal Teams, and QA

5-2. Strategy 2) Without "SSOT (Organizational Truth) + Agent Evaluation Environment," There Is No Organizational Brain 🧠

Flex's Evaluation and Operations System: Prompt Versioning + Model Performance/Cost Monitoring + Feedback Loop

5-3. Strategy 3) "Redefine Roles and Organizations" — AI Is Not a Tool; It's a Change in How You Work 🔄

Flex Internal Change Example 1: Design System Formalized "for AI to Use"

Flex Internal Change Example 2: The Role of a PM on the AI Product Team Has Changed

Flex Internal Change Example 3: Mobile Engineering Team — "Individuals Using AI Well" vs. "Team Working AI-Native" Are Different Problems

5-4. Strategy 4) "Start with Public Knowledge, Then Expand Step by Step as You Solve Authorization and Security" 🪜

6. "Things You Can Start Tomorrow" — Checklist

7. HR-Based AX and Flex AI Introduction (Sponsored) + Talent Profile Proposal

Closing

Related writing

Inside YC's AI Playbook

Why Agent-Era Skill Standardization Changes Everything

Harness Engineering: Leveraging Codex in an Agent-First World

Reading

1. Opening: "Rather than a product demo, let's talk about organizational AX strategy"

2. "AI Has Clearly Gotten Faster — So Why Are the Metrics the Same?" — The Structure of the Performance Illusion

3. The 3 Traps Flex Ran Into Directly: Measurement, Sequencing, Organizational Adoption

3-1. (Trap 1) Measurement: "Confusing Usage and Output Volume with Results" 📈

Dev Team Experiment: "Running Parallel Agents May Not Reduce Total Time Much"

Cost Illusion: "Cheaper Than Hiring Engineers?" → May Not Deliver as Expected

Knowledge Bases and Agents Also Fail Without Measuring "What Was Saved"

3-2. (Trap 2) Sequencing: "Starting with the Easy Parts Only Speeds Up What Comes Before the Bottleneck" 🧩

AI Literacy Means Understanding, Evaluation, and Accountability — Not Just Usage Skills

3-3. (Trap 3) Organizational Adoption: "The Moment You Move from Individual Use to Shared Use, Security and Permissions Explode" 🔒

4. The Core Engine of Organizational AX: Validation and Harness Engineering

5. Four AX Design Strategies That Lead to Results: Last Mile → SSOT/Evaluation → Role Redefinition → Phased Expansion

5-1. Strategy 1) "Start from the Last Mile" + Revisit the Bottleneck 🎯

Flex Case: Creating "Last-Mile Impact" in CS, Partners, Internal Teams, and QA

5-2. Strategy 2) Without "SSOT (Organizational Truth) + Agent Evaluation Environment," There Is No Organizational Brain 🧠

Flex's Evaluation and Operations System: Prompt Versioning + Model Performance/Cost Monitoring + Feedback Loop

5-3. Strategy 3) "Redefine Roles and Organizations" — AI Is Not a Tool; It's a Change in How You Work 🔄

Flex Internal Change Example 1: Design System Formalized "for AI to Use"

Flex Internal Change Example 2: The Role of a PM on the AI Product Team Has Changed

Flex Internal Change Example 3: Mobile Engineering Team — "Individuals Using AI Well" vs. "Team Working AI-Native" Are Different Problems

5-4. Strategy 4) "Start with Public Knowledge, Then Expand Step by Step as You Solve Authorization and Security" 🪜

6. "Things You Can Start Tomorrow" — Checklist

7. HR-Based AX and Flex AI Introduction (Sponsored) + Talent Profile Proposal

Closing

Related writing

Inside YC's AI Playbook

Why Agent-Era Skill Standardization Changes Everything

Harness Engineering: Leveraging Codex in an Agent-First World