This video is the first session in the PyWhy community's 'Causality in Practice' series, where Microsoft's Emre Kiciman introduces the latest attempts and possibilities of integrating large language models (LLMs) with causal inference, centered on the pywhy-llm library. The video concretely demonstrates in tutorial format 'why domain knowledge is the core of causal inference and how difficult it is to acquire,' how LLMs can complement this problem, and the practical protocols/features that pywhy-llm provides. Finally, it encourages community input and participation while discussing the library's incomplete stage and future development directions.


1. Series Overview and Introduction

This session launched as the first installment of a new lecture series called 'Causality in Practice'. Participants aim to deeply cover not just how to use tools, but also the underlying principles of causality and the real-world problems encountered in practice.

"From much community feedback, there were many requests to cover not just the tools we build but also the causal theory behind them. That's why we prepared this series that covers the latest causal research, hands-on workshops, and external expert presentations."

The featured speaker for this first lecture, Emre Kiciman, is a long-time collaborator who has been active in PyWhy and various causality projects, focusing on how large language models and causal inference can be integrated. This online lecture, held via Teams, was followed by a live Q&A session.


2. Where LLMs and Causal Inference Meet -- The Problem Statement

The main theme of the video is the most critical yet most challenging part of causal inference -- 'acquiring domain knowledge'.

"What actually makes causal inference difficult is 'domain knowledge.' In other words, where does the causal graph itself come from? That's the fundamental question."

He presents real-world examples of past incorrect causal interpretations to explain common confounders and the inherent limitations of data.

  • Example 1: The claim that nightlights cause myopia In 1999, a study claiming that 'children who sleep under nightlights develop more myopia' was widely reported, but follow-up research revealed that parental myopia was a confounding variable influencing nightlight use, invalidating the causal relationship.

"The conclusion that a small light causes children's myopia was actually the result of overlooking a hidden variable -- family history."

  • Example 2: The claim that Vitamins C and E reduce heart disease The result of overlooking multiple confounders such as socioeconomic status, the effect disappeared in randomized controlled trials (RCTs).

"Many health claims failed to replicate in actual RCTs or even showed opposite results. This was because confounders were left out."

  • Example 3: The failure of early COVID-19 CT scan AI AI was actually only responding to whether patients were lying down or sitting up, or distinguishing child patients, etc. He describes cases where incorrect data structures and hidden rules rendered AI useless in the field.

All these cases show that when 'domain knowledge' is not properly reflected in causal inference, causality can be misjudged from observational data. Emre sees this as precisely where LLMs can help.

"LLMs have the great strength of being able to naturally leverage associations and background knowledge that we would struggle to find and verify one by one, based on the vast text and expert knowledge available on the internet."


3. pywhy-llm: Assisting Causal Analysis with LLM-Embedded Protocols

Now the pywhy-llm library is introduced -- what it is and how it uses LLMs as 'assistants' at each stage of causal inference.

Core Design Philosophy and Supported Stages

  • Measurement/Modeling Stage

    • LLM identifies 'causal variable relationships relevant to the problem' and 'suggests potential confounders'
    • Supports the part that previously required data scientists to have repeated discussions with multiple domain experts
  • Identification Stage

    • 'Recommends appropriate causal analysis methods (backdoor, front-door, instrumental variables, etc.)'
    • Assists where graph analysis is complex or expert intuition is needed
  • Verification Stage

    • Critiques/suggests modifications to LLM-generated causal graphs, automatically detects negative controls or potential confounders
  • Effect Estimation Stage

    • Focuses on automation/assistance such as integration with existing libraries (econml, etc.) and code generation

"pywhy-llm is experimental and designed so that simple interfaces and complex interfaces can coexist. It has an open architecture that allows for various extensions and deeper implementations going forward."


4. pywhy-llm Code/Notebook Hands-on: How Does It Work?

Here, a detailed explanation of how pywhy-llm actually works is provided along with demos.

  • Protocol Structure

    • The 'Suggesters' folder contains protocols defined for each causal analysis stage (modeling, identification, verification, effect estimation)
    • For example, in the modeling stage:
      • Suggesting causal directions between variable pairs (pairwise relationship)
      • Suggesting an entire causal graph with multiple variables
      • Presenting potential confounders for specific treatment-outcome pairs
  • Example Runs

    1. Ice cream sales and temperature relationship

      • Input two variables ('ice cream sales', 'temperature')
      • LLM infers "temperature influences ice cream sales"

        "The final answer was B, meaning 'temperature causes ice cream sales.'"

    2. Ice cream sales and shark attacks

      • LLM determines "the two do not influence each other"

        "The conclusion was C, meaning neither is a cause of the other."

    3. Potential confounder suggestion example

      • Confounders for "ice cream sales -> shark attacks"?

        "Beach visitor count, season, water temperature, holidays, shark population, swimming conditions, tourist season..." and various other confounders are automatically suggested

    4. Causal analysis method suggestion

      • For example, when wanting to know 'the effect of smoking on newborn weight,' it identifies the "cigarette tax" variable as a potential instrumental variable
  • Flexibility of Prompt Engineering

    • Variable descriptions can be made longer, and specific domain expert roles can be assigned
    • For LLM errors (variable interpretation mistakes, ambiguity, etc.), additional query-user confirmation loops are also designed

"For instance, when a data column was named 'sex,' the LLM misinterpreted it as 'sexual activity' rather than 'gender' -- interpretation errors like this actually happen. Structures for making such ambiguities explicit are also an important point in LLM utilization."


5. Community Discussion and Future Directions

The current state of pywhy-llm as a highly experimental early stage is emphasized.

"There are errors, and many improvements are needed. Some advanced implementation versions still have bugs remaining, but the interface was designed to be as simple as possible so anyone can easily experiment and contribute."

Various community opinions were exchanged during the Q&A:

  • Cyclic causal relationships (feedback loops)
    • Full support for temporal data and cyclic relationships is urgently needed
  • Integration with various LLM frameworks including Python/OpenAI/LangChain
    • "Rather than limiting to a specific library, a structure that easily extends and connects different backends (e.g., LangChain, babyAGI, AutoGen, etc.) would be more useful"
  • Automatic graph parsing, automatic result report generation, various field-specific template applicability
    • Discussion of ways to smoothly support actual user workflows

"We welcome many contributions, opinions, and experimental code going forward. Since this is a rapidly evolving field, the speed of trying methods first and replacing them with better approaches later is what matters!"


Closing

At the end of the video, it is repeatedly emphasized that pywhy-llm is still an experimental library with low maturity, but the great potential of LLM-based causal inference automation is highlighted alongside an active invitation for community-driven participation. The schedule for biweekly continuing seminars and the latest development news will be shared through PyWhy Discord and GitHub, concluding the session.

"We look forward to your experiments, feedback, and code contributions! We hope for diverse collaboration and innovation beyond expectations. See you every two weeks!"


Key Keywords:

  • Causal inference
  • Importance of domain knowledge
  • Causal inference support through LLMs (Large Language Models)
  • pywhy-llm library
  • Experimental protocols, workflow automation
  • Community contribution and growth potential

Final Thoughts

pywhy-llm demonstrates a new experimental approach that can dramatically alleviate the chronic challenges of causal inference (domain knowledge acquisition, expert collaboration burden) by leveraging LLMs. While maturity is low, its open architecture and practical examples are impressive, and the video strongly emphasizes that active experimentation and contribution from the community is more important than ever.

Related writing

HarvestEngineering Leadership · AIEnglish

Why Agent-Era Skill Standardization Changes Everything

A walkthrough of Skills as org-wide AI infrastructure: four ecosystem shifts, how specialist stacks and orchestrators work, practical authoring rules (description, one-line gotcha, reasoning), agent-first design, three-tier ops, community repos—and why perfection, depth, and stamina still compound.

Mar 31, 2026Read more
HarvestData & Decision-MakingEnglish

From VC Judge to Builder: A Ralphathon Survival Story

A Kakao Ventures investor joins Ralphathon as a sponsored participant, fails without knowing how ralph loops work, learns from mentors that success needs concrete requirements and exit tests, ships with a cop-and-robber verifier—and reflects on cognitive load, GitHub, and investing in the AI era.

Mar 30, 2026Read more
HarvestEngineering Leadership · AIEnglish

AX Roadmap That Leads to Results: Connecting Individual Efficiency to Organizational Productivity

This webinar by Flex team's CCPO examines why 'using more AI doesn't automatically improve organizational outcomes' through structural analysis. Drawing from real experiments and failures in measurement, sequencing, and organizational adoption, it presents an AX design strategy focused on solving bottlenecks from the last mile - SSOT, evaluation environments, validation, and access control - concluding that changing bottlenecks, verification, decision-making, and collaboration structures matters more than increasing output volume.

Mar 28, 2026Read more