Automating Scientific Discovery: Dr. Jessica Rumbelow of Leap Labs

This video features Dr. Jessica Rumbelow, CEO of Leap Labs, discussing her company's "Discovery Engine" on the Lightning Pod podcast. The Discovery Engine leverages deep learning and interpretability to automatically uncover complex patterns and novel findings in scientific data that humans easily miss. Dr. Rumbelow provides deep insights into how this technology is revolutionizing scientific research, overcoming the limitations of traditional hypothesis-driven approaches, and ultimately increasing the speed and scale of scientific discovery dramatically. She particularly emphasizes the limitations of existing language models (LLMs) and their synergy with the Discovery Engine, pointing the way forward for future scientific research.

1. The Journey to Scientific Discovery: A Fascination with Interpretability

Dr. Rumbelow begins by explaining how her career led her to the field of scientific discovery. In academia, she built neural networks in the field of histology, developing various cancer diagnostic and immune cell classification models. During this process, she created a model that could identify specific immune cells in ways that human pathologists could neither understand nor reproduce.

She recalls:

"This was a really interesting piece of work. The model was able to identify certain kinds of immune cells that human pathologists couldn't understand and couldn't reproduce. Using just very cheap biopsy stains."

Here, rather than mere functional capability, she became curious about how the model was arriving at these results. The core question became: what signals was the model finding in the data, and what did it know that humans didn't?

"It wasn't the capability that was interesting -- it was how the model was doing it. What signal had the model found in the data that gave it this ability? What does it know that we don't?"

These questions sparked her deep interest in interpretability, leading her to view neural networks not merely as automation tools but as lenses for discovering hidden patterns. She realized that while humans are poor at finding patterns in data, neural networks are remarkably good at it.

2. The Birth of Leap Labs and the Evolution of the Discovery Engine

Initially, Leap Labs focused on building interpretability tools so that anyone could interpret their neural networks. This was useful for identifying model biases (e.g., whether a model was sexist or racist), predicting failure modes before deployment, and intervening to improve models in various ways.

However, about a year later, Dr. Rumbelow realized that scientific discovery was the most exciting application of interpretability. She explains:

"There are a lot of applications for interpretability, but there was this big idea of scientific discovery. You train a neural network to do something that we don't know how to do, then you interpret the neural network to learn something new. Nobody was doing this. So I thought, I have to do it."

Ultimately, Leap Labs developed its proprietary system, the "Discovery Engine." This engine takes arbitrary scientific data as input, automatically trains multiple neural networks, and uses Leap Labs' own interpretability methodology to systematically extract patterns the models have learned. The extracted patterns are compared against existing literature, ranked by novelty and prevalence within the data, and visualized in human-understandable forms.

"The Discovery Engine takes an arbitrary scientific dataset, automatically trains multiple neural networks, systematically extracts learned patterns using our interpretability methods -- that's the real secret sauce -- contextualizes them against existing literature, ranks them by novelty and prevalence in the data, and then makes them human-understandable. We've made numerous new scientific discoveries through this work."

3. Real-World Scientific Discoveries Through the Discovery Engine

Dr. Rumbelow illustrates the groundbreaking discoveries the Discovery Engine has enabled with concrete examples.

3.1. T-Cell Receptor Research: Discovering New Markers for Immunotherapy

The first case involves research on T-cell receptors. In this study, Leap Labs discovered important new markers for understanding and predicting which T cells respond to tumors and which do not.

"Identifying T-cell receptors is really important. We want to understand why certain T cells respond to tumors, why they become activated and help fight the tumor, and why some T cells don't."

Dr. Rumbelow emphasizes that the newly discovered markers were completely different from what existing researchers had expected, showcasing the strength of the Discovery Engine's hypothesis-free exploration. These findings can be highly useful in fields such as personalized immunotherapy.

3.2. Crop Root Architecture Research: Discovering Synergy Effects Contributing to Food Security

The second case involved collaborative research with a plant biologist to understand the factors determining crop root architecture. This is crucial for developing crops resistant to drought and flooding, directly tied to food security.

"Root architecture determines whether a crop will succeed or fail. So having levers like putting certain nutrients in the soil or planting certain mutations is incredibly important for food security in these regions."

The research team worked with a small dataset of about 700 samples and initially had low expectations. However, the Discovery Engine discovered a new combination beyond the patterns the plant biologist already knew -- a synergy effect between manganese content and a specific genotype. This combination was found to have an enormous impact on root architecture.

3.3. Meteorology Research: Discovering That a Fundamental Assumption Was Wrong 20% of the Time

One of the most impressive cases involved meteorological modeling research. Weather modeling underpins many fields including offshore wind farm site selection and hurricane path prediction. This modeling relies on a fundamental assumption called "surface layer theory," which assumes that various flow rates remain constant within approximately 200 meters of the ground or water surface.

Remarkably, the Discovery Engine found that this assumption was not true in about 20% of the data, particularly prominent in coastal regions.

"Amazingly, what the Discovery Engine found was that in about 20% of the samples we worked with, mainly in coastal areas, this assumption doesn't hold at all."

This finding held tremendous significance for the scientists at the National Center for Atmospheric Research (NCAR) who conducted the study. They had never even thought to question such a fundamental assumption. Dr. Rumbelow noted that even a few percentage points of improvement in weather modeling can be worth billions of dollars, emphasizing the economic and scientific importance of this discovery.

"This is a huge deal for them. They would never have found this on their own, and they wouldn't have even thought to question an assumption this fundamental. Improving weather modeling by even a few percentage points is said to be worth billions of dollars. So this is probably my favorite example."

4. Collaboration Between Claude and the Discovery Engine: LLM Limitations and Synergy

Dr. Rumbelow also shares interesting experimental results about the limitations of Large Language Models (LLMs) and their synergy with the Discovery Engine.

4.1. Limitations of Using Language Models (Claude) Alone

She points out that language models are fundamentally language models, not specifically suited for understanding arbitrary large numerical datasets. When she used Claude Opus 4.1 to analyze an open-source dataset for materials scientists looking to improve catalytic efficiency, Claude exhibited the following problems:

Hallucination: Fabricated information that didn't exist.
Overgeneralization: Drew overly broad conclusions.
Outlier fixation: Disproportionately focused on small portions of the data.

Dr. Rumbelow explains that because language models follow hypothesis-driven, path-dependent, iterative processes just like humans, this approach severely limits possible discoveries due to existing assumptions and biases.

"Language models are language models. They're not specifically suited for understanding arbitrary, large numerical datasets... They're still doing the same kind of thing humans do. Very path-dependent and hypothesis-driven... This brings too many assumptions and biases. We're only looking for what we already think will be there, which enormously limits possible discoveries."

She also expressed concern that scientific papers generated by language models could be filled with plausible but inaccurate content, undermining the reliability of scientific literature.

"You can't have them write papers. They're amazingly good at saying things that are plausible but not true. And when you're doing cutting-edge science, by definition there's no ground truth. So it's very hard to tell if the model is making things up."

4.2. Claude and the Discovery Engine Working Together

However, when Claude had access to the Discovery Engine, the situation changed completely. Dr. Rumbelow describes the results as "amazing."

"Claude is amazingly good at synthesis, at pulling things together, and has a really good sense of what matters to us -- what matters to humans. Giving them access to our data-driven discovery tools seems to be a real step change in the ability to contribute to cutting-edge science."

Language models excel at synthesis and pulling information together, with a good sense of what matters to humans. When the Discovery Engine's systematic, unbiased data analysis results are combined with these strengths of language models, it can bring a step change in the ability to contribute to cutting-edge science, according to her argument. She emphasizes the importance of tool-use models and agents, noting that tool use is essential in "verifiable domains" like science.

5. Discovery Engine Dashboard Demo: Concrete Compressive Strength Data Analysis

Dr. Rumbelow demonstrates the Discovery Engine dashboard using a Concrete Compressive Strength (CCS) dataset. Because this dataset is old and has been analyzed countless times, no new patterns were discovered, but it was well-suited for showing how the system works.

5.1. Dashboard Overview and Discovered Patterns

The dashboard includes complex visualization tools and shows a total of 17 patterns discovered in the bottom-right table. Of these, 2 were speculative and 15 were validated and confirmatory. "Validated" means that a specific subset of the data proves the pattern.

"These aren't speculative hypotheses. There's a real pattern here in the data, and we can prove it."

The left-side menu displays various features including the key characteristic CCS, the number of patterns discovered for each feature, and mutual information and correlation matrices.

5.2. The Importance of Discovering Compound Patterns

A particularly notable portion is Pattern #2. This pattern demonstrates the Discovery Engine's ability to find the kind of patterns humans struggle to detect -- complex interactions among multiple conditions.

Individual conditions: Shows the CCS distribution when individual conditions like ash-to-cement ratio, aggregate-to-cement ratio, and concrete age fall within specific ranges. Each has its own correlation (e.g., older age increases CCS).
Combinatorial effect: The rightmost violin plot showing the CCS distribution when all conditions are combined demonstrates a much stronger synergy effect than viewing individual conditions alone. Through this combination, average CCS increases significantly from 35 to 53.

"What's really hard to find as a human is exactly this combination of patterns, this combination of conditions. Put it all together and look -- the average concrete compressive strength went from 35 to 53."

The dashboard also clearly marks whether each pattern is a novel discovery or not, providing citation information for existing findings. Novel discoveries are marked with an exclamation point (!) so researchers can easily identify them.

6. Leap Labs' Future Plans and Roadmap

Dr. Rumbelow mentions several important points about Leap Labs' future plans.

Expanding tool-use models: The goal is to provide data analysis tools like the Discovery Engine to agents such as language models, dramatically enhancing the capabilities of scientific agents.
Multimodality expansion: While currently focused primarily on structured data, they plan to expand to multimodal datasets combining vision data and tabular data. This is possible because deep learning and interpretability allow models to learn and extract features from images without detailed labeling.
Launching industrial pilot programs: While currently engaged primarily in academic collaborations, they plan to partner with companies that have large R&D departments to apply the Discovery Engine in industrial settings. Leap Labs emphasizes that the tool is 100 times faster than manual analysis -- a speed improvement achieved by transitioning from hypothesis-driven iterative manual analysis (taking months) to automatically and systematically extracting all valuable meaning from data in just hours.

"We're moving from these iterative turnaround times that take months to automatically, systematically extracting all the valuable meaning from data in a matter of hours."

Conclusion: Leap Labs Opening the Future of Science

Leap Labs' Discovery Engine is changing the paradigm of scientific research by leveraging the core concept of AI interpretability. By letting the data speak for itself rather than relying on human intuition and bias in hypothesis-driven research, it enables the discovery of unknown patterns and innovative findings. This potential has already been proven across diverse fields including T-cell research, crop cultivation, and meteorological modeling.

When the powerful synthesis capabilities of language models are combined with the Discovery Engine's systematic pattern exploration abilities, the speed and depth of scientific discovery can expand beyond imagination. Dr. Rumbelow noted that Leap Labs targets "anyone with data" and plans to offer a free dashboard service for academia so that more researchers can use this powerful tool. Leap Labs' Discovery Engine represents an important step toward maximizing the efficiency of scientific research and illuminating more of humanity's unknown frontiers.

1. The Journey to Scientific Discovery: A Fascination with Interpretability

She recalls:

"This was a really interesting piece of work. The model was able to identify certain kinds of immune cells that human pathologists couldn't understand and couldn't reproduce. Using just very cheap biopsy stains."

"It wasn't the capability that was interesting -- it was how the model was doing it. What signal had the model found in the data that gave it this ability? What does it know that we don't?"

2. The Birth of Leap Labs and the Evolution of the Discovery Engine

However, about a year later, Dr. Rumbelow realized that scientific discovery was the most exciting application of interpretability. She explains:

"There are a lot of applications for interpretability, but there was this big idea of scientific discovery. You train a neural network to do something that we don't know how to do, then you interpret the neural network to learn something new. Nobody was doing this. So I thought, I have to do it."

"The Discovery Engine takes an arbitrary scientific dataset, automatically trains multiple neural networks, systematically extracts learned patterns using our interpretability methods -- that's the real secret sauce -- contextualizes them against existing literature, ranks them by novelty and prevalence in the data, and then makes them human-understandable. We've made numerous new scientific discoveries through this work."

3. Real-World Scientific Discoveries Through the Discovery Engine

Dr. Rumbelow illustrates the groundbreaking discoveries the Discovery Engine has enabled with concrete examples.

3.1. T-Cell Receptor Research: Discovering New Markers for Immunotherapy

The first case involves research on T-cell receptors. In this study, Leap Labs discovered important new markers for understanding and predicting which T cells respond to tumors and which do not.

"Identifying T-cell receptors is really important. We want to understand why certain T cells respond to tumors, why they become activated and help fight the tumor, and why some T cells don't."

3.2. Crop Root Architecture Research: Discovering Synergy Effects Contributing to Food Security

"Root architecture determines whether a crop will succeed or fail. So having levers like putting certain nutrients in the soil or planting certain mutations is incredibly important for food security in these regions."

3.3. Meteorology Research: Discovering That a Fundamental Assumption Was Wrong 20% of the Time

Remarkably, the Discovery Engine found that this assumption was not true in about 20% of the data, particularly prominent in coastal regions.

"Amazingly, what the Discovery Engine found was that in about 20% of the samples we worked with, mainly in coastal areas, this assumption doesn't hold at all."

"This is a huge deal for them. They would never have found this on their own, and they wouldn't have even thought to question an assumption this fundamental. Improving weather modeling by even a few percentage points is said to be worth billions of dollars. So this is probably my favorite example."

4. Collaboration Between Claude and the Discovery Engine: LLM Limitations and Synergy

Dr. Rumbelow also shares interesting experimental results about the limitations of Large Language Models (LLMs) and their synergy with the Discovery Engine.

4.1. Limitations of Using Language Models (Claude) Alone

Hallucination: Fabricated information that didn't exist.
Overgeneralization: Drew overly broad conclusions.
Outlier fixation: Disproportionately focused on small portions of the data.

"Language models are language models. They're not specifically suited for understanding arbitrary, large numerical datasets... They're still doing the same kind of thing humans do. Very path-dependent and hypothesis-driven... This brings too many assumptions and biases. We're only looking for what we already think will be there, which enormously limits possible discoveries."

She also expressed concern that scientific papers generated by language models could be filled with plausible but inaccurate content, undermining the reliability of scientific literature.

"You can't have them write papers. They're amazingly good at saying things that are plausible but not true. And when you're doing cutting-edge science, by definition there's no ground truth. So it's very hard to tell if the model is making things up."

4.2. Claude and the Discovery Engine Working Together

However, when Claude had access to the Discovery Engine, the situation changed completely. Dr. Rumbelow describes the results as "amazing."

"Claude is amazingly good at synthesis, at pulling things together, and has a really good sense of what matters to us -- what matters to humans. Giving them access to our data-driven discovery tools seems to be a real step change in the ability to contribute to cutting-edge science."

5. Discovery Engine Dashboard Demo: Concrete Compressive Strength Data Analysis

5.1. Dashboard Overview and Discovered Patterns

"These aren't speculative hypotheses. There's a real pattern here in the data, and we can prove it."

The left-side menu displays various features including the key characteristic CCS, the number of patterns discovered for each feature, and mutual information and correlation matrices.

5.2. The Importance of Discovering Compound Patterns

Individual conditions: Shows the CCS distribution when individual conditions like ash-to-cement ratio, aggregate-to-cement ratio, and concrete age fall within specific ranges. Each has its own correlation (e.g., older age increases CCS).
Combinatorial effect: The rightmost violin plot showing the CCS distribution when all conditions are combined demonstrates a much stronger synergy effect than viewing individual conditions alone. Through this combination, average CCS increases significantly from 35 to 53.

"What's really hard to find as a human is exactly this combination of patterns, this combination of conditions. Put it all together and look -- the average concrete compressive strength went from 35 to 53."

6. Leap Labs' Future Plans and Roadmap

Dr. Rumbelow mentions several important points about Leap Labs' future plans.

Expanding tool-use models: The goal is to provide data analysis tools like the Discovery Engine to agents such as language models, dramatically enhancing the capabilities of scientific agents.
Multimodality expansion: While currently focused primarily on structured data, they plan to expand to multimodal datasets combining vision data and tabular data. This is possible because deep learning and interpretability allow models to learn and extract features from images without detailed labeling.
Launching industrial pilot programs: While currently engaged primarily in academic collaborations, they plan to partner with companies that have large R&D departments to apply the Discovery Engine in industrial settings. Leap Labs emphasizes that the tool is 100 times faster than manual analysis -- a speed improvement achieved by transitioning from hypothesis-driven iterative manual analysis (taking months) to automatically and systematically extracting all valuable meaning from data in just hours.

"We're moving from these iterative turnaround times that take months to automatically, systematically extracting all the valuable meaning from data in a matter of hours."

1. The Journey to Scientific Discovery: A Fascination with Interpretability

2. The Birth of Leap Labs and the Evolution of the Discovery Engine

3. Real-World Scientific Discoveries Through the Discovery Engine

3.1. T-Cell Receptor Research: Discovering New Markers for Immunotherapy

3.2. Crop Root Architecture Research: Discovering Synergy Effects Contributing to Food Security

3.3. Meteorology Research: Discovering That a Fundamental Assumption Was Wrong 20% of the Time

4. Collaboration Between Claude and the Discovery Engine: LLM Limitations and Synergy

4.1. Limitations of Using Language Models (Claude) Alone

4.2. Claude and the Discovery Engine Working Together

5. Discovery Engine Dashboard Demo: Concrete Compressive Strength Data Analysis

5.1. Dashboard Overview and Discovered Patterns

5.2. The Importance of Discovering Compound Patterns

6. Leap Labs' Future Plans and Roadmap

Conclusion: Leap Labs Opening the Future of Science

Related writing

Inside YC's AI Playbook

MouseMapper Reveals Whole-Body Change

SensorLM: Giving Wearable Data Language

Reading

1. The Journey to Scientific Discovery: A Fascination with Interpretability

2. The Birth of Leap Labs and the Evolution of the Discovery Engine

3. Real-World Scientific Discoveries Through the Discovery Engine

3.1. T-Cell Receptor Research: Discovering New Markers for Immunotherapy

3.2. Crop Root Architecture Research: Discovering Synergy Effects Contributing to Food Security

3.3. Meteorology Research: Discovering That a Fundamental Assumption Was Wrong 20% of the Time

4. Collaboration Between Claude and the Discovery Engine: LLM Limitations and Synergy

4.1. Limitations of Using Language Models (Claude) Alone

4.2. Claude and the Discovery Engine Working Together

5. Discovery Engine Dashboard Demo: Concrete Compressive Strength Data Analysis

5.1. Dashboard Overview and Discovered Patterns

5.2. The Importance of Discovering Compound Patterns

6. Leap Labs' Future Plans and Roadmap

Conclusion: Leap Labs Opening the Future of Science

Related writing

Inside YC's AI Playbook

MouseMapper Reveals Whole-Body Change

SensorLM: Giving Wearable Data Language