How Deep Research Works - Mukund Sridhar & Aarush Selvan, Google DeepMind preview image

Video Overview

This video features Google DeepMind's Mukund Sridhar and Aarush Selvan explaining how Deep Research works, the challenges encountered during development, and future directions. Deep Research is a tool where a personalized research agent collects information and generates comprehensive reports on the user's behalf without requiring them to browse the web. This feature is offered through Google's AI platform Gemini and is designed to help users get deeper answers to complex questions.


The Origins of Deep Research

1. Why Was Deep Research Created?

  • Goal: To help people get smart quickly.
  • Among Gemini use cases, research and learning related queries ranked near the top, but existing chatbots had limitations, providing only surface-level answers to complex questions.
  • Example question: "How do I get a shot put scholarship?"
    • Existing chatbot answer: "Talk to a coach," "Check what distance you can throw," "Maintain your grades."
    • Problem: Users wanted specific information (e.g., required distances, GPA thresholds), but chatbots only provided general guidelines.

"What we wanted was not just a blueprint, but truly comprehensive and specific answers."

2. Deep Research's Approach

  • Removing constraints: Designed to remove computing resource and latency constraints, letting Gemini browse the web as much as needed.
  • Time limit: However, there's a practical constraint of delivering results within 5 minutes.

Development Challenges

1. Designing the Asynchronous Experience

  • Gemini was originally a synchronous (instant response) chatbot, but Deep Research needed to work asynchronously (time-consuming tasks).
  • Setting user expectations was important:
    • Deep Research is suited for complex research but not for simple questions (e.g., "What's the weather today?").

"Deep Research is not a simple chatbot; it's a personalized research agent for users."

2. Long Output and User Experience

  • Deep Research results can be thousands of words long. They needed to be designed for easy reading and interaction.
  • Solutions:
    • Research plan card: Lets users review and modify the plan before research begins.
    • Real-time progress display: Shows websites Gemini is exploring in real-time.
    • Artifact pin feature: Lets users pin specific information and ask related questions directly.

How Deep Research Works

1. Research Plan Generation

  • When a user inputs a question, Gemini first generates a research plan and presents it to the user.
  • Users can review and modify this plan.

    "Like a good analyst, Gemini doesn't jump straight into work -- it shows you the plan first."

2. Web Exploration and Information Collection

  • Gemini explores websites, collects information, and shows this to users in real-time.
  • Users can click on explored websites to verify content.
  • Some users experimentally tried having the system explore thousands of websites to increase coverage.

3. Comprehensive Report Generation

  • A comprehensive report is generated based on all collected information.
  • Reports are flexibly designed so users can change styles or add/remove sections.
  • Source transparency: All sources used are cited, and when exported to Google Docs, they're provided in citation format.

Technical Challenges

1. Stability of Long-Running Tasks

  • Deep Research interacts with multiple distributed services, requiring recovery mechanisms for intermediate failures.
  • State management and error recovery are critical.

2. Planning and Information Processing

  • The model must distinguish between parallelizable tasks and sequential tasks.
  • Partial information processing:
    • Example: Found D1 division shot put standards but needs to additionally find D2 and D3 division standards.
    • Combines information from multiple sources to generate complete answers.

"The model must plan next steps based on discovered information and fill in incomplete information."

3. Handling Web Noise

  • Information on the web is distributed and layouts vary widely, making exploration difficult.
  • A robust browsing mechanism is essential.

4. Context Management

  • When users ask follow-up questions or add new topics, context size increases dramatically.
  • Solutions:
    • Selectively store information from current and previous work.
    • Store older information as research notes for reference when needed.

The Future of Deep Research

1. Enhanced Expertise

  • Currently, Deep Research provides McKinsey analyst-level information, but in the future it could evolve to McKinsey partner level.
  • Moving beyond simple information collection toward extracting meaningful insights and patterns.

2. Expansion to Various Domains

  • Applicable to specialized fields like science, finance, and law.
  • Examples: Reading papers and formulating hypotheses, generating financial models.

3. Multimodal Feature Integration

  • Combining not just text-based research but also coding, data analysis, video generation, and other capabilities.
  • Example: Automatically performing statistical analysis and financial modeling during corporate due diligence.

"We hope Deep Research becomes more than just a tool -- a true companion for users."


Conclusion

Deep Research is an innovative tool designed to provide comprehensive and specific answers to complex questions. The Google DeepMind team aims to help users get smarter and faster through this feature. The future development of Deep Research is exciting to anticipate!

"We almost called this feature 'Gemini Deep Dive,' but we think the current name fits much better. Thank you!"