Midjourney Full-Body Ultrasound: From Images to Outcomes

This piece offers a critical examination of Midjourney's full-body ultrasound concept. Despite the captivating images and the potential the technology promises, it carefully analyzes the medical evidence and limitations around whether full-body ultrasound can actually deliver meaningful improvements for patients in real clinical settings. In particular, it methodically works through the many problems that full-body ultrasound could face as a screening test — including overdiagnosis, false positives, and cost — and emphasizes the need for careful evaluation when adopting any new technology. ✨

1. The Appealing Vision of Midjourney Full-Body Ultrasound

The full-body ultrasound proposed by Midjourney is genuinely a wonderful idea. 😮 It reportedly reconstructs a 3D image of the entire body much like an MRI, and even enables AI-based segmentation and body-composition mapping. The promotional copy goes so far as to claim that this kind of early imaging could reduce many deaths and a great deal of medical cost. 🚀 The approach of collecting whole-body signals through a ring-shaped ultrasound scanner while submerged in water is scientifically interesting, and it could prove useful in certain clinical fields. But no matter how impressive the images it can produce, the question of whether this actually delivers real benefit to patients still remains.

2. The Rigorous Standards for Adopting New Technology

Being a new technology does not exempt anything from the general evidence-based principles of medicine. No matter how new, visually convincing, AI-driven, and convenient it may be, if its effectiveness as a screening intervention is not proven, then it is incomplete. The fact that a machine produces beautiful images does not, by itself, solve everything. 😉

A good segmentation model is not a diagnostic test. 좋은 분할 모델이 진단 검사는 아닙니다.

A diagnostic test is not a screening program. 진단 검사가 선별 프로그램은 아닙니다.

A screening program is not beneficial only because it finds more abnormalities. 선별 프로그램은 더 많은 이상 징후를 발견한다고 해서 유익한 것만은 아닙니다.

As this shows, the accuracy of a test is merely an intermediate step. Only when it actually changes the diagnosis, alters management, and leads from a newly detected disease spectrum to effective treatment that improves overall health can it have true value. 🤔

3. The Danger of Screening: The Trap of Low Prevalence and False Positives

One of the biggest problems inherent in whole-body screening is the issue of low prevalence and the resulting false positives. For example, suppose a disease occurs in 1% of an asymptomatic population. If you screen 10,000 people with a test that has 90% sensitivity and 95% specificity, you would identify 90 of the 100 actual patients, but among the 9,900 healthy people, 5% — that is, 495 people — would receive a false-positive result. 😱 As a result, out of 585 total positive results, only 90 are true positives, which means the Positive Predictive Value is a mere 15% or so.

Positive Predictive Value

This does not mean the test is of poor quality; rather, it is a statistical phenomenon that occurs when prevalence is very low. Because there are far more healthy people, even a small false-positive rate ends up accounting for a large share of all positive results. 😥

This problem intensifies even further when, as with a whole-body scan, you simultaneously test for multiple low-prevalence conditions across the thyroid, liver, kidneys, lymph nodes, and more. That is because each question brings its own underlying false-positive problem.

4. The Risk of Repeated Testing: Accumulating False Positives

The Midjourney medical vision also includes casual and frequent scanning. But repeated scans can change the harm profile. If we assume that a single scan carries a 5% probability of producing at least one false alarm in a healthy person, then under a simple independence assumption, scanning every month means that after 12 scans the probability of at least one false alarm reaches a striking 46%! 😲 After 60 monthly scans, this probability climbs to nearly 95%.

Cumulative chance of false alarm

Of course, the real situation would not follow this curve exactly, but it teaches us that a small per-scan problem can grow into a serious problem over the long term. In addition, repeated imaging can cause measurement noise to be mistaken for disease. A cyst measured at 9mm, then 10mm, then 9mm again may not actually be growing or shrinking — it could simply be measurement error. 📊

5. The Value and Pitfalls of Data: The Problem of Overdiagnosis

Medical data is useful when it improves decisions, but it can actually be harmful when it is uncertain, poorly calibrated, not actionable, or disconnected from evidence-based responses. Small anatomical variations can be treated as pathology, and risk markers can be mistaken for diagnoses. 🤦‍♀️

Overdiagnosis is a key concept. It refers to detecting real abnormalities that will never cause symptoms, harm, or death. They are not false positives, but classifying them as disease can harm the patient. More sensitive imaging technology and more frequent scanning raise the likelihood of discovering these "silent reservoirs."

An AI model may accurately outline the liver, muscles, vessel walls, or small lesions, but that does not by itself prove a diagnostic, prognostic, or screening benefit. 📉

6. The Hidden Pitfalls of AI-Based Imaging: A Variety of Biases

There are several biases to watch out for when evaluating AI-based imaging technology.

Spectrum bias: This occurs when you evaluate a test using obvious disease and obviously normal controls. That is different from detecting clinically important disease at an early stage.
Verification bias: This occurs when only positive scans are confirmed with MRI, CT, biopsy, and the like, while negative scans are overlooked, making sensitivity appear better than it really is.
Lead-time bias: A phenomenon in which the time of diagnosis is moved earlier but the time of death is not delayed, making post-diagnosis survival appear longer. The truth is that actual lifespan is unchanged! ⏳
Length-time bias: Screening preferentially finds slowly progressing disease, creating the illusion that screen-detected cases have a better prognosis because of their biological characteristics.

Accuracy can be misleading in rare diseases. For example, when prevalence is 1%, a system that judges everyone to be negative is 99% accurate but clinically useless. 🤷‍♀️

7. A Reasonable Evidence Pathway and a Cautious Approach

There is certainly potential for systems like Midjourney to become useful. They could help with body composition, anatomical mapping, specific diagnostic imaging, monitoring, or high-risk pathways. But broad, repeated scanning of asymptomatic, low-risk people is one of the hardest claims to support in medicine. 😥

A reasonable evidence pathway should proceed through the following stages.

Engineering and phantom studies: Evaluate image quality, spatial resolution, artifacts, repeatability, failure rates, and so on.
Anatomical validation: Compare measurements and segmentation against established methods.
Disease-specific diagnostic accuracy studies: Use representative populations, blinded interpretation, and reliable reference standards.

Only after that can screening studies in asymptomatic populations begin, at which point you must report positivity rates, false negatives/false positives, incidental findings, follow-up testing, complications, anxiety, evidence of overdiagnosis, and more.

Ultimately, clinical impact studies must determine whether the scan changes management in the right direction, and randomized controlled trials must evaluate real benefits — patient quality of life, mortality, cost, and so on — compared against usual care. 🧐

Conclusion

Beautiful images can start a scientific conversation, but they do not end the whole discussion. New technologies like full-body ultrasound hold enormous potential, but rigorous, multifaceted evaluation is essential to deliver real benefit to patients while minimizing harm. We must continually ask "what, with what accuracy, in whom, compared to what, leading to what action, with what benefit, and at what harm and cost." Only a cautious approach and thorough validation will open the way for technology to become genuine medical innovation. 💡

1. The Appealing Vision of Midjourney Full-Body Ultrasound

2. The Rigorous Standards for Adopting New Technology

A good segmentation model is not a diagnostic test. 좋은 분할 모델이 진단 검사는 아닙니다.

A diagnostic test is not a screening program. 진단 검사가 선별 프로그램은 아닙니다.

A screening program is not beneficial only because it finds more abnormalities. 선별 프로그램은 더 많은 이상 징후를 발견한다고 해서 유익한 것만은 아닙니다.

3. The Danger of Screening: The Trap of Low Prevalence and False Positives

Positive Predictive Value

4. The Risk of Repeated Testing: Accumulating False Positives

Cumulative chance of false alarm

5. The Value and Pitfalls of Data: The Problem of Overdiagnosis

An AI model may accurately outline the liver, muscles, vessel walls, or small lesions, but that does not by itself prove a diagnostic, prognostic, or screening benefit. 📉

6. The Hidden Pitfalls of AI-Based Imaging: A Variety of Biases

There are several biases to watch out for when evaluating AI-based imaging technology.

Spectrum bias: This occurs when you evaluate a test using obvious disease and obviously normal controls. That is different from detecting clinically important disease at an early stage.
Verification bias: This occurs when only positive scans are confirmed with MRI, CT, biopsy, and the like, while negative scans are overlooked, making sensitivity appear better than it really is.
Lead-time bias: A phenomenon in which the time of diagnosis is moved earlier but the time of death is not delayed, making post-diagnosis survival appear longer. The truth is that actual lifespan is unchanged! ⏳
Length-time bias: Screening preferentially finds slowly progressing disease, creating the illusion that screen-detected cases have a better prognosis because of their biological characteristics.

Accuracy can be misleading in rare diseases. For example, when prevalence is 1%, a system that judges everyone to be negative is 99% accurate but clinically useless. 🤷‍♀️

7. A Reasonable Evidence Pathway and a Cautious Approach

A reasonable evidence pathway should proceed through the following stages.

Engineering and phantom studies: Evaluate image quality, spatial resolution, artifacts, repeatability, failure rates, and so on.
Anatomical validation: Compare measurements and segmentation against established methods.
Disease-specific diagnostic accuracy studies: Use representative populations, blinded interpretation, and reliable reference standards.

1. The Appealing Vision of Midjourney Full-Body Ultrasound

2. The Rigorous Standards for Adopting New Technology

3. The Danger of Screening: The Trap of Low Prevalence and False Positives

4. The Risk of Repeated Testing: Accumulating False Positives

5. The Value and Pitfalls of Data: The Problem of Overdiagnosis

6. The Hidden Pitfalls of AI-Based Imaging: A Variety of Biases

7. A Reasonable Evidence Pathway and a Cautious Approach

Conclusion

Related writing

HiMe: A Self-Hosted Health Agent Platform

Geometry-Enhanced Portion Estimation for MLLMs

MouseMapper Reveals Whole-Body Change

Reading

1. The Appealing Vision of Midjourney Full-Body Ultrasound

2. The Rigorous Standards for Adopting New Technology

3. The Danger of Screening: The Trap of Low Prevalence and False Positives

4. The Risk of Repeated Testing: Accumulating False Positives

5. The Value and Pitfalls of Data: The Problem of Overdiagnosis

6. The Hidden Pitfalls of AI-Based Imaging: A Variety of Biases

7. A Reasonable Evidence Pathway and a Cautious Approach

Conclusion

Related writing

HiMe: A Self-Hosted Health Agent Platform

Geometry-Enhanced Portion Estimation for MLLMs

MouseMapper Reveals Whole-Body Change