'LifeClock' has been developed using modern long-term clinical data to comprehensively predict age, health, and disease risk. This study demonstrates biological age prediction spanning the entire lifespan from growth to aging, and shows that the gap between predicted age and actual age is closely linked to future disease risk. This suggests significant progress toward personalized health management and early intervention strategies in real clinical settings.


1. Research Background and Necessity

Aging is a highly complex process occurring at the molecular, cellular, and organ levels. These changes ultimately affect overall health and survival. The scientific community has long focused on the concept of 'Biological Age (BA).' BA shows that even at the same chronological age (CA), the degree of accumulated physical damage or functional decline varies by individual. In other words, two people of the same age may have a 'more aged heart' or a 'younger brain.'

"Biological age, unlike chronological age, is an indicator that shows how aging progresses differently for each person depending on genetic and environmental influences."

Initially, BA was measured at the molecular level through DNA methylation, gene expression, and similar methods. More recently, by integrating diverse clinical data including protein analysis, metabolomics, medical imaging, and electronic health records (EHR), more accurate and granular BA predictions have become possible. The discovery that different organs age at different rates has opened the door to individually evaluating the impact of lifestyle habits and medications.


2. The Emergence of 'LifeClock' and Study Design

This paper proposes 'LifeClock,' a new biological clock model. LifeClock utilizes a massive dataset of 24.63 million EHR records to estimate BA across the entire lifespan, from infancy to old age, and analyzes its association with disease risk.

EHRFormer model architecture diagram

  • To overcome challenges in EHR data such as variability, missing values, and inter-institutional differences, a Transformer-based 'EHRFormer' model was developed.
  • Based on 184 clinical indicators, digital patient representations suitable for all age groups were created, with specialized models trained separately as 'growth clock (under 18)' and 'adult aging clock (18 and over).'

"In practice, doctors only look at test values outside the normal range, but values within the normal range also contain important information about individual health status and long-term organ aging."


3. How the BA Prediction Model Works and Key Achievements

Tailored Clocks for Growth and Adulthood

EHRFormer analyzes clinical indicators at each visit to predict current and future health status. SHAP (Shapley additive explanations) was also applied to enhance model interpretability.

BA prediction and clinical indicator contributions

  • Growth period (under 18): Key indicators include AST (liver enzyme) decrease, creatinine increase, and total protein increase.
  • Adulthood (18 and over): Urea increase, albumin decrease, and red cell distribution width (RDW) increase play important roles.
  • The clinical indicators that matter most differed substantially between the two clocks.

High accuracy was also validated in an external cohort (UK Biobank), confirming the model's generalizability and reliability.


4. BA and Future Disease Risk Prediction: Cluster Analysis and Clinical Application

Patient Data Clustering and Risk Prediction

Using dimensionality reduction (PCA, UMAP) followed by Leiden Clustering, 64 patient clusters were created from EHR data. In each cluster, the 'accelerated aging (over-aged)' groups with large BA-CA differences clearly had higher disease risk.

Patient clusters and disease incidence risk

  • Example: Adult cluster 20 showed significantly higher incidence of renal failure and diabetes.
  • Pediatric cluster 8 had elevated precocious puberty risk, cluster 10 growth delay, and cluster 5 immune-related disease risk.
  • Detailed examination of each cluster's clinical profile reveals that inflammatory markers, cardiac markers, and other clinical features align well.

"The 'accelerated aging' group where BA exceeds CA shows continuously rising future disease incidence within their cluster, which can serve as a traffic light for disease-specific tailored management."


5. Refinement of the Prediction Model: Individual Disease Prediction Capability

EHRFormer goes beyond simple BA prediction to warn of disease onset through multiple approaches, including first diagnosis, future disease prediction, and 5-year/10-year predictions. By calculating risk levels for each disease and dividing into high-risk, medium-risk, and low-risk groups, clear risk differences were observed when plotting cumulative risk curves by age.

Disease risk prediction and cumulative risk by group

  • Excellent predictive power with AUC 0.8-0.98 for cardiovascular disease, stroke, diabetes, Parkinson's disease, and more.
  • Performance was superior to existing machine learning approaches (XGBoost, RNN) in heatmaps, cumulative curves, and other metrics.

"Thanks to this model, future growth disorder risk could be accurately predicted from EHR data alone even in infancy, and personalized risk management can become a reality for middle-aged and older adults as well."


6. Discussion: Clinical and Scientific Significance and Limitations

The strength of this study is its integration of large-scale, long-term, multi-institutional clinical data into a deep learning-based clock, enabling individual tracking and prediction of not only BA but also actual disease risk 'trajectories.'

  • In pediatric patients, clusters were confirmed to align with clear physiological tendencies such as growth disorders and precocious puberty.
  • In adults, renal and cardiac abnormalities, immune system diseases, and other conditions aligned with cluster characteristics and clinical markers.

"EHRFormer can serve as a digital patient avatar that goes beyond simple classification and is applicable in actual clinical settings."

However, limitations include the observational nature of the study and data bias risks. In the long term, with the addition of wearables and environmental sensors, there is potential for development into a 'real-time evolving' health and aging management tool.

EHRFormer modeling the entire lifespan


Conclusion

The LifeClock study presents the realization of an individualized biological clock spanning growth to aging across the entire lifespan, and through it proposes a new paradigm for health and disease risk management. It demonstrated powerful predictive capability that is sufficiently applicable in clinical settings using routine clinical data alone, and proved the potential for disease-specific tailored prevention and intervention strategies. As the transition to the digital health era continues, this approach is expected to expand across broader clinical domains.

"Don't just look at the numbers on your age. The era has arrived where you can predict and manage your true 'biological age' and health trajectory with routine checkup data alone."

Related writing

HarvestData & Decision-MakingEnglish

From VC Judge to Builder: A Ralphathon Survival Story

A Kakao Ventures investor joins Ralphathon as a sponsored participant, fails without knowing how ralph loops work, learns from mentors that success needs concrete requirements and exit tests, ships with a cop-and-robber verifier—and reflects on cognitive load, GitHub, and investing in the AI era.

Mar 30, 2026Read more
HarvestStartup · Engineering LeadershipEnglish

Fighting Cancer: How GitLab Founder Sid Sijbrandij Beat Cancer with Founder Mode

The remarkable story of GitLab founder Sid Sijbrandij who, after being diagnosed with a rare bone cancer in 2022, approached treatment in 'Founder Mode' - applying maximized diagnostics, developing 10+ personalized therapies, and running treatments in parallel to overcome the limitations of modern medicine and achieve remission, pointing toward the future of cancer treatment.

Mar 29, 2026Read more
HarvestData & Decision-Making · HealthEnglish

Comparison of Deep Learning Models for Non-Invasive Interstitial Glucose Prediction Using Wearable Sensors in Healthy Cohorts

This pilot study compares deep learning models for predicting blood glucose (interstitial glucose, IG) using only biosignals from wearable devices in healthy individuals. Without requiring food logs or exercise data, the BiLSTM model achieved the best prediction performance (RMSE 13.42 mg/dL) using only wrist-worn sensor data, opening possibilities for convenient non-invasive blood glucose monitoring in daily life.

Feb 2, 2026Read more