'LifeClock' has been developed using modern long-term clinical data to comprehensively predict age, health, and disease risk. This study demonstrates biological age prediction spanning the entire lifespan from growth to aging, and shows that the gap between predicted age and actual age is closely linked to future disease risk. This suggests significant progress toward personalized health management and early intervention strategies in real clinical settings.
1. Research Background and Necessity
Aging is a highly complex process occurring at the molecular, cellular, and organ levels. These changes ultimately affect overall health and survival. The scientific community has long focused on the concept of 'Biological Age (BA).' BA shows that even at the same chronological age (CA), the degree of accumulated physical damage or functional decline varies by individual. In other words, two people of the same age may have a 'more aged heart' or a 'younger brain.'
"Biological age, unlike chronological age, is an indicator that shows how aging progresses differently for each person depending on genetic and environmental influences."
Initially, BA was measured at the molecular level through DNA methylation, gene expression, and similar methods. More recently, by integrating diverse clinical data including protein analysis, metabolomics, medical imaging, and electronic health records (EHR), more accurate and granular BA predictions have become possible. The discovery that different organs age at different rates has opened the door to individually evaluating the impact of lifestyle habits and medications.
2. The Emergence of 'LifeClock' and Study Design
This paper proposes 'LifeClock,' a new biological clock model. LifeClock utilizes a massive dataset of 24.63 million EHR records to estimate BA across the entire lifespan, from infancy to old age, and analyzes its association with disease risk.

- To overcome challenges in EHR data such as variability, missing values, and inter-institutional differences, a Transformer-based 'EHRFormer' model was developed.
- Based on 184 clinical indicators, digital patient representations suitable for all age groups were created, with specialized models trained separately as 'growth clock (under 18)' and 'adult aging clock (18 and over).'
"In practice, doctors only look at test values outside the normal range, but values within the normal range also contain important information about individual health status and long-term organ aging."
3. How the BA Prediction Model Works and Key Achievements
Tailored Clocks for Growth and Adulthood
EHRFormer analyzes clinical indicators at each visit to predict current and future health status. SHAP (Shapley additive explanations) was also applied to enhance model interpretability.

- Growth period (under 18): Key indicators include AST (liver enzyme) decrease, creatinine increase, and total protein increase.
- Adulthood (18 and over): Urea increase, albumin decrease, and red cell distribution width (RDW) increase play important roles.
- The clinical indicators that matter most differed substantially between the two clocks.
High accuracy was also validated in an external cohort (UK Biobank), confirming the model's generalizability and reliability.
4. BA and Future Disease Risk Prediction: Cluster Analysis and Clinical Application
Patient Data Clustering and Risk Prediction
Using dimensionality reduction (PCA, UMAP) followed by Leiden Clustering, 64 patient clusters were created from EHR data. In each cluster, the 'accelerated aging (over-aged)' groups with large BA-CA differences clearly had higher disease risk.

- Example: Adult cluster 20 showed significantly higher incidence of renal failure and diabetes.
- Pediatric cluster 8 had elevated precocious puberty risk, cluster 10 growth delay, and cluster 5 immune-related disease risk.
- Detailed examination of each cluster's clinical profile reveals that inflammatory markers, cardiac markers, and other clinical features align well.
"The 'accelerated aging' group where BA exceeds CA shows continuously rising future disease incidence within their cluster, which can serve as a traffic light for disease-specific tailored management."
5. Refinement of the Prediction Model: Individual Disease Prediction Capability
EHRFormer goes beyond simple BA prediction to warn of disease onset through multiple approaches, including first diagnosis, future disease prediction, and 5-year/10-year predictions. By calculating risk levels for each disease and dividing into high-risk, medium-risk, and low-risk groups, clear risk differences were observed when plotting cumulative risk curves by age.

- Excellent predictive power with AUC 0.8-0.98 for cardiovascular disease, stroke, diabetes, Parkinson's disease, and more.
- Performance was superior to existing machine learning approaches (XGBoost, RNN) in heatmaps, cumulative curves, and other metrics.
"Thanks to this model, future growth disorder risk could be accurately predicted from EHR data alone even in infancy, and personalized risk management can become a reality for middle-aged and older adults as well."
6. Discussion: Clinical and Scientific Significance and Limitations
The strength of this study is its integration of large-scale, long-term, multi-institutional clinical data into a deep learning-based clock, enabling individual tracking and prediction of not only BA but also actual disease risk 'trajectories.'
- In pediatric patients, clusters were confirmed to align with clear physiological tendencies such as growth disorders and precocious puberty.
- In adults, renal and cardiac abnormalities, immune system diseases, and other conditions aligned with cluster characteristics and clinical markers.
"EHRFormer can serve as a digital patient avatar that goes beyond simple classification and is applicable in actual clinical settings."
However, limitations include the observational nature of the study and data bias risks. In the long term, with the addition of wearables and environmental sensors, there is potential for development into a 'real-time evolving' health and aging management tool.

Conclusion
The LifeClock study presents the realization of an individualized biological clock spanning growth to aging across the entire lifespan, and through it proposes a new paradigm for health and disease risk management. It demonstrated powerful predictive capability that is sufficiently applicable in clinical settings using routine clinical data alone, and proved the potential for disease-specific tailored prevention and intervention strategies. As the transition to the digital health era continues, this approach is expected to expand across broader clinical domains.
"Don't just look at the numbers on your age. The era has arrived where you can predict and manage your true 'biological age' and health trajectory with routine checkup data alone."
