This study constructed and publicly released a real-world dataset that continuously collected Heart Rate Variability (HRV) and sleep diaries over 4 weeks from healthy adults using wearable devices. The dataset enables multi-layered analysis of psychological states (depression, anxiety, insomnia, etc.), daily physical activity, and sleep patterns, and is expected to serve as a valuable resource for future wearable-based healthcare and AI-driven analytical research.


1. Background and Purpose of the Study

Wearable devices have recently attracted significant attention as tools for everyday monitoring of health conditions such as blood pressure, metabolism, and sleep. In particular, wrist-worn devices have shown promise not only for continuous observation of post-traumatic stress symptoms and blood pressure but also for mental health monitoring (depression and anxiety assessment).

"Sleep duration and regularity tracked by wearable devices show strong correlations with self-reported mood and depression scores." -- Cited from reference 4 of the paper

Heart Rate Variability (HRV) is a biomarker reflecting the balance of the autonomic nervous system, primarily extracted from wearable devices. It has been confirmed that people with insomnia or depression typically show insufficient cardiac slowing at night and lower HRV, which may be associated with cardiovascular risk.

Existing HRV research has largely relied on temporary clinical measurements (5 minutes to 24 hours) in hospital settings, limiting the ability to finely track variations in real daily life. The research team stated that the core goal was to construct and publicly release a dataset recording long-term HRV, sleep diaries, and psychological questionnaires concurrently in real-world (in-situ) settings, providing a broad analytical foundation for researchers across various fields.


2. Data Collection Design and Process

The study recruited 49 men and women aged 21-43 and had them wear a Samsung Galaxy Watch Active2 for 4 weeks. Clinical questionnaires (insomnia, depression, anxiety) were administered at the beginning, middle, and end of the study.

"Participants wore the smartwatch on their non-dominant wrist to collect heart rate, movement, and environmental sensor data during activities. They recorded bedtime, sleep onset, wake time, and nocturnal awakening times in daily sleep diaries. Clinical questionnaires were administered three times at two-week intervals."

Study design overview

Key Collection Items and Methods

  • Custom wearable app (Heart+): Collected movement (accelerometer, gyroscope), physiological (heart rate, PPG), and ambient light data at 0.1-second (10Hz) intervals
  • Sleep diaries: Daily entries for bedtime/sleep onset/wake time/WASO (wake after sleep onset)
  • Clinical questionnaires: ISI (insomnia), PHQ-9 (depression), GAD-7 (anxiety) administered 3 times
  • Data storage and transfer: Transferred to server whenever Wi-Fi was available, managed as individual files
  • Participant monitoring: Daily checks for data gaps with alerts, communication via group chats and messengers

"Individual warnings were sent if the smartwatch was not worn 3 or more times or for more than 2 hours per day."

Participants (Demographics and Lifestyle)

  • 49 participants, ages 21-43, roughly balanced by gender (25 women)
  • Office workers (35%), undergraduates (30%), graduate students (35%)
  • Most were non-smokers, drank alcohol once a week or less, and maintained regular routines
  • Exercise and caffeine consumption frequency were evenly distributed

Lifestyle distribution

  • Clinical questionnaire scores before and after: Trends in insomnia (ISI), depression (PHQ9), and anxiety (GAD7) score changes were observed

Clinical questionnaire distribution


3. Dataset Structure and Processing

The dataset systematically records and manages diverse types of information as follows:

  • Participant profiles and lifestyle habits (survey.csv)
  • Smartwatch sensor data (sensor_hrv.csv, sensor_hrv_filtered.csv)
    • Processed in 5-minute intervals; filtered versions by missing rate and noise scores included
  • Sleep diary data (sleep_diary.csv)
    • Self-reported data plus various quantified sleep metrics (sleep efficiency, etc.)
  • Raw sensor data (raw_data/ folder: PPG, heart rate, accelerometer, etc.)
  • Clinical questionnaire scores
  • README documentation with variable descriptions

"A total of 33,600 hours of data were collected, averaging 672 hours per person."

During data preprocessing, values outside sensor reference ranges were removed, non-wearing intervals were filtered both automatically and manually, and sleep diary errors (AM/PM, etc.) were manually inspected and corrected by the researchers.


4. Dataset Validation

A. Overall Data Trends and Reliability

  • Time distribution of data points shows the densest data collection during working hours (09:00-23:00)

5-minute interval data frequency

  • Average physical activity (step count, heart rate) increased during lunch and dinner hours, confirming typical daily patterns

Physical activity and heart rate by time of day

  • Individual daily activity and heart rate data are also available for comparison
    • Example: the same user's active day (min-max HR 50-150 bpm) vs. sedentary day (50-100 bpm)

Individual daily pattern differences

  • Correlations between sensor-derived HRV features
    • Strong positive correlations among time-domain HRV metrics; negative correlations for LF/HF, etc.

Sensor-HRV correlations

B. HRV Feature Distributions and Trends

  • HRV distributions by occupation and gender (e.g., SDNN, LF/HF)
    • Undergraduates showed higher SDNN (variability) than office workers -- consistent with existing research showing HRV decreases with age
    • Women generally showed lower HRV magnitudes -- consistent with prior large-scale meta-analyses

HRV distribution by occupation and gender

"SDNN ranges are consistent with existing research, and LF/HF averages fall within ranges similar to recent long-term measurements."

  • Although values may differ depending on measurement method (real-world setting, PPG vs. hospital ECG, etc.), they fall within standard ranges

C. Sleep Diary Reliability

  • The study's self-reported bedtime/wake time and other key sleep metrics are academically similar to or slightly earlier than existing research (wearable/sleep band studies)

5. Applications and Limitations

Potential Uses of the Dataset

  • Mental health prediction modeling:
    • Long-term continuous sensor + questionnaire data can be used for predicting insomnia/depression/anxiety
    • AI prediction using multivariable combinations of HRV, activity, and sleep data is possible
  • Sleep research:
    • Clustering analysis to identify specific sleep pattern groups and evaluate lifestyle/HRV differences
  • Daily-life HRV variation research:
    • Supports in-depth analysis of HRV changes by circadian rhythm, occupation, gender, and age group

Limitations of the Dataset

  • Sensor signal noise: Possible noise from varying wearing habits and non-wearing intervals
  • Some missing data: Gaps due to non-wearing/charging periods exist; additional interpolation and cleaning needed
  • Memory-based sleep diaries: Potential subjective errors (memory inaccuracies) due to self-report characteristics
  • Limited population: Primarily young university/research institute participants, centered on certain occupations/age groups in Korea
  • HRV calculation bias: Algorithm/parameter-specific value differences possible when using HeartPy

"HRV values may be missing in some 5-minute intervals depending on PPG signal quality. Using other algorithms (e.g., pyPPG, NeuroKit2) may produce different results."


6. Code and Dataset Access Information


Closing Thoughts

This publicly released dataset, based on long-term observations of heart rate variability (for psychological and physical health assessment) and sleep habits in real-life environments, is freely available for anyone to use. It is expected to open new horizons in wearable-based personalized health management, AI mental health analysis, and research on the relationship between sleep and physical activity -- spanning broad healthcare and biosignal AI research domains. Active utilization and further research based on this dataset as a scientific, empirical resource for healthy living and mental health recovery is strongly encouraged.

Related writing