Comparison of Deep Learning Models for Non-Invasive Interstitial Glucose Prediction Using Wearable Sensors in Healthy Cohorts preview image

This paper is a pilot study comparing deep learning models for predicting blood glucose (specifically interstitial glucose, IG) using only biosignals collected from wearable devices in healthy individuals. The researchers found that the BiLSTM model achieved the best prediction performance (RMSE 13.42 mg/dL) using purely sensor data, without requiring cumbersome additional data like food logs or exercise information. This opens possibilities for convenient non-invasive blood glucose monitoring technology in everyday life.


1. Introduction: Why Measure Blood Glucose Without Drawing Blood?

Measuring blood glucose is crucial for diabetes management and health care. However, current methods are quite painful or inconvenient. Finger-prick methods (FSGM) and continuous glucose monitors (CGMs) that implant needle sensors under the skin are both invasive and require periodic device replacement.

Researchers began focusing on methods to predict blood glucose through wearable devices like smartwatches without invasive procedures. Since raw sensor data alone cannot directly reveal blood glucose levels, machine learning (ML) and deep learning (DL) technologies that learn from data like the human brain became necessary.

Previous studies required significant additional information like meal logs and sleep data, which is quite burdensome for users to record every time. In contrast, this study explored the potential for healthy individuals to manage blood glucose more conveniently in daily life using a Feature Learning (FL) strategy that autonomously learns features from sensor data alone.

This study, in contrast, compares feature learning (FL) strategies that can potentially offer non-invasive glucose monitoring for healthy individuals by transitioning from invasive to non-invasive methods.

2. Methods: From Data Collection to Model Design

2.1 Data Collection Process

The researchers recruited healthy participants, excluding pregnant individuals or those with chronic conditions. After initially collecting data from 34 participants, they selected 5 (3 women, 2 men) for the secondary study. These 5 collected data over 10 days while living their normal daily lives, with no special dietary restrictions — just basic rules about exercise timing and meal intervals.

2.2 Sensors and Measured Metrics

Two key devices were used for data collection:

  • Abbott Freestyle Libre: Attached to the upper arm to measure actual interstitial glucose (IG) levels. (Serves as the answer key)
  • Empatica E4: A research-grade wearable worn on the wrist that collects various biosignals. (Serves as the question sheet)

Figure 1

The four key sensor data collected by Empatica E4:

  1. BVP (Blood Volume Pulse): Measures blood volume changes in vessels using an optical sensor.
  2. HR (Heart Rate): Estimates heart rate and heart rate variability (HRV) from BVP data, closely related to blood glucose changes.
  3. TEMP (Skin Temperature): Measures skin temperature that varies with metabolic activity or ambient environment.
  4. EDA (Electrodermal Activity): Measures skin electrical conductance that changes with stress or emotional responses.

2.3 Deep Learning Model Lineup

The researchers designed and competed four different deep learning models for blood glucose prediction:

  1. CNN (Convolutional Neural Network): Primarily used for image processing, here applied to extract features from time-series data.
  2. LTF (Lightweight Transformer): Uses 'attention mechanisms' to identify relationships between data points, optimized for wearable environments with reduced computational load.
  3. LSTM-A (LSTM with Attention): Adds attention functionality to LSTM, which excels at time-series data processing.
  4. BiLSTM (Bidirectional LSTM): Scans data both forward and backward in time to better capture context.

3. Experiments and Results Analysis

The researchers aligned collected sensor data at 4Hz, segmented it into 5-minute windows, and trained the models. Performance evaluation used RMSE (Root Mean Square Error) and CEGA (Clarke Error Grid Analysis). CEGA assesses how clinically safe predictions are — closer to Zone A means more accurate and safe.

3.1 Model Report Card

Results showed clear performance differences between models:

  • CNN: Showed the poorest performance with the highest RMSE of 18.54 mg/dL and 13.23% in the potentially dangerous Zone D.
  • LTF & LSTM-A: Achieved middle-range performance. Better than CNN thanks to attention mechanisms, but not the best.
  • BiLSTM (Winner!): Demonstrated the best prediction accuracy with RMSE of 13.42 mg/dL. In CEGA analysis, it had the highest combined safe Zones A and B, with only 3.01% in dangerous Zone D.

Figure 4

3.2 Which Sensor Combination Is Best? (Ablation Test)

The researchers conducted tests removing or combining sensors one by one. Results were interesting: using a single sensor alone yielded poor prediction performance. But when all sensors (EDA, TEMP, BVP, HR) were used together, they complemented each other's complex correlations and achieved the highest accuracy.

4. Discussion: What This Study Means

What is particularly impressive about this study is that it predicted blood glucose "without food logs." Previous studies required users to input when they ate and exercised to achieve good results. However, this study's BiLSTM model read blood glucose trends using only biosignals obtained from the wrist.

In contrast, our proposed IG model does not require food logs or exercise information and uses only wearable sensor data.

4.1 Why Did BiLSTM Win?

BiLSTM excelled at analyzing time-series data bidirectionally, catching blood glucose 'spike' zones where levels rise or fall sharply much better than other models. CNN matched average values but couldn't track extreme changes, and the Transformer (LTF) struggled to demonstrate its full potential in a data-scarce environment.

Figure 5

4.2 Limitations and Future Work

Of course, nothing is perfect. The dataset size was small, and hardware limitations of the training equipment required a short 5-minute prediction window. Additionally, since only healthy participants were studied, further validation is needed for predicting extreme blood glucose fluctuations in actual diabetes patients.


5. Conclusion

This study demonstrated that sensor data from smartwatches we wear daily can predict blood glucose changes in healthy individuals with reasonable accuracy. The combination of BiLSTM model and multi-sensor data proved most effective.

As this technology advances further, in 2026 or the near future, we may reach a day when instead of painful needle pricks or tedious food diaries, the watch on our wrist tells us: "Your blood glucose is a bit high right now — how about a walk?" The future of non-invasive blood glucose management feels much closer.

Related writing

Related writing