A Comparative Study of Deep Learning Architectures for Multi-Horizon Behavioural Forecasting for Mobile Health
English summary
This paper benchmarks six deep learning architectures, two zero-shot foundation models (TimesFM), and statistical baselines for forecasting step counts, screen time, and sleep duration from wearable data. Using three public datasets with over 800 participants, the study evaluates performance across 1–8 day horizons. Among trained models, PatchTST leads with no significant differences among TCN, MLP, and Transformer. The foundation model TimesFM performs on par or better than trained models zero-shot, especially in low-data settings, while participant-level fine-tuning reduces RMSE by 16–60%, with sleep benefiting most. This is the first study to jointly compare deep learning, foundation models, and personalization for multi-horizon mobile health forecasting.
Chinese summary
本文针对可穿戴设备的行为时间序列(步数、屏幕使用时间、睡眠时长)预测,在三个公开数据集(超过800名参与者)上系统比较了六种深度学习架构、两种零样本基础模型(TimesFM)以及统计基线在1至8天预测窗口内的表现。主要发现:在训练模型中PatchTST表现最优,但TCN、MLP和Transformer之间无显著差异;基础模型TimesFM零样本性能持平或优于训练模型,尤其在少数据场景下;对参与者进行个体微调可使RMSE降低16%至60%,其中睡眠预测收益最大。这是首次联合评估深度学习、基础模型和个性化策略在多时间尺度移动健康行为预测中的研究。
Key points
Among trained models, PatchTST leads, but TCN, MLP, and Transformer show no significant performance difference.
在训练模型中PatchTST表现最优,但TCN、MLP和Transformer之间的性能差异不显著。
The foundation model TimesFM matches or exceeds trained models in zero-shot forecasting, especially in low-data regimes.
基础模型TimesFM在零样本预测中持平或优于训练模型,尤其在数据稀缺的情况下。
Participant-level fine-tuning reduces per-feature RMSE by 16–60%, with sleep forecasting improving the most and step counts the least.
参与者个体微调使每个特征的RMSE降低16%至60%,睡眠预测的改善最大,步数预测获益最小。
The study uses three public datasets with over 800 participants, forecasting step counts, screen time, and sleep duration across 1–8 day horizons.
研究使用三个公开数据集,覆盖超过800名参与者,预测1至8天窗口内的步数、屏幕使用时间和睡眠时长。
This is the first work to jointly evaluate deep learning, foundation models, and personalization for multi-horizon behavioral forecasting from wearables.
这是首个联合评估深度学习、基础模型和个性化策略在多时间尺度可穿戴行为预测中的研究。