Pre-training and self-supervised learning for speech-based mental health assessment

Dumpala, Sri Harsha

Pre-training and self-supervised learning for speech-based mental health assessment

dc.contributor.author	Dumpala, Sri Harsha
dc.contributor.copyright-release	No
dc.contributor.degree	Doctor of Philosophy
dc.contributor.department	Faculty of Computer Science
dc.contributor.ethics-approval	Not Applicable
dc.contributor.external-examiner	Dr. Theodora Chaspari
dc.contributor.manuscripts	No
dc.contributor.thesis-reader	Dr. Rudolf Uher
dc.contributor.thesis-reader	Dr. Frank Rudzicz
dc.contributor.thesis-supervisor	Dr. Sageev Oore
dc.date.accessioned	2025-05-09T14:00:33Z
dc.date.available	2025-05-09T14:00:33Z
dc.date.defence	2025-05-05
dc.date.issued	2025-05-08
dc.description	In this thesis, speech-based self-supervised learning models are employed to detect depression, predict depressive symptoms, and enhance the robustness of depression assessment systems through test-time training.
dc.description.abstract	Major depressive disorder (MDD), commonly known as depression, is a leading cause of disability, absenteeism, and premature death. Automatic depression assessment from speech is a vital step towards improving the diagnosis and treatment of this condition. While previous research has explored conventional acoustic features for speech-based depression assessment, these methods have not yet achieved clinical-level performance, highlighting the need for further advancements. A significant challenge is the non-availability of large training datasets required to train deep learning models from scratch for automated depression assessment. To address these issues, this thesis proposes the use of self-supervised learning (SSL) models based on speech to enhance the performance of automatic depression assessment systems. The pre-training objective function of SSL models determines the types of information encoded, such as semantic, speaker, and prosodic features. I first demonstrate that combining SSL models, which capture different aspects of speech—both local and global information—leads to improved performance in detecting depression. Additionally, I show that SSL-based speech embeddings are more effective at identifying specific symptoms of depression than traditional speech features. Furthermore, I compare various SSL pre-trained models to identify which aspects of speech contribute most to the detection of different symptoms. Finally, I extend test-time training (TTT) for depression detection to improve model robustness under naturally occurring covariate (distributional) shifts. This work underscores the potential of SSL techniques in developing more accurate and resilient models for depression assessment, thereby fostering further research into automated mental health evaluation.
dc.identifier.uri	https://hdl.handle.net/10222/85124
dc.language.iso	en
dc.subject	Mental health assessment
dc.subject	Self-supervised learning
dc.subject	Speech processing
dc.subject	Pre-training
dc.subject	Depression
dc.subject	Test-time training
dc.subject	Depressive symptoms
dc.title	Pre-training and self-supervised learning for speech-based mental health assessment

Files

Original bundle

Now showing 1 - 1 of 1

Name:: SriHarshaDumpala2025.pdf
Size:: 14.19 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.12 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Faculty of Graduate Studies Online Theses