Pre-training and self-supervised learning for speech-based mental health assessment
dc.contributor.author | Dumpala, Sri Harsha | |
dc.contributor.copyright-release | No | |
dc.contributor.degree | Doctor of Philosophy | |
dc.contributor.department | Faculty of Computer Science | |
dc.contributor.ethics-approval | Not Applicable | |
dc.contributor.external-examiner | Dr. Theodora Chaspari | |
dc.contributor.manuscripts | No | |
dc.contributor.thesis-reader | Dr. Rudolf Uher | |
dc.contributor.thesis-reader | Dr. Frank Rudzicz | |
dc.contributor.thesis-supervisor | Dr. Sageev Oore | |
dc.date.accessioned | 2025-05-09T14:00:33Z | |
dc.date.available | 2025-05-09T14:00:33Z | |
dc.date.defence | 2025-05-05 | |
dc.date.issued | 2025-05-08 | |
dc.description | In this thesis, speech-based self-supervised learning models are employed to detect depression, predict depressive symptoms, and enhance the robustness of depression assessment systems through test-time training. | |
dc.description.abstract | Major depressive disorder (MDD), commonly known as depression, is a leading cause of disability, absenteeism, and premature death. Automatic depression assessment from speech is a vital step towards improving the diagnosis and treatment of this condition. While previous research has explored conventional acoustic features for speech-based depression assessment, these methods have not yet achieved clinical-level performance, highlighting the need for further advancements. A significant challenge is the non-availability of large training datasets required to train deep learning models from scratch for automated depression assessment. To address these issues, this thesis proposes the use of self-supervised learning (SSL) models based on speech to enhance the performance of automatic depression assessment systems. The pre-training objective function of SSL models determines the types of information encoded, such as semantic, speaker, and prosodic features. I first demonstrate that combining SSL models, which capture different aspects of speech—both local and global information—leads to improved performance in detecting depression. Additionally, I show that SSL-based speech embeddings are more effective at identifying specific symptoms of depression than traditional speech features. Furthermore, I compare various SSL pre-trained models to identify which aspects of speech contribute most to the detection of different symptoms. Finally, I extend test-time training (TTT) for depression detection to improve model robustness under naturally occurring covariate (distributional) shifts. This work underscores the potential of SSL techniques in developing more accurate and resilient models for depression assessment, thereby fostering further research into automated mental health evaluation. | |
dc.identifier.uri | https://hdl.handle.net/10222/85124 | |
dc.language.iso | en | |
dc.subject | Mental health assessment | |
dc.subject | Self-supervised learning | |
dc.subject | Speech processing | |
dc.subject | Pre-training | |
dc.subject | Depression | |
dc.subject | Test-time training | |
dc.subject | Depressive symptoms | |
dc.title | Pre-training and self-supervised learning for speech-based mental health assessment |