Repository logo
 

Pre-training and self-supervised learning for speech-based mental health assessment

dc.contributor.authorDumpala, Sri Harsha
dc.contributor.copyright-releaseNo
dc.contributor.degreeDoctor of Philosophy
dc.contributor.departmentFaculty of Computer Science
dc.contributor.ethics-approvalNot Applicable
dc.contributor.external-examinerDr. Theodora Chaspari
dc.contributor.manuscriptsNo
dc.contributor.thesis-readerDr. Rudolf Uher
dc.contributor.thesis-readerDr. Frank Rudzicz
dc.contributor.thesis-supervisorDr. Sageev Oore
dc.date.accessioned2025-05-09T14:00:33Z
dc.date.available2025-05-09T14:00:33Z
dc.date.defence2025-05-05
dc.date.issued2025-05-08
dc.descriptionIn this thesis, speech-based self-supervised learning models are employed to detect depression, predict depressive symptoms, and enhance the robustness of depression assessment systems through test-time training.
dc.description.abstractMajor depressive disorder (MDD), commonly known as depression, is a leading cause of disability, absenteeism, and premature death. Automatic depression assessment from speech is a vital step towards improving the diagnosis and treatment of this condition. While previous research has explored conventional acoustic features for speech-based depression assessment, these methods have not yet achieved clinical-level performance, highlighting the need for further advancements. A significant challenge is the non-availability of large training datasets required to train deep learning models from scratch for automated depression assessment. To address these issues, this thesis proposes the use of self-supervised learning (SSL) models based on speech to enhance the performance of automatic depression assessment systems. The pre-training objective function of SSL models determines the types of information encoded, such as semantic, speaker, and prosodic features. I first demonstrate that combining SSL models, which capture different aspects of speech—both local and global information—leads to improved performance in detecting depression. Additionally, I show that SSL-based speech embeddings are more effective at identifying specific symptoms of depression than traditional speech features. Furthermore, I compare various SSL pre-trained models to identify which aspects of speech contribute most to the detection of different symptoms. Finally, I extend test-time training (TTT) for depression detection to improve model robustness under naturally occurring covariate (distributional) shifts. This work underscores the potential of SSL techniques in developing more accurate and resilient models for depression assessment, thereby fostering further research into automated mental health evaluation.
dc.identifier.urihttps://hdl.handle.net/10222/85124
dc.language.isoen
dc.subjectMental health assessment
dc.subjectSelf-supervised learning
dc.subjectSpeech processing
dc.subjectPre-training
dc.subjectDepression
dc.subjectTest-time training
dc.subjectDepressive symptoms
dc.titlePre-training and self-supervised learning for speech-based mental health assessment

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SriHarshaDumpala2025.pdf
Size:
14.19 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.12 KB
Format:
Item-specific license agreed upon to submission
Description: