Data Exhaust in Voice Assistants: Analysis and Mitigation Approaches
| dc.contributor.author | Mousavi Diva, Mahsa | |
| dc.contributor.copyright-release | Not Applicable | |
| dc.contributor.degree | Master of Computer Science | |
| dc.contributor.department | Faculty of Computer Science | |
| dc.contributor.ethics-approval | Not Applicable | |
| dc.contributor.external-examiner | none | |
| dc.contributor.manuscripts | Not Applicable | |
| dc.contributor.thesis-reader | Dr. Nur Zincir-Heywood | |
| dc.contributor.thesis-reader | Dr. Darshana Upadhyay | |
| dc.contributor.thesis-supervisor | Dr. Srini Sampalli | |
| dc.date.accessioned | 2025-08-12T17:21:14Z | |
| dc.date.available | 2025-08-12T17:21:14Z | |
| dc.date.defence | 2025-08-07 | |
| dc.date.issued | 2025-08-11 | |
| dc.description.abstract | Voice assistants(VAs) such as Siri, Google Assistant, Cortana, and Alexa are increasingly integrated into smartphones, smart home devices, and Internet of Things (IoT) platforms. While offering convenience, these technologies generate significant data exhaust, consisting of background data captured during both active use and passive listening. This passive data generation often occurs without users’ awareness, raising critical privacy, data governance, and security concerns. Despite their ubiquity, a systematic understanding of how, when, and to what extent voice assistants transmit data in real-world settings remains limited. The objective of this thesis is to examine voice assistant privacy policies and network traffic, develop a mobile application to notify users of security risks, and propose mitigation methods. Firstly, we conducted a systematic survey of the privacy policies of four major VAs, focusing on data collection, retention, third-party sharing, transparency, and exploring mitigation methods to limit unnecessary data collection. Based on these findings, Google Assistant was selected for detailed analysis due to its deep integration with Google services and extensive data collection. We subsequently developed an Android application to analyze PCAP files and classify network traffic generated by voice assistants, particularly in background or passive modes. The application identifies active background services, extracts Domain Name System (DNS) queries, and detects unexpected third-party communications. A built-in risk assessment system categorizes background activity into high, medium, or low risk, providing users with clear, contextual explanations. We further performed technical traffic analysis using tools such as Wireshark, evaluating encryption patterns and traffic bursts to better understand behavioral signatures. Our findings confirm that voice assistants can transmit user-related data even without explicit interaction, often to external analytics and ad services. This thesis presents a hybrid framework to uncover hidden data behaviors in voice assistants and proposes mitigation strategies to reduce passive data leakage, enabling more privacy-aware and transparent smart environments. | |
| dc.identifier.uri | https://hdl.handle.net/10222/85298 | |
| dc.language.iso | en_US | |
| dc.subject | voice assistants | |
| dc.title | Data Exhaust in Voice Assistants: Analysis and Mitigation Approaches |
