UTILIZING MACHINE LEARNING TO DETECT TOR TRAFFIC: A REALISTIC DATASET AND A PRELIMINARY ANALYSIS
| dc.contributor.author | Sadik, Md Rafiqul Islam | |
| dc.contributor.copyright-release | Not Applicable | |
| dc.contributor.degree | Master of Computer Science | |
| dc.contributor.department | Faculty of Computer Science | |
| dc.contributor.ethics-approval | Not Applicable | |
| dc.contributor.external-examiner | Dr. Xichen Zhang | |
| dc.contributor.manuscripts | Not Applicable | |
| dc.contributor.thesis-reader | Dr. Samer Lahoud | |
| dc.contributor.thesis-supervisor | Dr. Qiang Ye | |
| dc.date.accessioned | 2025-12-16T19:35:47Z | |
| dc.date.available | 2025-12-16T19:35:47Z | |
| dc.date.defence | 2025-12-02 | |
| dc.date.issued | 2025-12-15 | |
| dc.description.abstract | With the increasing use of anonymization technologies such as the Tor network, the ability to accurately differentiate Tor traffic from conventional Internet traffic has become an important challenge for network analysis and security monitoring.This thesis presents a fully controlled and reproducible framework for generating realistic Tor and Non-Tor traffic datasets to support the evaluation of encrypted traffic detection techniques. The framework integrates a Debian workstation, a Whonix gateway for Tor routing, and a noise-free AWS-based web server, combined with Selenium-driven automation to execute identical user activities over both Tor-based and Non-Tor-based network paths.Using this environment, a comprehensive dataset was generated across six application categories: web browsing, video streaming, file transfer, instant messaging, voice over IP, and video conferencing. A set of six machine-learning models-Decision Tree, Random Forest, XGBoost, Multilayer Perceptron (MLP), Convolutional Neural Network (CNN), and Recurrent Neural Network (RNN) was evaluated on the generated dataset. Experimental results demonstrate that traditional treebased models, particularly Random Forest and XGBoost, consistently outperform deep-learning approaches, achieving high detection accuracy in distinguishing Tor from Non-Tor network flows across all traffic types.These findings highlight both the effectiveness of classical machinelearning approaches and the importance of realistic dataset generation in advancing encrypted traffic classification research. | |
| dc.identifier.uri | https://hdl.handle.net/10222/85575 | |
| dc.language.iso | en | |
| dc.subject | Tor Traffic Detection | |
| dc.subject | Machine Learning | |
| dc.subject | Dataset Privacy | |
| dc.subject | Selenium | |
| dc.subject | Whonix Gateway | |
| dc.subject | Scapy | |
| dc.title | UTILIZING MACHINE LEARNING TO DETECT TOR TRAFFIC: A REALISTIC DATASET AND A PRELIMINARY ANALYSIS |
