Comprehending Software Bugs Leveraging Code Structures with Neural Language Models

Mahbub, Parvez

dc.contributor.author	Mahbub, Parvez
dc.date.accessioned	2023-08-30T14:32:10Z
dc.date.available	2023-08-30T14:32:10Z
dc.date.issued	2023-08-28
dc.identifier.uri	http://hdl.handle.net/10222/82871
dc.description.abstract	Software bugs claim ~50% of development time and cost the global economy billions of dollars every year. Unfortunately, despite the use of many software quality assurance (SQA) practices in software development (e.g., code review, continuous integration), defects may still exist in the official release of a software product. If software defects can be predicted at the line level, that can help the developers prioritize SQA efforts for the vulnerable areas of a codebase and thus achieve a high-quality software release. However, a defect prediction technique could be less helpful without any meaningful explanation of the defect. In this thesis, we propose and evaluate two novel techniques that support developers in identifying software defects at the line level and provide natural language explanations for those defects. In our first study, we propose – Bugsplorer – a novel deep-learning technique for line-level defect prediction. It leverages a hierarchical structure of transformer models to represent two types of code elements: code tokens and code lines. Our evaluation with five performance metrics shows that Bugsplorer can predict defective lines with 26-72% better accuracy than that of the state-of-the-art technique. It can also rank the first 20% defective lines within the top 1-3% vulnerable lines. In our second study, we propose Bugsplainer – a transformer-based generative model that generates natural language explanations for software bugs by leveraging structural information and buggy patterns from the source code. Our evaluation using three performance metrics shows that Bugsplainer can generate understandable and good explanations according to Google's standard and can outperform multiple baselines from the literature. We also conducted a developer study involving 20 participants where the explanations from Bugsplainer were found to be more accurate, more precise, more concise and more useful than the baselines. Given the empirical evidence, our techniques have the potential to significantly reduce the SQA costs.	en_US
dc.language.iso	en	en_US
dc.subject	software bug	en_US
dc.subject	bug explanation	en_US
dc.subject	software engineering	en_US
dc.subject	software maintenance	en_US
dc.subject	natural language processing	en_US
dc.subject	deep learning	en_US
dc.subject	transformer	en_US
dc.subject	defect prediction	en_US
dc.subject	software quality assurance	en_US
dc.title	Comprehending Software Bugs Leveraging Code Structures with Neural Language Models	en_US
dc.date.defence	2023-08-21
dc.contributor.department	Faculty of Computer Science	en_US
dc.contributor.degree	Master of Computer Science	en_US
dc.contributor.external-examiner	N/A	en_US
dc.contributor.graduate-coordinator	Michael McAllister	en_US
dc.contributor.thesis-reader	Dr. Srini Sampalli	en_US
dc.contributor.thesis-reader	Dr. Tushar Sharma	en_US
dc.contributor.thesis-supervisor	Masud Rahman	en_US
dc.contributor.ethics-approval	Received	en_US
dc.contributor.manuscripts	Yes	en_US
dc.contributor.copyright-release	Yes	en_US

Find Full text

Files in this item

Name:: ParvezMahbub2023.pdf
Size:: 17.92Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Faculty of Graduate Studies Online Theses

Show simple item record

Comprehending Software Bugs Leveraging Code Structures with Neural Language Models

Files in this item

This item appears in the following Collection(s)

Related items

DEPLOYMENT OF A 27.5 KHZ LINK USING NON-COHERENT SPACE TIME BLOCK CODED FREQUENCY SHIFT KEYING ﻿

Implementation of ChekOne Task Verification Software ﻿

The Myth of Free: The Hidden Costs of Open Source Software ﻿

DEPLOYMENT OF A 27.5 KHZ LINK USING NON-COHERENT SPACE TIME BLOCK CODED FREQUENCY SHIFT KEYING

Implementation of ChekOne Task Verification Software

The Myth of Free: The Hidden Costs of Open Source Software