Repository logo
 

Comprehending Software Bugs Leveraging Code Structures with Neural Language Models

dc.contributor.authorMahbub, Parvez
dc.contributor.copyright-releaseYesen_US
dc.contributor.degreeMaster of Computer Scienceen_US
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.ethics-approvalReceiveden_US
dc.contributor.external-examinerN/Aen_US
dc.contributor.graduate-coordinatorMichael McAllisteren_US
dc.contributor.manuscriptsYesen_US
dc.contributor.thesis-readerDr. Srini Sampallien_US
dc.contributor.thesis-readerDr. Tushar Sharmaen_US
dc.contributor.thesis-supervisorMasud Rahmanen_US
dc.date.accessioned2023-08-30T14:32:10Z
dc.date.available2023-08-30T14:32:10Z
dc.date.defence2023-08-21
dc.date.issued2023-08-28
dc.description.abstractSoftware bugs claim ~50% of development time and cost the global economy billions of dollars every year. Unfortunately, despite the use of many software quality assurance (SQA) practices in software development (e.g., code review, continuous integration), defects may still exist in the official release of a software product. If software defects can be predicted at the line level, that can help the developers prioritize SQA efforts for the vulnerable areas of a codebase and thus achieve a high-quality software release. However, a defect prediction technique could be less helpful without any meaningful explanation of the defect. In this thesis, we propose and evaluate two novel techniques that support developers in identifying software defects at the line level and provide natural language explanations for those defects. In our first study, we propose – Bugsplorer – a novel deep-learning technique for line-level defect prediction. It leverages a hierarchical structure of transformer models to represent two types of code elements: code tokens and code lines. Our evaluation with five performance metrics shows that Bugsplorer can predict defective lines with 26-72% better accuracy than that of the state-of-the-art technique. It can also rank the first 20% defective lines within the top 1-3% vulnerable lines. In our second study, we propose Bugsplainer – a transformer-based generative model that generates natural language explanations for software bugs by leveraging structural information and buggy patterns from the source code. Our evaluation using three performance metrics shows that Bugsplainer can generate understandable and good explanations according to Google's standard and can outperform multiple baselines from the literature. We also conducted a developer study involving 20 participants where the explanations from Bugsplainer were found to be more accurate, more precise, more concise and more useful than the baselines. Given the empirical evidence, our techniques have the potential to significantly reduce the SQA costs.en_US
dc.identifier.urihttp://hdl.handle.net/10222/82871
dc.language.isoenen_US
dc.subjectsoftware bugen_US
dc.subjectbug explanationen_US
dc.subjectsoftware engineeringen_US
dc.subjectsoftware maintenanceen_US
dc.subjectnatural language processingen_US
dc.subjectdeep learningen_US
dc.subjecttransformeren_US
dc.subjectdefect predictionen_US
dc.subjectsoftware quality assuranceen_US
dc.titleComprehending Software Bugs Leveraging Code Structures with Neural Language Modelsen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ParvezMahbub2023.pdf
Size:
17.93 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: