INTERPRETING THE EFFECT OF QUANTIZATION ON LLMS

Singh, Manpreet

INTERPRETING THE EFFECT OF QUANTIZATION ON LLMS

dc.contributor.author	Singh, Manpreet
dc.contributor.copyright-release	Not Applicable
dc.contributor.degree	Master of Computer Science
dc.contributor.department	Faculty of Computer Science
dc.contributor.ethics-approval	Not Applicable
dc.contributor.external-examiner	n/a
dc.contributor.manuscripts	Not Applicable
dc.contributor.thesis-reader	Dr. Frank Rudzicz
dc.contributor.thesis-reader	Dr. Evangelos Milios
dc.contributor.thesis-supervisor	Dr. Hassan Sajjad
dc.date.accessioned	2024-12-16T15:23:29Z
dc.date.available	2024-12-16T15:23:29Z
dc.date.defence	2024-12-05
dc.date.issued	2024-12-12
dc.description.abstract	Recent advancements in large language models (LLMs) have led to unprecedented model sizes, creating challenges in deployment for resource-constrained environments. Quantization offers a promising solution to this challenge by reducing weight precision, thereby decreasing memory footprint and computational requirements while potentially maintaining model performance. However, it is crucial to understand how quantization affects their internal representations and overall behavior for the reliable deployment of quantized LLMs. In this research, using various interpretation techniques, we explore the effects of quantization on model and neurons behavior. We investigate Phi-2 and Llama-2-7b models, employing 4-bit and 8-bit quantization, using the BoolQ and Jigsaw Toxicity datasets. Our findings reveal several important insights. First, 4-bit quantized models exhibit slightly better calibration than 8-bit and 16-bit models. Second, our analysis of neuron activations indicates that the number of dead neurons, i.e., those with activation values close to 0 across the dataset, remains consistent regardless of quantization. Regarding salient neurons, we observe that full-precision models have fewer contributing neurons overall. The effect of quantization on neuron redundancy varies across models. In Llama-2-7b, we observed minimal variation in neuron redundancy across quantization levels. In contrast, Phi-2 exhibited higher redundancy in its full-precision than its quantized counterparts. Finally, our investigation into human-level interpretation demonstrates that the learning pattern of salient neurons remains consistent under various quantization conditions. These findings suggest that quantization is a viable approach for the efficient and reliable deployment of LLMs in resource-constrained environments.
dc.identifier.uri	https://hdl.handle.net/10222/84788
dc.language.iso	en_US
dc.subject	Artificial Intelligence
dc.subject	Deep Learning
dc.subject	LLMS
dc.subject	Quantization
dc.title	INTERPRETING THE EFFECT OF QUANTIZATION ON LLMS

Files

Original bundle

Now showing 1 - 1 of 1

Name:: ManpreetSingh2024.pdf
Size:: 1.91 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.03 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Faculty of Graduate Studies Online Theses