Repository logo
 

INTERPRETING THE EFFECT OF QUANTIZATION ON LLMS

dc.contributor.authorSingh, Manpreet
dc.contributor.copyright-releaseNot Applicable
dc.contributor.degreeMaster of Computer Science
dc.contributor.departmentFaculty of Computer Science
dc.contributor.ethics-approvalNot Applicable
dc.contributor.external-examinern/a
dc.contributor.manuscriptsNot Applicable
dc.contributor.thesis-readerDr. Frank Rudzicz
dc.contributor.thesis-readerDr. Evangelos Milios
dc.contributor.thesis-supervisorDr. Hassan Sajjad
dc.date.accessioned2024-12-16T15:23:29Z
dc.date.available2024-12-16T15:23:29Z
dc.date.defence2024-12-05
dc.date.issued2024-12-12
dc.description.abstractRecent advancements in large language models (LLMs) have led to unprecedented model sizes, creating challenges in deployment for resource-constrained environments. Quantization offers a promising solution to this challenge by reducing weight precision, thereby decreasing memory footprint and computational requirements while potentially maintaining model performance. However, it is crucial to understand how quantization affects their internal representations and overall behavior for the reliable deployment of quantized LLMs. In this research, using various interpretation techniques, we explore the effects of quantization on model and neurons behavior. We investigate Phi-2 and Llama-2-7b models, employing 4-bit and 8-bit quantization, using the BoolQ and Jigsaw Toxicity datasets. Our findings reveal several important insights. First, 4-bit quantized models exhibit slightly better calibration than 8-bit and 16-bit models. Second, our analysis of neuron activations indicates that the number of dead neurons, i.e., those with activation values close to 0 across the dataset, remains consistent regardless of quantization. Regarding salient neurons, we observe that full-precision models have fewer contributing neurons overall. The effect of quantization on neuron redundancy varies across models. In Llama-2-7b, we observed minimal variation in neuron redundancy across quantization levels. In contrast, Phi-2 exhibited higher redundancy in its full-precision than its quantized counterparts. Finally, our investigation into human-level interpretation demonstrates that the learning pattern of salient neurons remains consistent under various quantization conditions. These findings suggest that quantization is a viable approach for the efficient and reliable deployment of LLMs in resource-constrained environments.
dc.identifier.urihttps://hdl.handle.net/10222/84788
dc.language.isoen_US
dc.subjectArtificial Intelligence
dc.subjectDeep Learning
dc.subjectLLMS
dc.subjectQuantization
dc.titleINTERPRETING THE EFFECT OF QUANTIZATION ON LLMS

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ManpreetSingh2024.pdf
Size:
1.91 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.03 KB
Format:
Item-specific license agreed upon to submission
Description: