INTERPRETING THE EFFECT OF QUANTIZATION ON LLMS

Singh, Manpreet

INTERPRETING THE EFFECT OF QUANTIZATION ON LLMS

Files

ManpreetSingh2024.pdf (1.91 MB)

Date

2024-12-12

Authors

Singh, Manpreet

Abstract

Recent advancements in large language models (LLMs) have led to unprecedented model sizes, creating challenges in deployment for resource-constrained environments. Quantization offers a promising solution to this challenge by reducing weight precision, thereby decreasing memory footprint and computational requirements while potentially maintaining model performance. However, it is crucial to understand how quantization affects their internal representations and overall behavior for the reliable deployment of quantized LLMs. In this research, using various interpretation techniques, we explore the effects of quantization on model and neurons behavior. We investigate Phi-2 and Llama-2-7b models, employing 4-bit and 8-bit quantization, using the BoolQ and Jigsaw Toxicity datasets. Our findings reveal several important insights. First, 4-bit quantized models exhibit slightly better calibration than 8-bit and 16-bit models. Second, our analysis of neuron activations indicates that the number of dead neurons, i.e., those with activation values close to 0 across the dataset, remains consistent regardless of quantization. Regarding salient neurons, we observe that full-precision models have fewer contributing neurons overall. The effect of quantization on neuron redundancy varies across models. In Llama-2-7b, we observed minimal variation in neuron redundancy across quantization levels. In contrast, Phi-2 exhibited higher redundancy in its full-precision than its quantized counterparts. Finally, our investigation into human-level interpretation demonstrates that the learning pattern of salient neurons remains consistent under various quantization conditions. These findings suggest that quantization is a viable approach for the efficient and reliable deployment of LLMs in resource-constrained environments.

Keywords

Artificial Intelligence, Deep Learning, LLMS, Quantization

URI

https://hdl.handle.net/10222/84788

Collections

Faculty of Graduate Studies Online Theses

Full item page

INTERPRETING THE EFFECT OF QUANTIZATION ON LLMS

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections