Beyond Monocular Vision: Assessing LLaVA's Performance on an Augmented CLEVR-like Dataset with Binocular Images

Devesh, Sagar

Beyond Monocular Vision: Assessing LLaVA's Performance on an Augmented CLEVR-like Dataset with Binocular Images

Files

SagarDevesh2025.pdf (3.08 MB)

Date

2025-07-07

Authors

Devesh, Sagar

Abstract

This thesis investigates how binocular vision impacts the spatial reasoning capabilities of Large Language and Vision Assistant (LLaVA) models in visual question answering tasks. By developing BiCLEVR, an augmented CLEVR-like dataset featuring stereoscopic image pairs and expanded visual attributes, we systematically evaluate the effect of different visual inputs across varying model sizes. Our experiments compare two LLaVA variants (7B and 13B parameters) across three dataset configurations: standard CLEVR, monocular BiCLEVR, and binocular BiCLEVR. Results reveal a nuanced relationship between model capacity and the ability to leverage stereoscopic information. The larger model demonstrated significant performance improvements with binocular input, while the smaller model showed degraded performance, suggesting insufficient capacity to process the additional visual information effectively. Particularly notable were improvements in numerical comparison and counting tasks for the larger model, indicating that stereoscopic cues enhance object individuation abilities. These findings contribute to our understanding of how vision-language models process spatial information and provide a pathway toward more robust visual reasoning systems capable of understanding 3D relationships in complex environments.

Keywords

Visual Question Answering, Multimodal model, Stereo Vision

URI

https://hdl.handle.net/10222/85207

Collections

Faculty of Graduate Studies Online Theses

Full item page

Beyond Monocular Vision: Assessing LLaVA's Performance on an Augmented CLEVR-like Dataset with Binocular Images

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections