A Generic Gesture Recognition Approach based on Visual Perception
Current developments of hardware devices have allowed the computer vision technologies to analyze complex human activities in real time. High quality computer algorithms for human activity interpretation are required by many emerging applications, such as patient behavior analysis, surveillance, gesture control video games, and other human computer interface systems. Despite great efforts that have been made in the past decades, it is still a challenging task to provide a generic gesture recognition solution that can facilitate the developments of different gesture-based applications. Human vision is able to perceive scenes continuously, recognize objects and grasp motion semantics effortlessly. Neuroscientists and psychologists have tried to understand and explain how exactly the visual system works. Some theories/hypotheses on visual perception such as the visual attention and the Gestalt Laws of perceptual organization (PO) have been established and shed some light on understanding fundamental mechanisms of human visual perception. In this dissertation, inspired by those visual attention models, we attempt to model and integrate important visual perception discoveries into a generic gesture recognition framework, which is the fundamental component of full-tier human activity understanding tasks. Our approach handles challenging tasks by: (1) organizing the complex visual information into a hierarchical structure including low-level feature, object (human body), and 4D spatiotemporal layers; 2) extracting bottom-up shape-based visual salience entities at each layer according to PO grouping laws; 3) building shape-based hierarchical salience maps in favor of high-level tasks for visual feature selection by manipulating attention conditions of the top-down knowledge about gestures and body structures; and 4) modeling gesture representations by a set of perceptual gesture salience entities (PGSEs) that provide qualitative gesture descriptions in 4D space for recognition tasks. Unlike other existing approaches, our gesture representation method encodes both extrinsic and intrinsic properties and reflects the way humans perceive the visual world so as to reduce the semantic gaps. Experimental results show our approach outperforms the others and has great potential in real-time applications.