GramML: Leveraging Grammar Driven Machine Learning for Sign Comprehension and Gesture Sequence Compliance
Document
Description
The significance of visual gesture recognition is growing in our digital era, particularly in human-computer interactions (HCI) that utilize hand gestures. It plays a vital role in ubiquitous HCI applications like sign language recognition, monitoring hand hygiene practices, and gesture- based smart home interfaces. These applications often rely on supervised machine learning algorithms, trained on labeled data, to continuously recognize hand gestures. However, accurately segmenting static or dynamic gestures and reliably detecting hand gestures within a continuous stream remains challenging, especially when considering real-world testing scenarios. Challenges include background noise complexity, varying speeds of hand gestures, and co-articulation, which can hinder continuous hand gesture recognition.This dissertation presents a novel approach for enhancing cross-domain gesture recognition performance in deep learning architectures through the Grammar-Driven Machine Learning (GramML) framework. The focus is on meticulously identifying frames corresponding to specific gestures within continuous signing streams, based on key characteristics like hand morphology, spatial positioning, and dynamic movement patterns. The GramML method utilizes a predefined syntactic structure of tokens to capture spatial temporal features that closely align with the semantic meaning of individual hand gestures. The effectiveness of this approach is evaluated through an analysis of performance degradation in an Inflated 3D ConvNet (I3D) model under varying data distributions. Furthermore, the study underscores the importance of robust classification methodologies in practical scenarios, exemplified by the validation of gesture sequence compliance, such as hand-washing routines. By integrating Grammar-Driven Machine Learning (GramML) into deep learning architectures, this research aims to enhance the reliability, adaptability, and compliance of gesture recognition systems across diverse sign language contexts.