Invariant human pose feature extraction for movement recognition and pose estimation

149977-Thumbnail Image.png
Description
Reliable extraction of human pose features that are invariant to view angle and body shape changes is critical for advancing human movement analysis. In this dissertation, the multifactor analysis techniques, including the multilinear analysis and the multifactor Gaussian process methods,

Reliable extraction of human pose features that are invariant to view angle and body shape changes is critical for advancing human movement analysis. In this dissertation, the multifactor analysis techniques, including the multilinear analysis and the multifactor Gaussian process methods, have been exploited to extract such invariant pose features from video data by decomposing various key contributing factors, such as pose, view angle, and body shape, in the generation of the image observations. Experimental results have shown that the resulting pose features extracted using the proposed methods exhibit excellent invariance properties to changes in view angles and body shapes. Furthermore, using the proposed invariant multifactor pose features, a suite of simple while effective algorithms have been developed to solve the movement recognition and pose estimation problems. Using these proposed algorithms, excellent human movement analysis results have been obtained, and most of them are superior to those obtained from state-of-the-art algorithms on the same testing datasets. Moreover, a number of key movement analysis challenges, including robust online gesture spotting and multi-camera gesture recognition, have also been addressed in this research. To this end, an online gesture spotting framework has been developed to automatically detect and learn non-gesture movement patterns to improve gesture localization and recognition from continuous data streams using a hidden Markov network. In addition, the optimal data fusion scheme has been investigated for multicamera gesture recognition, and the decision-level camera fusion scheme using the product rule has been found to be optimal for gesture recognition using multiple uncalibrated cameras. Furthermore, the challenge of optimal camera selection in multi-camera gesture recognition has also been tackled. A measure to quantify the complementary strength across cameras has been proposed. Experimental results obtained from a real-life gesture recognition dataset have shown that the optimal camera combinations identified according to the proposed complementary measure always lead to the best gesture recognition results.
Date Created
2011
Agent

Mining semantics from low-level features in multimedia computing

149922-Thumbnail Image.png
Description
Bridging semantic gap is one of the fundamental problems in multimedia computing and pattern recognition. The challenge of associating low-level signal with their high-level semantic interpretation is mainly due to the fact that semantics are often conveyed implicitly in a

Bridging semantic gap is one of the fundamental problems in multimedia computing and pattern recognition. The challenge of associating low-level signal with their high-level semantic interpretation is mainly due to the fact that semantics are often conveyed implicitly in a context, relying on interactions among multiple levels of concepts or low-level data entities. Also, additional domain knowledge may often be indispensable for uncovering the underlying semantics, but in most cases such domain knowledge is not readily available from the acquired media streams. Thus, making use of various types of contextual information and leveraging corresponding domain knowledge are vital for effectively associating high-level semantics with low-level signals with higher accuracies in multimedia computing problems. In this work, novel computational methods are explored and developed for incorporating contextual information/domain knowledge in different forms for multimedia computing and pattern recognition problems. Specifically, a novel Bayesian approach with statistical-sampling-based inference is proposed for incorporating a special type of domain knowledge, spatial prior for the underlying shapes; cross-modality correlations via Kernel Canonical Correlation Analysis is explored and the learnt space is then used for associating multimedia contents in different forms; model contextual information as a graph is leveraged for regulating interactions among high-level semantic concepts (e.g., category labels), low-level input signal (e.g., spatial/temporal structure). Four real-world applications, including visual-to-tactile face conversion, photo tag recommendation, wild web video classification and unconstrained consumer video summarization, are selected to demonstrate the effectiveness of the approaches. These applications range from classic research challenges to emerging tasks in multimedia computing. Results from experiments on large-scale real-world data with comparisons to other state-of-the-art methods and subjective evaluations with end users confirmed that the developed approaches exhibit salient advantages, suggesting that they are promising for leveraging contextual information/domain knowledge for a wide range of multimedia computing and pattern recognition problems.
Date Created
2011
Agent

Interactive laboratory for digital signal processing in iOS devices

149780-Thumbnail Image.png
Description
The demand for handheld portable computing in education, business and research has resulted in advanced mobile devices with powerful processors and large multi-touch screens. Such devices are capable of handling tasks of moderate computational complexity such as word processing, complex

The demand for handheld portable computing in education, business and research has resulted in advanced mobile devices with powerful processors and large multi-touch screens. Such devices are capable of handling tasks of moderate computational complexity such as word processing, complex Internet transactions, and even human motion analysis. Apple's iOS devices, including the iPhone, iPod touch and the latest in the family - the iPad, are among the well-known and widely used mobile devices today. Their advanced multi-touch interface and improved processing power can be exploited for engineering and STEM demonstrations. Moreover, these devices have become a part of everyday student life. Hence, the design of exciting mobile applications and software represents a great opportunity to build student interest and enthusiasm in science and engineering. This thesis presents the design and implementation of a portable interactive signal processing simulation software on the iOS platform. The iOS-based object-oriented application is called i-JDSP and is based on the award winning Java-DSP concept. It is implemented in Objective-C and C as a native Cocoa Touch application that can be run on any iOS device. i-JDSP offers basic signal processing simulation functions such as Fast Fourier Transform, filtering, spectral analysis on a compact and convenient graphical user interface and provides a very compelling multi-touch programming experience. Built-in modules also demonstrate concepts such as the Pole-Zero Placement. i-JDSP also incorporates sound capture and playback options that can be used in near real-time analysis of speech and audio signals. All simulations can be visually established by forming interactive block diagrams through multi-touch and drag-and-drop. Computations are performed on the mobile device when necessary, making the block diagram execution fast. Furthermore, the extensive support for user interactivity provides scope for improved learning. The results of i-JDSP assessment among senior undergraduate and first year graduate students revealed that the software created a significant positive impact and increased the students' interest and motivation and in understanding basic DSP concepts.
Date Created
2011
Agent

Mediated social interpersonal communication: evidence-based understanding of multimedia solutions for enriching social situational awareness

149621-Thumbnail Image.png
Description
Social situational awareness, or the attentiveness to one's social surroundings, including the people, their interactions and their behaviors is a complex sensory-cognitive-motor task that requires one to be engaged thoroughly in understanding their social interactions. These interactions are formed out

Social situational awareness, or the attentiveness to one's social surroundings, including the people, their interactions and their behaviors is a complex sensory-cognitive-motor task that requires one to be engaged thoroughly in understanding their social interactions. These interactions are formed out of the elements of human interpersonal communication including both verbal and non-verbal cues. While the verbal cues are instructive and delivered through speech, the non-verbal cues are mostly interpretive and requires the full attention of the participants to understand, comprehend and respond to them appropriately. Unfortunately certain situations are not conducive for a person to have complete access to their social surroundings, especially the non-verbal cues. For example, a person is who is blind or visually impaired may find that the non-verbal cues like smiling, head nod, eye contact, body gestures and facial expressions of their interaction partners are not accessible due to their sensory deprivation. The same could be said of people who are remotely engaged in a conversation and physically separated to have a visual access to one's body and facial mannerisms. This dissertation describes novel multimedia technologies to aid situations where it is necessary to mediate social situational information between interacting participants. As an example of the proposed system, an evidence-based model for understanding the accessibility problem faced by people who are blind or visually impaired is described in detail. From the derived model, a sleuth of sensing and delivery technologies that use state-of-the-art computer vision algorithms in combination with novel haptic interfaces are developed towards a) A Dyadic Interaction Assistant, capable of helping individuals who are blind to access important head and face based non-verbal communicative cues during one-on-one dyadic interactions, and b) A Group Interaction Assistant, capable of provide situational awareness about the interaction partners and their dynamics to a user who is blind, while also providing important social feedback about their own body mannerisms. The goal is to increase the effective social situational information that one has access to, with the conjuncture that a good awareness of one's social surroundings gives them the ability to understand and empathize with their interaction partners better. Extending the work from an important social interaction assistive technology, the need for enriched social situational awareness is everyday professional situations are also discussed, including, a) enriched remote interactions between physically separated interaction partners, and b) enriched communication between medical professionals during critical care procedures, towards enhanced patient safety. In the concluding remarks, this dissertation engages the readers into a science and technology policy discussion on the potential effect of a new technology like the social interaction assistant on the society. Discussing along the policy lines, social disability is highlighted as an important area that requires special attention from researchers and policy makers. Given that the proposed technology relies on wearable inconspicuous cameras, the discussion of privacy policies is extended to encompass newly evolving interpersonal interaction recorders, like the one presented in this dissertation.
Date Created
2011
Agent

Designing tools to increase group awareness in the work place

149461-Thumbnail Image.png
Description
This thesis investigates the role of activity visualization tools in increasing group awareness at the workspace. Today, electronic calendaring tools are widely used in the workplace. The primary function is to enable each person maintain a work schedule. They also

This thesis investigates the role of activity visualization tools in increasing group awareness at the workspace. Today, electronic calendaring tools are widely used in the workplace. The primary function is to enable each person maintain a work schedule. They also are used to schedule meetings and share work details when appropriate. However, a key limitation of current tools is that they do not enable people in the workplace to understand the activity of the group as a whole. A tool that increases group awareness would promote reflection; it would enable thoughtful engagement with one's co-workers. I have developed two tools: the first tool enables the worker to examine detailed task information of one's own tasks, within the context of his/her peers' anonymized task data. The second tool is a public display to promote group reflection. I have used an iterative design methodology to refine the tools. I developed ActivityStream desktop tool that enables users to examine the detailed information of their own activities and the aggregate information of other peers' activities. ActivityStream uses a client-server architecture. The server collected activity data from each user by parsing RSS feeds associated with their preferred online calendaring and task management tool, on a daily basis. The client software displays personalized aggregate data and user specific tasks, including task types. The client display visualizes the activity data at multiple time scales. The activity data for each user is represented though discrete blocks; interacting with the block will reveal task details. The activity of the rest of the group is anonymized and aggregated. ActivityStream visualizes the aggregated data via Bezier curves. I developed ActivityStream public display that shows a group people's activity levels change over time to promote group reflection. In particular, the public display shows the anonymized task activity data, over the course of one year. The public display visualizes data for each user using a Bezier curve. The display shows data from all users simultaneously. This representation enables users to reflect on the relationships across the group members, over the course of one year. The survey results revealed that users are more aware of their peers' activities in the workspace.
Date Created
2010
Agent

Multimodal movement sensing using motion capture and inertial sensors for mixed-reality rehabilitation

149371-Thumbnail Image.png
Description
This thesis presents a multi-modal motion tracking system for stroke patient rehabilitation. This system deploys two sensor modules: marker-based motion capture system and inertial measurement unit (IMU). The integrated system provides real-time measurement of the right arm and trunk movement,

This thesis presents a multi-modal motion tracking system for stroke patient rehabilitation. This system deploys two sensor modules: marker-based motion capture system and inertial measurement unit (IMU). The integrated system provides real-time measurement of the right arm and trunk movement, even in the presence of marker occlusion. The information from the two sensors is fused through quaternion-based recursive filters to promise robust detection of torso compensation (undesired body motion). Since this algorithm allows flexible sensor configurations, it presents a framework for fusing the IMU data and vision data that can adapt to various sensor selection scenarios. The proposed system consequently has the potential to improve both the robustness and flexibility of the sensing process. Through comparison between the complementary filter, the extended Kalman filter (EKF), the unscented Kalman filter (UKF) and the particle filter (PF), the experimental part evaluated the performance of the quaternion-based complementary filter for 10 sensor combination scenarios. Experimental results demonstrate the favorable performance of the proposed system in case of occlusion. Such investigation also provides valuable information for filtering algorithm and strategy selection in specific sensor applications.
Date Created
2010
Agent