Li, Baoxin BL

Perceiving, Planning, Acting, and Self-Explaining: A Cognitive Quartet with Four Neural Networks

Description

Learning to accomplish complex tasks may require a tight coupling among different levels of cognitive functions or components, like perception, acting, planning, and self-explaining. One may need a coupling between perception and acting components to make decisions automatically especially in emergent situations. One may need collaboration between perception and planning components to go with optimal plans in the long run while also drives task-oriented perception. One may also need self-explaining components to monitor and improve the overall learning. In my research, I explore how different cognitive functions or components at different levels, modeled by Deep Neural Networks, can learn and adapt simultaneously. The first question that I address is: Can an intelligent agent leverage recognized plans or human demonstrations to improve its perception that may allow better acting? To answer this question, I explore novel ways to learn to couple perception-acting or perception-planning. As a cornerstone, I will explore how to learn shallow domain models for planning. Apart from these, more advanced cognitive learning agents may also be reflective of what they have experienced so far, either from themselves or from observing others. Likewise, humans may also frequently monitor their learning and draw lessons from failures and others' successes. To this end, I explore the possibility of motivating cognitive agents to learn how to self-explain experiences, accomplishments, and failures, to gain useful insights. By internally making sense of the past experiences, an agent could have its learning of other cognitive functions guided and improved.

Date Created

2022

Agent

Author (aut): Zha, Yantian
Thesis advisor (ths): Kambhampati, Subbarao SK
Committee member: Li, Baoxin BL
Committee member: Srivastava, Siddharth SS
Committee member: Wang, Jianjun JW
Publisher (pbl): Arizona State University

Exploring Deep Learning for Video Understanding

Description

Video analysis and understanding have obtained more and more attention in recent years. The research community also has devoted considerable effort and made progress in many related visual tasks, like video action/event recognition, thumbnail frame or video index retrieval, and zero-shot learning. The way to find good representative features of videos is an important objective for these visual tasks.

Thanks to the success of deep neural networks in recent vision tasks, it is natural to take the deep learning methods into consideration for better extraction of a global representation of the images and videos. In general, Convolutional Neural Network (CNN) is utilized for obtaining the spatial information, and Recurrent Neural Network (RNN) is leveraged for capturing the temporal information.

This dissertation provides a perspective of the challenging problems in different kinds of videos which may require different solutions. Therefore, several novel deep learning-based approaches of obtaining representative features are outlined for different visual tasks like zero-shot learning, video retrieval, and video event recognition in this dissertation. To better understand and obtained the video spatial and temporal information, Convolutional Neural Network and Recurrent Neural Network are jointly utilized in most approaches. And different experiments are conducted to present the importance and effectiveness of good representative features for obtaining a better knowledge of video clips in the computer vision field. This dissertation also concludes a discussion with possible future works of obtaining better representative features of more challenging video clips.

Date Created

2020

Agent

Author (aut): Li, Yikang
Thesis advisor (ths): Li, Baoxin BL
Committee member: Karam, Lina LK
Committee member: LiKamWa, Robert RL
Committee member: Yang, Yezhou YY
Publisher (pbl): Arizona State University