Towards Reliable Semantic Vision

187328-Thumbnail Image.png
Description
Models that learn from data are widely and rapidly being deployed today for real-world use, and have become an integral and embedded part of human lives. While these technological advances are exciting and impactful, such data-driven computer vision systems often

Models that learn from data are widely and rapidly being deployed today for real-world use, and have become an integral and embedded part of human lives. While these technological advances are exciting and impactful, such data-driven computer vision systems often fail in inscrutable ways. This dissertation seeks to study and improve the reliability of machine learning models from several perspectives including the development of robust training algorithms to mitigate the risks of such failures, construction of new datasets that provide a new perspective on capabilities of vision models, and the design of evaluation metrics for re-calibrating the perception of performance improvements. I will first address distribution shift in image classification with the following contributions: (1) two methods for improving the robustness of image classifiers to distribution shift by leveraging the classifier's failures into an adversarial data transformation pipeline guided by domain knowledge, (2) an interpolation-based technique for flagging out-of-distribution samples, and (3) an intriguing trade-off between distributional and adversarial robustness resulting from data modification strategies. I will then explore reliability considerations for \textit{semantic vision} models that learn from both visual and natural language data; I will discuss how logical and semantic sentence transformations affect the performance of vision--language models and my contributions towards developing knowledge-guided learning algorithms to mitigate these failures. Finally, I will describe the effort towards building and evaluating complex reasoning capabilities of vision--language models towards the long-term goal of robust and reliable computer vision models that can communicate, collaborate, and reason with humans.
Date Created
2023
Agent

Diversity Promoting Online Sampling for Streaming Video Summarization

Description
Video summarization is gaining popularity in the technological culture, where positioning the mouse pointer on top of a video results in a quick overview of what the video is about. The algorithm usually selects frames in a time sequence through

Video summarization is gaining popularity in the technological culture, where positioning the mouse pointer on top of a video results in a quick overview of what the video is about. The algorithm usually selects frames in a time sequence through systematic sampling. Invariably, there are other applications like video surveillance, web-based video surfing and video archival applications which can benefit from efficient and concise video summaries. In this project, we explored several clustering algorithms and how these can be combined and deconstructed to make summarization algorithm more efficient and relevant. We focused on two metrics to summarize: reducing error and redundancy in the summary. To reduce the error online k-means clustering algorithm was used; to reduce redundancy we applied two different methods: volume of convex hulls and the true diversity measure that is usually used in biological disciplines. The algorithm was efficient and computationally cost effective due to its online nature. The diversity maximization (or redundancy reduction) using technique of volume of convex hulls showed better results compared to other conventional methods on 50 different videos. For the true diversity measure, there has not been much work done on the nature of the measure in the context of video summarization. When we applied it, the algorithm stalled due to the true diversity saturating because of the inherent initialization present in the algorithm. We explored the nature of this measure to gain better understanding on how it can help to make summarization more intuitive and give the user a handle to customize the summary.
Date Created
2017-05
Agent

Statistical and dynamical modeling of Riemannian trajectories with application to human movement analysis

154471-Thumbnail Image.png
Description
The data explosion in the past decade is in part due to the widespread use of rich sensors that measure various physical phenomenon -- gyroscopes that measure orientation in phones and fitness devices, the Microsoft Kinect which measures depth information,

The data explosion in the past decade is in part due to the widespread use of rich sensors that measure various physical phenomenon -- gyroscopes that measure orientation in phones and fitness devices, the Microsoft Kinect which measures depth information, etc. A typical application requires inferring the underlying physical phenomenon from data, which is done using machine learning. A fundamental assumption in training models is that the data is Euclidean, i.e. the metric is the standard Euclidean distance governed by the L-2 norm. However in many cases this assumption is violated, when the data lies on non Euclidean spaces such as Riemannian manifolds. While the underlying geometry accounts for the non-linearity, accurate analysis of human activity also requires temporal information to be taken into account. Human movement has a natural interpretation as a trajectory on the underlying feature manifold, as it evolves smoothly in time. A commonly occurring theme in many emerging problems is the need to \emph{represent, compare, and manipulate} such trajectories in a manner that respects the geometric constraints. This dissertation is a comprehensive treatise on modeling Riemannian trajectories to understand and exploit their statistical and dynamical properties. Such properties allow us to formulate novel representations for Riemannian trajectories. For example, the physical constraints on human movement are rarely considered, which results in an unnecessarily large space of features, making search, classification and other applications more complicated. Exploiting statistical properties can help us understand the \emph{true} space of such trajectories. In applications such as stroke rehabilitation where there is a need to differentiate between very similar kinds of movement, dynamical properties can be much more effective. In this regard, we propose a generalization to the Lyapunov exponent to Riemannian manifolds and show its effectiveness for human activity analysis. The theory developed in this thesis naturally leads to several benefits in areas such as data mining, compression, dimensionality reduction, classification, and regression.
Date Created
2016
Agent

Low complexity differential geometric computations with applications to human activity analysis

151028-Thumbnail Image.png
Description
In this thesis, we consider the problem of fast and efficient indexing techniques for time sequences which evolve on manifold-valued spaces. Using manifolds is a convenient way to work with complex features that often do not live in Euclidean spaces.

In this thesis, we consider the problem of fast and efficient indexing techniques for time sequences which evolve on manifold-valued spaces. Using manifolds is a convenient way to work with complex features that often do not live in Euclidean spaces. However, computing standard notions of geodesic distance, mean etc. can get very involved due to the underlying non-linearity associated with the space. As a result a complex task such as manifold sequence matching would require very large number of computations making it hard to use in practice. We believe that one can device smart approximation algorithms for several classes of such problems which take into account the geometry of the manifold and maintain the favorable properties of the exact approach. This problem has several applications in areas of human activity discovery and recognition, where several features and representations are naturally studied in a non-Euclidean setting. We propose a novel solution to the problem of indexing manifold-valued sequences by proposing an intrinsic approach to map sequences to a symbolic representation. This is shown to enable the deployment of fast and accurate algorithms for activity recognition, motif discovery, and anomaly detection. Toward this end, we present generalizations of key concepts of piece-wise aggregation and symbolic approximation for the case of non-Euclidean manifolds. Experiments show that one can replace expensive geodesic computations with much faster symbolic computations with little loss of accuracy in activity recognition and discovery applications. The proposed methods are ideally suited for real-time systems and resource constrained scenarios.
Date Created
2012
Agent