On Feature Saliency and Deep Neural Networks

158774-Thumbnail Image.png
Description
Technological advances have allowed for the assimilation of a variety of data, driving a shift away from the use of simpler and constrained patterns to more complex and diverse patterns in retrieval and analysis of such data. This shift has

Technological advances have allowed for the assimilation of a variety of data, driving a shift away from the use of simpler and constrained patterns to more complex and diverse patterns in retrieval and analysis of such data. This shift has inundated the conventional techniques and has stressed the need for intelligent mechanisms that can model the complex patterns in the data. Deep neural networks have shown some success at capturing complex patterns, including the so-called attentioned networks, have significant shortcomings in distinguishing what is important in data from what is noise. This dissertation observes that the traditional neural networks primarily rely solely on gradient-based learning to model deep features maps while ignoring the key insight in the data that can be leveraged as complementary information to help learn an accurate model. In particular, this dissertation shows that the localized multi-scale features (captured implicitly or explicitly) can be leveraged to help improve model performance as these features capture salient informative points in the data.

This dissertation focuses on “working with the data, not just on data”, i.e. leveraging feature saliency through pre-training, in-training, and post-training analysis of the data. In particular, non-neural localized multi-scale feature extraction, in images and time series, are relatively cheap to obtain and can provide a rough overview of the patterns in the data. Furthermore, localized features coupled with deep features can help learn a high performing network. A pre-training analysis of sizes, complexities, and distribution of these localized features can help intelligently allocate a user-provided kernel budget in the network as a single-shot hyper-parameter search. Additionally, these localized features can be used as a secondary input modality to the network for cross-attention. Retraining pre-trained networks can be a costly process, yet, a post-training analysis of model inferences can allow for learning the importance of individual network parameters to the model inferences thus facilitating a retraining-free network sparsification with minimal impact on the model performance. Furthermore, effective in-training analysis of the intermediate features in the network help learn the importance of individual intermediate features (neural attention) and this analysis can be achieved through simulating local-extrema detection or learning features simultaneously and understanding their co-occurrences. In summary, this dissertation argues and establishes that, if appropriately leveraged, localized features and their feature saliency can help learn high-accurate, yet cheaper networks.
Date Created
2020
Agent

Multi-variate time series similarity measures and their robustness against temporal asynchrony

154174-Thumbnail Image.png
Description
The amount of time series data generated is increasing due to the integration of sensor technologies with everyday applications, such as gesture recognition, energy optimization, health care, video surveillance. The use of multiple sensors simultaneously

for capturing different aspects of the

The amount of time series data generated is increasing due to the integration of sensor technologies with everyday applications, such as gesture recognition, energy optimization, health care, video surveillance. The use of multiple sensors simultaneously

for capturing different aspects of the real world attributes has also led to an increase in dimensionality from uni-variate to multi-variate time series. This has facilitated richer data representation but also has necessitated algorithms determining similarity between two multi-variate time series for search and analysis.

Various algorithms have been extended from uni-variate to multi-variate case, such as multi-variate versions of Euclidean distance, edit distance, dynamic time warping. However, it has not been studied how these algorithms account for asynchronous in time series. Human gestures, for example, exhibit asynchrony in their patterns as different subjects perform the same gesture with varying movements in their patterns at different speeds. In this thesis, we propose several algorithms (some of which also leverage metadata describing the relationships among the variates). In particular, we present several techniques that leverage the contextual relationships among the variates when measuring multi-variate time series similarities. Based on the way correlation is leveraged, various weighing mechanisms have been proposed that determine the importance of a dimension for discriminating between the time series as giving the same weight to each dimension can led to misclassification. We next study the robustness of the considered techniques against different temporal asynchronies, including shifts and stretching.

Exhaustive experiments were carried on datasets with multiple types and amounts of temporal asynchronies. It has been observed that accuracy of algorithms that rely on data to discover variate relationships can be low under the presence of temporal asynchrony, whereas in case of algorithms that rely on external metadata, robustness against asynchronous distortions tends to be stronger. Specifically, algorithms using external metadata have better classification accuracy and cluster separation than existing state-of-the-art work, such as EROS, PCA, and naive dynamic time warping.
Date Created
2015
Agent