Compressed-Domain Deep Learning with Application to Image Recognition and Universal Adversarial Attack

189300-Thumbnail Image.png
Description
Researchers have shown that the predictions of a deep neural network (DNN) for an image set can be severely distorted by one single image-agnostic perturbation, or universal perturbation, usually with an empirically fixed threshold in the spatial domain to restrict

Researchers have shown that the predictions of a deep neural network (DNN) for an image set can be severely distorted by one single image-agnostic perturbation, or universal perturbation, usually with an empirically fixed threshold in the spatial domain to restrict its perceivability. However, current universal perturbations have limited attack ability, and more importantly, limiting the perturbation's norm in the spatial domain may not be a suitable way to restrict the perceptibility of universal adversarial perturbations. Besides, the effects of such attacks on DNN-based texture recognition have yet to be explored. Learning-based image compression was shown to achieve a competitive performance with state-of-the-art transform-based codecs. This motivated the development of learning-based image compression systems targeting both humans and machines. Also, the learning-based compressed-domain representations can be utilized to perform computer vision tasks directly in the compressed domain. In the context of universal attacks, a novel method is proposed to compute more effective universal perturbations via enhanced projected gradient descent on targeted classifiers. The perturbation is optimized by accumulating small updates on perturbed images consecutively. Performance results show that the proposed adversarial attack method can achieve much higher fooling rates as compared to state-of-the-art universal attack methods. In order to reduce the perceptibility of universal attacks without compromising their effectiveness, a frequency-tuned universal attack framework is proposed to adopt JND thresholds to guide the perceptibility of universal adversarial perturbations. The proposed frequency-tuned attack method can achieve cutting-edge quantitative results, realize a good balance between perceptibility and effectiveness in terms of fooling rate on both natural and texture image datasets. In the context of compressed-domain image recognition, a novel feature adaptation module integrating a lightweight attention model is proposed to adaptively emphasize and enhance the key features within the extracted channel-wise information. Also, an adaptation training strategy is designed to utilize the pretrained pixel-domain weights. The obtained performance results show that the proposed compressed-domain classification model can distinctly outperform the existing compressed-domain classifiers, and that it can also yield similar accuracy results with a much higher computational efficiency as compared to the decoded image trained pixel-domain models.
Date Created
2023
Agent

Robust Deep Learning Through Selective Feature Regeneration.

158654-Thumbnail Image.png
Description
In recent years, the widespread use of deep neural networks (DNNs) has facilitated great improvements in performance for computer vision tasks like image classification and object recognition. In most realistic computer vision applications, an input image undergoes some form of

In recent years, the widespread use of deep neural networks (DNNs) has facilitated great improvements in performance for computer vision tasks like image classification and object recognition. In most realistic computer vision applications, an input image undergoes some form of image distortion such as blur and additive noise during image acquisition or transmission. Deep networks trained on pristine images perform poorly when tested on such distortions. DNN predictions have also been shown to be vulnerable to carefully crafted adversarial perturbations. Specifically, so-called universal adversarial perturbations are image-agnostic perturbations that can be added to any image and can fool a target network into making erroneous predictions. This work proposes selective DNN feature regeneration to improve the robustness of existing DNNs to image distortions and universal adversarial perturbations.

In the context of common naturally occurring image distortions, a metric is proposed to identify the most susceptible DNN convolutional filters and rank them in order of the highest gain in classification accuracy upon correction. The proposed approach called DeepCorrect applies small stacks of convolutional layers with residual connections at the output of these ranked filters and trains them to correct the most distortion-affected filter activations, whilst leaving the rest of the pre-trained filter outputs in the network unchanged. Performance results show that applying DeepCorrect models for common vision tasks significantly improves the robustness of DNNs against distorted images and outperforms other alternative approaches.

In the context of universal adversarial perturbations, departing from existing defense strategies that work mostly in the image domain, a novel and effective defense which only operates in the DNN feature domain is presented. This approach identifies pre-trained convolutional features that are most vulnerable to adversarial perturbations and deploys trainable feature regeneration units which transform these DNN filter activations into resilient features that are robust to universal perturbations. Regenerating only the top 50% adversarially susceptible activations in at most 6 DNN layers and leaving all remaining DNN activations unchanged can outperform existing defense strategies across different network architectures and across various universal attacks.
Date Created
2020
Agent

Driver Assistance System and Feedback for Hybrid Electric Vehicles Using Sensor Fusion

157934-Thumbnail Image.png
Description
Transportation plays a significant role in every human's life. Numerous factors, such as cost of living, available amenities, work style, to name a few, play a vital role in determining the amount of travel time. Such factors, among others, led

Transportation plays a significant role in every human's life. Numerous factors, such as cost of living, available amenities, work style, to name a few, play a vital role in determining the amount of travel time. Such factors, among others, led in part to an increased need for private transportation and, consequently, leading to an increase in the purchase of private cars. Also, road safety was impacted by numerous factors such as Driving Under Influence (DUI), driver’s distraction due to the increase in the use of mobile devices while driving. These factors led to an increasing need for an Advanced Driver Assistance System (ADAS) to help the driver stay aware of the environment and to improve road safety.

EcoCAR3 is one of the Advanced Vehicle Technology Competitions, sponsored by the United States Department of Energy (DoE) and managed by Argonne National Laboratory in partnership with the North American automotive industry. Students are challenged beyond the traditional classroom environment in these competitions, where they redesign a donated production vehicle to improve energy efficiency and to meet emission standards while maintaining the features that are attractive to the customer, including but not limited to performance, consumer acceptability, safety, and cost.

This thesis presents a driver assistance system interface that was implemented as part of EcoCAR3, including the adopted sensors, hardware and software components, system implementation, validation, and testing. The implemented driver assistance system uses a combination of range measurement sensors to determine the distance, relative location, & the relative velocity of obstacles and surrounding objects together with a computer vision algorithm for obstacle detection and classification. The sensor system and vision system were tested individually and then combined within the overall system. Also, a visual and audio feedback system was designed and implemented to provide timely feedback for the driver as an attempt to enhance situational awareness and improve safety.

Since the driver assistance system was designed and developed as part of a DoE sponsored competition, the system needed to satisfy competition requirements and rules. This work attempted to optimize the system in terms of performance, robustness, and cost while satisfying these constraints.
Date Created
2019
Agent

Performance Evaluation of Object Proposal Generators for Salient Object Detection

157065-Thumbnail Image.png
Description
The detection and segmentation of objects appearing in a natural scene, often referred to as Object Detection, has gained a lot of interest in the computer vision field. Although most existing object detectors aim to detect all the objects in

The detection and segmentation of objects appearing in a natural scene, often referred to as Object Detection, has gained a lot of interest in the computer vision field. Although most existing object detectors aim to detect all the objects in a given scene, it is important to evaluate whether these methods are capable of detecting the salient objects in the scene when constraining the number of proposals that can be generated due to constraints on timing or computations during execution. Salient objects are objects that tend to be more fixated by human subjects. The detection of salient objects is important in applications such as image collection browsing, image display on small devices, and perceptual compression.

This thesis proposes a novel evaluation framework that analyses the performance of popular existing object proposal generators in detecting the most salient objects. This work also shows that, by incorporating saliency constraints, the number of generated object proposals and thus the computational cost can be decreased significantly for a target true positive detection rate (TPR).

As part of the proposed framework, salient ground-truth masks are generated from the given original ground-truth masks for a given dataset. Given an object detection dataset, this work constructs salient object location ground-truth data, referred to here as salient ground-truth data for short, that only denotes the locations of salient objects. This is obtained by first computing a saliency map for the input image and then using it to assign a saliency score to each object in the image. Objects whose saliency scores are sufficiently high are referred to as salient objects. The detection rates are analyzed for existing object proposal generators with respect to the original ground-truth masks and the generated salient ground-truth masks.

As part of this work, a salient object detection database with salient ground-truth masks was constructed from the PASCAL VOC 2007 dataset. Not only does this dataset aid in analyzing the performance of existing object detectors for salient object detection, but it also helps in the development of new object detection methods and evaluating their performance in terms of successful detection of salient objects.
Date Created
2019
Agent

Subjective and objective evaluation of visual attention models

155148-Thumbnail Image.png
Description
Visual attention (VA) is the study of mechanisms that allow the human visual system (HVS) to selectively process relevant visual information. This work focuses on the subjective and objective evaluation of computational VA models for the distortion-free case as well

Visual attention (VA) is the study of mechanisms that allow the human visual system (HVS) to selectively process relevant visual information. This work focuses on the subjective and objective evaluation of computational VA models for the distortion-free case as well as in the presence of image distortions.



Existing VA models are traditionally evaluated by using VA metrics that quantify the match between predicted saliency and fixation data obtained from eye-tracking experiments on human observers. Though there is a considerable number of objective VA metrics, there exists no study that validates that these metrics are adequate for the evaluation of VA models. This work constructs a VA Quality (VAQ) Database by subjectively assessing the prediction performance of VA models on distortion-free images. Additionally, shortcomings in existing metrics are discussed through illustrative examples and a new metric that uses local weights based on fixation density and that overcomes these flaws, is proposed. The proposed VA metric outperforms all other popular existing metrics in terms of the correlation with subjective ratings.



In practice, the image quality is affected by a host of factors at several stages of the image processing pipeline such as acquisition, compression, and transmission. However, none of the existing studies have discussed the subjective and objective evaluation of visual saliency models in the presence of distortion. In this work, a Distortion-based Visual Attention Quality (DVAQ) subjective database is constructed to evaluate the quality of VA maps for images in the presence of distortions. For creating this database, saliency maps obtained from images subjected to various types of distortions, including blur, noise and compression, and varying levels of distortion severity are rated by human observers in terms of their visual resemblance to corresponding ground-truth fixation density maps. The performance of traditionally used as well as recently proposed VA metrics are evaluated by correlating their scores with the human subjective ratings. In addition, an objective evaluation of 20 state-of-the-art VA models is performed using the top-performing VA metrics together with a study of how the VA models’ prediction performance changes with different types and levels of distortions.
Date Created
2016
Agent

The feasibility of domain specific compilation for spatially programmable architectures

154977-Thumbnail Image.png
Description
Integrated circuits must be energy efficient. This efficiency affects all aspects of chip design, from the battery life of embedded devices to thermal heating on high performance servers. As technology scaling slows, future generations of transistors will lack the energy

Integrated circuits must be energy efficient. This efficiency affects all aspects of chip design, from the battery life of embedded devices to thermal heating on high performance servers. As technology scaling slows, future generations of transistors will lack the energy efficiency gains as it has had in previous generations. Therefore, other sources of energy efficiency will be much more important. Many computations have the potential to be executed for extreme energy efficiency but are not instigated because the platforms they run on are not optimized for efficient execution. ASICs improve energy efficiency by reducing flexibility and leveraging the properties of a specific computation. However, ASICs are fixed in function and therefore have incredible opportunity cost. FPGAs offer a reconfigurable solution but are 25x less energy efficient than ASIC implementation. Spatially programmable architectures (SPAs) are similar in design and structure to ASICs and FPGAs but are able bridge the ASIC-FPGA energy efficiency gap by trading flexibility for efficiency. However, SPAs are difficult to program because they do not share the same programming model as normal architectures that execute in time. This work addresses compiler challenges for coarse grained, locally interconnected SPA for domain efficiency (SPADE). A novel SPADE topology, called the wave pipeline, is introduced that is designed for the image signal processing domain that is both efficient and simple to compile to. A compiler for the wave pipeline is created that solves for maximum energy and area efficiency using low complexity, greedy methods. The wave pipeline topology and compiler allow for us to investigate and experiment with image signal processing applications to prove the feasibility of SPADE compilers.
Date Created
2016
Agent

Spatial and multi-temporal visual change detection with application to SAR image analysis

153241-Thumbnail Image.png
Description
Thousands of high-resolution images are generated each day. Detecting and analyzing variations in these images are key steps in image understanding. This work focuses on spatial and multitemporal

visual change detection and its applications in multi-temporal synthetic aperture radar (SAR) images.

The

Thousands of high-resolution images are generated each day. Detecting and analyzing variations in these images are key steps in image understanding. This work focuses on spatial and multitemporal

visual change detection and its applications in multi-temporal synthetic aperture radar (SAR) images.

The Canny edge detector is one of the most widely-used edge detection algorithms due to its superior performance in terms of SNR and edge localization and only one response to a single edge. In this work, we propose a mechanism to implement the Canny algorithm at the block level without any loss in edge detection performance as compared to the original frame-level Canny algorithm. The resulting block-based algorithm has significantly reduced memory requirements and can achieve a significantly reduced latency. Furthermore, the proposed algorithm can be easily integrated with other block-based image processing systems. In addition, quantitative evaluations and subjective tests show that the edge detection performance of the proposed algorithm is better than the original frame-based algorithm, especially when noise is present in the images.

In the context of multi-temporal SAR images for earth monitoring applications, one critical issue is the detection of changes occurring after a natural or anthropic disaster. In this work, we propose a novel similarity measure for automatic change detection using a pair of SAR images

acquired at different times and apply it in both the spatial and wavelet domains. This measure is based on the evolution of the local statistics of the image between two dates. The local statistics are modeled as a Gaussian Mixture Model (GMM), which is more suitable and flexible to approximate the local distribution of the SAR image with distinct land-cover typologies. Tests on real datasets show that the proposed detectors outperform existing methods in terms of the quality of the similarity maps, which are assessed using the receiver operating characteristic (ROC) curves, and in terms of the total error rates of the final change detection maps. Furthermore, we proposed a new

similarity measure for automatic change detection based on a divisive normalization transform in order to reduce the computation complexity. Tests show that our proposed DNT-based change detector

exhibits competitive detection performance while achieving lower computational complexity as compared to previously suggested methods.
Date Created
2014
Agent

Texture structure analysis

152770-Thumbnail Image.png
Description
Texture analysis plays an important role in applications like automated pattern inspection, image and video compression, content-based image retrieval, remote-sensing, medical imaging and document processing, to name a few. Texture Structure Analysis is the process of studying the structure present

Texture analysis plays an important role in applications like automated pattern inspection, image and video compression, content-based image retrieval, remote-sensing, medical imaging and document processing, to name a few. Texture Structure Analysis is the process of studying the structure present in the textures. This structure can be expressed in terms of perceived regularity. Our human visual system (HVS) uses the perceived regularity as one of the important pre-attentive cues in low-level image understanding. Similar to the HVS, image processing and computer vision systems can make fast and efficient decisions if they can quantify this regularity automatically. In this work, the problem of quantifying the degree of perceived regularity when looking at an arbitrary texture is introduced and addressed. One key contribution of this work is in proposing an objective no-reference perceptual texture regularity metric based on visual saliency. Other key contributions include an adaptive texture synthesis method based on texture regularity, and a low-complexity reduced-reference visual quality metric for assessing the quality of synthesized textures. In order to use the best performing visual attention model on textures, the performance of the most popular visual attention models to predict the visual saliency on textures is evaluated. Since there is no publicly available database with ground-truth saliency maps on images with exclusive texture content, a new eye-tracking database is systematically built. Using the Visual Saliency Map (VSM) generated by the best visual attention model, the proposed texture regularity metric is computed. The proposed metric is based on the observation that VSM characteristics differ between textures of differing regularity. The proposed texture regularity metric is based on two texture regularity scores, namely a textural similarity score and a spatial distribution score. In order to evaluate the performance of the proposed regularity metric, a texture regularity database called RegTEX, is built as a part of this work. It is shown through subjective testing that the proposed metric has a strong correlation with the Mean Opinion Score (MOS) for the perceived regularity of textures. The proposed method is also shown to be robust to geometric and photometric transformations and outperforms some of the popular texture regularity metrics in predicting the perceived regularity. The impact of the proposed metric to improve the performance of many image-processing applications is also presented. The influence of the perceived texture regularity on the perceptual quality of synthesized textures is demonstrated through building a synthesized textures database named SynTEX. It is shown through subjective testing that textures with different degrees of perceived regularities exhibit different degrees of vulnerability to artifacts resulting from different texture synthesis approaches. This work also proposes an algorithm for adaptively selecting the appropriate texture synthesis method based on the perceived regularity of the original texture. A reduced-reference texture quality metric for texture synthesis is also proposed as part of this work. The metric is based on the change in perceived regularity and the change in perceived granularity between the original and the synthesized textures. The perceived granularity is quantified through a new granularity metric that is proposed in this work. It is shown through subjective testing that the proposed quality metric, using just 2 parameters, has a strong correlation with the MOS for the fidelity of synthesized textures and outperforms the state-of-the-art full-reference quality metrics on 3 different texture databases. Finally, the ability of the proposed regularity metric in predicting the perceived degradation of textures due to compression and blur artifacts is also established.
Date Created
2014
Agent

Automated animal coloration quantification in digital images using dominant colors and skin classification

152389-Thumbnail Image.png
Description
The origin and function of color in animals has been a subject of great interest for taxonomists and ecologists in recent years. Coloration in animals is useful for many important functions like species identification, camouflage and understanding evolutionary relationships. Quantitative

The origin and function of color in animals has been a subject of great interest for taxonomists and ecologists in recent years. Coloration in animals is useful for many important functions like species identification, camouflage and understanding evolutionary relationships. Quantitative measurements of color signal and patch size in mammals, birds and reptiles, to name a few are strong indicators of sexual selection cues and individual health. These measurements provide valuable insights into the impact of environmental conditions on habitat and breeding of mammals, birds and reptiles. Recent advances in the area of digital cameras and sensors have led to a significant increase in the use of digital photography as a means of color quantification in animals. Although a significant amount of research has been conducted on ways to standardize image acquisition conditions and calibrate cameras for use in animal color quantification, almost no work has been done on designing automated methods for animal color quantification. This thesis presents a novel perceptual"–"based framework for the automated extraction and quantification of animal coloration from digital images with slowly varying (almost homogenous) background colors. This implemented framework uses a combination of several techniques including color space quantization using a few dominant colors, foreground"–"background identification, Bayesian classification and mixture Gaussian modelling of conditional densities, edge"–"enhanced model"–"based classification and Saturation"–"Brightness quantization to extract the colored patch. This approach assumes no prior information about the color of either the subject or the background and also the position of the subject in the image. The performance of the proposed method is evaluated for the plumage color of the wild house finches. Segmentation results obtained using the implemented framework are compared with manually scored results to illustrate the performance of this system. The segmentation results show a high correlation with manually scored images. This novel framework also eliminates common problems in manual scoring of digital images such as low repeatability and inter"–"observer error.
Date Created
2013
Agent

Three-dimensional morphometric biosignatures of cancer by automated analysis of transmission-mode optical cell CT images

152001-Thumbnail Image.png
Description
Despite significant advances in digital pathology and automation sciences, current diagnostic practice for cancer detection primarily relies on a qualitative manual inspection of tissue architecture and cell and nuclear morphology in stained biopsies using low-magnification, two-dimensional (2D) brightfield microscopy. The

Despite significant advances in digital pathology and automation sciences, current diagnostic practice for cancer detection primarily relies on a qualitative manual inspection of tissue architecture and cell and nuclear morphology in stained biopsies using low-magnification, two-dimensional (2D) brightfield microscopy. The efficacy of this process is limited by inter-operator variations in sample preparation and imaging, and by inter-observer variability in assessment. Over the past few decades, the predictive value quantitative morphology measurements derived from computerized analysis of micrographs has been compromised by the inability of 2D microscopy to capture information in the third dimension, and by the anisotropic spatial resolution inherent to conventional microscopy techniques that generate volumetric images by stacking 2D optical sections to approximate 3D. To gain insight into the analytical 3D nature of cells, this dissertation explores the application of a new technology for single-cell optical computed tomography (optical cell CT) that is a promising 3D tomographic imaging technique which uses visible light absorption to image stained cells individually with sub-micron, isotropic spatial resolution. This dissertation provides a scalable analytical framework to perform fully-automated 3D morphological analysis from transmission-mode optical cell CT images of hematoxylin-stained cells. The developed framework performs rapid and accurate quantification of 3D cell and nuclear morphology, facilitates assessment of morphological heterogeneity, and generates shape- and texture-based biosignatures predictive of the cell state. Custom 3D image segmentation methods were developed to precisely delineate volumes of interest (VOIs) from reconstructed cell images. Comparison with user-defined ground truth assessments yielded an average agreement (DICE coefficient) of 94% for the cell and its nucleus. Seventy nine biologically relevant morphological descriptors (features) were computed from the segmented VOIs, and statistical classification methods were implemented to determine the subset of features that best predicted cell health. The efficacy of our proposed framework was demonstrated on an in vitro model of multistep carcinogenesis in human Barrett's esophagus (BE) and classifier performance using our 3D morphometric analysis was compared against computerized analysis of 2D image slices that reflected conventional cytological observation. Our results enable sensitive and specific nuclear grade classification for early cancer diagnosis and underline the value of the approach as an objective adjunctive tool to better understand morphological changes associated with malignant transformation.
Date Created
2013
Agent