Femiani, John

Minimizing Dataset Size Requirements for Machine Learning

Description

Machine learning methodologies are widely used in almost all aspects of software engineering. An effective machine learning model requires large amounts of data to achieve high accuracy. The data used for classification is mostly labeled, which is difficult to obtain. The dataset requires both high costs and effort to accurately label the data into different classes. With abundance of data, it becomes necessary that all the data should be labeled for its proper utilization and this work focuses on reducing the labeling effort for large dataset. The thesis presents a comparison of different classifiers performance to test if small set of labeled data can be utilized to build accurate models for high prediction rate. The use of small dataset for classification is then extended to active machine learning methodology where, first a one class classifier will predict the outliers in the data and then the outlier samples are added to a training set for support vector machine classifier for labeling the unlabeled data. The labeling of dataset can be scaled up to avoid manual labeling and building more robust machine learning methodologies.

Date Created

2017

Agent

Author (aut): Batra, Salil
Thesis advisor (ths): Femiani, John
Thesis advisor (ths): Amresh, Ashish
Committee member: Bansal, Ajay
Publisher (pbl): Arizona State University

Feature Adaptive Ray Tracing of Subdivision Surfaces

Description

Subdivision surfaces have gained more and more traction since it became the standard surface representation in the movie industry for many years. And Catmull-Clark subdivision scheme is the most popular one for handling polygonal meshes. After its introduction, Catmull-Clark surfaces have been extended to several eminent ways, including the handling of boundaries, infinitely sharp creases, semi-sharp creases, and hierarchically defined detail. For ray tracing of subdivision surfaces, a common way is to construct spatial bounding volume hierarchies on top of input control mesh. However, a high-level refined subdivision surface not only requires a substantial amount of memory storage, but also causes slow and inefficient ray tracing. In this thesis, it presents a new way to improve the efficiency of ray tracing of subdivision surfaces, while the quality is not as good as general methods.

Date Created

2017

Agent

Author (aut): Ke, Shujian
Thesis advisor (ths): Amresh, Ashish
Committee member: Femiani, John
Committee member: Gonzalez-Sanchez, Javier
Publisher (pbl): Arizona State University

Land use and land cover classification using deep learning techniques

Description

Large datasets of sub-meter aerial imagery represented as orthophoto mosaics are widely available today, and these data sets may hold a great deal of untapped information. This imagery has a potential to locate several types of features; for example, forests, parking lots, airports, residential areas, or freeways in the imagery. However, the appearances of these things vary based on many things including the time that the image is captured, the sensor settings, processing done to rectify the image, and the geographical and cultural context of the region captured by the image. This thesis explores the use of deep convolutional neural networks to classify land use from very high spatial resolution (VHR), orthorectified, visible band multispectral imagery. Recent technological and commercial applications have driven the collection a massive amount of VHR images in the visible red, green, blue (RGB) spectral bands, this work explores the potential for deep learning algorithms to exploit this imagery for automatic land use/ land cover (LULC) classification. The benefits of automatic visible band VHR LULC classifications may include applications such as automatic change detection or mapping. Recent work has shown the potential of Deep Learning approaches for land use classification; however, this thesis improves on the state-of-the-art by applying additional dataset augmenting approaches that are well suited for geospatial data. Furthermore, the generalizability of the classifiers is tested by extensively evaluating the classifiers on unseen datasets and we present the accuracy levels of the classifier in order to show that the results actually generalize beyond the small benchmarks used in training. Deep networks have many parameters, and therefore they are often built with very large sets of labeled data. Suitably large datasets for LULC are not easy to come by, but techniques such as refinement learning allow networks trained for one task to be retrained to perform another recognition task. Contributions of this thesis include demonstrating that deep networks trained for image recognition in one task (ImageNet) can be efficiently transferred to remote sensing applications and perform as well or better than manually crafted classifiers without requiring massive training data sets. This is demonstrated on the UC Merced dataset, where 96% mean accuracy is achieved using a CNN (Convolutional Neural Network) and 5-fold cross validation. These results are further tested on unrelated VHR images at the same resolution as the training set.

Date Created

2016

Agent

Author (aut): Uba, Nagesh Kumar
Thesis advisor (ths): Femiani, John
Committee member: Razdan, Anshuman
Committee member: Amresh, Ashish
Publisher (pbl): Arizona State University

An adaptive time reduction technique for video lectures

Description

Lecture videos are a widely used resource for learning. A simple way to create

videos is to record live lectures, but these videos end up being lengthy, include long

pauses and repetitive words making the viewing experience time consuming. While

pauses are useful in live learning environments where students take notes, I question

the value of pauses in video lectures. Techniques and algorithms that can shorten such

videos can have a huge impact in saving students’ time and reducing storage space.

I study this problem of shortening videos by removing long pauses and adaptively

modifying the playback rate by emphasizing the most important sections of the video

and its effect on the student community. The playback rate is designed in such a

way to play uneventful sections faster and significant sections slower. Important and

unimportant sections of a video are identified using textual analysis. I use an existing

speech-to-text algorithm to extract the transcript and apply latent semantic analysis

and standard information retrieval techniques to identify the relevant segments of

the video. I compute relevance scores of different segments and propose a variable

playback rate for each of these segments. The aim is to reduce the amount of time

students spend on passive learning while watching videos without harming their ability

to follow the lecture. I validate the approach by conducting a user study among

computer science students and measuring their engagement. The results indicate

no significant difference in their engagement when this method is compared to the

original unedited video.

Date Created

2016

Agent

Author (aut): Purushothama Shenoy, Sreenivas
Thesis advisor (ths): Amresh, Ashish
Committee member: Femiani, John
Committee member: Walker, Erin
Publisher (pbl): Arizona State University

Vectorization in analyzing 2D/3D data

Description

Vectorization is an important process in the fields of graphics and image processing. In computer-aided design (CAD), drawings are scanned, vectorized and written as CAD files in a process called paper-to-CAD conversion or drawing conversion. In geographic information systems (GIS), satellite or aerial images are vectorized to create maps. In graphic design and photography, raster graphics can be vectorized for easier usage and resizing. Vector arts are popular as online contents. Vectorization takes raster images, point clouds, or a series of scattered data samples in space, outputs graphic elements of various types including points, lines, curves, polygons, parametric curves and surface patches. The vectorized representations consist of a different set of components and elements from that of the inputs. The change of representation is the key difference between vectorization and practices such as smoothing and filtering. Compared to the inputs, the vector outputs provide higher order of control and attributes such as smoothness. Their curvatures or gradients at the points are scale invariant and they are more robust data sources for downstream applications and analysis. This dissertation explores and broadens the scope of vectorization in various contexts. I propose a novel vectorization algorithm on raster images along with several new applications for vectorization mechanism in processing and analysing both 2D and 3D data sets. The main components of the research are: using vectorization in generating 3D models from 2D floor plans; a novel raster image vectorization methods and its applications in computer vision, image processing, and animation; and vectorization in visualizing and information extraction in 3D laser scan data. I also apply vectorization analysis towards human body scans and rock surface scans to show insights otherwise difficult to obtain.

Date Created

2016

Agent

Author (aut): Yin, Xuetao
Thesis advisor (ths): Razdan, Anshuman
Committee member: Wonka, Peter
Committee member: Femiani, John
Committee member: Maciejewski, Ross
Publisher (pbl): Arizona State University

In-vehicle multimodal interaction: an approach to mitigate driver distraction

Description

Despite the various driver assistance systems and electronics, the threat to life of driver, passengers and other people on the road still persists. With the growth in technology, the use of in-vehicle devices with a plethora of buttons and features is increasing resulting in increased distraction. Recently, speech recognition has emerged as an alternative to distraction and has the potential to be beneficial. However, considering the fact that automotive environment is dynamic and noisy in nature, distraction may not arise from the manual interaction, but due to the cognitive load. Hence, speech recognition certainly cannot be a reliable mode of communication.

The thesis is focused on proposing a simultaneous multimodal approach for designing interface between driver and vehicle with a goal to enable the driver to be more attentive to the driving tasks and spend less time fiddling with distractive tasks. By analyzing the human-human multimodal interaction techniques, new modes have been identified and experimented, especially suitable for the automotive context. The identified modes are touch, speech, graphics, voice-tip and text-tip. The multiple modes are intended to work collectively to make the interaction more intuitive and natural. In order to obtain a minimalist user-centered design for the center stack, various design principles such as 80/20 rule, contour bias, affordance, flexibility-usability trade-off etc. have been implemented on the prototypes. The prototype was developed using the Dragon software development kit on android platform for speech recognition.

In the present study, the driver behavior was investigated in an experiment conducted on the DriveSafety driving simulator DS-600s. Twelve volunteers drove the simulator under two conditions: (1) accessing the center stack applications using touch only and (2) accessing the applications using speech with offered text-tip. The duration for which user looked away from the road (eyes-off-road) was measured manually for each scenario. Comparison of results proved that eyes-off-road time is less for the second scenario. The minimalist design with 8-10 icons per screen proved to be effective as all the readings were within the driver distraction recommendations (eyes-off-road time < 2sec per screen) defined by NHTSA.

Date Created

2015

Agent

Author (aut): Mittal, Richa
Thesis advisor (ths): Gaffar, Ashraf
Committee member: Femiani, John
Committee member: Gray, Robert
Publisher (pbl): Arizona State University

3D rooftop detection and modeling using orthographic aerial images

Description

Detection of extruded features like rooftops and trees in aerial images automatically is a very active area of research. Elevated features identified from aerial imagery have potential applications in urban planning, identifying cover in military training or flight training. Detection of such features using commonly available geospatial data like orthographic aerial imagery is very challenging because rooftop and tree textures are often camouflaged by similar looking features like roads, ground and grass. So, additonal data such as LIDAR, multispectral imagery and multiple viewpoints are exploited for more accurate detection. However, such data is often not available, or may be improperly registered or inacurate. In this thesis, we discuss a novel framework that only uses orthographic images for detection and modeling of rooftops. A segmentation scheme that initializes by assigning either foreground (rooftop) or background labels to certain pixels in the image based on shadows is proposed. Then it employs grabcut to assign one of those two labels to the rest of the pixels based on initial labeling. Parametric model fitting is performed on the segmented results in order to create a 3D scene and to facilitate roof-shape and height estimation. The framework can also benefit from additional geospatial data such as streetmaps and LIDAR, if available.

Date Created

2013

Agent

Author (aut): Khanna, Kunal
Thesis advisor (ths): Femiani, John
Thesis advisor (ths): Wonka, Peter
Committee member: Razdan, Anshuman
Committee member: Maciejewski, Ross
Publisher (pbl): Arizona State University

Lighting prediction and simulation in large nighttime urban scenes

Description

Night vision goggles (NVGs) are widely used by helicopter pilots for flight missions at night, but the equipment can present visually confusing images especially in urban areas. A simulation tool with realistic nighttime urban images would help pilots practice and train for flight with NVGs. However, there is a lack of tools for visualizing urban areas at night. This is mainly due to difficulties in gathering the light system data, placing the light systems at suitable locations, and rendering millions of lights with complex light intensity distributions (LID). Unlike daytime images, a city can have millions of light sources at night, including street lights, illuminated signs, and light shed from building interiors through windows. In this paper, a Procedural Lighting tool (PL), which predicts the positions and properties of street lights, is presented. The PL tool is used to accomplish three aims: (1) to generate vector data layers for geographic information systems (GIS) with statistically estimated information on lighting designs for streets, as well as the locations, orientations, and models for millions of streetlights; (2) to generate geo-referenced raster data to suitable for use as light maps that cover a large scale urban area so that the effect of millions of street light can be accurately rendered at real time, and (3) to extend existing 3D models by generating detailed light-maps that can be used as UV-mapped textures to render the model. An interactive graphical user interface (GUI) for configuring and previewing lights from a Light System Database (LDB) is also presented. The GUI includes physically accurate information about LID and also the lights' spectral power distributions (SPDs) so that a light-map can be generated for use with any sensor if the sensors luminosity function is known. Finally, for areas where more detail is required, a tool has been developed for editing and visualizing light effects over a 3D building from many light sources including area lights and windows. The above components are integrated in the PL tool to produce a night time urban view for not only a large-scale area but also a detail of a city building.

Date Created

2011

Agent

Author (aut): Chuang, Chia-Yuan
Thesis advisor (ths): Femiani, John
Committee member: Razdan, Anshuman
Committee member: Amresh, Ashish
Publisher (pbl): Arizona State University

CyberCog a synthetic task environment for measuring cyber situation awareness

Description

This thesis describes a synthetic task environment, CyberCog, created for the purposes of 1) understanding and measuring individual and team situation awareness in the context of a cyber security defense task and 2) providing a context for evaluating algorithms, visualizations, and other interventions that are intended to improve cyber situation awareness. CyberCog provides an interactive environment for conducting human-in-loop experiments in which the participants of the experiment perform the tasks of a cyber security defense analyst in response to a cyber-attack scenario. CyberCog generates the necessary performance measures and interaction logs needed for measuring individual and team cyber situation awareness. Moreover, the CyberCog environment provides good experimental control for conducting effective situation awareness studies while retaining realism in the scenario and in the tasks performed.

Date Created

2011

Agent

Author (aut): Rajivan, Prashanth
Thesis advisor (ths): Femiani, John
Thesis advisor (ths): Cooke, Nancy J.
Committee member: Lindquist, Timothy
Committee member: Gary, Kevin
Publisher (pbl): Arizona State University

Subscribe to Femiani, John