Understanding legacy workflows through runtime trace analysis

153583-Thumbnail Image.png
Description
When scientific software is written to specify processes, it takes the form of a workflow, and is often written in an ad-hoc manner in a dynamic programming language. There is a proliferation of legacy workflows implemented by non-expert programmers due

When scientific software is written to specify processes, it takes the form of a workflow, and is often written in an ad-hoc manner in a dynamic programming language. There is a proliferation of legacy workflows implemented by non-expert programmers due to the accessibility of dynamic languages. Unfortunately, ad-hoc workflows lack a structured description as provided by specialized management systems, making ad-hoc workflow maintenance and reuse difficult, and motivating the need for analysis methods. The analysis of ad-hoc workflows using compiler techniques does not address dynamic languages - a program has so few constrains that its behavior cannot be predicted. In contrast, workflow provenance tracking has had success using run-time techniques to record data. The aim of this work is to develop a new analysis method for extracting workflow structure at run-time, thus avoiding issues with dynamics.

The method captures the dataflow of an ad-hoc workflow through its execution and abstracts it with a process for simplifying repetition. An instrumentation system first processes the workflow to produce an instrumented version, capable of logging events, which is then executed on an input to produce a trace. The trace undergoes dataflow construction to produce a provenance graph. The dataflow is examined for equivalent regions, which are collected into a single unit. The workflow is thus characterized in terms of its treatment of an input. Unlike other methods, a run-time approach characterizes the workflow's actual behavior; including elements which static analysis cannot predict (for example, code dynamically evaluated based on input parameters). This also enables the characterization of dataflow through external tools.

The contributions of this work are: a run-time method for recording a provenance graph from an ad-hoc Python workflow, and a method to analyze the structure of a workflow from provenance. Methods are implemented in Python and are demonstrated on real world Python workflows. These contributions enable users to derive graph structure from workflows. Empowered by a graphical view, users can better understand a legacy workflow. This makes the wealth of legacy ad-hoc workflows accessible, enabling workflow reuse instead of investing time and resources into creating a workflow.
Date Created
2015
Agent

Using contextual information to improve phishing warning effectiveness

153487-Thumbnail Image.png
Description
Internet browsers are today capable of warning internet users of a potential phishing attack. Browsers identify these websites by referring to blacklists of reported phishing websites maintained by trusted organizations like Google, Phishtank etc. On identifying a Unified Resource Locator

Internet browsers are today capable of warning internet users of a potential phishing attack. Browsers identify these websites by referring to blacklists of reported phishing websites maintained by trusted organizations like Google, Phishtank etc. On identifying a Unified Resource Locator (URL) requested by a user as a reported phishing URL, browsers like Mozilla Firefox and Google Chrome display an 'active' warning message in an attempt to stop the user from making a potentially dangerous decision of visiting the website and sharing confidential information like username-password, credit card information, social security number etc.

However, these warnings are not always successful at safeguarding the user from a phishing attack. On several occasions, users ignore these warnings and 'click through' them, eventually landing at the potentially dangerous website and giving away confidential information. Failure to understand the warning, failure to differentiate different types of browser warnings, diminishing trust on browser warnings due to repeated encounter are some of the reasons that make users ignore these warnings. It is important to address these factors in order to eventually improve a user’s reaction to these warnings.

In this thesis, I propose a novel design to improve the effectiveness and reliability of phishing warning messages. This design utilizes the name of the target website that a fake website is mimicking, to display a simple, easy to understand and interactive warning message with the primary objective of keeping the user away from a potentially spoof website.
Date Created
2015
Agent

The design and analysis of hash families for use in broadcast encryption

150895-Thumbnail Image.png
Description
Broadcast Encryption is the task of cryptographically securing communication in a broadcast environment so that only a dynamically specified subset of subscribers, called the privileged subset, may decrypt the communication. In practical applications, it is desirable for a Broadcast Encryption

Broadcast Encryption is the task of cryptographically securing communication in a broadcast environment so that only a dynamically specified subset of subscribers, called the privileged subset, may decrypt the communication. In practical applications, it is desirable for a Broadcast Encryption Scheme (BES) to demonstrate resilience against attacks by colluding, unprivileged subscribers. Minimal Perfect Hash Families (PHFs) have been shown to provide a basis for the construction of memory-efficient t-resilient Key Pre-distribution Schemes (KPSs) from multiple instances of 1-resilient KPSs. Using this technique, the task of constructing a large t-resilient BES is reduced to finding a near-minimal PHF of appropriate parameters. While combinatorial and probabilistic constructions exist for minimal PHFs with certain parameters, the complexity of constructing them in general is currently unknown. This thesis introduces a new type of hash family, called a Scattering Hash Family (ScHF), which is designed to allow for the scalable and ingredient-independent design of memory-efficient BESs for large parameters, specifically resilience and total number of subscribers. A general BES construction using ScHFs is shown, which constructs t-resilient KPSs from other KPSs of any resilience ≤w≤t. In addition to demonstrating how ScHFs can be used to produce BESs , this thesis explores several ScHF construction techniques. The initial technique demonstrates a probabilistic, non-constructive proof of existence for ScHFs . This construction is then derandomized into a direct, polynomial time construction of near-minimal ScHFs using the method of conditional expectations. As an alternative approach to direct construction, representing ScHFs as a k-restriction problem allows for the indirect construction of ScHFs via randomized post-optimization. Using the methods defined, ScHFs are constructed and the parameters' effects on solution size are analyzed. For large strengths, constructive techniques lose significant performance, and as such, asymptotic analysis is performed using the non-constructive existential results. This work concludes with an analysis of the benefits and disadvantages of BESs based on the constructed ScHFs. Due to the novel nature of ScHFs, the results of this analysis are used as the foundation for an empirical comparison between ScHF-based and PHF-based BESs . The primary bases of comparison are construction efficiency, key material requirements, and message transmission overhead.
Date Created
2012
Agent

Client-driven dynamic database updates

150212-Thumbnail Image.png
Description
This thesis addresses the problem of online schema updates where the goal is to be able to update relational database schemas without reducing the database system's availability. Unlike some other work in this area, this thesis presents an approach which

This thesis addresses the problem of online schema updates where the goal is to be able to update relational database schemas without reducing the database system's availability. Unlike some other work in this area, this thesis presents an approach which is completely client-driven and does not require specialized database management systems (DBMS). Also, unlike other client-driven work, this approach provides support for a richer set of schema updates including vertical split (normalization), horizontal split, vertical and horizontal merge (union), difference and intersection. The update process automatically generates a runtime update client from a mapping between the old the new schemas. The solution has been validated by testing it on a relatively small database of around 300,000 records per table and less than 1 Gb, but with limited memory buffer size of 24 Mb. This thesis presents the study of the overhead of the update process as a function of the transaction rates and the batch size used to copy data from the old to the new schema. It shows that the overhead introduced is minimal for medium size applications and that the update can be achieved with no more than one minute of downtime.
Date Created
2011
Agent

UnSync: a soft error resilient redundant CMP architecture

149560-Thumbnail Image.png
Description
Reducing device dimensions, increasing transistor densities, and smaller timing windows, expose the vulnerability of processors to soft errors induced by charge carrying particles. Since these factors are inevitable in the advancement of processor technology, the industry has been forced to

Reducing device dimensions, increasing transistor densities, and smaller timing windows, expose the vulnerability of processors to soft errors induced by charge carrying particles. Since these factors are inevitable in the advancement of processor technology, the industry has been forced to improve reliability on general purpose Chip Multiprocessors (CMPs). With the availability of increased hardware resources, redundancy based techniques are the most promising methods to eradicate soft error failures in CMP systems. This work proposes a novel customizable and redundant CMP architecture (UnSync) that utilizes hardware based detection mechanisms (most of which are readily available in the processor), to reduce overheads during error free executions. In the presence of errors (which are infrequent), the always forward execution enabled recovery mechanism provides for resilience in the system. The inherent nature of UnSync architecture framework supports customization of the redundancy, and thereby provides means to achieve possible performance-reliability trade-offs in many-core systems. This work designs a detailed RTL model of UnSync architecture and performs hardware synthesis to compare the hardware (power/area) overheads incurred. It then compares the same with those of the Reunion technique, a state-of-the-art redundant multi-core architecture. This work also performs cycle-accurate simulations over a wide range of SPEC2000, and MiBench benchmarks to evaluate the performance efficiency achieved over that of the Reunion architecture. Experimental results show that, UnSync architecture reduces power consumption by 34.5% and improves performance by up to 20% with 13.3% less area overhead, when compared to Reunion architecture for the same level of reliability achieved.
Date Created
2011
Agent