Categorization of Phishing Detection Features And Using the Feature Vectors to Classify Phishing Websites

155726-Thumbnail Image.png
Description
Phishing is a form of online fraud where a spoofed website tries to gain access to user's sensitive information by tricking the user into believing that it is a benign website. There are several solutions to detect phishing attacks such

Phishing is a form of online fraud where a spoofed website tries to gain access to user's sensitive information by tricking the user into believing that it is a benign website. There are several solutions to detect phishing attacks such as educating users, using blacklists or extracting phishing characteristics found to exist in phishing attacks. In this thesis, we analyze approaches that extract features from phishing websites and train classification models with extracted feature set to classify phishing websites. We create an exhaustive list of all features used in these approaches and categorize them into 6 broader categories and 33 finer categories. We extract 59 features from the URL, URL redirects, hosting domain (WHOIS and DNS records) and popularity of the website and analyze their robustness in classifying a phishing website. Our emphasis is on determining the predictive performance of robust features. We evaluate the classification accuracy when using the entire feature set and when URL features or site popularity features are excluded from the feature set and show how our approach can be used to effectively predict specific types of phishing attacks such as shortened URLs and randomized URLs. Using both decision table classifiers and neural network classifiers, our results indicate that robust features seem to have enough predictive power to be used in practice.
Date Created
2017
Agent

A Study of Online Security Practices

Description
Data from a total of 282 online web applications was collected, and accounts for 230 of those web applications were created in order to gather data about authentication practices, multistep authentication practices, security question practices, fallback authentication practices, and other

Data from a total of 282 online web applications was collected, and accounts for 230 of those web applications were created in order to gather data about authentication practices, multistep authentication practices, security question practices, fallback authentication practices, and other security practices for online accounts. The account creation and data collection was done between June 2016 and April 2017. The password strengths for online accounts were analyzed and password strength data was compared to existing data. Security questions used by online accounts were evaluated for security and usability, and fallback authentication practices were assessed based on their adherence to best practices. Alternative authentication schemes were examined, and other security considerations such as use of HTTPS and CAPTCHAs were explored. Based on existing data, password policies require stronger passwords in for web applications in 2017 compared to the requirements in 2010. Nevertheless, password policies for many accounts are still not adequate. About a quarter of online web applications examined use security questions, and many of the questions have usability and security concerns. Security mechanisms such as HTTPS and continuous authentication are in general not used in conjunction with security questions for most web applications, which reduces the overall security of the web application. A majority of web applications use email addresses as the login credential and the password recovery credential and do not follow best practices. About a quarter of accounts use multistep authentication and a quarter of accounts employ continuous authentication, yet most accounts fail to combine security measures for defense in depth. The overall conclusion is that some online web applications are using secure practices; however, a majority of online web applications fail to properly implement and utilize secure practices.
Date Created
2017
Agent

Pingo: A Framework for the Management of Storage of Intermediate Outputs of Computational Workflows

155634-Thumbnail Image.png
Description
Scientific workflows allow scientists to easily model and express the entire data processing steps, typically as a directed acyclic graph (DAG). These scientific workflows are made of a collection of tasks that usually take a long time to compute and

Scientific workflows allow scientists to easily model and express the entire data processing steps, typically as a directed acyclic graph (DAG). These scientific workflows are made of a collection of tasks that usually take a long time to compute and that produce a considerable amount of intermediate datasets. Because of the nature of scientific exploration, a scientific workflow can be modified and re-run multiple times, or new scientific workflows are created that might make use of past intermediate datasets. Storing intermediate datasets has the potential to save time in computations. Since storage is limited, one main problem that needs a solution is determining which intermediate datasets need to be saved at creation time in order to minimize the computational time of the workflows to be run in the future. This research thesis proposes the design and implementation of Pingo, a system that is capable of managing the computations of scientific workflows as well as the storage, provenance and deletion of intermediate datasets. Pingo uses the history of workflows submitted to the system to predict the most likely datasets to be needed in the future, and subjects the decision of dataset deletion to the optimization of the computational time of future workflows.
Date Created
2017
Agent

A non-consensus based decentralized financial transaction processing model with support for efficient auditing

154792-Thumbnail Image.png
Description
The success of Bitcoin has generated significant interest in the financial community to understand whether the technological underpinnings of the cryptocurrency paradigm can be leveraged to improve the efficiency of financial processes in the existing infrastructure. Various alternative proposals, most

The success of Bitcoin has generated significant interest in the financial community to understand whether the technological underpinnings of the cryptocurrency paradigm can be leveraged to improve the efficiency of financial processes in the existing infrastructure. Various alternative proposals, most notably, Ripple and Ethereum, aim to provide solutions to the financial community in different ways. These proposals derive their security guarantees from either the computational hardness of proof-of-work or voting based distributed consensus mechanism, both of which can be computationally expensive. Furthermore, the financial audit requirements for a participating financial institutions have not been suitably addressed.

This thesis presents a novel approach of constructing a non-consensus based decentralized financial transaction processing model with a built-in efficient audit structure. The problem of decentralized inter-bank payment processing is used for the model design. The two key insights used in this work are (1) to utilize a majority signature based replicated storage protocol for transaction authorization, and (2) to construct individual self-verifiable audit trails for each node as opposed to a common Blockchain. Theoretical analysis shows that the model provides cryptographic security for transaction processing and the presented audit structure facilitates financial auditing of individual nodes in time independent of the number of transactions.
Date Created
2016
Agent

Fixed verse generation using neural word embeddings

154765-Thumbnail Image.png
Description
For the past three decades, the design of an effective strategy for generating poetry that matches that of a human’s creative capabilities and complexities has been an elusive goal in artificial intelligence (AI) and natural language generation (NLG) research, and

For the past three decades, the design of an effective strategy for generating poetry that matches that of a human’s creative capabilities and complexities has been an elusive goal in artificial intelligence (AI) and natural language generation (NLG) research, and among linguistic creativity researchers in particular. This thesis presents a novel approach to fixed verse poetry generation using neural word embeddings. During the course of generation, a two layered poetry classifier is developed. The first layer uses a lexicon based method to classify poems into types based on form and structure, and the second layer uses a supervised classification method to classify poems into subtypes based on content with an accuracy of 92%. The system then uses a two-layer neural network to generate poetry based on word similarities and word movements in a 50-dimensional vector space.

The verses generated by the system are evaluated using rhyme, rhythm, syllable counts and stress patterns. These computational features of language are considered for generating haikus, limericks and iambic pentameter verses. The generated poems are evaluated using a Turing test on both experts and non-experts. The user study finds that only 38% computer generated poems were correctly identified by nonexperts while 65% of the computer generated poems were correctly identified by experts. Although the system does not pass the Turing test, the results from the Turing test suggest an improvement of over 17% when compared to previous methods which use Turing tests to evaluate poetry generators.
Date Created
2016
Agent

GemV a validated micro-architecture vulnerability estimation tool

154657-Thumbnail Image.png
Description
Several decades of transistor technology scaling has brought the threat of soft errors to modern embedded processors. Several techniques have been proposed to protect these systems from soft errors. However, their effectiveness in protecting the computation cannot be ascertained without

Several decades of transistor technology scaling has brought the threat of soft errors to modern embedded processors. Several techniques have been proposed to protect these systems from soft errors. However, their effectiveness in protecting the computation cannot be ascertained without accurate and quantitative estimation of system reliability. Vulnerability -- a metric that defines the probability of system-failure (reliability) through analytical models -- is the most effective mechanism for our current estimation and early design space exploration needs. Previous vulnerability estimation tools are based around the Sim-Alpha simulator which has been to shown to have several limitations. In this thesis, I present gemV: an accurate and comprehensive vulnerability estimation tool based on gem5. Gem5 is a popular cycle-accurate micro-architectural simulator that can model several different processor models in close to real hardware form. GemV can be used for fast and early design space exploration and also evaluate the protection afforded by commodity processors. gemV is comprehensive, since it models almost all sequential components of the processor. gemV is accurate because of fine-grain vulnerability tracking, accurate vulnerability modeling of squashed instructions, and accurate vulnerability modeling of shared data structures in gem5. gemV has been thoroughly validated against extensive fault injection experiments and achieves a 97\% accuracy with 95\% confidence. A micro-architect can use gemV to discover micro-architectural variants of a processor that minimize vulnerability for allowed performance penalty. A software developer can use gemV to explore the performance-vulnerability trade-off by choosing different algorithms and compiler optimizations, while the system designer can use gemV to explore the performance-vulnerability trade-offs of choosing different Insruction Set Architectures (ISA).
Date Created
2016
Agent

Answer set programming modulo theories

154648-Thumbnail Image.png
Description
Knowledge representation and reasoning is a prominent subject of study within the field of artificial intelligence that is concerned with the symbolic representation of knowledge in such a way to facilitate automated reasoning about this knowledge. Often in real-world domains,

Knowledge representation and reasoning is a prominent subject of study within the field of artificial intelligence that is concerned with the symbolic representation of knowledge in such a way to facilitate automated reasoning about this knowledge. Often in real-world domains, it is necessary to perform defeasible reasoning when representing default behaviors of systems. Answer Set Programming is a widely-used knowledge representation framework that is well-suited for such reasoning tasks and has been successfully applied to practical domains due to efficient computation through grounding--a process that replaces variables with variable-free terms--and propositional solvers similar to SAT solvers. However, some domains provide a challenge for grounding-based methods such as domains requiring reasoning about continuous time or resources.

To address these domains, there have been several proposals to achieve efficiency through loose integrations with efficient declarative solvers such as constraint solvers or satisfiability modulo theories solvers. While these approaches successfully avoid substantial grounding, due to the loose integration, they are not suitable for performing defeasible reasoning on functions. As a result, this expressive reasoning on functions must either be performed using predicates to simulate the functions or in a way that is not elaboration tolerant. Neither compromise is reasonable; the former suffers from the grounding bottleneck when domains are large as is often the case in real-world domains while the latter necessitates encodings to be non-trivially modified for elaborations.

This dissertation presents a novel framework called Answer Set Programming Modulo Theories (ASPMT) that is a tight integration of the stable model semantics and satisfiability modulo theories. This framework both supports defeasible reasoning about functions and alleviates the grounding bottleneck. Combining the strengths of Answer Set Programming and satisfiability modulo theories enables efficient continuous reasoning while still supporting rich reasoning features such as reasoning about defaults and reasoning in domains with incomplete knowledge. This framework is realized in two prototype implementations called MVSM and ASPMT2SMT, and the latter was recently incorporated into a non-monotonic spatial reasoning system. To define the semantics of this framework, we extend the first-order stable model semantics by Ferraris, Lee and Lifschitz to allow "intensional functions" and provide analyses of the theoretical properties of this new formalism and on the relationships between this and existing approaches.
Date Created
2016
Agent

Development and analysis of stochastic boundary coverage strategies for multi-robot systems

154497-Thumbnail Image.png
Description
Robotic technology is advancing to the point where it will soon be feasible to deploy massive populations, or swarms, of low-cost autonomous robots to collectively perform tasks over large domains and time scales. Many of these tasks will require

Robotic technology is advancing to the point where it will soon be feasible to deploy massive populations, or swarms, of low-cost autonomous robots to collectively perform tasks over large domains and time scales. Many of these tasks will require the robots to allocate themselves around the boundaries of regions or features of interest and achieve target objectives that derive from their resulting spatial configurations, such as forming a connected communication network or acquiring sensor data around the entire boundary. We refer to this spatial allocation problem as boundary coverage. Possible swarm tasks that will involve boundary coverage include cooperative load manipulation for applications in construction, manufacturing, and disaster response.

In this work, I address the challenges of controlling a swarm of resource-constrained robots to achieve boundary coverage, which I refer to as the problem of stochastic boundary coverage. I first examined an instance of this behavior in the biological phenomenon of group food retrieval by desert ants, and developed a hybrid dynamical system model of this process from experimental data. Subsequently, with the aid of collaborators, I used a continuum abstraction of swarm population dynamics, adapted from a modeling framework used in chemical kinetics, to derive stochastic robot control policies that drive a swarm to target steady-state allocations around multiple boundaries in a way that is robust to environmental variations.

Next, I determined the statistical properties of the random graph that is formed by a group of robots, each with the same capabilities, that have attached to a boundary at random locations. I also computed the probability density functions (pdfs) of the robot positions and inter-robot distances for this case.

I then extended this analysis to cases in which the robots have heterogeneous communication/sensing radii and attach to a boundary according to non-uniform, non-identical pdfs. I proved that these more general coverage strategies generate random graphs whose probability of connectivity is Sharp-P Hard to compute. Finally, I investigated possible approaches to validating our boundary coverage strategies in multi-robot simulations with realistic Wi-fi communication.
Date Created
2016
Agent

A network function virtualization based load balancer for TCP

154142-Thumbnail Image.png
Description
A load balancer is an essential part of many network systems. A load balancer is capable of dividing and redistributing incoming network traffic to different back end servers, thus improving reliability and performance. Existing load balancing solutions can be classified

A load balancer is an essential part of many network systems. A load balancer is capable of dividing and redistributing incoming network traffic to different back end servers, thus improving reliability and performance. Existing load balancing solutions can be classified into two categories: hardware-based or software-based. Hardware-based load balancing systems are hard to manage and force network administrators to scale up (replacing with more powerful but expensive hardware) when their system can not handle the growing traffic. Software-based solutions have a limitation when dealing with a single large TCP flow. In recent years, with the fast developments of virtualization technology, a new trend of network function virtualization (NFV) is being adopted. Instead of using proprietary hardware, an NFV network infrastructure uses virtual machines running to implement network functions such as load balancers, firewalls, etc. In this thesis, a new load balancing system is designed and evaluated. This system is high performance and flexible. It can fully utilize the bandwidth between a load balancer and back end servers compared to traditional load balancers such as HAProxy. The experimental results show that using this NFV load balancer could have $n$ ($n$ is the number of back end servers) times better performance than HAProxy. Also, an extract, transform and load (ETL) application was implemented to demonstrate that this load balancer can shorten data load time. The experiment shows that when loading a large data set (18.3GB), our load balancer needs only 28\% less time than traditional load balancer.
Date Created
2015
Agent

Graphical representations of security settings in Android

Description
On Android, existing security procedures require apps to request permissions for access to sensitive resources.

Only when the user approves the requested permissions will the app be installed.

However, permissions are an incomplete security mechanism.

In addition to a user's limited understanding of

On Android, existing security procedures require apps to request permissions for access to sensitive resources.

Only when the user approves the requested permissions will the app be installed.

However, permissions are an incomplete security mechanism.

In addition to a user's limited understanding of permissions, the mechanism does not account for the possibility that different permissions used together have the ability to be more dangerous than any single permission alone.

Even if users did understand the nature of an app's requested permissions, this mechanism is still not enough to guarantee that a user's information is protected.

Applications can potentially send or receive sensitive information from other applications without the required permissions by using intents.

In other words, applications can potentially collaborate in ways unforeseen by the user, even if the user understands the permissions of each app independently.

In this thesis, we present several graph-based approaches to address these issues.

We determine the permissions of an app and generate scores based on our assigned value of certain resources.

We analyze these scores overall, as well as in the context of the app's category as determined by Google Play.

We show that these scores can be used to identify overzealous apps, as well as apps that do not properly fit within their category.

We analyze potential interactions between different applications using intents, and identify several promiscuous apps with low permission scores, showing that permissions alone are not sufficient to evaluate the security risks of an app.

Our analyses can form the basis of a system to assist users in identifying apps that can potentially compromise user privacy.
Date Created
2015
Agent