Analysis and Management of Security State for Large-Scale Data Center Networks

156850-Thumbnail Image.png
Description
With the increasing complexity of computing systems and the rise in the number of risks and vulnerabilities, it is necessary to provide a scalable security situation awareness tool to assist the system administrator in protecting the critical assets, as well

With the increasing complexity of computing systems and the rise in the number of risks and vulnerabilities, it is necessary to provide a scalable security situation awareness tool to assist the system administrator in protecting the critical assets, as well as managing the security state of the system. There are many methods to provide security states' analysis and management. For instance, by using a Firewall to manage the security state, and/or a graphical analysis tools such as attack graphs for analysis.

Attack Graphs are powerful graphical security analysis tools as they provide a visual representation of all possible attack scenarios that an attacker may take to exploit system vulnerabilities. The attack graph's scalability, however, is a major concern for enumerating all possible attack scenarios as it is considered an NP-complete problem. There have been many research work trying to come up with a scalable solution for the attack graph. Nevertheless, non-practical attack graph based solutions have been used in practice for realtime security analysis.

In this thesis, a new framework, namely 3S (Scalable Security Sates) analysis framework is proposed, which present a new approach of utilizing Software-Defined Networking (SDN)-based distributed firewall capabilities and the concept of stateful data plane to construct scalable attack graphs in near-realtime, which is a practical approach to use attack graph for realtime security decisions. The goal of the proposed work is to control reachability information between different datacenter segments to reduce the dependencies among vulnerabilities and restrict the attack graph analysis in a relative small scope. The proposed framework is based on SDN's programmable capabilities to adjust the distributed firewall policies dynamically according to security situations during the running time. It apply white-list-based security policies to limit the attacker's capability from moving or exploiting different segments by only allowing uni-directional vulnerability dependency links between segments. Specifically, several test cases will be presented with various attack scenarios and analyze how distributed firewall and stateful SDN data plan can significantly reduce the security states construction and analysis. The proposed approach proved to achieve a percentage of improvement over 61% in comparison with prior modules were SDN and distributed firewall are not in use.
Date Created
2018
Agent

Understanding Hacking-as-a-Service Markets

156823-Thumbnail Image.png
Description
An examination of 12 darkweb sites involved in selling hacking services - often referred to as ”Hacking-as-a-Service” (HaaS) sites is performed. Data is gathered and analyzed for 7 months via weekly site crawling and parsing. In this empirical study, after

An examination of 12 darkweb sites involved in selling hacking services - often referred to as ”Hacking-as-a-Service” (HaaS) sites is performed. Data is gathered and analyzed for 7 months via weekly site crawling and parsing. In this empirical study, after examining over 200 forum threads, common categories of services available on HaaS sites are identified as well as their associated topics of conversation. Some of the most common hacking service categories in the HaaS market include Social Media, Database, and Phone hacking. These types of services are the most commonly advertised; found on over 50\% of all HaaS sites, while services related to Malware and Ransomware are advertised on less than 30\% of these sites. Additionally, an analysis is performed on prices of these services along with their volume of demand and comparisons made between the prices listed in posts seeking services with those sites selling services. It is observed that individuals looking to hire hackers for these services are offering to pay premium prices, on average, 73\% more than what the individual hackers are requesting on their own sites. Overall, this study provides insights into illicit markets for contact based hacking especially with regards to services such as social media hacking, email breaches, and website defacement.
Date Created
2018
Agent

Reasoning about Cyber Threat Actors

156622-Thumbnail Image.png
Description
Reasoning about the activities of cyber threat actors is critical to defend against cyber

attacks. However, this task is difficult for a variety of reasons. In simple terms, it is difficult

to determine who the attacker is, what the desired goals are

Reasoning about the activities of cyber threat actors is critical to defend against cyber

attacks. However, this task is difficult for a variety of reasons. In simple terms, it is difficult

to determine who the attacker is, what the desired goals are of the attacker, and how they will

carry out their attacks. These three questions essentially entail understanding the attacker’s

use of deception, the capabilities available, and the intent of launching the attack. These

three issues are highly inter-related. If an adversary can hide their intent, they can better

deceive a defender. If an adversary’s capabilities are not well understood, then determining

what their goals are becomes difficult as the defender is uncertain if they have the necessary

tools to accomplish them. However, the understanding of these aspects are also mutually

supportive. If we have a clear picture of capabilities, intent can better be deciphered. If we

understand intent and capabilities, a defender may be able to see through deception schemes.

In this dissertation, I present three pieces of work to tackle these questions to obtain

a better understanding of cyber threats. First, we introduce a new reasoning framework

to address deception. We evaluate the framework by building a dataset from DEFCON

capture-the-flag exercise to identify the person or group responsible for a cyber attack.

We demonstrate that the framework not only handles cases of deception but also provides

transparent decision making in identifying the threat actor. The second task uses a cognitive

learning model to determine the intent – goals of the threat actor on the target system.

The third task looks at understanding the capabilities of threat actors to target systems by

identifying at-risk systems from hacker discussions on darkweb websites. To achieve this

task we gather discussions from more than 300 darkweb websites relating to malicious

hacking.
Date Created
2018
Agent

Identifying Financial Frauds on Darkweb

156290-Thumbnail Image.png
Description
Data breaches have been on a rise and financial sector is among the top targeted. It can take a few months and upto a few years to identify the occurrence of a data breach. A major motivation behind data breaches

Data breaches have been on a rise and financial sector is among the top targeted. It can take a few months and upto a few years to identify the occurrence of a data breach. A major motivation behind data breaches is financial gain, hence most of the data ends up being on sale on the darkweb websites. It is important to identify sale of such stolen information on a timely and relevant manner. In this research, we present a system for timely identification of sale of stolen data on darkweb websites. We frame identifying sale of stolen data as a multi-label classification problem and leverage several machine learning approaches based on the thread content (textual) and social network analysis of the user communication seen on darkweb websites. The system generates alerts about trends based on popularity amongst the users of such websites. We evaluate our system using the K-fold cross validation as well as manual evaluation of blind (unseen) data. The method of combining social network and textual features outperforms baseline method i.e only using textual features, by 15 to 20 % improved precision. The alerts provide a good insight and we illustrate our findings by cases studies of the results.
Date Created
2018
Agent

Multi-class and Multi-label classication of Darkweb Data

156125-Thumbnail Image.png
Description
In this research, I try to solve multi-class multi-label classication problem, where

the goal is to automatically assign one or more labels(tags) to discussion topics seen

in deepweb. I observed natural hierarchy in our dataset, and I used dierent

techniques to ensure hierarchical

In this research, I try to solve multi-class multi-label classication problem, where

the goal is to automatically assign one or more labels(tags) to discussion topics seen

in deepweb. I observed natural hierarchy in our dataset, and I used dierent

techniques to ensure hierarchical integrity constraint on the predicted tag list. To

solve `class imbalance' and `scarcity of labeled data' problems, I developed semisupervised

model based on elastic search(ES) document relevance score. I evaluate

our models using standard K-fold cross-validation method. Ensuring hierarchical

integrity constraints improved F1 score by 11.9% over standard supervised learning,

while our ES based semi-supervised learning model out-performed other models in

terms of precision(78.4%) score while maintaining comparable recall(21%) score.
Date Created
2018
Agent

Malicious IP Address Prediction

133396-Thumbnail Image.png
Description
IP blacklisting is a popular technique to bolster an enterprise's security, where access to and from designated IP addresses is explicitly restricted. The fundamental idea behind blacklists is to continually add IP addresses that reputable entities, such as security researchers,

IP blacklisting is a popular technique to bolster an enterprise's security, where access to and from designated IP addresses is explicitly restricted. The fundamental idea behind blacklists is to continually add IP addresses that reputable entities, such as security researchers, have labeled as malicious to the list. Currently IP blacklisting is a reactive method, where malicious IP addresses are identified after their engagement in malicious activities is detected (e.g. hosting malware samples or sending spam emails). This thesis project aims to address this issue, by laying the groundwork for a machine learning tool that proactively identifies malicious IP address. The ground truth data derives from VirusTotal, a company that synthesizes security knowledge from prominent sources, such as Symantec, Fortinet, and ESET. I passed 307,621 IP addresses found in posts on the D2web (deep and dark web) through VirusTotal. If at least one detected URL associates with the IP address and VirusTotal deems it positive, I accordingly label the IP address as positive (malicious), and negative (non-malicious) otherwise. To give some insight into the ground truth, 6,147 IP addresses were identified as positive from the original 307,621. Furthermore, in order to quantify the prediction capabilities of our models, I introduce a metric called lead time. Lead time represents the difference between the date an IP address was first seen on the D2web and its earliest date on VirusTotal. For example, if an IP address was mentioned on the D2web on 1/5/2017 and mentioned on VirusTotal on 1/25/2017, then its lead time is 20 days. After feature selection, where I handpicked features from the data mined from the D2web, I attempted various combinations of classifiers and feature sets in order to create the best model. The final machine learning models implement temporal cross validation - where I train a model on data from 1/1/2016 up until a testing month in 2017, and test on data from the testing month - with a Random Forest classifier. The following are results from a model that was tested on January 2017, which exhibits median performance among the final models. The true positive rate is 0.2558, the false positive rate is 0.3612, and the average lead time (for leading true positives) is 193 days, where the model picks up 33.33% of all leading true positives. Although the model finds a respectable number of true positives, it picks up too many false positives. Thus, my approach is ineffective at predicting malicious IP addresses in their current state, meaning additional efforts will be required to transform the current work into a viable tool
Date Created
2018-05
Agent

An Algorithm for Merging Identities

133698-Thumbnail Image.png
Description
In online social networks the identities of users are concealed, often by design. This anonymity makes it possible for a single person to have multiple accounts and to engage in malicious activity such as defrauding a service providers, leveraging social

In online social networks the identities of users are concealed, often by design. This anonymity makes it possible for a single person to have multiple accounts and to engage in malicious activity such as defrauding a service providers, leveraging social influence, or hiding activities that would otherwise be detected. There are various methods for detecting whether two online users in a network are the same people in reality and the simplest way to utilize this information is to simply merge their identities and treat the two users as a single user. However, this then raises the issue of how we deal with these composite identities. To solve this problem, we introduce a mathematical abstraction for representing users and their identities as partitions on a set. We then define a similarity function, SIM, between two partitions, a set of properties that SIM must have, and a threshold that SIM must exceed for two users to be considered the same person. The main theoretical result of our work is a proof that for any given partition and similarity threshold, there is only a single unique way to merge the identities of similar users such that no two identities are similar. We also present two algorithms, COLLAPSE and SIM_MERGE, that merge the identities of users to find this unique set of identities. We prove that both algorithms execute in polynomial time and we also perform an experiment on dark web social network data from over 6000 users that demonstrates the runtime of SIM_MERGE.
Date Created
2018-05
Agent

Data Driven Game Theoretic Cyber Threat Mitigation

135242-Thumbnail Image.png
Description
Penetration testing is regarded as the gold-standard for understanding how well an organization can withstand sophisticated cyber-attacks. However, the recent prevalence of markets specializing in zero-day exploits on the darknet make exploits widely available to potential attackers. The cost associated

Penetration testing is regarded as the gold-standard for understanding how well an organization can withstand sophisticated cyber-attacks. However, the recent prevalence of markets specializing in zero-day exploits on the darknet make exploits widely available to potential attackers. The cost associated with these sophisticated kits generally precludes penetration testers from simply obtaining such exploits – so an alternative approach is needed to understand what exploits an attacker will most likely purchase and how to defend against them. In this paper, we introduce a data-driven security game framework to model an attacker and provide policy recommendations to the defender. In addition to providing a formal framework and algorithms to develop strategies, we present experimental results from applying our framework, for various system configurations, on real-world exploit market data actively mined from the darknet.
Date Created
2016-05
Agent

Darkweb Cyber Threat Intelligence Mining through the I2P Protocol

134946-Thumbnail Image.png
Description
This thesis project focused on malicious hacking community activities accessible through the I2P protocol. We visited 315 distinct I2P sites to identify those with malicious hacking content. We also wrote software to scrape and parse data from relevant I2P sites.

This thesis project focused on malicious hacking community activities accessible through the I2P protocol. We visited 315 distinct I2P sites to identify those with malicious hacking content. We also wrote software to scrape and parse data from relevant I2P sites. The data was integrated into the CySIS databases for further analysis to contribute to the larger CySIS Lab Darkweb Cyber Threat Intelligence Mining research. We found that the I2P cryptonet was slow and had only a small amount of malicious hacking community activity. However, we also found evidence of a growing perception that Tor anonymity could be compromised. This work will contribute to understanding the malicious hacker community as some Tor users, seeking assured anonymity, transition to I2P.
Date Created
2016-12
Agent

Blurring safety between online and offline worlds: archival, correlational, and experimental evidence of generalized threat in the digital age

155371-Thumbnail Image.png
Description
Decades of research in cyberpsychology and human-computer interaction has pointed to a strong distinction between the online and offline worlds, suggesting that attitudes and behaviors in one domain do not necessarily generalize to the other. However, as humans spend increasing

Decades of research in cyberpsychology and human-computer interaction has pointed to a strong distinction between the online and offline worlds, suggesting that attitudes and behaviors in one domain do not necessarily generalize to the other. However, as humans spend increasing amounts of time in the digital world, psychological understandings of safety may begin to influence human perceptions of threat while online. This dissertation therefore examines whether perceived threat generalizes between domains across archival, correlational, and experimental research methods. Four studies offer insight into the relationship between objective indicators of physical and online safety on the levels of nation and state; the relationship between perceptions of these forms of safety on the individual level; and whether experimental manipulations of one form of threat influence perceptions of threat in the opposite domain. In addition, this work explores the impact of threat perception-related personal and situational factors, as well as the impact of threat type (i.e., self-protection, resource), on this hypothesized relationship.

Collectively, these studies evince a positive relationship between physical and online safety in macro-level actuality and individual-level perception. Among individuals, objective indicators of community safety—as measured by zip code crime data—were a positive reflection of perceptions of physical safety; these perceptions, in turn, mapped onto perceived online safety. The generalization between perceived physical threat and online threat was stronger after being exposed to self-protection threat manipulations, possibly underscoring the more dire nature of threats to bodily safety than those to valuable resources. Most notably, experimental findings suggest that it is not the physical that informs the digital, but rather the opposite: Online threats blur more readily into physical domains, possibly speaking to the concern that dangers specific to the digital world will bleed into the physical one. This generalization of threat may function as a strategy to prepare oneself for future dangers wherever they might appear; and indeed, perceived threat in either world positively influenced desires to act on recommended safety practices. Taken together, this research suggests that in the realm of threat perception, the boundaries between physical and digital are less rigid than may have been previously believed.
Date Created
2017
Agent