A Hacker-Centric Perspective to Empower Cyber Defense

158434-Thumbnail Image.png
Description
Malicious hackers utilize the World Wide Web to share knowledge. Previous work has demonstrated that information mined from online hacking communities can be used as precursors to cyber-attacks. In a threatening scenario, where security alert systems are facing high false

Malicious hackers utilize the World Wide Web to share knowledge. Previous work has demonstrated that information mined from online hacking communities can be used as precursors to cyber-attacks. In a threatening scenario, where security alert systems are facing high false positive rates, understanding the people behind cyber incidents can help reduce the risk of attacks. However, the rapidly evolving nature of those communities leads to limitations still largely unexplored, such as: who are the skilled and influential individuals forming those groups, how they self-organize along the lines of technical expertise, how ideas propagate within them, and which internal patterns can signal imminent cyber offensives? In this dissertation, I have studied four key parts of this complex problem set. Initially, I leverage content, social network, and seniority analysis to mine key-hackers on darkweb forums, identifying skilled and influential individuals who are likely to succeed in their cybercriminal goals. Next, as hackers often use Web platforms to advertise and recruit collaborators, I analyze how social influence contributes to user engagement online. On social media, two time constraints are proposed to extend standard influence measures, which increases their correlation with adoption probability and consequently improves hashtag adoption prediction. On darkweb forums, the prediction of where and when hackers will post a message in the near future is accomplished by analyzing their recurrent interactions with other hackers. After that, I demonstrate how vendors of malware and malicious exploits organically form hidden organizations on darkweb marketplaces, obtaining significant consistency across the vendors’ communities extracted using the similarity of their products in different networks. Finally, I predict imminent cyber-attacks correlating malicious hacking activity on darkweb forums with real-world cyber incidents, evidencing how social indicators are crucial for the performance of the proposed model. This research is a hybrid of social network analysis (SNA), machine learning (ML), evolutionary computation (EC), and temporal logic (TL), presenting expressive contributions to empower cyber defense.
Date Created
2020
Agent

Everything You Ever Wanted to Know About Bitcoin Mixers (But Were Afraid to Ask)

158251-Thumbnail Image.png
Description
The lack of fungibility in Bitcoin has forced its userbase to seek out tools that can heighten their anonymity. Third-party Bitcoin mixers utilize obfuscation techniques to protect participants from blockchain analysis. In recent years, various centralized and decentralized Bitcoin mixing

The lack of fungibility in Bitcoin has forced its userbase to seek out tools that can heighten their anonymity. Third-party Bitcoin mixers utilize obfuscation techniques to protect participants from blockchain analysis. In recent years, various centralized and decentralized Bitcoin mixing implementations have been proposed in academic literature. Although these methods depict a threat-free environment for users to preserve their anonymity, public Bitcoin mixers continue to be associated with theft and poor implementation.

This research explores the public Bitcoin mixer ecosystem to identify if today's mixing services have adopted academically proposed solutions. This is done through real-world interactions with publicly available mixers to analyze both implementation and resistance to common threats in the mixing landscape. First, proposed decentralized and centralized mixing protocols found in literature are outlined. Then, data is presented from 19 publicly announced mixing services available on the deep web and clearnet. The services are categorized based on popularity with the Bitcoin community and experiments are conducted on five public mixing services: ChipMixer, MixTum, Bitcoin Mixer, CryptoMixer, and Sudoku Wallet.

The results of the experiments highlight a clear gap between public and proposed Bitcoin mixers in both implementation and security. Today's mixing services focus on presenting users with a false sense of control to gain their trust rather then employing secure mixing techniques. As a result, the five selected services lack implementation of academically proposed techniques and display poor resistance to common mixer-related threats.
Date Created
2020
Agent

HoneyPLC: A Next-Generation Honeypot for Industrial Control Systems

158121-Thumbnail Image.png
Description
Utilities infrastructure like the electric grid have been the target of more sophisticated cyberattacks designed to disrupt their operation and create social unrest and economical losses. Just in 2016, a cyberattack targeted the Ukrainian power grid and successfully caused

Utilities infrastructure like the electric grid have been the target of more sophisticated cyberattacks designed to disrupt their operation and create social unrest and economical losses. Just in 2016, a cyberattack targeted the Ukrainian power grid and successfully caused a blackout that affected 225,000 customers.

Industrial Control Systems (ICS) are a critical part of this infrastructure. Honeypots are one of the tools that help us capture attack data to better understand new and existing attack methods and strategies. Honeypots are computer systems purposefully left exposed to be broken into. They do not have any inherent value, instead, their value comes when attackers interact with them. However, state-of-the-art honeypots lack sophisticated service simulations required to obtain valuable data.

Worst, they cannot adapt while ICS malware keeps evolving and attacks patterns are increasingly more sophisticated.

This work presents HoneyPLC: A Next-Generation Honeypot for ICS. HoneyPLC is, the very first medium-interaction ICS honeypot, and includes advanced service simulation modeled after S7-300 and S7-1200 Siemens PLCs, which are widely used in real-life ICS infrastructures.

Additionally, HoneyPLC provides much needed extensibility features to prepare for new attack tactics, e.g., exploiting a new vulnerability found in a new PLC model.

HoneyPLC was deployed both in local and public environments, and tested against well-known reconnaissance tools used by attackers such as Nmap and Shodan's Honeyscore. Results show that HoneyPLC is in fact able to fool both tools with a high level of confidence. Also, HoneyPLC recorded high amounts of interesting ICS interactions from all around the globe, proving not only that attackers are in fact targeting ICS systems, but that HoneyPLC provides a higher level of interaction that effectively deceives them.
Date Created
2020
Agent

Sequencing Behavior in an Intelligent Pro-active Co-Driver System

158101-Thumbnail Image.png
Description
Driving is the coordinated operation of mind and body for movement of a vehicle, such as a car, or a bus. Driving, being considered an everyday activity for many people, still has an issue of safety. Driver distraction is becoming

Driving is the coordinated operation of mind and body for movement of a vehicle, such as a car, or a bus. Driving, being considered an everyday activity for many people, still has an issue of safety. Driver distraction is becoming a critical safety problem. Speed, drunk driving as well as distracted driving are the three leading factors in the fatal car crashes. Distraction, which is defined as an excessive workload and limited attention, is the main paradigm that guides this research area. Driver behavior analysis can be used to address the distraction problem and provide an intelligent adaptive agent to work closely with the driver, fay beyond traditional algorithmic computational models. A variety of machine learning approaches has been proposed to estimate or predict drivers’ fatigue level using car data, driver status or a combination of them.

Three important features of intelligence and cognition are perception, attention and sensory memory. In this thesis, I focused on memory and attention as essential parts of highly intelligent systems. Without memory, systems will only show limited intelligence since their response would be exclusively based on spontaneous decision without considering the effect of previous events. I proposed a memory-based sequence to predict the driver behavior and distraction level using neural network. The work started with a large-scale experiment to collect data and make an artificial intelligence-friendly dataset. After that, the data was used to train a deep neural network to estimate the driver behavior. With a focus on memory by using Long Short Term Memory (LSTM) network to increase the level of intelligence in two dimensions: Forgiveness of minor glitches, and accumulation of anomalous behavior., I reduced the model error and computational expense by adding attention mechanism on the top of LSTM models. This system can be generalized to build and train highly intelligent agents in other domains.
Date Created
2020
Agent

Leveraging Scalable Data Analysis to Proactively Bolster the Anti-Phishing Ecosystem

158081-Thumbnail Image.png
Description
Despite an abundance of defenses that work to protect Internet users from online threats, malicious actors continue deploying relentless large-scale phishing attacks that target these users. Effectively mitigating phishing attacks remains a challenge for the security community due to

Despite an abundance of defenses that work to protect Internet users from online threats, malicious actors continue deploying relentless large-scale phishing attacks that target these users. Effectively mitigating phishing attacks remains a challenge for the security community due to attackers' ability to evolve and adapt to defenses, the cross-organizational nature of the infrastructure abused for phishing, and discrepancies between theoretical and realistic anti-phishing systems. Although technical countermeasures cannot always compensate for the human weakness exploited by social engineers, maintaining a clear and up-to-date understanding of the motivation behind---and execution of---modern phishing attacks is essential to optimizing such countermeasures.

In this dissertation, I analyze the state of the anti-phishing ecosystem and show that phishers use evasion techniques, including cloaking, to bypass anti-phishing mitigations in hopes of maximizing the return-on-investment of their attacks. I develop three novel, scalable data-collection and analysis frameworks to pinpoint the ecosystem vulnerabilities that sophisticated phishing websites exploit. The frameworks, which operate on real-world data and are designed for continuous deployment by anti-phishing organizations, empirically measure the robustness of industry-standard anti-phishing blacklists (PhishFarm and PhishTime) and proactively detect and map phishing attacks prior to launch (Golden Hour). Using these frameworks, I conduct a longitudinal study of blacklist performance and the first large-scale end-to-end analysis of phishing attacks (from spamming through monetization). As a result, I thoroughly characterize modern phishing websites and identify desirable characteristics for enhanced anti-phishing systems, such as more reliable methods for the ecosystem to collectively detect phishing websites and meaningfully share the corresponding intelligence. In addition, findings from these studies led to actionable security recommendations that were implemented by key organizations within the ecosystem to help improve the security of Internet users worldwide.
Date Created
2020
Agent

Rule-Based Home Automation

131337-Thumbnail Image.png
Description
Apple’s HomeKit framework centralizes control of smart home devices and allows users to create home automations based on predefined rules. For example, a user can add a rule to turn off all the lights in their house whenever they leave.

Apple’s HomeKit framework centralizes control of smart home devices and allows users to create home automations based on predefined rules. For example, a user can add a rule to turn off all the lights in their house whenever they leave. Currently, these rules must be added through a graphical user interface provided by Apple or a third-party app on iOS. This thesis describes how a text-based language provides users with a more expressive means of creating complex home automations and successfully implements such a language. Rules created using this text-based format are parsed and interpreted into rules that can be added directly into HomeKit. This thesis also explores how security features should be implemented with this text-based approach. Since automations are run by the system without user interaction, it is important to consider how the system itself can provide functionality to address the unintended consequences that may result from running an automation. This is especially important for the text-based approach since its increase in expressiveness makes it easier for a user to make a mistake in programming that leads to a security concern. The proposed method for preventing unintended side effects is using a simulation to run every automation prior to actually running the automation on real-world devices. This approach allows users to code some conditions that must be satisfied in order for the automation to run on devices in the home. This thesis describes the creation of such a program that successfully simulates every device in the home. There were limitations, however, with Apple's HomeKit framework, which made it impractical to match the state of simulated devices to real devices in the home. Without being able to match the current state of the home to the current state of the simulation, this method cannot satisfy the goal of ensuring that certain adverse effects will not occur as a result of automations. Other smart home control platforms that provide more extensibility could be used to create this simulation-based security approach. Perhaps as Apple continues to open up their HomeKit platform to developers, this approach may be feasible within Apple's ecosystem at some point in the future.
Date Created
2020-05
Agent

Attribute-Based Encryption for Fine-Grained Access Control over Sensitive Data

158005-Thumbnail Image.png
Description
The traditional access control system suffers from the problem of separation of data ownership and management. It poses data security issues in application scenarios such as cloud computing and blockchain where the data owners either do not trust the data

The traditional access control system suffers from the problem of separation of data ownership and management. It poses data security issues in application scenarios such as cloud computing and blockchain where the data owners either do not trust the data storage provider or even do not know who would have access to their data once they are appended to the chain. In these scenarios, the data owner actually loses control of the data once they are uploaded to the outside storage. Encryption-before-uploading is the way to solve this issue, however traditional encryption schemes such as AES, RSA, ECC, bring about great overheads in key management on the data owner end and could not provide fine-grained access control as well.

Attribute-Based Encryption (ABE) is a cryptographic way to implement attribute-based access control, which is a fine-grained access control model, thus solving all aforementioned issues. With ABE, the data owner would encrypt the data by a self-defined access control policy before uploading the data. The access control policy is an AND-OR boolean formula over attributes. Only users with attributes that satisfy the access control policy could decrypt the ciphertext. However the existing ABE schemes do not provide some important features in practical applications, e.g., user revocation and attribute expiration. Furthermore, most existing work focus on how to use ABE to protect cloud stored data, while not the blockchain applications.

The main objective of this thesis is to provide solutions to add two important features of the ABE schemes, i.e., user revocation and attribute expiration, and also provide a practical trust framework for using ABE to protect blockchain data. To add the feature of user revocation, I propose to add user's hierarchical identity into the private attribute key. In this way, only users whose identity is not revoked and attributes satisfy the access control policy could decrypt the ciphertext. To add the feature of attribute expiration, I propose to add the attribute valid time period into the private attribute key. The data would be encrypted by access control policy where all attributes have a temporal value. In this way, only users whose attributes both satisfy the access policy and at the same time these attributes do not expire,

are allowed to decrypt the ciphertext. To use ABE in the blockchain applications, I propose an ABE-enabled trust framework in a very popular blockchain platform, Hyperledger Fabric. Based on the design, I implement a light-weight attribute certificate authority for attribute distribution and validation; I implement the proposed ABE schemes and provide a toolkit which supports system setup, key generation,

data encryption and data decryption. All these modules were integrated into a demo system for protecting sensitive les in a blockchain application.
Date Created
2020
Agent

Automated reflection of CTF hostile exploits (ARCHES): inductive programming techniques for network traffic comprehension and reflection

157598-Thumbnail Image.png
Description
As the gap widens between the number of security threats and the number of security professionals, the need for automated security tools becomes increasingly important. These automated systems assist security professionals by identifying and/or fixing potential vulnerabilities before they can

As the gap widens between the number of security threats and the number of security professionals, the need for automated security tools becomes increasingly important. These automated systems assist security professionals by identifying and/or fixing potential vulnerabilities before they can be exploited. One such category of tools is exploit generators, which craft exploits to demonstrate a vulnerability and provide guidance on how to repair it. Existing exploit generators largely use the application code, either through static or dynamic analysis, to locate crashes and craft a payload.

This thesis proposes the Automated Reflection of CTF Hostile Exploits (ARCHES), an exploit generator that learns by example. ARCHES uses an inductive programming library named IRE to generate exploits from exploit examples. In doing so, ARCHES can create an exploit only from example exploit payloads without interacting with the service. By representing each component of the exploit interaction as a collection of theories for how that component occurs, ARCHES can identify critical state information and replicate an executable exploit. This methodology learns rapidly and works with only a few examples. The ARCHES exploit generator is targeted towards Capture the Flag (CTF) events as a suitable environment for initial research.

The effectiveness of this methodology was evaluated on four exploits with features that demonstrate the capabilities and limitations of this methodology. ARCHES is capable of reproducing exploits that require an understanding of state dependent input, such as a flag id. Additionally, ARCHES can handle basic utilization of state information that is revealed through service output. However, limitations in this methodology result in failure to replicate exploits that require a loop, intricate mathematics, or multiple TCP connections.

Inductive programming has potential as a security tool to augment existing automated security tools. Future research into these techniques will provide more capabilities for security professionals in academia and in industry.
Date Created
2019
Agent

Protecting Visual Information in Augmented Reality from Malicious Application Developers

157518-Thumbnail Image.png
Description
Visual applications – those that use camera frames as part of the application – provide a rich, context-aware experience. The continued development of mixed and augmented reality (MR/AR) computing environments furthers the richness of this experience by providing applications a

Visual applications – those that use camera frames as part of the application – provide a rich, context-aware experience. The continued development of mixed and augmented reality (MR/AR) computing environments furthers the richness of this experience by providing applications a continuous vision experience, where visual information continuously provides context for applications and the real world is augmented by the virtual. To understand user privacy concerns in continuous vision computing environments, this work studies three MR/AR applications (augmented markers, augmented faces, and text capture) to show that in a modern mobile system, the typical user is exposed to potential mass collection of sensitive information, posing privacy and security deficiencies to be addressed in future systems.

To address such deficiencies, a development framework is proposed that provides resource isolation between user information contained in camera frames and application access to the network. The design is implemented using existing system utilities as a proof of concept on the Android operating system and demonstrates its viability with a modern state-of-the-art augmented reality library and several augmented reality applications. Evaluation is conducted on the design on a Samsung Galaxy S8 phone by comparing the applications from the case study with modified versions which better protect user privacy. Early results show that the new design efficiently protects users against data collection in MR/AR applications with less than 0.7% performance overhead.
Date Created
2019
Agent

IRE: A Framework For Inductive Reverse Engineering

157515-Thumbnail Image.png
Description
Reverse engineering is critical to reasoning about how a system behaves. While complete access to a system inherently allows for perfect analysis, partial access is inherently uncertain. This is the case foran individual agent in a distributed system. Inductive Reverse

Reverse engineering is critical to reasoning about how a system behaves. While complete access to a system inherently allows for perfect analysis, partial access is inherently uncertain. This is the case foran individual agent in a distributed system. Inductive Reverse Engineering (IRE) enables analysis under

such circumstances. IRE does this by producing program spaces consistent with individual input-output examples for a given domain-specific language. Then, IRE intersects those program spaces to produce a generalized program consistent with all examples. IRE, an easy to use framework, allows this domain-specific language to be specified in the form of Theorist s, which produce Theory s, a succinct way of representing the program space.

Programs are often much more complex than simple string transformations. One of the ways in which they are more complex is in the way that they follow a conversation-like behavior, potentially following some underlying protocol. As a result, IRE represents program interactions as Conversations in order to

more correctly model a distributed system. This, for instance, enables IRE to model dynamically captured inputs received from other agents in the distributed system.

While domain-specific knowledge provided by a user is extremely valuable, such information is not always possible. IRE mitigates this by automatically inferring program grammars, allowing it to still perform efficient searches of the program space. It does this by intersecting conversations prior to synthesis in order to understand what portions of conversations are constant.

IRE exists to be a tool that can aid in automatic reverse engineering across numerous domains. Further, IRE aspires to be a centralized location and interface for implementing program synthesis and automatic black box analysis techniques.
Date Created
2019
Agent