Extracting Semantic Information from Online Conversations to Enhance Cyber Defense
Description
Recent advances in techniques allow the extraction of Cyber Threat Information (CTI) from online content, such as social media, blog articles, and posts in discussion forums. Most research work focuses on social media and blog posts since their content is often contributed by cybersecurity experts and is usually of cleaner formats. While posts in online forums are noisier and less structured, online forums attract more users than other sources and contain much valuable information that may help predict cyber threats. Therefore, effectively extracting CTI from online forum posts is an important task in today's data-driven cybersecurity defenses. Many Natural Language Processing (NLP) techniques are applied to the cybersecurity domains to extract the useful information, however, there is still space to improve. In this dissertation, a new Named Entity Recognition framework for cybersecurity domains and thread structure construction methods for unstructured forums are proposed to support the extraction of CTI. Then, extend them to filter the posts in the forums to eliminate non cybersecurity related topics with Cyber Attack Relevance Scale (CARS), extract the cybersecurity knowledgeable users to enhance more information for enhancing cybersecurity, and extract trending topic phrases related to cyber attacks in the hackers forums to find the clues for potential future attacks to predict them.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
2022
Agent
- Author (aut): Kashihara, Kazuaki
- Thesis advisor (ths): Baral, Chitta
- Committee member: Doupe, Adam
- Committee member: Blanco, Eduardo
- Committee member: Wang, Ruoyu
- Publisher (pbl): Arizona State University