Full metadata
Title
Topic chains for determining risk of unauthorized information transfer
Description
Corporations invest considerable resources to create, preserve and analyze
their data; yet while organizations are interested in protecting against
unauthorized data transfer, there lacks a comprehensive metric to discriminate
what data are at risk of leaking.
This thesis motivates the need for a quantitative leakage risk metric, and
provides a risk assessment system, called Whispers, for computing it. Using
unsupervised machine learning techniques, Whispers uncovers themes in an
organization's document corpus, including previously unknown or unclassified
data. Then, by correlating the document with its authors, Whispers can
identify which data are easier to contain, and conversely which are at risk.
Using the Enron email database, Whispers constructs a social network segmented
by topic themes. This graph uncovers communication channels within the
organization. Using this social network, Whispers determines the risk of each
topic by measuring the rate at which simulated leaks are not detected. For the
Enron set, Whispers identified 18 separate topic themes between January 1999
and December 2000. The highest risk data emanated from the legal department
with a leakage risk as high as 60%.
their data; yet while organizations are interested in protecting against
unauthorized data transfer, there lacks a comprehensive metric to discriminate
what data are at risk of leaking.
This thesis motivates the need for a quantitative leakage risk metric, and
provides a risk assessment system, called Whispers, for computing it. Using
unsupervised machine learning techniques, Whispers uncovers themes in an
organization's document corpus, including previously unknown or unclassified
data. Then, by correlating the document with its authors, Whispers can
identify which data are easier to contain, and conversely which are at risk.
Using the Enron email database, Whispers constructs a social network segmented
by topic themes. This graph uncovers communication channels within the
organization. Using this social network, Whispers determines the risk of each
topic by measuring the rate at which simulated leaks are not detected. For the
Enron set, Whispers identified 18 separate topic themes between January 1999
and December 2000. The highest risk data emanated from the legal department
with a leakage risk as high as 60%.
Date Created
2014
Contributors
- Wright, Jeremy (Author)
- Syrotiuk, Violet (Thesis advisor)
- Davulcu, Hasan (Committee member)
- Yau, Stephen (Committee member)
- Arizona State University (Publisher)
Resource Type
Extent
vii, 46 p. : ill. (some col.)
Language
eng
Copyright Statement
In Copyright
Primary Member of
Peer-reviewed
No
Open Access
No
Handle
https://hdl.handle.net/2286/R.I.27506
Statement of Responsibility
by Jeremy Wright
Description Source
Viewed on March 6, 2015
Level of coding
full
Note
thesis
Partial requirement for: M.S., Arizona State University, 2014
bibliography
Includes bibliographical references (p. 43-46)
Field of study: Computer science
System Created
- 2015-02-01 07:09:10
System Modified
- 2021-08-30 01:31:06
- 3 years ago
Additional Formats