Full metadata
Title
Unsupervised Bayesian data cleaning techniques for structured data
Description
Recent efforts in data cleaning have focused mostly on problems like data deduplication, record matching, and data standardization; few of these focus on fixing incorrect attribute values in tuples. Correcting values in tuples is typically performed by a minimum cost repair of tuples that violate static constraints like CFDs (which have to be provided by domain experts, or learned from a clean sample of the database). In this thesis, I provide a method for correcting individual attribute values in a structured database using a Bayesian generative model and a statistical error model learned from the noisy database directly. I thus avoid the necessity for a domain expert or master data. I also show how to efficiently perform consistent query answering using this model over a dirty database, in case write permissions to the database are unavailable. A Map-Reduce architecture to perform this computation in a distributed manner is also shown. I evaluate these methods over both synthetic and real data.
Date Created
2014
Contributors
- De, Sushovan (Author)
- Kambhampati, Subbarao (Thesis advisor)
- Chen, Yi (Committee member)
- Candan, K. Selcuk (Committee member)
- Liu, Huan (Committee member)
- Arizona State University (Publisher)
Topical Subject
Resource Type
Extent
viii, 90 p. : ill. (some col.)
Language
eng
Copyright Statement
In Copyright
Primary Member of
Peer-reviewed
No
Open Access
No
Handle
https://hdl.handle.net/2286/R.I.25942
Statement of Responsibility
by Sushovan De
Description Source
Viewed on December 19, 2014
Level of coding
full
Note
thesis
Partial requirement for: Ph.D., Arizona State University, 2014
bibliography
Includes bibliographical references (p. 87-90)
Field of study: Computer science
System Created
- 2014-10-01 08:03:36
System Modified
- 2021-08-30 01:32:43
- 3 years 2 months ago
Additional Formats