Full metadata
Title
Automatic text summarization using importance of sentences for email corpus
Description
With the advent of Internet, the data being added online is increasing at enormous rate. Though search engines are using IR techniques to facilitate the search requests from users, the results are not effective towards the search query of the user. The search engine user has to go through certain webpages before getting at the webpage he/she wanted. This problem of Information Overload can be solved using Automatic Text Summarization. Summarization is a process of obtaining at abridged version of documents so that user can have a quick view to understand what exactly the document is about. Email threads from W3C are used in this system. Apart from common IR features like Term Frequency, Inverse Document Frequency, Term Rank, a variation of page rank based on graph model, which can cluster the words with respective to word ambiguity, is implemented. Term Rank also considers the possibility of co-occurrence of words with the corpus and evaluates the rank of the word accordingly. Sentences of email threads are ranked as per features and summaries are generated. System implemented the concept of pyramid evaluation in content selection. The system can be considered as a framework for Unsupervised Learning in text summarization.
Date Created
2015
Contributors
- Nadella, Sravan (Author)
- Davulcu, Hasan (Thesis advisor)
- Li, Baoxin (Committee member)
- Sen, Arunabha (Committee member)
- Arizona State University (Publisher)
Topical Subject
Resource Type
Extent
: illustrations (some color)
Language
eng
Copyright Statement
In Copyright
Primary Member of
Peer-reviewed
No
Open Access
No
Handle
https://hdl.handle.net/2286/R.I.34929
Statement of Responsibility
by Sravan Nadella
Description Source
Viewed on October 5, 2015
Level of coding
full
Note
thesis
Partial requirement for: M.S., Arizona State University, 2015
bibliography
Includes bibliographical references (pages 31-32)
Field of study: Computer science
System Created
- 2015-08-17 11:57:38
System Modified
- 2021-08-30 01:26:59
- 3 years 2 months ago
Additional Formats