Full metadata
Title
Sequence-based web page template detection
Description
Templates are wildly used in Web sites development. Finding the template for a given set of Web pages could be very important and useful for many applications like Web page classification and monitoring content and structure changes of Web pages. In this thesis, two novel sequence-based Web page template detection algorithms are presented. Different from tree mapping algorithms which are based on tree edit distance, sequence-based template detection algorithms operate on the Prüfer/Consolidated Prüfer sequences of trees. Since there are one-to-one correspondences between Prüfer/Consolidated Prüfer sequences and trees, sequence-based template detection algorithms identify the template by finding a common subsequence between to Prüfer/Consolidated Prüfer sequences. This subsequence should be a sequential representation of a common subtree of input trees. Experiments on real-world web pages showed that our approaches detect templates effectively and efficiently.
Date Created
2011
Contributors
- Huang, Wei (Author)
- Candan, Kasim Selcuk (Thesis advisor)
- Sundaram, Hari (Committee member)
- Davulcu, Hasan (Committee member)
- Arizona State University (Publisher)
Topical Subject
Resource Type
Extent
vii, 62 p. : ill. (some col.)
Language
eng
Copyright Statement
In Copyright
Primary Member of
Peer-reviewed
No
Open Access
No
Handle
https://hdl.handle.net/2286/R.I.9268
Statement of Responsibility
by Wei Huang
Description Source
Viewed on March 9, 2012
Level of coding
full
Note
thesis
Partial requirement for: M.S., Arizona State University, 2011
bibliography
Includes bibliographical references (p. 60-62)
Field of study: Computer science
System Created
- 2011-08-12 04:47:38
System Modified
- 2021-08-30 01:52:23
- 3 years 2 months ago
Additional Formats