149410-Thumbnail Image.png
Description
Text search is a very useful way of retrieving document information from a particular website. The public generally use internet search engines over the local enterprise search engines, because the enterprise content is not cross linked and does not follow

Text search is a very useful way of retrieving document information from a particular website. The public generally use internet search engines over the local enterprise search engines, because the enterprise content is not cross linked and does not follow a page rank algorithm. On the other hand the enterprise search engine uses metadata information, which allows the user to specify the conditions that any retrieved document should meet. Therefore, using metadata information for searching will also be very useful. My thesis aims on developing an enterprise search engine using metadata information by providing advanced features like faceted navigation. The search engine data was extracted from various Indonesian web sources. Metadata information like person, organization, location, and sentiment analytic keyword entities should be tagged in each document to provide facet search capability. A shallow parsing technique like named entity recognizer is used for this purpose. There are more than 1500 entities that have been tagged in this process. These documents have been successfully converted into XML format and are indexed with "Apache Solr". It is an open source enterprise search engine with full text search and faceted search capabilities. The entities will be helpful for users to specify conditions and search faster through the large collection of documents. The user is assured results by clicking on a metadata condition. Since the sentiment analytic keywords are tagged with positive and negative values, social scientists can use these results to check for overlapping or conflicting organizations and ideologies. In addition, this tool is the first of its kind for the Indonesian language. The results are fetched much faster and with better accuracy.
Reuse Permissions


  • Download restricted.

    Details

    Title
    • Faceted search and browsing of Indonesian text collection using shallow parsing techniques
    Contributors
    Date Created
    2010
    Resource Type
  • Text
  • Collections this item is in
    Note
    • thesis
      Partial requirement for: M.S., Arizona State University, 2010
    • bibliography
      Includes bibliographical references (p. 40-42)
    • Field of study: Computer science

    Citation and reuse

    Statement of Responsibility

    by Srinivasa Raviteja Sanaka

    Machine-readable links