Context-aware rank-oriented recommender systems

151407-Thumbnail Image.png
Description
Recommender systems are a type of information filtering system that suggests items that may be of interest to a user. Most information retrieval systems have an overwhelmingly large number of entries. Most users would experience information overload if they were

Recommender systems are a type of information filtering system that suggests items that may be of interest to a user. Most information retrieval systems have an overwhelmingly large number of entries. Most users would experience information overload if they were forced to explore the full set of results. The goal of recommender systems is to overcome this limitation by predicting how users will value certain items and returning the items that should be of the highest interest to the user. Most recommender systems collect explicit user feedback, such as a rating, and attempt to optimize their model to this rating value. However, there is potential for a system to collect implicit user feedback, such as user purchases and clicks, to learn user preferences. Additionally with implicit user feedback, it is possible for the system to remember the context of user feedback in terms of which other items a user was considering when making their decisions. When considering implicit user feedback, only a subset of all evaluation techniques can be used. Currently, sufficient evaluation techniques for evaluating implicit user feedback do not exist. In this thesis, I introduce a new model for recommendation that borrows the idea of opportunity cost from economics. There are two variations of the model, one considering context and one that does not. Additionally, I propose a new evaluation measure that works specifically for the case of implicit user feedback.
Date Created
2012
Agent

Application of a temporal database framework for processing event queries

151371-Thumbnail Image.png
Description
This dissertation presents the Temporal Event Query Language (TEQL), a new language for querying event streams. Event Stream Processing enables online querying of streams of events to extract relevant data in a timely manner. TEQL enables querying of interval-based event

This dissertation presents the Temporal Event Query Language (TEQL), a new language for querying event streams. Event Stream Processing enables online querying of streams of events to extract relevant data in a timely manner. TEQL enables querying of interval-based event streams using temporal database operators. Temporal databases and temporal query languages have been a subject of research for more than 30 years and are a natural fit for expressing queries that involve a temporal dimension. However, operators developed in this context cannot be directly applied to event streams. The research extends a preexisting relational framework for event stream processing to support temporal queries. The language features and formal semantic extensions to extend the relational framework are identified. The extended framework supports continuous, step-wise evaluation of temporal queries. The incremental evaluation of TEQL operators is formalized to avoid re-computation of previous results. The research includes the development of a prototype that supports the integrated event and temporal query processing framework, with support for incremental evaluation and materialization of intermediate results. TEQL enables reporting temporal data in the output, direct specification of conditions over timestamps, and specification of temporal relational operators. Through the integration of temporal database operators with event languages, a new class of temporal queries is made possible for querying event streams. New features include semantic aggregation, extraction of temporal patterns using set operators, and a more accurate specification of event co-occurrence.
Date Created
2012
Agent

Trust and profit sensitive ranking for the deep web and on-line advertisements

151129-Thumbnail Image.png
Description
Ranking is of definitive importance to both usability and profitability of web information systems. While ranking of results is crucial for the accessibility of information to the user, the ranking of online ads increases the profitability of the search provider.

Ranking is of definitive importance to both usability and profitability of web information systems. While ranking of results is crucial for the accessibility of information to the user, the ranking of online ads increases the profitability of the search provider. The scope of my thesis includes both search and ad ranking. I consider the emerging problem of ranking the deep web data considering trustworthiness and relevance. I address the end-to-end deep web ranking by focusing on: (i) ranking and selection of the deep web databases (ii) topic sensitive ranking of the sources (iii) ranking the result tuples from the selected databases. Especially, assessing the trustworthiness and relevances of results for ranking is hard since the currently used link analysis is inapplicable (since deep web records do not have links). I formulated a method---namely SourceRank---to assess the trustworthiness and relevance of the sources based on the inter-source agreement. Secondly, I extend the SourceRank to consider the topic of the agreeing sources in multi-topic environments. Further, I formulate a ranking sensitive to trustworthiness and relevance for the individual results returned by the selected sources. For ad ranking, I formulate a generalized ranking function---namely Click Efficiency (CE)---based on a realistic user click model of ads and documents. The CE ranking considers hitherto ignored parameters of perceived relevance and user dissatisfaction. CE ranking guaranteeing optimal utilities for the click model. Interestingly, I show that the existing ad and document ranking functions are reduced forms of the CE ranking under restrictive assumptions. Subsequently, I extend the CE ranking to include a pricing mechanism, designing a complete auction mechanism. My analysis proves several desirable properties including revenue dominance over popular Vickery-Clarke-Groves (VCG) auctions for the same bid vector and existence of a Nash equilibrium in pure strategies. The equilibrium is socially optimal, and revenue equivalent to the truthful VCG equilibrium. Further, I relax the independence assumption in CE ranking and analyze the diversity ranking problem. I show that optimal diversity ranking is NP-Hard in general, and that a constant time approximation algorithm is not likely.
Date Created
2012
Agent

Enhancing the usability of complex structured data by supporting keyword searches

150026-Thumbnail Image.png
Description
As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major

As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major hardness of using structured data is the problem of easily retrieving information from them given a user's information needs. Learning and using a structured query language (e.g., SQL and XQuery) is overwhelmingly burdensome for most users, as not only are these languages sophisticated, but the users need to know the data schema. Keyword search provides us with opportunities to conveniently access structured data and potentially significantly enhances the usability of structured data. However, processing keyword search on structured data is challenging due to various types of ambiguities such as structural ambiguity (keyword queries have no structure), keyword ambiguity (the keywords may not be accurate), user preference ambiguity (the user may have implicit preferences that are not indicated in the query), as well as the efficiency challenges due to large search space. This dissertation performs an expansive study on keyword search processing techniques as a gateway for users to access structured data and retrieve desired information. The key issues addressed include: (1) Resolving structural ambiguities in keyword queries by generating meaningful query results, which involves identifying relevant keyword matches, identifying return information, composing query results based on relevant matches and return information. (2) Resolving structural, keyword and user preference ambiguities through result analysis, including snippet generation, result differentiation, result clustering, result summarization/query expansion, etc. (3) Resolving the efficiency challenge in processing keyword search on structured data by utilizing and efficiently maintaining materialized views. These works deliver significant technical contributions towards building a full-fledged search engine for structured data.
Date Created
2011
Agent