Sentimental bi-partite graph of political blogs

150969-Thumbnail Image.png
Description
Analysis of political texts, which contains a huge amount of personal political opinions, sentiments, and emotions towards powerful individuals, leaders, organizations, and a large number of people, is an interesting task, which can lead to discover interesting interactions between the

Analysis of political texts, which contains a huge amount of personal political opinions, sentiments, and emotions towards powerful individuals, leaders, organizations, and a large number of people, is an interesting task, which can lead to discover interesting interactions between the political parties and people. Recently, political blogosphere plays an increasingly important role in politics, as a forum for debating political issues. Most of the political weblogs are biased towards their political parties, and they generally express their sentiments towards their issues (i.e. leaders, topics etc.,) and also towards issues of the opposing parties. In this thesis, I have modeled the above interactions/debate as a sentimental bi-partite graph, a bi-partite graph with Blogs forming vertices of a disjoint set, and the issues (i.e. leaders, topics etc.,) forming the other disjoint set,and the edges between the two sets representing the sentiment of the blogs towards the issues. I have used American Political blog data to model the sentimental bi- partite graph, in particular, a set of popular political liberal and conservative blogs that have clearly declared positions. These blogs contain discussion about social, political, economic issues and related key individuals in their conservative/liberal view. To be more focused and more polarized, 22 most popular liberal/conservative blogs of a particular time period, May 2008 - October 2008(because of high intensity of debate and discussions), just before the presidential elections, was considered, involving around 23,800 articles. This thesis involves solving the questions: a) which is the most liberal/conservative blogs on the web? b) Who is on which side of debate and what are the issues? c) Who are the important leaders? d) How do you model the relationship between the participants of the debate and the underlying issues?
Date Created
2012
Agent

A visualization dashboard for Muslim social movements

150836-Thumbnail Image.png
Description
Muslim radicalism is recognized as one of the greatest security threats for the United States and the rest of the world. Use of force to eliminate specific radical entities is ineffective in containing radicalism as a whole. There is a

Muslim radicalism is recognized as one of the greatest security threats for the United States and the rest of the world. Use of force to eliminate specific radical entities is ineffective in containing radicalism as a whole. There is a need to understand the origin, ideologies and behavior of Radical and Counter-Radical organizations and how they shape up over a period of time. Recognizing and supporting counter-radical organizations is one of the most important steps towards impeding radical organizations. A lot of research has already been done to categorize and recognize organizations, to understand their behavior, their interactions with other organizations, their target demographics and the area of influence. We have a huge amount of information which is a result of the research done over these topics. This thesis provides a powerful and interactive way to navigate through all this information, using a Visualization Dashboard. The dashboard makes it easier for Social Scientists, Policy Analysts, Military and other personnel to visualize an organization's propensity towards violence and radicalism. It also tracks the peaking religious, political and socio-economic markers, their target demographics and locations. A powerful search interface with parametric search helps in narrowing down to specific scenarios and view the corresponding information related to the organizations. This tool helps to identify moderate Counter-Radical organizations and also has the potential of predicting the orientation of various organizations based on the current information.
Date Created
2012
Agent

Expanding data mining theory for industrial applications

150537-Thumbnail Image.png
Description
The field of Data Mining is widely recognized and accepted for its applications in many business problems to guide decision-making processes based on data. However, in recent times, the scope of these problems has swollen and the methods are under

The field of Data Mining is widely recognized and accepted for its applications in many business problems to guide decision-making processes based on data. However, in recent times, the scope of these problems has swollen and the methods are under scrutiny for applicability and relevance to real-world circumstances. At the crossroads of innovation and standards, it is important to examine and understand whether the current theoretical methods for industrial applications (which include KDD, SEMMA and CRISP-DM) encompass all possible scenarios that could arise in practical situations. Do the methods require changes or enhancements? As part of the thesis I study the current methods and delineate the ideas of these methods and illuminate their shortcomings which posed challenges during practical implementation. Based on the experiments conducted and the research carried out, I propose an approach which illustrates the business problems with higher accuracy and provides a broader view of the process. It is then applied to different case studies highlighting the different aspects to this approach.
Date Created
2012
Agent

SystemC TLM2.0 modeling of network-on-chip architecture

150524-Thumbnail Image.png
Description
Network-on-Chip (NoC) architectures have emerged as the solution to the on-chip communication challenges of multi-core embedded processor architectures. Design space exploration and performance evaluation of a NoC design requires fast simulation infrastructure. Simulation of register transfer level model of NoC

Network-on-Chip (NoC) architectures have emerged as the solution to the on-chip communication challenges of multi-core embedded processor architectures. Design space exploration and performance evaluation of a NoC design requires fast simulation infrastructure. Simulation of register transfer level model of NoC is too slow for any meaningful design space exploration. One of the solutions to reduce the speed of simulation is to increase the level of abstraction. SystemC TLM2.0 provides the capability to model hardware design at higher levels of abstraction with trade-off of simulation speed and accuracy. In this thesis, SystemC TLM2.0 models of NoC routers are developed at three levels of abstraction namely loosely-timed, approximately-timed, and cycle accurate. Simulation speed and accuracy of these three models are evaluated by a case study of a 4x4 mesh NoC.
Date Created
2012
Agent

Statistical monitoring and control of locally proactive routing protocols in MANETs

150488-Thumbnail Image.png
Description
Mobile ad hoc networks (MANETs) have attracted attention for mission critical applications. This dissertation investigates techniques of statistical monitoring and control for overhead reduction in a proactive MANET routing protocol. Proactive protocols transmit overhead periodically. Instead, we propose that the

Mobile ad hoc networks (MANETs) have attracted attention for mission critical applications. This dissertation investigates techniques of statistical monitoring and control for overhead reduction in a proactive MANET routing protocol. Proactive protocols transmit overhead periodically. Instead, we propose that the local conditions of a node should determine this transmission decision. While the goal is to minimize overhead, a balance in the amount of overhead transmitted and the performance achieved is required. Statistical monitoring consists of techniques to determine if a characteristic has shifted away from an in-control state. A basic tool for monitoring is a control chart, a time-oriented representation of the characteristic. When a sample deviates outside control limits, a significant change has occurred and corrective actions are required to return to the in-control state. We investigate the use of statistical monitoring of local conditions in the Optimized Link State Routing (OLSR) protocol. Three versions are developed. In A-OLSR, each node uses a Shewhart chart to monitor betweenness of its two-hop neighbourhood. Betweenness is a social network metric that measures a node's influence; betweenness is larger when a node has more influence. Changes in topology are associated with changes in betweenness. We incorporate additional local node conditions including speed, density, packet arrival rate, and number of flows it forwards in A+-OLSR. Response Surface Methodology (RSM) is used to optimize timer values. As well, the Shewhart chart is replaced by an Exponentially Weighted Moving Average (EWMA) chart, which is more sensitive to small changes in the characteristic. It is known that control charts do not work as well in the presence of correlation. Hence, in A*-OLSR the autocorrelation in the time series is removed and an Auto-Regressive Integrated Moving Average (ARIMA) model found; this removes the dependence on node speed. A*-OLSR also extends monitoring to two characteristics concurrently using multivariate cumulative sum (MCUSUM) charts. The protocols are evaluated in simulation, and compared to OLSR and its variants. The techniques for statistical monitoring and control are general and have great potential to be applied to the adaptive control of many network protocols.
Date Created
2012
Agent

Energy management in solar powered wireless sensor networks

150486-Thumbnail Image.png
Description
The use of energy-harvesting in a wireless sensor network (WSN) is essential for situations where it is either difficult or not cost effective to access the network's nodes to replace the batteries. In this paper, the problems involved in controlling

The use of energy-harvesting in a wireless sensor network (WSN) is essential for situations where it is either difficult or not cost effective to access the network's nodes to replace the batteries. In this paper, the problems involved in controlling an active sensor network that is powered both by batteries and solar energy are investigated. The objective is to develop control strategies to maximize the quality of coverage (QoC), which is defined as the minimum number of targets that must be covered and reported over a 24 hour period. Assuming a time varying solar profile, the problem is to optimally control the sensing range of each sensor so as to maximize the QoC while maintaining connectivity throughout the network. Implicit in the solution is the dynamic allocation of solar energy during the day to sensing and to recharging the battery so that a minimum coverage is guaranteed even during the night, when only the batteries can supply energy to the sensors. This problem turns out to be a non-linear optimal control problem of high complexity. Based on novel and useful observations, a method is presented to solve it as a series of quasiconvex (unimodal) optimization problems which not only ensures a maximum QoC, but also maintains connectivity throughout the network. The runtime of the proposed solution is 60X less than a naive but optimal method which is based on dynamic programming, while the peak error of the solution is less than 8%. Unlike the dynamic programming method, the proposed method is scalable to large networks consisting of hundreds of sensors and targets. The solution method enables a designer to explore the optimal configuration of network design. This paper offers many insights in the design of energy-harvesting networks, which result in minimum network setup cost through determination of optimal configuration of number of sensors, sensing beam width, and the sampling time.
Date Created
2012
Agent

Adaptive cross layer design and implementation for gigabit multimedia applications using 60 GHz wireless links

150348-Thumbnail Image.png
Description
Demands in file size and transfer rates for consumer-orientated products have escalated in recent times. This is primarily due to the emergence of high definition video content. Now factor in the consumer desire for convenience, and we find that wireless

Demands in file size and transfer rates for consumer-orientated products have escalated in recent times. This is primarily due to the emergence of high definition video content. Now factor in the consumer desire for convenience, and we find that wireless service is the most desired approach for inter-connectivity. Consumers expect wireless service to emulate wired service with little to virtually no difference in quality of service (QoS). The background section of this document examines the QoS requirements for wireless connectivity of high definition video applications. I then proceed to look at proposed solutions at the physical (PHY) and the media access control (MAC) layers as well as cross-layer schemes. These schemes are subsequently are evaluated in terms of usefulness in a multi-gigabit, 60 GHz wireless multimedia system targeting the average consumer. It is determined that a substantial gap in published literature exists pertinent to this application. Specifically, little or no work has been found that shows how an adaptive PHYMAC cross-layer solution that provides real-time compensation for varying channel conditions might be actually implemented. Further, no work has been found that shows results of such a model. This research proposes, develops and implements in Matlab code an alternate cross-layer solution that will provide acceptable QoS service for multimedia applications. Simulations using actual high definition video sequences are used to test the proposed solution. Results based on the average PSNR metric show that a quasi-adaptive algorithm provides greater than 7 dB of improvement over a non-adaptive approach while a fully-adaptive alogrithm provides over18 dB of improvement. The fully adaptive implementation has been conclusively shown to be superior to non-adaptive techniques and sufficiently superior to even quasi-adaptive algorithms.
Date Created
2011
Agent

Finding provenance data in social media

150244-Thumbnail Image.png
Description
A statement appearing in social media provides a very significant challenge for determining the provenance of the statement. Provenance describes the origin, custody, and ownership of something. Most statements appearing in social media are not published with corresponding provenance data.

A statement appearing in social media provides a very significant challenge for determining the provenance of the statement. Provenance describes the origin, custody, and ownership of something. Most statements appearing in social media are not published with corresponding provenance data. However, the same characteristics that make the social media environment challenging, including the massive amounts of data available, large numbers of users, and a highly dynamic environment, provide unique and untapped opportunities for solving the provenance problem for social media. Current approaches for tracking provenance data do not scale for online social media and consequently there is a gap in provenance methodologies and technologies providing exciting research opportunities. The guiding vision is the use of social media information itself to realize a useful amount of provenance data for information in social media. This departs from traditional approaches for data provenance which rely on a central store of provenance information. The contemporary online social media environment is an enormous and constantly updated "central store" that can be mined for provenance information that is not readily made available to the average social media user. This research introduces an approach and builds a foundation aimed at realizing a provenance data capability for social media users that is not accessible today.
Date Created
2011
Agent

Sparse learning package with stability selection and application to alzheimer's disease

150190-Thumbnail Image.png
Description
Sparse learning is a technique in machine learning for feature selection and dimensionality reduction, to find a sparse set of the most relevant features. In any machine learning problem, there is a considerable amount of irrelevant information, and separating relevant

Sparse learning is a technique in machine learning for feature selection and dimensionality reduction, to find a sparse set of the most relevant features. In any machine learning problem, there is a considerable amount of irrelevant information, and separating relevant information from the irrelevant information has been a topic of focus. In supervised learning like regression, the data consists of many features and only a subset of the features may be responsible for the result. Also, the features might require special structural requirements, which introduces additional complexity for feature selection. The sparse learning package, provides a set of algorithms for learning a sparse set of the most relevant features for both regression and classification problems. Structural dependencies among features which introduce additional requirements are also provided as part of the package. The features may be grouped together, and there may exist hierarchies and over- lapping groups among these, and there may be requirements for selecting the most relevant groups among them. In spite of getting sparse solutions, the solutions are not guaranteed to be robust. For the selection to be robust, there are certain techniques which provide theoretical justification of why certain features are selected. The stability selection, is a method for feature selection which allows the use of existing sparse learning methods to select the stable set of features for a given training sample. This is done by assigning probabilities for the features: by sub-sampling the training data and using a specific sparse learning technique to learn the relevant features, and repeating this a large number of times, and counting the probability as the number of times a feature is selected. Cross-validation which is used to determine the best parameter value over a range of values, further allows to select the best parameter value. This is done by selecting the parameter value which gives the maximum accuracy score. With such a combination of algorithms, with good convergence guarantees, stable feature selection properties and the inclusion of various structural dependencies among features, the sparse learning package will be a powerful tool for machine learning research. Modular structure, C implementation, ATLAS integration for fast linear algebraic subroutines, make it one of the best tool for a large sparse setting. The varied collection of algorithms, support for group sparsity, batch algorithms, are a few of the notable functionality of the SLEP package, and these features can be used in a variety of fields to infer relevant elements. The Alzheimer Disease(AD) is a neurodegenerative disease, which gradually leads to dementia. The SLEP package is used for feature selection for getting the most relevant biomarkers from the available AD dataset, and the results show that, indeed, only a subset of the features are required to gain valuable insights.
Date Created
2011
Agent

Post-optimization: necessity analysis for combinatorial arrays

150111-Thumbnail Image.png
Description
Finding the optimal solution to a problem with an enormous search space can be challenging. Unless a combinatorial construction technique is found that also guarantees the optimality of the resulting solution, this could be an infeasible task. If such a

Finding the optimal solution to a problem with an enormous search space can be challenging. Unless a combinatorial construction technique is found that also guarantees the optimality of the resulting solution, this could be an infeasible task. If such a technique is unavailable, different heuristic methods are generally used to improve the upper bound on the size of the optimal solution. This dissertation presents an alternative method which can be used to improve a solution to a problem rather than construct a solution from scratch. Necessity analysis, which is the key to this approach, is the process of analyzing the necessity of each element in a solution. The post-optimization algorithm presented here utilizes the result of the necessity analysis to improve the quality of the solution by eliminating unnecessary objects from the solution. While this technique could potentially be applied to different domains, this dissertation focuses on k-restriction problems, where a solution to the problem can be presented as an array. A scalable post-optimization algorithm for covering arrays is described, which starts from a valid solution and performs necessity analysis to iteratively improve the quality of the solution. It is shown that not only can this technique improve upon the previously best known results, it can also be added as a refinement step to any construction technique and in most cases further improvements are expected. The post-optimization algorithm is then modified to accommodate every k-restriction problem; and this generic algorithm can be used as a starting point to create a reasonable sized solution for any such problem. This generic algorithm is then further refined for hash family problems, by adding a conflict graph analysis to the necessity analysis phase. By recoloring the conflict graphs a new degree of flexibility is explored, which can further improve the quality of the solution.
Date Created
2011
Agent