Kitsune: Structurally-Aware and Adaptable Plagiarism Detection

158416-Thumbnail Image.png
Description
Plagiarism is a huge problem in a learning environment. In programming classes especially, plagiarism can be hard to detect as source codes' appearance can be easily modified without changing the intent through simple formatting changes or refactoring. There are a

Plagiarism is a huge problem in a learning environment. In programming classes especially, plagiarism can be hard to detect as source codes' appearance can be easily modified without changing the intent through simple formatting changes or refactoring. There are a number of plagiarism detection tools that attempt to encode knowledge about the programming languages they support in order to better detect obscured duplicates. Many such tools do not support a large number of languages because doing so requires too much code and therefore too much maintenance. It is also difficult to add support for new languages because each language is vastly different syntactically. Tools that are more extensible often do so by reducing the features of a language that are encoded and end up closer to text comparison tools than structurally-aware program analysis tools.

Kitsune attempts to remedy these issues by tying itself to Antlr, a pre-existing language recognition tool with over 200 currently supported languages. In addition, it provides an interface through which generic manipulations can be applied to the parse tree generated by Antlr. As Kitsune relies on language-agnostic structure modifications, it can be adapted with minimal effort to provide plagiarism detection for new languages. Kitsune has been evaluated for 10 of the languages in the Antlr grammar repository with success and could easily be extended to support all of the grammars currently developed by Antlr or future grammars which are developed as new languages are written.
Date Created
2020
Agent

Adaptive mHealth interventions for improving youth responsiveness and clinical outcomes

157565-Thumbnail Image.png
Description
Mobile health (mHealth) applications (apps) hold tremendous potential for addressing chronic health conditions. Smartphones are now the most popular form of computing, and the ubiquitous “always with us, always on” nature of mobile technology makes them amenable to interventions aimed

Mobile health (mHealth) applications (apps) hold tremendous potential for addressing chronic health conditions. Smartphones are now the most popular form of computing, and the ubiquitous “always with us, always on” nature of mobile technology makes them amenable to interventions aimed and managing chronic disease. Several challenges exist, however, such as the difficulty in determining mHealth effects due to the rapidly changing nature of the technology and the challenges presented to existing methods of evaluation, and the ability to ensure end users consistently use the technology in order to achieve the desired effects. The latter challenge is in adherence, defined as the extent to which a patient conducts the activities defined in a clinical protocol (i.e. an intervention plan). Further, higher levels of adherence should lead to greater effects of the intervention (the greater fidelity to the protocol, the more benefit one should receive from the protocol). mHealth has limitations in these areas; the ability to have patients sustainably adhere to a protocol, and the ability to drive intervention effect sizes. My research considers personalized interventions, a new approach of study in the mHealth community, as a potential remedy to these limitations. Specifically, in the context of a pediatric preventative anxiety protocol, I introduce algorithms to drive greater levels of adherence and greater effect sizes by incorporating per-patient (personalized) information. These algorithms have been implemented within an existing mHealth app for middle school that has been successfully deployed in a school in the Phoenix Arizona metropolitan area. The number of users is small (n=3) so a case-by-case analysis of app usage is presented. In addition simulated user behaviors based on models of adherence and effects sizes over time are presented as a means to demonstrate the potential impact of personalized deployments on a larger scale.
Date Created
2019
Agent

Graph Search as a Feature in Imperative/Procedural Programming Languages

156331-Thumbnail Image.png
Description
Graph theory is a critical component of computer science and software engineering, with algorithms concerning graph traversal and comprehension powering much of the largest problems in both industry and research. Engineers and researchers often have an accurate view of their

Graph theory is a critical component of computer science and software engineering, with algorithms concerning graph traversal and comprehension powering much of the largest problems in both industry and research. Engineers and researchers often have an accurate view of their target graph, however they struggle to implement a correct, and efficient, search over that graph.

To facilitate rapid, correct, efficient, and intuitive development of graph based solutions we propose a new programming language construct - the search statement. Given a supra-root node, a procedure which determines the children of a given parent node, and optional definitions of the fail-fast acceptance or rejection of a solution, the search statement can conduct a search over any graph or network. Structurally, this statement is modelled after the common switch statement and is put into a largely imperative/procedural context to allow for immediate and intuitive development by most programmers. The Go programming language has been used as a foundation and proof-of-concept of the search statement. A Go compiler is provided which implements this construct.
Date Created
2018
Agent

Learning about Android Development through Application of the Software Curriculum to Solve Real World Problems

133981-Thumbnail Image.png
Description
This mobile application development sought to accomplish three goals: learning mobile development, addressing a real world problem, and applying four years of schooling towards a structured project. These goals were each chosen for individual reasons. Firstly, the mobile platform is

This mobile application development sought to accomplish three goals: learning mobile development, addressing a real world problem, and applying four years of schooling towards a structured project. These goals were each chosen for individual reasons. Firstly, the mobile platform is the most effective way to reach the maximal amount of users. The majority of the first-world populous owns a smartphone and spends a reasonable amount of time using them through a multitude of purposes such as scheduling, conversation, and entertainment. Coupled with the lack of personal experience in this development area, it becomes evident that creating a mobile application was the most desirable choice for this project. Secondly, after hearing stories and reflections from my peers about their own humanitarian endeavors, their experiences sparked a desire to utilize this project as an opportunity to also create an impact. Thus, this project began to address the desire to solve a real world problem. Lastly, the first three years of the software engineering curriculum has been focused primarily on theoretical environments and projects. Most of these projects seemed to have no real world transferability. Therefore, having spent the time and effort to learn proper methods of software development, it would be remiss not to use these skills to train for future employment. This thesis began with a motivation to solve the time-consuming problem of finding health facilities that satisfy financial, insurance, and health needs. Two personal accounts of delayed proper medical services will be expanded upon later in this document. These experiences served as inspiration to delve further into the problem and determine if there was a solution to solve it. After months of exploring and planning, the project hit an impassible roadblock that deemed the project could no longer be continued. Within the remaining time constraints, changing the development area was not a viable solution to accomplish the intended goal. Thus, a new idea was conceived to assist those trying to cope with anxiety in today's world. One of the common recommendations for people suffering from anxiety is to write down their troubles with the intention of reflecting on them at a later point. This serves as a method to reason through the irrationality, enabling individuals to identify repetitive patterns over long periods of time. Physically writing down these reflections with pen and paper is no longer sufficient in this technological era. This is especially true for those wanting to retain their privacy or those without enough drive to consistently use this technique. The remaining months of this thesis were directed at planning and creating a prototype to address this.
Date Created
2018-05
Agent

SMART SCHEDULING FOR INSTRUCTIONAL MODULE DEVELOPMENT SYSTEM

Description
Many organizational course design methodologies feature general guidelines for the chronological and time-management aspects of course design development. Proper course structure and instructional strategy pacing has been shown to facilitate student knowledge acquisition of novel material. These course-scheduling details influencing

Many organizational course design methodologies feature general guidelines for the chronological and time-management aspects of course design development. Proper course structure and instructional strategy pacing has been shown to facilitate student knowledge acquisition of novel material. These course-scheduling details influencing student learning outcomes implies the need for an effective and tightly coupled component of an instructional module. The Instructional Module Development System, or IMODS, seeks to improve STEM, or ‘science, technology, engineering, and math’, education, by equipping educators with a powerful informational tool that helps guide course design by providing information based on contemporary research about pedagogical methodology and assessment practices. This is particularly salient within the higher-education STEM fields because many instructors come from backgrounds that are more technical and most Ph.Ds. in science fields have traditionally not focused on preparing doctoral candidates to teach. This thesis project aims to apply a multidisciplinary approach, blending educational psychology and computer science, to help improve STEM education. By developing an instructional module-scheduling feature for the Web-based IMODS, Instructional Module Development System, system, we can help instructors plan out and organize their course work inside and outside of the classroom, while providing them with relevant helpful research that will help them improve their courses. This article illustrates the iterative design process to gather background research on pacing of workload and learning activities and their influence on student knowledge acquisition, constructively critique and analyze pre-existing information technology (IT) scheduling tools, synthesize graphical user interface, or GUI, mockups based on the background research, and then implement a functional-working prototype using the IMODs framework.
Date Created
2016-05
Agent

Maroon and Gold: Mobile Application

135458-Thumbnail Image.png
Description
Currently, students at Arizona State University are restricted to cards when using their college's local currency. This currency, Maroon and Gold dollars (M&G), is a primary source of meal plans for many students. When relying on card readers, students risk

Currently, students at Arizona State University are restricted to cards when using their college's local currency. This currency, Maroon and Gold dollars (M&G), is a primary source of meal plans for many students. When relying on card readers, students risk security and convenience. The security is risked due to the constant student id number on each card. A student's identification number never changes and is located on each card. If the student loses their card, their account information is permanently compromised. Convenience is an issue because, currently, students must make a purchase in order to see their current account balance. Another major issue is that businesses must purchase external hardware in order to use the M&G System. An online or mobile system would eliminate the need for a physical card and allow businesses to function without external card readers. Such a system would have access to financial information of businesses and students at ASU. Thus, the system require severe scrutiny by a well-trusted team of professionals before being implemented. My objective was to help bring such a system to life. To do this, I decided to make a mobile application prototype to serve as a baseline and to demonstrate the features of such a system. As a baseline, it needed to have a realistic, professional appearance, with the ability to accurately demonstrate feature functionality. Before developing the app, I set out to determine the User Interactions and User Experience designs (UI/UX) by conducting a series of informal interviews with local students and businesses. After the designs were finalized, I started implementation of the actual application in Android Studio. This creative project consists of a mobile application, a contained database, a GUI (Graphics User Interface) prototype, and a technical document.
Date Created
2016-05
Agent

New methodology of automatic design collaboration

155205-Thumbnail Image.png
Description
When software design teams attempt to collaborate on different design docu-

ments they suffer from a serious collaboration problem. Designers collaborate either in person or remotely. In person collaboration is expensive but effective. Remote collaboration is inexpensive but inefficient. In, order

When software design teams attempt to collaborate on different design docu-

ments they suffer from a serious collaboration problem. Designers collaborate either in person or remotely. In person collaboration is expensive but effective. Remote collaboration is inexpensive but inefficient. In, order to gain the most benefit from collaboration there needs to be remote collaboration that is not only cheap but also as efficient as physical collaboration.

Remotely collaborating on software design relies on general tools such as Word, and Excel. These tools are then shared in an inefficient manner by using either email, cloud based file locking tools, or something like google docs. Because these tools either increase the number of design building blocks, or limit the number

of available times in which one can work on a specific document, they drastically decrease productivity.

This thesis outlines a new methodology to increase design productivity, accom- plished by providing design specific collaboration. Using version control systems, this methodology allows for effective project collaboration between remotely lo- cated design teams. The methodology of this paper encompasses role management, policy management, and design artifact management, including nonfunctional re- quirements. Version control can be used for different design products, improving communication and productivity amongst design teams. This thesis outlines this methodology and then outlines a proof of concept tool that embodies the core of these principles.
Date Created
2016
Agent

A semantic framework for integrating and publishing linked data on the Web

154834-Thumbnail Image.png
Description
Semantic web is the web of data that provides a common framework and technologies for sharing and reusing data in various applications. In semantic web terminology, linked data is the term used to describe a method of exposing and connecting

Semantic web is the web of data that provides a common framework and technologies for sharing and reusing data in various applications. In semantic web terminology, linked data is the term used to describe a method of exposing and connecting data on the web from different sources. The purpose of linked data and semantic web is to publish data in an open and standard format and to link this data with existing data on the Linked Open Data Cloud. The goal of this thesis to come up with a semantic framework for integrating and publishing linked data on the web. Traditionally integrating data from multiple sources usually involves an Extract-Transform-Load (ETL) framework to generate datasets for analytics and visualization. The thesis proposes introducing a semantic component in the ETL framework to semi-automate the generation and publishing of linked data. In this thesis, various existing ETL tools and data integration techniques have been analyzed and deficiencies have been identified. This thesis proposes a set of requirements for the semantic ETL framework by conducting a manual process to integrate data from various sources such as weather, holidays, airports, flight arrival, departure and delays. The research questions that are addressed are: (i) to what extent can the integration, generation, and publishing of linked data to the cloud using a semantic ETL framework be automated; (ii) does use of semantic technologies produce a richer data model and integrated data. Details of the methodology, data collection, and application that uses the linked data generated are presented. Evaluation is done by comparing traditional data integration approach with semantic ETL approach in terms of effort involved in integration, data model generated and querying the data generated.
Date Created
2016
Agent

A composite natural language processing and information retrieval approach to question answering against a structured knowledge base

154818-Thumbnail Image.png
Description
With the inception of World Wide Web, the amount of data present on the internet is tremendous. This makes the task of navigating through this enormous amount of data quite difficult for the user. As users struggle to navigate through

With the inception of World Wide Web, the amount of data present on the internet is tremendous. This makes the task of navigating through this enormous amount of data quite difficult for the user. As users struggle to navigate through this wealth of information, the need for the development of an automated system that can extract the required information becomes urgent. The aim of this thesis is to develop a Question Answering system to ease the process of information retrieval.

Question Answering systems have been around for quite some time and are a sub-field of information retrieval and natural language processing. The task of any Question Answering system is to seek an answer to a free form factual question. The difficulty of pinpointing and verifying the precise answer makes question answering more challenging than simple information retrieval done by search engines. Text REtrieval Conference (TREC) is a yearly conference which provides large - scale infrastructure and resources to support research in information retrieval domain. TREC has a question answering track since 1999 where the questions dataset contains a list of factual questions (Vorhees & Tice, 1999). DBpedia (Bizer et al., 2009) is a community driven effort to extract and structure the data present in Wikipedia.

The research objective of this thesis is to develop a novel approach to Question Answering based on a composition of conventional approaches of Information Retrieval and Natural Language processing. The focus is also on exploring the use of a structured and annotated knowledge base as opposed to an unstructured knowledge base. The knowledge base used here is DBpedia and the final system is evaluated on the TREC 2004 questions dataset.
Date Created
2016
Agent

Distributed SPARQL over big RDF data: a comparative analysis using Presto and MapReduce

153213-Thumbnail Image.png
Description
The processing of large volumes of RDF data require an efficient storage and query processing engine that can scale well with the volume of data. The initial attempts to address this issue focused on optimizing native RDF stores as

The processing of large volumes of RDF data require an efficient storage and query processing engine that can scale well with the volume of data. The initial attempts to address this issue focused on optimizing native RDF stores as well as conventional relational databases management systems. But as the volume of RDF data grew to exponential proportions, the limitations of these systems became apparent and researchers began to focus on using big data analysis tools, most notably Hadoop, to process RDF data. Various studies and benchmarks that evaluate these tools for RDF data processing have been published. In the past two and half years, however, heavy users of big data systems, like Facebook, noted limitations with the query performance of these big data systems and began to develop new distributed query engines for big data that do not rely on map-reduce. Facebook's Presto is one such example.

This thesis deals with evaluating the performance of Presto in processing big RDF data against Apache Hive. A comparative analysis was also conducted against 4store, a native RDF store. To evaluate the performance Presto for big RDF data processing, a map-reduce program and a compiler, based on Flex and Bison, were implemented. The map-reduce program loads RDF data into HDFS while the compiler translates SPARQL queries into a subset of SQL that Presto (and Hive) can understand. The evaluation was done on four and eight node Linux clusters installed on Microsoft Windows Azure platform with RDF datasets of size 10, 20, and 30 million triples. The results of the experiment show that Presto has a much higher performance than Hive can be used to process big RDF data. The thesis also proposes an architecture based on Presto, Presto-RDF, that can be used to process big RDF data.
Date Created
2014
Agent