Data-Driven Sustainability: A Machine Learning Approach to Assessing ESG Performance in B Corporations

Description
The purpose of this research is to create predictive models for a leading sustainability certification - the B Corporation certification issued by the non-profit company B Lab based on the B Impact Assessment. This certification is one of many that

The purpose of this research is to create predictive models for a leading sustainability certification - the B Corporation certification issued by the non-profit company B Lab based on the B Impact Assessment. This certification is one of many that are currently being used to assess sustainability in the corporate world, and this research seeks to understand the relationships between a corporation's characteristics (e.g. market, size, country) and the B Certification. The data used for the analysis comes from a B Lab upload to data.world, providing descriptive information on each company, current certification status, and B Impact Assessment scores. Further data engineering was used to include attributes on publicly traded status and years certified. Comparing Logistic Regression and Random Forest Classification machine learning methods, a predictive model was produced with 87.58% accuracy discerning between certified and de-certified B Corporations.
Date Created
2024-05
Agent

An Analysis of Significant Factors Affecting Grades Using Student Data

Description

This project tackles a real-world example of a classroom with college students to discover what factors affect a student’s outcome in the class as well as investigate when and why a student who started well in the semester may end

This project tackles a real-world example of a classroom with college students to discover what factors affect a student’s outcome in the class as well as investigate when and why a student who started well in the semester may end poorly later on. First, this project performs a statistical analysis to ensure that the total score of a student is truly based on the factors given in the dataset instead of due to random chance. Next, factors that are the most significant in affecting the outcome of scores in zyBook assignments are discovered. Thirdly, visualization of how students perform over time is displayed for the student body as a whole and students who started well at the beginning of the semester but trailed off towards the end. Lastly, the project also gives insight into the failure metrics for good starter students who unfortunately did not perform as well later in the course.

Date Created
2023-05
Agent

Using Stepwise Logistic Regression to Determine Substitutions in Baseball

132394-Thumbnail Image.png
Description
In baseball, a starting pitcher has historically been a more durable pitcher capable of lasting long into games without tiring. For the entire history of Major League Baseball, these pitchers have been expected to last 6 innings or more into

In baseball, a starting pitcher has historically been a more durable pitcher capable of lasting long into games without tiring. For the entire history of Major League Baseball, these pitchers have been expected to last 6 innings or more into a game before being replaced. However, with the advances in statistics and sabermetrics and their gradual acceptance by professional coaches, the role of the starting pitcher is beginning to change. Teams are experimenting with having starters being replaced quicker, challenging the traditional role of the starting pitcher. The goal of this study is to determine if there is an exact point at which a team would benefit from replacing a starting or relief pitcher with another pitcher using statistical analyses. We will use logistic stepwise regression to predict the likelihood of a team scoring a run if a substitution is made or not made given the current game situation.
Date Created
2019-05
Agent

FastStat: Online Statistics Calculator

133091-Thumbnail Image.png
Description
FastStat is a responsive website designed to work on any handheld, laptop, or desktop device. It serves as a first step into statistical calculations, educating the user on the basics of statistical analysis, and guiding them as they perform analyses

FastStat is a responsive website designed to work on any handheld, laptop, or desktop device. It serves as a first step into statistical calculations, educating the user on the basics of statistical analysis, and guiding them as they perform analyses of their own using built-in calculators. The calculators available can perform z tests, t tests, chi square tests, and analysis of variance tests to determine significant characteristics of the user's data. Outputted data includes means, standard deviations, significance levels, applicable statistics, and worded results indicating the outcome of the performed test. With its clean design, FastStat directs the user in an intuitive manner to fill in the information needed, giving clear indications of what types of values are needed where and flagging descriptive error messages if any inputted values are incorrect. FastStat also implements a halt to calculations if any errors are found, which saves time by avoiding impossible calculations. Once complete, FastStat outputs a variety of information of use to the user in a clearly labeled manner. The calculators are designed in such a way that the user will know what information they will get out of the calculator before performing any calculations at all. Aside from the calculators, FastStat includes introductory pages designed to get users familiar with common statistical terms and the associated tests, solidifying its purpose as an introductory tool. All tests are described by their typical uses, necessary inputs, calculated outputs, and extra notes of importance. Many terms are defined for the purpose of statistics, complete with examples to help educate the user on the concepts. With the information available, even the newest statistician can learn and begin performing tests almost immediately.
Date Created
2018-12
Agent

Differences in Body Mass Index (BMI) Trends Across American Ethnicities

133714-Thumbnail Image.png
Description
This study aims to determine if there are differences in body mass index (BMI) across ethnic groups in the United States. Modern medicine is increasingly going the way of personalized medicine, and existing literature has begun to suggest that cultural

This study aims to determine if there are differences in body mass index (BMI) across ethnic groups in the United States. Modern medicine is increasingly going the way of personalized medicine, and existing literature has begun to suggest that cultural differences may have an effect on physical health. Initially, this study was to explore anorexia nervosa prevalence, but the data is simply not there; this led to a shift in focus to exploring health differences in terms of BMI. The data analyzed is from the National Health and Nutritional Examination Survey (NHANES) collected by the Centers of Disease Control and Prevention (CDC) from 1999-2013. The subjects used were aged 13-25, and the ethnicities compared were African American, Caucasian American, Mexican American, Other Hispanic American, Asian American, and Other (including multiracial). Statistical tests were run through the software program SAS and included ANOVA tests, t-tests, and z-tests. It was found that there are differences across ethnicities, and that there are far more differences among females than among males. Asian American males and Mexican American males appear to be the groups that caused males to have significant differences. Asian Americans were also found to have the lowest average BMI by far. On the other hand, African Americans and Mexican Americans appeared to have the highest average BMIs. Although these findings and others detailed in the paper are intriguing, the BMI data is not strictly normal, and is still not normalized even by transforming the variable into a log of BMI. The data is still right skewed, and must be attacked in the future with different transformations and non-parametric tests to increase the accuracy and strength of these findings.
Date Created
2018-05
Agent