SIURO | Volume 17 | SIAM


SIAM Undergraduate Research Online

Volume 17

SIAM Undergraduate Research Online Volume 12

Identifying Priority Areas for Expanding Mental Health Facilities with Mixed Integer Linear Programming

Published electronically January 25, 2024
DOI: 10.1137/23S1547755

Authors: Junyuan Quan (Corresponding author – Denison University)
Project Advisor: Dr. Anthony Bonifonte (Denison University)

Abstract: This research attempts to estimate the unmet mental health services demand at a census tract level and identify new mental health facility locations in Ohio to maximize the number of new individuals with serious mental illness (SMI) who receive treatment. We find that among the 765,304 individuals with SMI in Ohio, 469,549 (61.4%) perceive an unmet need for mental health services due to the lack of geographic access and limited service capacity. Using estimates of the capacity of existing facilities, unmet demand in each census tract, and the distance between each census tract center, we modeled a mixed integer linear program to maximize coverage of newly opened facilities. The model suggests 10 new potential mental health facilities could provide geographic access to 418,228 new patients, comprising 89.1% of the total SMI population that currently has unmet mental health services demand in Ohio. The findings of this research could make recommendations for identifying hot spots for individuals with SMI and priority areas for expanding mental health facilities.

From Pole to Podium: Adjusting Elo Method to Separate Car and Driver in Formula One Racing

Published electronically February 06, 2024
DOI: 10.1137/22S1522899

Authors: Zijian Xun (Corresponding author – Carleton College)
Project Advisor: Timothy P. Chartier (Davidson College)

Abstract: This article presents a novel approach to separating and quantifying the effect of a car’s performance and a driver’s skill on Formula 1(F1) race outcomes. By analyzing data from the past decade, we propose a formula to measure F1 drivers’ ability. This approach could be used to predict race outcomes for a given driver in cars with different performance levels, thereby aiding teams in optimizing resource allocation for car development.

How to be #1 in the IOI? A Study on Rating Nations Participating in the International Informatics Olympiad

Published electronically February 09, 2024
DOI: 10.1137/23S1586951

Authors: Mohamed Mahmoud (Corresponding author – STEM High School for Boys, Giza, Egypt)
Project Advisor: Timothy P. Chartier (Davidson College)

Abstract: This paper investigates the reliability of using Elo, TrueSkill, and Top Coder rating methods in analyzing the performance of nations participating in the International Informatics Olympiad from 2011-2022. This investigation aims to utilize the ratings to assist nations in improving and achieving more medals in future IOI contests. Based on ratings for whole contests and each problem category, including but not limited to graph theory, ad hoc, and data structures, we prove and compare the reliability of the rating methods by measuring their predictive accuracies. By taking Egypt as a case study, we show how to extract useful information from rating changes over time to assist in improvement. In addition, we use standardization and percentiles in locating Egypt, or any nation, in each category among other nations to find which categories weaken the whole contests ratings of Egypt. Thus, Egypt can focus on these categories for improvements. Moreover, we relate each specific range of whole contests percentiles to medal achievements, showing that nations in each range have nearly the same number and types of medals, which means that a country needs to get to a higher specific range of percentiles to get more and better in type medals. Ultimately, we set recommendations for future work, encompassing a sensitive analysis of which category is easier to improve and the usage of a modified Elo version.

Predator-Prey Oscillations in a Cellular Automaton of Huffaker's Mite Experiment

Published electronically February 13, 2024
DOI: 10.1137/22S1529452
Supplementary materials

Authors: Haley Zsoldos (Corresponding author – Pennsylvania State University) and Isabelle Stepler (Pennsylvania State University)
Project Advisor: Jessica M. Conway (Pennsylvania State University) and Timothy Reluga (Pennsylvania State University)

Abstract: Predator-prey interactions are commonly modeled using the Lotka-Volterra ordinary differential equations, producing intertwined predator and prey population oscillations. Scientists have attempted to reproduce these oscillations, such as Carl Huffaker and his 1958 experiment with mites and oranges. However, Huffaker was only able to produce sustained oscillations after adjusting his system’s spatial factors. Particularly, increased space per orange and increased mite dispersal have a significant impact on achieving predator-prey oscillations. To address and confirm this result, we developed a cellular automaton model of Huffaker’s mite experiment. We simplified his system to fit automata criteria, created rules to govern mite dynamics, tested model parameters relating to mite lifetime and fertility, and increased patches per orange and mite dispersal by wooden posts to determine the conditions for successful oscillations. The results of our simulations show that increasing prey dispersal and the number of patches available per orange is sufficient for producing lasting oscillations in our model. Secondarily, we concluded that a certain disparity between reproduction and lifetime parameters for the predators and prey is sufficient for oscillations as well. In conclusion, spatial complexity must be considered when attempting to achieve predator-prey oscillations experimentally.

An Infectious Disease Model with Asymptomatic Transmission and Waning Immunity

Published electronically March 29, 2024
DOI: 10.1137/23S1606411

Authors: Sophia Y. Rong (Corresponding author – Buchholz High School, Gainesville, FL) and Alice X. Li (Eastside High School, Gainesville, FL)
Project Advisor: Shasha Gao (Jiangxi Normal University, China, and University of Florida) and Chunmei Wang (University of Florida)

Abstract: Infectious diseases present persistent challenges to global public health, demanding a comprehensive understanding of their dynamics to develop effective prevention and control strategies. The presence of asymptomatic carriers, individuals capable of transmitting pathogens without displaying symptoms, challenges conventional containment approaches focused on symptomatic cases. Waning immunity, the decline in protective response following natural recovery or vaccination, introduces further complexity to disease dynamics. In this paper, we developed a mathematical model to investigate the interplay between these factors, aiming to inform strategies for the management of infectious diseases. We derived the basic reproduction number for the model and showed that the disease would die out when this number falls below 1. We obtained a formula to estimate the relative contributions of asymptomatic and symptomatic transmission to the basic reproduction number, which remains unchanged when vaccination is included in the model. Through computer simulations with parameter values tailored for COVID-19 and sensitivity analysis, we demonstrated that population susceptibility significantly impacts the timing and magnitude of infection peaks. Populations with lower susceptibility experience delayed and less severe outbreaks. Vaccination was shown to play a crucial role in disease control, with an increased vaccination rate, extended immunity, and heightened vaccine efficacy proving pivotal. However, the effectiveness of these strategies hinges on maintaining a low vaccine escape proportion. Taken together, this study underscores the need for multifaceted, adaptable approaches to infectious disease management, highlighting the central role of vaccination in mitigating disease spread. Further research and validation with disease-specific data will enhance parameter estimates, improve model predictions, and inform evidence-based disease control strategies.

Modeling Renewable Electricity Purchasing for Sustainable Management of Clarkson University's Energy Portfolio

Published electronically April 09, 2024
DOI: 10.1137/23S1560756

Authors: Sara Peter (Corresponding author – Clarkson University)
Project Advisor: Marko Budišić (Clarkson University)

Abstract: We present a dynamical system model to show the Potsdam, NY campus of Clarkson University is 100% renewable with the university’s new contract as of 2019 with Brookfield Renewable. The model creates periodic functions simulating energy inputs which can be used to generate alternative past and future scenarios. Clarkson University changed their source of electrical energy to Brookfield Renewable in July 2019 as they moved towards their 2025 goal of being 100% renewable. To claim that a MWh of electricity used by campus is renewably generated, a Renewable Energy Credit (REC) has to be purchased or generated and applied to it. The new contract with Brookfield Renewable provides each supplied MWh with its own REC. Combined with Clarkson University’s other renewable energy sources, 95% of electricity consumed by campus can be certified as renewable. To model the remaining portion of electricity consumed by off-campus properties, we rely on data that accounts for consumed and delivered electricity, prices for that electricity, and monetary credits generated by local energy sources. Our developed dynamical system models the monetary credit generation, debt accumulation, and REC accrual over 34 months using real university data. As not all data parameters were explicitly available, we explore estimating parameters in three ways: directly from the data as time-varying functions, as constants, and stochastically, as random variables with distributions consistent with the provided data. We validate the alternative models against the data and estimate sensitivity to parameters.

Quantification of the Effects of Voter Protocols on the Outcome of Approval Voting

Published electronically April 16, 2024
DOI: 10.1137/23S1574269

Authors: Zhuorong Mao (Corresponding author – College of William & Mary) and Sarah Kunkler (College of William & Mary)
Project Advisors: Susana Furtado (Universidade do Porto) and Charles R. Johnson (College of William & Mary)

Abstract: Approval Voting over several alternatives asks each voter to choose a subset of the alternatives of which they “approve”. Then, the alternative (or, perhaps, alternatives) approved by the most voters is selected. The outcome is then not only a function of the profile of individual voter preferences alone, but also the protocols (number of alternatives approved) chosen by each voter. We quantify the differences in outcome that result from differences in protocols in several ways, regarding hypotheses about individual preferences. Considered are the two natural protocols when there are three alternatives, for both small and large numbers of voters. For more alternatives, we consider all possible pure protocols. Mixed protocols are also considered to quantify protocol effects. Several methods were employed for varying numbers of voters. Consistent numbers result from all methods. We find that differences in outcome, due to differences in protocol alone, can be quite frequent and often rival preferences as a determinant of the outcome.

Predictive Modeling of H5N1 Bird Flu in United States of America: A 2022-2023 Analysis

Published electronically May 17, 2024
DOI: 10.1137/23S1591980

Authors: Li Yuan (Corresponding author – University of Michigan), Weilin Cheng (University of California, Davis), Hengyuan Liu (University of California, Davis), Kathy Mo (University of California, Davis), and Sida Tian (University of Michigan)
Project Advisor: Ambuj Tewari (University of Michigan)

Abstract: This research uniquely focuses on predicting the likelihood of H5N1 outbreaks in the United States at the county level. Unlike previous studies, which either excluded the United States or used outdated data, we utilized diverse statistical techniques and publicly available H5N1- related data from January 2022 to March 2023. Employing logistic regression, regularization methods, cross-validation, and eXtreme Gradient Boosting (XGBoost), our models demonstrated remarkable predictive efficacy. Notably, the XGBoost model, trained with 10-fold cross-validation, outperformed others in terms of ROC-AUC. This research provides valuable epidemiological insights, proposes intervention strategies for H5N1 in the United States, and suggests future research directions.

Pooling Matrix Designs for Group Testing

Published electronically May 30, 2024
DOI: 10.1137/23S1577055
Supplementary materials

Author: Yong Hong Ivan Tan (Corresponding author – National University of Singapore)
Project Advisors:  Delin Chu (National University of Singapore), Timo Sprekeler (National University of Singapore) and Johannes J. Brust (Arizona State University)

Abstract: The main objective of this article is to find the group testing strategy which minimizes the number of groups to test while identifying all positives. This manuscript explores the Hypercube Approach, the Kirkman Triple and Polynomial Pools Algorithms which are used to design group testing strategies. This work enhances the Polynomial Pools Algorithm with the Projective Geometry Design and proposes an algorithm which returns an effective group testing strategy when compared to other well-known algorithms.

Comparative Evaluation and Refinement of Linear Algebra-Based Camera Calibration Algorithms

Published electronically June 7, 2024
DOI: 10.1137/23S1612032

Authors: Eunkyu Kim (Corresponding author – The Cooper Union, New York) and Lucia Rhode (The Cooper Union, New York)
Project Advisor: Mili Shah (The Cooper Union, New York)

Abstract: This paper introduces a linear algebra-based formulation of camera calibration algorithms, focusing on two methods: the regular method and the linearized method. The regular method is computationally more efficient while the linearized method can be adapted to an iterative process to enhance calibration accuracy. To demonstrate the effectiveness of these methods, a marker-based motion tracking experiment utilizing real-life data is conducted. The results showcase the regular method’s computational efficiency, while the linearized method’s ability to remove outliers is demonstrated through reverse coordinate computation. With a 2.00 mm upper limit threshold, reverse coordinate computation achieved accuracy up to a Frobenius norm of 0.18 mm.

Fast & Fair: Efficient Second-Order Robust Optimization for Fairness in Machine Learning

Published electronically June 11, 2024
DOI: 10.1137/24S1636083

Authors: Allen Joseph Minch (Corresponding author – Brandeis University), Hung Anh Dinh Vu (University of Maryland), and Anne Marie Warren (University of Minnesota)
Project Advisor: Elizabeth Newman (Emory University)

Abstract: This project explores adversarial training techniques to develop fairer Deep Neural Networks (DNNs) to mitigate the inherent bias they are known to exhibit. DNNs are susceptible to inheriting bias with respect to sensitive attributes such as race and gender, which can lead to life-altering outcomes (e.g., demographic bias in facial recognition software used to arrest a suspect). We propose a robust optimization problem, which we demonstrate can improve fairness in several datasets, both synthetic and real-world, using an affine linear model. Leveraging second order information, we are able to find a solution to our optimization problem more efficiently than with a purely first order method.

Using a Smartphone Accelerometer to Classify Longboarding Motion

Published electronically June 20, 2024
DOI: 10.1137/23S1615838

Authors: Tuan M. Le (Corresponding author – DePauw University) and Evan Sajtar (DePauw University)
Project Advisor: McKenzie Lamb, PhD (DePauw University)

Abstract: The objective of this study is to explore the feasibility and effectiveness of using smartphone accelerometer data, combined with advanced machine learning techniques, to accurately classify three distinct motions– pushing, pumping, and coasting–that a longboarder might engage in over the course of a ride. The final goal is to integrate these concepts into a closed system of classification in the form of a mobile application. Utilizing a dataset collected from a smartphone carried by a longboarded, we apply a series of data processing techniques, including rotation matrices for orientation normalization, Verlet integration for handling unevenly-spaced time series data, and the discrete Fourier transform for feature extraction. We experiment with various machine learning algorithms, with a particular focus on the Random Forest classifier, which achieves an F1 Score of 96.8% in classifying the three primary longboarding actions. The F1 Score, a harmonic mean of precision and recall, serves as a critical metric in evaluating the model’s accuracy, particularly in the context of imbalanced datasets. This study not only demonstrates a novel application of portable technology in sports analytics but also contributes to the broader field of machine learning by presenting an efficient approach to processing and classifying telemetric data from motion sensors. The implications of this research extend beyond longboarding, offering potential applications in enhancing athletic performance across a range of sports through accessible and real-time motion analysis.

Statistical Methods Applied to Gene Expression Data to Explore Cancer Features

Published electronically June 27, 2024
DOI: 10.1137/23S161793X
Supplementary materials

Author: Hannah Simpson-Clancy (Corresponding author – University of Huddersfield)
Project Advisor: Dr. Ann Smith (University of Huddersfield)

Abstract: A small number of epithelial ovarian cancer cases are deemed preventable and its overall survival rates are low. The developments in omics data analysis paves a way for biomarker discovery for epithelial ovarian cancer in order to improve survival rates and prevent its development. This report provides analysis of gene expression data for epithelial ovarian cancer to compare the gene expression of 99 epithelial ovarian cancer samples and 4 non-cancerous ovary samples. Serous and endometrioid epithelial ovarian cancer subtypes were most similar based on hierarchical clustering. Serous was the subtype with the most differentially expressed genes when compared with normal ovary samples whereas mucinous had the least by the Wilcoxon Rank-Sum test (Benjamini Hochberg, p < 0.05). The number of down-regulated genes exceeded the number of up-regulated genes when comparing each cancer subtype with normal ovary samples. In this case, the clear cell subtype had the greatest number of dysregulated genes when compared to normal ovaries whereas endometrioid had the least. The dysregulated genes were found by fold change analysis (FC > 2 or FC < 0.5). Differences in gene expression levels between epithelial ovarian cancer subtypes were suggested due to 11, 181 differentially expressed genes identified when comparing expression levels in all sample groups by the Kruskal-Wallis test (Benjamini Hochberg, p < 0.05). Genes proposed as biomarkers for (1) epithelial ovarian cancer and (2) individual epithelial ovarian cancer subtypes, when compared with normal ovaries included MUC1, SCNN1A, CD24, ITM2A, AGR2 and WFDC2.