SIURO | Volume 16 | SIAM


SIAM Undergraduate Research Online

Volume 16

SIAM Undergraduate Research Online Volume 12

Symmetry and Free Boundary Points in a Class of Linear Ordinary Differential Equations

Published electronically January 10, 2023
DOI: 10.1137/21S1454110

Authors: Feng Jiang (University of Nottingham Ningbo China), Zhengyang Guo (University of Nottingham Ningbo China), Dong’ang Liu (University of Nottingham Ningbo China), and Yanghao Wang (University of Nottingham Ningbo China)
Project Advisors: Behrouz Emamizadeh (University of Nottingham Ningbo China) and Amin Farjudian (University of Nottingham Ningbo China)

Abstract: This note is concerned with the qualitative properties of the solutions of a class of linear ordinary differential equations. The existence and uniqueness of solutions are addressed, and properties of the graph of the solution when imposing some restrictions are derived. A new notion of derivative, called the force derivative, is introduced and an orthogonality result, between the force derivative of the solution and the force function, is obtained. All the important results are verified by numerical examples using MATLAB. Finally, an inequality result reminiscent of the famous G. Talenti's inequality is proved.

Food Deserts and k-Means Clustering

Published electronically March 9, 2023
DOI: 10.1137/22S1504445

Authors: Garrett Kepler (California State University, East Bay), Maria Palomino (California State University, East Bay)
Project Advisor: Andrea Arauza Rivera (California State University, East Bay)

Abstract: Food deserts are regions where people lack access to healthy foods. In this article we use k-means clustering to cluster the food deserts in two Bay Area counties. The centroids (means) of these clusters are optimal locations for intervention sites (such as food pantries) since they minimize the distance that a person within a food desert cluster would need to travel to reach the resources they require. We present the results of both a standard and a weighted k-means clustering algorithm. The weighted algorithm takes into account the poverty levels in each food desert when determining the placement of a centroid. We find that this weighting can make significant changes to the proposed locations of intervention sites.

Multi-scale Hybridized Topic Modeling: A Pipeline for Analyzing Unstructured Text Datasets via Topic Modeling

Published electronically March 27, 2023
DOI: 10.1137/22S1536832

Authors: Keyi Cheng (University of California, Los Angeles), Stefan Inzer (University of California, Berkeley), Adrian Leung (University of California, Los Angeles), and Xiaoxian Shen (University of California, Los Angeles)
Project advisors: Deanna Needell (University of California, Los Angeles), Todd Presner (University of California, Los Angeles), Michael Perlmutter (University of California, Los Angeles), Michael R. Lindstrom (University of California, Los Angeles), and Joyce Chew (University of California, Los Angeles)

Abstract: We propose a multi-scale hybridized topic modeling method to find hidden topics from transcribed interviews more accurately and more efficiently than traditional topic modeling methods. Our multiscale hybridized topic modeling method (MSHTM) approaches data at different scales and performs topic modeling in a hierarchical way utilizing first a classical method, Nonnegative Matrix Factorization, and then a transformer-based method, BERTopic. It harnesses the strengths of both NMF and BERTopic. Our method can help researchers and the public better extract and interpret the interview information. Additionally, it provides insights for new indexing systems based on the topic level. We then deploy our method on real-world interview transcripts and find promising results.

Numerical Analysis of Crowding Effects in Competing Species

Published electronically March 10, 2023
DOI: 10.1137/22S151042X

Author: Braden Carlson (Southern Utah University)
Project Advisors: Jianlong Han (Southern Utah University), Seth Armstrong (Southern Utah University), and Sarah Duffin (Southern Utah University)

Abstract: In recent decades, scientists have observed that the mortality rate of some competing species increases superlinearly as populations grow to unsustainable levels. This is modeled by terms representing crowding effects in a system of nonlinear differential equations that describes population growth of two species competing for resources under the effects of crowding. After applying nondimensionalization to reduce parameters in the system, the stability of the steady state solutions of the system is examined. A semi-implicit numerical scheme is proposed which guarantees the positivity of the solutions. The long term behavior of the numerical solutions is studied. The error estimate between the numerical solution and the true solution is given.

Quantifying Uncertainty in Ensemble Deep Learning

Published electronically April 17, 2023
DOI: 10.1137/22S1531816

Authors: Emily Diegel (Embry Riddle Aeronautical University), Rhiannon Hicks (Embry Riddle Aeronautical University), Max Prilutsky (San Diego State University), and Rachel Swan (Embry Riddle Aeronautical University)
Project Advisor: Mihhail Berezovski (Embry Riddle Aeronautical University)

Abstract: Neural networks are an emerging topic in the data science industry due to their high versatility and efficiency with large data sets. Past research has utilized machine learning on experimental data in the material sciences and chemistry field to predict properties of metal oxides. Neural networks can determine underlying optical properties in complex images of metal oxides and capture essential features which are unrecognizable by observation. However, neural networks are often referred to as a “black box algorithm” due to the underlying process during the training of the model. This poses a concern on how robust and reliable the prediction model actually is. To solve this ensemble neural networks were created. By utilizing multiple networks instead of one the robustness of the model was increased and points of uncertainty were identified. Overall, ensemble neural networks outperform singular networks and demonstrate areas of uncertainty and robustness in the model.

A Quasi-Optimal Spectral Solver for the Heat and Poisson Equations in a Closed Cylinder

Published electronically April 26, 2023
DOI: 10.1137/22S1502070

Author: David Darrow (Massachusetts Institute of Technology)
Project Advisors: Alex Townsend (Cornell University) and Grady Wright (Boise State University)

Abstract: We develop a spectral method to solve the heat equation in a closed cylinder, achieving a quasi-optimal O(N log N) complexity and high-order, spectral accuracy. The algorithm relies on a Chebyshev–Chebyshev–Fourier (CCF) discretization of the cylinder, which is easily implemented and decouples the heat equation into a collection of smaller, sparse Sylvester equations. In turn, each of these equations is solved using the alternating direction implicit (ADI) method in quasi-optimal time; overall, this represents an improvement in the heat equation solver from O(N4/3) (in previous Chebyshev-based methods) to O(N log N). While Legendre-based methods have recently been developed to achieve similar computation times, our Chebyshev discretization allows for far faster coefficient transforms; this is critical in applications with non-linear forcing, which we discuss in the context of the incompressible Navier–Stokes equations. Lastly, we provide numerical simulations of the heat equation, demonstrating significant speed-ups over traditional spectral collocation methods and finite difference methods.

Opinion Dynamics with Slowly Evolving Zealot Populations

Published electronically May 4, 2023
DOI: 10.1137/22S1515306

Author: Ashlyn DeGroot (Calvin University) and Emma Schmidt (Calvin University)
Project Advisor: Todd Kapitula (Calvin University)

Abstract:We introduce and analyze a model for opinion dynamics comprised of nonlinear ODEs. The variables are the proportion of moderates in the population who hold opinion A, the proportion of zealots who hold opinion A, and the proportion of zealots who hold opinion B (not A). The zealots are willing to change their opinion at a much slower rate than the moderates. Our model takes into account such things as the inherent attractiveness of one opinion over the other, the indoctrination of moderates by the zealots, and deradicalization of the zealots by the moderates. A combination of theoretical and numerical analysis shows there are many different types of asymptotic configurations of the population. Many of these correspond to critical points of the system. The most intriguing finding is that if both A and B are roughly equally attractive, and the rate of indoctrination is roughly equal to the rate of deradicalization, then there will be a stable periodic orbit. The dynamics of this orbit show that a precursor to an opinion being dominant is that the proportion of zealots for the opinion must first grow to some critical value. Moreover, when the periodic orbit exists, there are no other solutions which allow for coexistence between the two opinions.

Predicting Molecular Phenotypes with Single Cell RNA Sequencing Data: An Assessment of Unsupervised Machine Learning Models

Published electronically May 26, 2023
DOI: 10.1137/21S1439985

Author: Anastasia Dunca (Massachusetts Institute of Technology)
Project Advisor: Frederick R. Adler (University of Utah)

Abstract:According to the National Cancer Institute, there were 9.5 million cancer-related deaths in 2018. A challenge in improving treatment is resistance in genetically unstable cells. The purpose of this study is to evaluate unsupervised machine learning on classifying treatment-resistant phenotypes in heterogeneous tumors. This is done with analysis of single cell RNA sequencing (scRNAseq) data using a pipeline and evaluation metrics. scRNAseq quantifies mRNA in cells and characterizes cell phenotypes. One scRNAseq dataset is used (tumor/non-tumor cells of different molecular subtypes and patient identifications). The pipeline consists of data filtering, dimensionality reduction with Principal Component Analysis, projection with Uniform Manifold Approximation and Projection, clustering, and evaluation. Nine approaches for clustering (Ward, BIRCH, Gaussian Mixture Model, DBSCAN, Spectral Clustering, Affinity Propagation, Agglomerative Clustering, Mean Shift, and K-Means) are evaluated. Seven models divided tumor versus non-tumor cells and molecular subtype while six models classified different patient identification (13 of which were presented in the dataset); K-Means, Ward, and BIRCH often ranked highest with ∼ 80% accuracy on the tumor versus non-tumor task and ∼ 60% for molecular subtype and patient ID. An optimized classification pipeline using K-Means, Ward, and BIRCH models was evaluated to be most effective for further analysis. In clinical research where there is currently no standard protocol for scRNAseq analysis, clusters generated from this optimized pipeline can be used to understand cancer cell behavior and malignant growth, directly affecting the success of treatment.

Modelling the Evolutionary Dynamics of an Infectious Disease with an Initial Asymptomatic Infection Stage with Recovery

Published electronically June 16, 2023
DOI: 10.1137/22S151755X

Author: Swathi Nachiar Manivannan (University of Cambridge)
Project Advisor: Simon A. Levin (Princeton University)

Abstract: The study of infectious disease dynamics is an ongoing challenge, particularly due to the varied life-history strategies that pathogens exhibit. The ongoing COVID-19 pandemic has emphasised the importance of studying the dynamics of pathogens that allow for an asymptomatic stage (termed latency in this paper) and direct recovery from said asymptomatic stage. Here, we expand on a simple epidemiological model, introduced by Saad-Roy et al. (2020), in order to understand the evolutionary dynamics of allowing for direct recovery of infected, latent individuals. In this model, there are two infectious stages; in the first infectious stage, hosts are fully or partially asymptomatic, and there is a trade-off between transmission and progression. We consider arbitrary trade-offs and the specific case of power-law trade-offs. Through introducing the added parameter of direct recovery from latent infection (hence termed r), we show that there are 4 possible evolutionary stable strategies (ESSs) a pathogen can adopt, depending on the values of other parameters. However, when direct recovery is fast (i.e. at high values of r), the ESSs eventually collapse into one where there is zero latency (i.e. no asymptomatic stage). Overall, our findings suggest that more importance should be given to studying the role of asymptomatic individuals in infectious disease outbreaks and the rate at which they can recover without developing any symptoms.

Adapting Zeroth Order Algorithms for Comparison-Based Optimization

Published electronically June 26, 2023
DOI: 10.1137/22S1530951

Author: Isha Slavin (New York University)
Project Advisor: Daniel McKenzie (Colorado School of Mines)

Abstract: Comparison-Based Optimization (CBO) is an optimization paradigm that assumes only very limited access to the objective function f(x). Despite the growing relevance of CBO to real-world applications, this field has received little attention as compared to the adjacent field of Zeroth-Order Optimization (ZOO). In this work we propose a relatively simple method for converting ZOO algorithms to CBO algorithms, thus greatly enlarging the pool of known algorithms for CBO. Via PyCUTEst, we benchmarked these algorithms against a suite of unconstrained problems. We then used hyperparameter tuning to determine optimal values of the parameters of certain algorithms, and utilized visualization tools such as heat maps and line graphs for purposes of interpretation. All our code is available at

Linear Stability Analysis of Solitons Governed by the 2D Complex Cubic-Quintic Ginzburg-Landau Equation

Published electronically July 7, 2023
DOI: 10.1137/23S1548116

Author: Emily Gottry (Azusa Pacific University)
Project Advisor: Edwin Ding (Azusa Pacific University)

Abstract: We used the singular value decomposition to construct a low-dimensional model that qualitatively describes the behavior and dynamics of optical solitons governed by the complex cubic-quintic Ginzburg-Landau equation in two spatial dimensions. With this model, it was found that a single soliton destabilizes and transitions into a double-soliton configuration through an intermediate periodic phase as the gain increases. Linear stability analysis then revealed that a Hopf bifurcation occurs at several critical gain values corresponding to the destabilization of the single and double solitons.

A Comparative Study of Penalized Regression and Machine Learning Algorithms in High Dimensional Scenarios

Published electronically July 17, 2023
DOI: 10.1137/22S1538302
Supplementary materials

Authors: Gabriel Ackall (Georgia Institute of Technology) and Connor Shrader (University of Central Florida)
Project Advisor: Seongtae Kim (North Carolina A&T State University)

Abstract: With the prevalence of big data in recent years, the importance of modeling high dimensional data and selecting important features has increased greatly. High dimensional data is common in many fields such as genome decoding, rare disease identification, and environmental modeling. However, most traditional regression machine learning models are not designed to handle high dimensional data or conduct variable selection. In this paper, we investigate the use of penalized regression methods such as ridge, least absolute shrinkage and selection operation, elastic net, smoothly clipped absolute deviation, and minimax concave penalty compared to traditional machine learning models such as random forest, XGBoost, and support vector machines. We compare these models using factorial design methods for Monte Carlo simulations in 540 environments, with factors being the response variable, number of predictors, number of samples, signal to noise ratio, covariance matrix, and correlation strength. We also compare different models using empirical data to evaluate their viability in real-world scenarios. We evaluate the models using the training and test mean squared error, variable selection accuracy, β-sensitivity, and β-specificity. We found that the performance of penalized regression models is comparable with traditional machine learning algorithms in most high-dimensional situations. The analysis helps to create a greater understanding of the strengths and weaknesses of each model type and provide a reference for other researchers on which machine learning techniques they should use, depending on a range of factors and data environments. Our study shows that penalized regression techniques should be included in predictive modelers’ toolbox.

Iterative Methods at Lower Precision

Published electronically August 1, 2023
DOI: 10.1137/22S152637X

Authors: Yizhou Chen (Emory University), Xiaoyun Gong (Emory University), and Xiang Ji (Emory University)
Project Advisor: James G. Nagy (Emory University)

Abstract: Since numbers in the computer are represented with a fixed number of bits, loss of accuracy during calculation is unavoidable. At high precision where more bits (e.g. 64) are allocated to each number, round-off errors are typically small. On the other hand, calculating at lower precision, such as half (16 bits), has the advantage of being much faster. This research focuses on experimenting with arithmetic at different precision levels for large-scale inverse problems, which are represented by linear systems with ill-conditioned matrices. We modified the Conjugate Gradient Method for Least Squares (CGLS) and the Chebyshev Semi-Iterative Method (CS) with Tikhonov regularization to do arithmetic at lower precision using the MATLAB chop function, and we ran experiments on applications from image processing and compared their performance at different precision levels. We concluded that CGLS is a more stable algorithm, but overflows easily due to the computation of inner products, while CS is less likely to overflow but it has more erratic convergence behavior. When the noise level is high, CS outperforms CGLS by being able to run more iterations before overflow occurs; when the noise level is close to zero, CS appears to be more susceptible to accumulation of round-off errors.

Ride Like the Wind Without Getting Winded: The Growth of E-Bike Use

Published electronically August 15, 2023
DOI: 10.1137/23S1577213
M3 Introduction

Authors: Jerry Sheng (Thomas Jefferson High School for Science and Technology), Rishabh Chhabra (Thomas Jefferson High School for Science and Technology), Om Gole (Thomas Jefferson High School for Science and Technology), Rishabh Prabhu (Thomas Jefferson High School for Science and Technology), and Laura Zhang (Thomas Jefferson High School for Science and Technology)
Project Advisor: Quinn McFee (Thomas Jefferson High School for Science and Technology)

Abstract: As climate change becomes an increasingly pressing issue, policymakers are looking towards alternate forms of transportation to gas-powered cars. Since August 2022, states such as California, Massachusetts, and New York have passed laws that will ban the sale of gas-engine vehicles by 2035 [1]. As a result, motorists are looking to electric vehicles, which do not require the use of gasoline, as an alternative form of transportation. Although electric cars are a viable EV option, electric bikes (e-bikes) offer a more affordable, flexible, and enjoyable form of transportation [2]. Along with these consumer benefits, e-bikes are over 20 times more efficient than electric cars at combating climate change [3]. Thus, it is imperative to comprehensively understand the growing role e-bikes will play in the future of transportation.

Comparison of Vector Voting Rules and Their Relation to Simple Majority Voting

Published electronically August 23, 2023
DOI: 10.1137/22S1536418

Author: Zhuorong Mao (College of William & Mary)
Project Advisor: Charles R. Johnson (College of William & Mary)

Abstract: Introduced here are examples of what we call “vector voting rules”: social preference orderings deduced from vectors naturally associated with the group preference matrix. These include higher-order Borda Rules, Bp, p = 1, 2, ..., and the Perron Rule (P). We study the properties of these transitive rules and compare them with Simple Majority Voting (SMV). Even when SMV is transitive, it can yield results different from B1, B2, ... and P, and through simulation, we compile statistics about how often these differ. We also give a new condition (2/3+ majorities) that is (just) sufficient for SMV to be transitive and then quantify the frequency of transitivity for graded failures of this hypothesis.

Implementation of the Boneh-Franklin IBE Scheme

Published electronically September 1, 2023
DOI: 10.1137/22S1532901

Author: Florence Lam (University of California, Berkeley)
Project Advisor: Gabriel Dorfsman-Hopkins (St. Lawrence University)

Abstract: In this paper and accompanying software, we give a fully functional implementation of the Boneh- Franklin Identity-Based Encryption (IBE) scheme using the Weil pairing, which runs efficiently even with primes of cryptographic size. We describe the conceptual framework of the IBE, give background on the Weil pairing. Further, we discuss the challenges in the process of creating a functional implementation, and how we overcame them. The reader is encouraged to experiment with the accompanying software, which is written in SageMath.

Long-time L2 Stability for an IMEX Discretization of the 1D Fujita Equation

Published electronically September 13, 2023
DOI: 10.1137/23S1556940

Author: Victoria Luongo (Clemson University)
Project Advisor: Leo Rebholz (Clemson University) and Irina Viktorova (Clemson University)

Abstract: We study an efficient time-stepping scheme for the 1D Fujita equation that is implicit for the linear terms but explicit for the nonlinear terms. We analyze the long-time stability of the scheme for varying parameter values, which reveal parameter value regimes in which the method is stable. We provide numerical results that illustrate the theory and show the analytically derived stability conditions are sufficient to achieve long-time stability result.

Maximizing Harvest Yields in a Three-Species System

Published electronically October 6, 2023
DOI: 10.1137/23S1546737

Author: Jacob Kahn (United States Air Force Academy)
Project Advisor: Maila Hallare (United States Air Force Academy)

Abstract: Management decisions on sustainable harvesting of any species in our marine ecosystems benefit from mathematical modeling and simulations due to the underlying complex ecological interactions between species. Using basic mathematical analysis and numerical simulation tools, we consider the problem of investigating the maximum sustainable yield (MSY) and the maximum economic yield (MEY) when harvesting in a fishery system consisting of one predator and two competing prey species. Results show that the harvesting effort required to achieve MEY is less than what is needed to achieve MSY. This implies that increasing harvesting effort beyond what is needed to reach MEY will not necessarily deliver more profits but may run the risk of driving some of the species of the system into extinction. Furthermore, results show that under the MEY management policy, a predator-oriented harvesting approach is recommended when harvesting single-species only. For double-species harvesting in a system with weak interspecific competition and weak predation, a prey-oriented harvesting approach is recommended, but when there is strong interspecific competition and strong predation, a predator-oriented harvesting approach is recommended.

In Pursuit of Higher Power Through Integrated Multivariate Regression

Published electronically October 19, 2023
DOI: 10.1137/23S1584344

Author: Ryan Shahbaba (Sage Hill School, Newport Beach, CA)
Project Advisor: Annie Qu (University of California, Irvine)

Abstract: Univariate regression models are commonly used in statistics and machine learning to examine the relationship between an outcome variable and a set of explanatory variables, and possibly use this relationship to predict the unknown values of the outcome variable. However, when dealing with multiple outcome variables that are interrelated, multivariate regression models are preferred. These models simultaneously capture the dependencies between outcome variables and their collective relationships with explanatory variables. While multivariate regression models provide a rigorous and comprehensive understanding of factors associated with outcomes of interest, they have several limitations including: increased model complexity, larger sample size requirements, and lack of interpretability. To address these issues, we propose an alternative approach, called Integrated Multivariate Regression (IMR) that reduces the dimensionality of the outcome variables by transforming them into one or more derived outcome variables that retain important information. Using simulated and real data, we demonstrate that IMR simplifies the analysis and increases statistical power by reducing the number of parameters, while simultaneously maintaining interpretability and accounting for interdependencies among the outcome variables.

Understanding a Measure for Synchrony: Spike Time Tiling Coefficient Method

Published electronically October 31, 2023
DOI: 10.1137/23S1576104

Author: Kevin Li (Corresponding author -- Texas Academy of Mathematics and Science), Evan Huang (Seven Lakes High School), and Bill Sun (Seven Lakes High School)
Project Advisors: Yunjiao Wang (Texas Southern University) and Maria Leite (University of South Florida)

Abstract: Synchrony is an important feature of brain activities for the coordination of neural information and is also related to some neuronal disorders. Around 40 different measures have been proposed in literature for quantifying the synchrony of spike trains and the list is still growing. The main issue is that it is not clear to users which one to use and how measurements correspond to different features of synchrony. In this work, instead of looking at all methods at once, we focus on investigating one of the popular measures in the field of neuroscience: Spike Time Tiling Coefficient (STTC) proposed by Cutts and Eglen in 2014. We simulate three scenarios of neural spike trains and study how STTC values depend on distributions and phase shifts of spike trains. Firstly, we study pairs of simple periodic binary time series. We derive an analytical formula showing that the dependence of the STTC value on the phase shift is symmetric and has a general trend where the maximum value of STTC occurs when the phase shift is zero and the minimum value occurs when the phase shift is half of the period. Secondly, we investigate pairs of “periodic” normally distributed spike trains. While we observe the similar trends shown in the first scenario, we notice an exception. We also observe a general trend where the STTC value decreases as the standard deviation of the normal distribution increases. Thirdly, we study pairs of Poisson distributed spike trains. Using properties of the Poisson distribution, we generate pairs of Poisson distributed spike trains with certain overlap ratios and study the relationship between STTC and the overlap ratio. In general, this relationship is nonlinear. We observe that as the synchronicity window decreases towards zero, this nonlinear relationship tends toward a linear relationship. We derive analytical formulas to describe this nonlinear relationship and quantitatively evaluate its closeness to a linear relationship as the synchronicity window decreases towards zero. Through studying STTC, we notice that when the synchronicity window is too large, the problem of dividing by zero occurs in the calculation of STTC. To avoid such a problem, we derive an upper bound for the synchronicity window. We also argue that STTC can only approach −1, and show a case to demonstrate this argument.

Malaria Early Warning Application for Individual Risk Assessment

Published electronically Novemver 7, 2023
DOI: 10.1137/23S154875X

Author: Janiah Kyle (Corresponding author – Spelman College), Sagar Sadak (Georgia Institute of Technology), and Cayden Goeringer (Arizona State University)
Project Advisors: Abba Gumel (University of Maryland)

Abstract: As one of the oldest known diseases to inflict humanity (since the Agricultural Revolution about 12,000 years ago), malaria has proven to be a significant global challenge. Many intervention strategies have been undertaken in the last few decades such as widespread insecticide-treated bed nets (ITN), long-lasting insecticidal nets (LLIN) and indoor residual spraying (IRS). Yet even with great success, malaria continues to be a ravaging disease requiring inventive solutions. In this study, we develop a malaria early warning system, which utilizes an adapted Ross-MacDonald model to assess individual risk and disease epidemiology. Strategies for achieving a disease-free equilibrium state are also shown by performing local asymptotic stability analysis. It is important to note that the stages of the mosquito life-cycle are highly influenced by weather conditions, both in the aquatic and adult stages, as well as by the use of insecticides (either through ITN/LLIN use or via IRS). Therefore, we consider regional data parameters, such as weather conditions, parasite rate and resistance, to estimate deviated risk from the baseline, with the final product being a progressive web application (i.e. a web and mobile app). Such a product has widespread application primarily in holoendemic areas in Africa to inform both native and tourist populations of their relative risk of contracting malaria.