Data Centric Creative Problem Solver // Learner, Evaluative Thinker, Cold Coffee Brewer
Recent alumnus from Jarvis College of Computing & Digital Media, DePaul University. // M.S. in Data Science with Distinction.
Analyzed with Python for DSC540: Advanced Machine Learning at DPU |
November 2022
Analysis explores classification models for predicting a multi-class categorical target using various hospital related predictor features without patient demographics except patients’ age range. Extensive data preprocessing was explored including data encoding, outlier removal, and feature transformations to off set data imbalance. Tree, boosting, and histogram models were most effective for classification and optimized models using multi-class ensembles and voting ensembles were implemented.
Analyzed with Python for DSKUS Global Lab: Global Data Science Exchange Program |
June 2022
Investigated the impact of COVID-19 pandemic on the supply chain by examining trade, production, and manufacturing capacity of the automotive and semiconductor industries in the United States and South Korea. Collected and merged existing data between the two industries from various government sources from the two countries. Feature selection was performed to identify salient COVID-19 features that affect trade, production, and manufacturing which was then used to build a forecasting model using Gated Recurrent Unit (GRU) to predict future response to target variables under the change of feature selected variables.
Team Role: Lead, Preprocessing, Report Writing, Presentation and Visuals
Analyzed with Python for DSC478: Machine Learning Applications at DPU |
November 2021
Cluster analysis was used for data exploration and classification as well as feature selection for analysis of obesity levels among adults from Mexico, Peru, and Columbia to determine which specific eating habits or daily activities most predict obesity levels.
Team Role: Lead, Data Preprocessing, Cluster Analysis, Feature Selection, Report Writing, and Presentation
Analyzed with MATLAB for CSC481: Introduction to Image Processing at DPU |
November 2021
Analysis explores the extraction of texture features using a histogram of gradients (HOG) for image classification for wild edible flowers using support vector machines (SMV). The preprocessing involves color segmentation and compares different HOG parameter sizes as well as different flower class sizes to determine the most optimal parameters for model performance and classification.
Visualized with Python for GEO448: Spatial Data Science at DPU |
November 2022
Analysis explores the spatial relationship between the HIV and COVID-19 in California at the county level. The analysis focuses on analyzing new infection rates between the two epidemics and its impact on ethnic and minority groups. Through spatial clustering and outlier detection techniques, specific areas in California were determined to be more vulnerable to HIV and/or COVID-19. LISA interactive map was created to show significant clusters affected by both infection rates and social vulnerabilities. Agglomerative clustering was performed showing areas affected by higher social vlunerability related to ethnic and minority status.
Visualized with R for DSC465: Data Visualization at DPU |
March 2022
Analysis compares the impact of COVID-19 on airport travel in the United States and Canada. Ridgeline and violin plots with custom color schemes were used to display dynamic statistical and geographical data.
Team Role: Lead, Preprocessing, Ridgeline and Violin Plots
Analyzed with R for DSC424: Advanced Data Analysis at DPU |
June 2021
Analysis determines whether breast cancer clinicial features and gene expression features predict patient survival rates. Data was reduced using principal component analysis, factor analysis, and cluster analysis on gene expression features to better understand the predictive value of the model.
Team Role: Lead, Preprocessing, PCA/FA Analysis, Cluster Analysis