Cody Le //

Logo

Data Centric Creative Problem Solver // Learner, Evaluative Thinker, Cold Coffee Brewer

Recent alumnus from Jarvis College of Computing & Digital Media, DePaul University. // M.S. in Data Science with Distinction.

Portfolio


Machine Learning Projects //

Predicting Patient Length of Stay (LOS): Classification and Optimization Models for Multi-class Target Variable

Analyzed with Python for DSC540: Advanced Machine Learning at DPU | November 2022
View in GitHub Open Notebook

Analysis explores classification models for predicting a multi-class categorical target using various hospital related predictor features without patient demographics except patients’ age range. Extensive data preprocessing was explored including data encoding, outlier removal, and feature transformations to off set data imbalance. Tree, boosting, and histogram models were most effective for classification and optimized models using multi-class ensembles and voting ensembles were implemented.




Automotive and Semiconductor Supply Chain Analysis in the United States and South Korea

Analyzed with Python for DSKUS Global Lab: Global Data Science Exchange Program | June 2022
Read Report View Slide Deck

Investigated the impact of COVID-19 pandemic on the supply chain by examining trade, production, and manufacturing capacity of the automotive and semiconductor industries in the United States and South Korea. Collected and merged existing data between the two industries from various government sources from the two countries. Feature selection was performed to identify salient COVID-19 features that affect trade, production, and manufacturing which was then used to build a forecasting model using Gated Recurrent Unit (GRU) to predict future response to target variables under the change of feature selected variables.

Team Role: Lead, Preprocessing, Report Writing, Presentation and Visuals




Obesity Level Analysis of Adult Population in Latin America

Analyzed with Python for DSC478: Machine Learning Applications at DPU | November 2021
View in GitHub Read Report
Open Notebook Open Notebook

Cluster analysis was used for data exploration and classification as well as feature selection for analysis of obesity levels among adults from Mexico, Peru, and Columbia to determine which specific eating habits or daily activities most predict obesity levels.

Team Role: Lead, Data Preprocessing, Cluster Analysis, Feature Selection, Report Writing, and Presentation




Classifying Wild Edible Flowers by Color Segmentation with a Histogram of Oriented Gradients

Analyzed with MATLAB for CSC481: Introduction to Image Processing at DPU | November 2021
Read Report View Slide Deck

Analysis explores the extraction of texture features using a histogram of gradients (HOG) for image classification for wild edible flowers using support vector machines (SMV). The preprocessing involves color segmentation and compares different HOG parameter sizes as well as different flower class sizes to determine the most optimal parameters for model performance and classification.





Data Exploration and Visualization Projects //

Spatial Analysis of the Intersection of HIV and COVID-19 in California

Visualized with Python for GEO448: Spatial Data Science at DPU | November 2022
View in GitHub Read Report Open Notebook

Analysis explores the spatial relationship between the HIV and COVID-19 in California at the county level. The analysis focuses on analyzing new infection rates between the two epidemics and its impact on ethnic and minority groups. Through spatial clustering and outlier detection techniques, specific areas in California were determined to be more vulnerable to HIV and/or COVID-19. LISA interactive map was created to show significant clusters affected by both infection rates and social vulnerabilities. Agglomerative clustering was performed showing areas affected by higher social vlunerability related to ethnic and minority status.




Visualizing the Impact of COVID-19 on Airport Travel in the United States and Canada

Visualized with R for DSC465: Data Visualization at DPU | March 2022
View in GitHub Read Report View Slide Deck

Analysis compares the impact of COVID-19 on airport travel in the United States and Canada. Ridgeline and violin plots with custom color schemes were used to display dynamic statistical and geographical data.

Team Role: Lead, Preprocessing, Ridgeline and Violin Plots




Exploring Gene Expression Features through PCA and Factor Analysis in Breast Cancer Patients

Analyzed with R for DSC424: Advanced Data Analysis at DPU | June 2021
View in GitHub Read Report

Analysis determines whether breast cancer clinicial features and gene expression features predict patient survival rates. Data was reduced using principal component analysis, factor analysis, and cluster analysis on gene expression features to better understand the predictive value of the model.

Team Role: Lead, Preprocessing, PCA/FA Analysis, Cluster Analysis


© 2022 Cody Le. Powered by Jekyll and Hosted on Github.