top of page

My Projects

  • Successfully developed and trained machine learning models for predicting math scores based on student data. Utilized feature engineering, data preprocessing, and hyperparameter tuning techniques

  • Designed and implemented a user-friendly web application using Flask to facilitate custom predictions of math scores

  • Deployed the machine learning model and web application on AWS Elastic Beanstalk and established a CodePipeline for continuous delivery. This ensured a seamless and automated process for deploying updates and maintaining the application's functionality

1.jpg
  • Conducted comprehensive analysis and visualization of Postsecondary Student Information System (PSIS) data from Statistics Canada using Python

  • Identified and analyzed enrollment trends from the academic year 1992/1993 to 2021/2022

  • Noted a significant upward shift in admissions post-1998/1999, providing valuable insights into enrollment dynamics

  • Discovered and highlighted consistent gender disparities in admissions, with women consistently having higher enrollment rates

  • Conducted a detailed analysis of enrollment patterns in British Columbia, noting a sudden rise in university admissions and a sudden fall in college admissions in 2008/2009

Canada-Education-System.jpg
  • Conducted data cleaning and exploratory data analysis on the Spotify dataset containing the top 1000 songs of 2023 using Python

  • Identified the Top 10 Singers based on the highest number of songs on the Spotify Top 1000 Songs list

  • Developed an interactive Tableau dashboard to visually represent insights about the Top 10 Singers

  • Included visualizations for the number of songs with the highest for Taylor Swift (37 songs), total streams, playlists on Spotify and Apple, and average danceability and energy percentages 

  • Applied statistical analysis techniques to uncover patterns and trends within the Spotify dataset​

spotify-logo-1920x1080.jpg
  • Implemented binary classification techniques to predict a patient's smoking status based on various health indicators including height, weight, fasting blood sugar level, cholesterol, and triglyceride

  • Explored and utilized machine learning algorithms, including logistic regression, decision trees, random forests, and XGBoost

  • Achieved a Kaggle score of 0.77448, demonstrating proficiency in model development and optimization

  • Derived key findings from the analysis, including insights such as the impact of high weight, fasting blood sugar, triglyceride, and haemoglobin on smoking likelihood

smoking-cigarette.jpg
  • Developed and fine-tuned a deep neural network model for COVID-19 detection from chest X-ray images using TensorFlow, achieving competitive performance on a Kaggle competition dataset

  • Evaluated the neural network model's performance against an SVM-based model, achieving a Kaggle private leaderboard score of 0.85, demonstrating strong machine-learning expertise and problem-solving skills

2.jpg
  • Conducted an extensive analysis of the top 2000 global companies, exploring sales, profits, market value, and asset impact of geographical locations on the above values

  • Expertly visualized findings using advanced charts and graphs to provide clear insights including the

  • Leveraged data analysis tools of Python (Seaborn, Pandas, and Matplotlib) to manipulate and analyze the dataset effectively

3.jpg
  • Conducted an in-depth analysis of non-medical drug exposure among people in Canada as part of ASA DataFest, a prestigious data analysis competition

  • Utilized Python for data cleaning, statistical analysis, and visualization, uncovering key patterns and trends in drug abuse based on demographics like gender, income, and province of residence

4.jpg
  • Demonstrated strong data analysis skills by conducting an in-depth analysis of a comprehensive movie dataset, revealing high correlations among variables such as budget, revenue, and IMDb ratings

  • Successfully cleaned and preprocessed the dataset, handling missing values and outliers, and ensuring data quality. Utilised Python, Pandas, Matplotlib, and Seaborn to create informative data visualisations, enhancing the project's overall impact

  • Applied hypothesis testing and statistical analysis techniques to validate findings, making data-driven recommendations for stakeholders in the movie industry

5.jpg
bottom of page