Andy Yang

Data Scientist

About

Hi! I'm Andy, a Data Scientist with 5+ years of experience in the financial industry.

Data Scientist

My work has ranged from building credit risk models and automated reporting pipelines to developing AI-powered tools and automation solutions. I'm comfortable working with both technical and business teams, and I enjoy finding practical solutions to complex problems.


Some of my core competencies include:


Automation & AI: I have implemented automation solutions across teams, ranging from data scraping from documents for quality control, to automated report generation from model performance results.
Machine Learning: I have developed and validated many production grade credit risk models, including regression, survival, ensemble and deep learning models.
Presentation: I have experience leading town halls to communicate model performance results and performing code demos for both technical and non-technical audiences alike.

My key technical skills include:

  • Python
    Python (Pandas, Scikit-learn, PyTorch)
  • RStudio
    R (Shiny, dplyr, ggplot2, tidymodels)
  • Machine Learning
    Machine Learning (Regression, Classification, Survival, Deep Learning)
  • SQL
    Databases (SQL, NoSQL, Spark)
  • AWS
    AWS (EC2, S3, Lambda)
  • SAS
    Azure (Azure SQL, Azure ML)
  • Tableau
    Tableau (Tableau Flows, Dashboards)
  • SAS
    SAS (Base, Enterprise)

Professional Summary

Andy Yang

Data Scientist & Analytics Engineer with 5 years of experience designing automated ML pipelines and building high-impact AI models in financial services and tech.

Skills

Coding Languages

Python (Plotly, Pandas, PyTorch), R (Rshiny), Jupyter Notebooks, PySpark, SAS (certified in 9.4), SQL

Machine Learning

Predictive Modeling, Classification, Feature Engineering, A/B Testing, Hyperparameter Tuning, LLMs

Platforms & Tools

Azure (Databricks), AWS (S3, Bedrock, Lambda, SageMaker), Tableau, Power BI, Docker

Education

Master of Science in Data Science

2021 - 2022

University of British Columbia, Vancouver, BC

Bachelor of Applied Science in Chemical Engineering

2016 - 2021

University of Waterloo, Waterloo, ON

Professional Experience

Data Scientist

Jan 2026 - Jun 2026

TFG Financial, Vancouver, BC

  • Engineered data quality ETL pipelines (SQL, Python, Tableau) to detect Scorecard and PD anomalies, flagging ~10% of the portfolio in a weekly dashboard used by 10+ stakeholders across US and Canadian credit leadership.
  • Developed Cox Proportional-Hazards PD survival models (scikit-learn, lifelines) with engineered macroeconomic features to generate early default warnings at the account level for multiple portfolios, achieving a C-index of 0.81.

Senior Quantitative Analyst - Model Validation

Apr 2023 - Dec 2025

TD Bank Group, Toronto, ON

  • Developed a custom Python GUI application to automate data ingestion for model performance reporting across $5B+ portfolios, reducing quarterly reporting cycle time by 2+ weeks for senior stakeholders.
  • Built PySpark pipelines to supersede SAS scripts on Azure Databricks that evaluate model performance (AUROC, WAPE, calibration, Stress Testing), expanding testing coverage and cutting runtime by up to 2 hrs per validation.
  • Trained benchmark supervised learning models (XGBoost, logistic regression) on Azure ML to forecast default risk in multiple credit card portfolios, achieving MAPE metrics under 10% and supporting OSFI audit compliance.

Quantitative Analyst - Model Validation

Sep 2022 - Apr 2023

TD Bank Group, Toronto, ON

  • Validated over 15 production-grade ML models on millions of records, uncovering behavioral insights critical for real-time risk response systems, collaborated with data scientists to implement adjustments to PD models.
  • Standardized model analytics toolkits using Python, RShiny, and SAS, and implemented CI/CD workflows via GitHub Actions, reducing QA and deployment lead time by 50% and enhancing onboarding for new users.
  • Lead department town halls, aligning model performance with business decisions for non-technical audiences.

API Implementation Specialist (Contract)

Apr 2021 - Sep 2021

Royal Distributing, Guelph, ON

  • Deployed cloud-based internal REST APIs on Microsoft Azure to automate manual data sync tasks, enabling automatic supply chain risk dashboards and automatic data migration from legacy FoxPro databases.
  • Refactored schema of SQL Server Databases and revised query criteria, cutting query latency in half (~30 mins).

Projects

Data Science Projects

Twitter Sentiment Analysis

February 2024 - March 2024

Implemented deep learning models in PyTorch using CNN, LSTM, and RNN architectures to perform sentiment analysis

Achieved a maximum test accuracy of 94.1%

Github Link

LiDAR Object Detection for Cities (Capstone)

May 2022 - June 2022

Trained and optimized deep neural network using PyTorch to perform 3D object detection within point clouds

Achieved a validation precision score of 0.85

Recognized as one of the top 3 capstone projects of 2022

Australian Rainfall Prediction

March 2022 - April 2022

Utilized PyArrow, Hadoop, and S3 to conduct data cleaning and store rainfall data on AWS

Optimized and deployed random forest regression model to predict rainfall levels using Spark

Github Link

US Salary Prediction

November 2021 - December 2021

Created end to end data pipeline with a docker image to predict US salary values based on survey data

Conducted data preprocessing, EDA, and deployed Ridge and Random forest regression models

Github Link

Olympic Dash

February 2021 - March 2021

Created custom dashboard using Py Dash and R Shiny to display historic data from the Olympics

Deployed dashboard on heroku with CI/CD functionality

Python Github Link
R Github Link

LRASM Package

January 2022 - February 2022

Developed Python and R packages to assess regression assumptions and goodness of fit for linear regression models

Deployed package and documentation on PYPI and read the docs

Python Github Link
R Github Link

Contact

Feel free to reach out. I'd love to hear from you!