Andy Yang
Data Scientist
About
Hi! I'm Andy, a Data Scientist with 5+ years of experience in the financial industry.
Data Scientist
My work has ranged from building credit risk models and automated reporting pipelines to developing AI-powered tools and automation solutions. I'm comfortable working with both technical and business teams, and I enjoy finding practical solutions to complex problems.
Some of my core competencies include:
My key technical skills include:
-
Python (Pandas, Scikit-learn, PyTorch)
-
R (Shiny, dplyr, ggplot2, tidymodels)
-
Machine Learning (Regression, Classification, Survival, Deep Learning)
-
Databases (SQL, NoSQL, Spark)
-
AWS (EC2, S3, Lambda)
-
Azure (Azure SQL, Azure ML)
-
Tableau (Tableau Flows, Dashboards)
-
SAS (Base, Enterprise)
Resume
Professional Summary
Andy Yang
Data Scientist & Analytics Engineer with 5 years of experience designing automated ML pipelines and building high-impact AI models in financial services and tech.
- (647) 461-6466
- syyyang003@gmail.com
- www.linkedin.com/in/andyang80
- Vancouver, BC
Skills
Coding Languages
Python (Plotly, Pandas, PyTorch), R (Rshiny), Jupyter Notebooks, PySpark, SAS (certified in 9.4), SQL
Machine Learning
Predictive Modeling, Classification, Feature Engineering, A/B Testing, Hyperparameter Tuning, LLMs
Platforms & Tools
Azure (Databricks), AWS (S3, Bedrock, Lambda, SageMaker), Tableau, Power BI, Docker
Education
Master of Science in Data Science
2021 - 2022
University of British Columbia, Vancouver, BC
Bachelor of Applied Science in Chemical Engineering
2016 - 2021
University of Waterloo, Waterloo, ON
Professional Experience
Data Scientist
Jan 2026 - Jun 2026
TFG Financial, Vancouver, BC
- Engineered data quality ETL pipelines (SQL, Python, Tableau) to detect Scorecard and PD anomalies, flagging ~10% of the portfolio in a weekly dashboard used by 10+ stakeholders across US and Canadian credit leadership.
- Developed Cox Proportional-Hazards PD survival models (scikit-learn, lifelines) with engineered macroeconomic features to generate early default warnings at the account level for multiple portfolios, achieving a C-index of 0.81.
Senior Quantitative Analyst - Model Validation
Apr 2023 - Dec 2025
TD Bank Group, Toronto, ON
- Developed a custom Python GUI application to automate data ingestion for model performance reporting across $5B+ portfolios, reducing quarterly reporting cycle time by 2+ weeks for senior stakeholders.
- Built PySpark pipelines to supersede SAS scripts on Azure Databricks that evaluate model performance (AUROC, WAPE, calibration, Stress Testing), expanding testing coverage and cutting runtime by up to 2 hrs per validation.
- Trained benchmark supervised learning models (XGBoost, logistic regression) on Azure ML to forecast default risk in multiple credit card portfolios, achieving MAPE metrics under 10% and supporting OSFI audit compliance.
Quantitative Analyst - Model Validation
Sep 2022 - Apr 2023
TD Bank Group, Toronto, ON
- Validated over 15 production-grade ML models on millions of records, uncovering behavioral insights critical for real-time risk response systems, collaborated with data scientists to implement adjustments to PD models.
- Standardized model analytics toolkits using Python, RShiny, and SAS, and implemented CI/CD workflows via GitHub Actions, reducing QA and deployment lead time by 50% and enhancing onboarding for new users.
- Lead department town halls, aligning model performance with business decisions for non-technical audiences.
API Implementation Specialist (Contract)
Apr 2021 - Sep 2021
Royal Distributing, Guelph, ON
- Deployed cloud-based internal REST APIs on Microsoft Azure to automate manual data sync tasks, enabling automatic supply chain risk dashboards and automatic data migration from legacy FoxPro databases.
- Refactored schema of SQL Server Databases and revised query criteria, cutting query latency in half (~30 mins).
Projects
Data Science Projects
Twitter Sentiment Analysis
February 2024 - March 2024
Implemented deep learning models in PyTorch using CNN, LSTM, and RNN architectures to perform sentiment analysis
Achieved a maximum test accuracy of 94.1%
Github LinkLiDAR Object Detection for Cities (Capstone)
May 2022 - June 2022
Trained and optimized deep neural network using PyTorch to perform 3D object detection within point clouds
Achieved a validation precision score of 0.85
Recognized as one of the top 3 capstone projects of 2022
Australian Rainfall Prediction
March 2022 - April 2022
Utilized PyArrow, Hadoop, and S3 to conduct data cleaning and store rainfall data on AWS
Optimized and deployed random forest regression model to predict rainfall levels using Spark
Github LinkUS Salary Prediction
November 2021 - December 2021
Created end to end data pipeline with a docker image to predict US salary values based on survey data
Conducted data preprocessing, EDA, and deployed Ridge and Random forest regression models
Github LinkOlympic Dash
February 2021 - March 2021
Created custom dashboard using Py Dash and R Shiny to display historic data from the Olympics
Deployed dashboard on heroku with CI/CD functionality
Python Github LinkR Github Link
LRASM Package
January 2022 - February 2022
Developed Python and R packages to assess regression assumptions and goodness of fit for linear regression models
Deployed package and documentation on PYPI and read the docs
Python Github LinkR Github Link
Contact
Feel free to reach out. I'd love to hear from you!