Hi, I'm Cassie Guo

A
Promote the well-being of people via data and models

About

I dissect data, distill insights, develop solutions and deploy applications for problems of all sizes and shapes. My data science philosophy includes:

Data Centric

No Free Lunch

Occam's Razor

The best way to come up with the utimate idea is to have lots of ideas

Projects

medical research
Medical Research Interpreter

An multi-stage reasoning application to summarize and interpret medical literature

Accomplishments
  • Tech stacks:python, Langchain, huggingface, LLM, Prompt Engineering
  • Using large language models (LLMs) to summarize medical literature, and translate into layman language
  • Stage 1: A large language model to summarize literature with short sentences
  • Stage 2: A different model using text2text generation to translate the summary into layman language for non-professionals
quiz app
Explainable API to Detect Public Health Misinformation On Twitter

An API which flags misinformation on social media using ensemble BERT

Accomplishments
  • Tools: Python, HuggingFace
  • API end point that uses ensemble BERT fine-tuned on COVID-19 related tweets to classify new tweets
  • Augmented dataset to incorporate domain expert knowledge
Screenshot of web app
Exploring Impact Factors for BTC Fee using Regression Models

Using open source datasets to understand factors that influence BTC transaction fee

Accomplishments
  • Tools: Python, regression models
Screenshot of  web app
X-Ray Image Classification

Detect COVID-19 from Chest X-ray Images using Transfer Learning

Accomplishments
  • Achieved 98% sensitivity and 99% specificity scores
  • Implemented one cycle learning policy and reduced learning rate policy on ResNet50, resulting in better performance within shorter training time
Screenshot of  web app
Kaggle competition - M5 Forecasting Accuracy

Use the daily sales of each goods from Walmart during 2016-2019 to predict the daily sales of next 28 days

Accomplishments
  • Tech stack: time series forecasting, GBM, feature engineering
  • Selected features, and incorpated lags and moving windows on time series data
  • Final solution is a catboost model which achieved 16% score on public leaderboard
Screenshot of  web app
A draft implementation of pricing agent in DRL

Using openAI gym to model a pricing agent with deep reinforcement learning (DRL)

Accomplishments
  • Experimented PPO, A2C and several other policies using stable baseline library
Screenshot of  web app
Conversion Rate Prediction by Two-layered Stacking Model + GPU

Supercharge conversion rate prediction on large volume itineraries

Accomplishments
  • Tech stack: python, parallel computing, GPU, pyspark
  • Two-layered ensemble model to predict itineraries for search ranking

Contact