Data Scientist, Machine Learning Engineer
Welcome! This portfolio showcases real-world projects across machine learning, data engineering, computer vision, cybersecurity, and generative AI. These projects are built with production-level practices, using public or synthetic datasets to simulate real-world behavior while protecting sensitive data.
Each project demonstrates applied experience in developing end-to-end solutionsโranging from model development to deployment and data orchestration.
๐ ๏ธ All code and pipelines were created by Bianca Yeseo Kim, a Data Scientist and ML Engineer based in NYC.
A modular ML platform simulating user behavior across entertainment, fitness, music, and ride-sharing apps.
Features mood classification, churn modeling, trip prediction, and content recommendations.
Tech stack: Spark, Airflow, PyCaret, GCP, Tableau
Streaming pipeline for session modeling, churn detection, and real-time API delivery across multiple digital services.
Tech stack: Spark Structured Streaming, Airflow, Streamlit
Trip prediction and demand clustering using real-world NYC TLC datasets. Built to explore surge pricing prediction and geo-temporal user behavior.
Tech stack: Pandas, GeoPandas, Scikit-learn, Pydeck
Churn modeling and engagement simulation using wearable fitness data. Recommender systems for class suggestions and personalized fitness routines.
Tech stack: Scikit-learn, PyTorch, Streamlit
Inspired by real cybersecurity work. This project simulates internal access logs, using LSTM-based anomaly detection to flag credential misuse in bioinformatics systems.
Tech stack: TensorFlow (LSTM), PyCaret, OpenShift, AWS
A framework combining supervised and unsupervised ensemble models for cybersecurity event classification in enterprise infrastructure.
Tech stack: XGBoost, Isolation Forest, PyCaret
LLM-powered chatbot that predicts likely agent responses using prompt-tuned transformers and customer intent modeling.
Tech stack: HuggingFace Transformers, Prompt Engineering
๐ Hackathon Winner: Best UX & Most App Store Ready
AI-powered research and writing assistant that automates synthesis and content generation.
Tech stack: OpenAI API, Next.js, Prompt Engineering
Predictive modeling of NYC urban heat zones using spatial zoning, land temp data, and regression modeling.
Tech stack: GeoPandas, Python, Tableau
Computer vision challenge in collaboration with Break Through Tech AI, Kaggle, and New York Botanical Garden.
Trained CNNs on a large botanical image dataset.
Tech stack: PyTorch, TensorFlow, FastAI
Made by Bianca Yeseo Kim
๐ LinkedIn | ๐ Portfolio Website
โ ๏ธ Disclaimer: All datasets used are either synthetic or publicly available and are not derived from proprietary or sensitive sources.