Vancouver, BC, Canada / Data Scientist | Machine Learning Engineer

Mandeep

I build practical AI systems for operations, recommendations, scientific data, and language-heavy workflows. My work spans mathematical optimization, time series forecasting, LLM/RAG products, privacy-preserving synthetic data, and applied research.

  • Optimization
  • Forecasting
  • LLM Systems
  • Recommenders
  • Applied Research
$1.5M projected annual savings

Optimization work translating supply chain rules, capacity limits, routing choices, and demand priorities into solver-ready formulations.

7,000h to 52m labeling time reduction

Active learning pipeline for satellite imagery selection with SpaceML and NASA IMPACT.

Top 5 / 79 NASA SMD proposal

Research initiative ranking for AI-assisted Earth observation workflows.

4 published research works

Explainable AI, satellite data curation, mobile ML for agriculture, and machine-learning forecasting.

Selected Work

Operational ML with measurable constraints.

July 2025 - Present Vancouver, BC

Data Scientist Canfor

Optimization, forecasting, and enterprise ML workflows for forestry supply chain and manufacturing operations.

  • Leading a large-scale logistics optimization program that turns operational constraints into mathematical formulations for lower shipping and routing costs.
  • Built solver-agnostic optimization pipelines across SciPy HiGHS, SCIP, CPLEX, and Gurobi to compare solution quality, runtime, and scalability.
  • Developing production forecasting pipelines with Chronos2, TSMixer, XGBoost, LightGBM, and unit8-darts, using rigorous backtesting and error analysis.
  • Deploying and orchestrating ML workflows with Azure Machine Learning and Microsoft Fabric for reproducible enterprise experimentation.
Python Linear programming Mixed-integer optimization Azure ML Microsoft Fabric Darts
April 2025 - July 2025 Remote

Machine Learning Engineer CoffeeSpace

Recommendation systems, feature engineering, analytics pipelines, and cloud-native deployment.

  • Enhanced a people-to-people recommender system with LLM-driven feature engineering and retrieval-augmented generation.
  • Built analytics pipelines to evaluate user behavior and recommendation quality with BigQuery and Looker Studio.
  • Developed Python, JavaScript, Firebase, and GCP Cloud Run services to support recommendation and analytics workflows.
  • Integrated Qdrant vector search to improve similarity retrieval and personalization.
LLMs RAG Qdrant BigQuery Looker Studio GCP
February 2022 - July 2023 Remote

Machine Learning Engineer Betterdata

Privacy-preserving synthetic tabular data generation and evaluation.

  • Researched and implemented generative model architectures for synthetic data diversity, fidelity, and privacy.
  • Integrated differential privacy into PyTorch-based generative models to support stronger data protection standards.
  • Optimized training and evaluation pipelines with Numba for cloud and on-premises deployment.
  • Engineered evaluation systems for relevance, statistical fidelity, accuracy, and privacy of generated data.
PyTorch Differential privacy Numba TensorFlow Scikit-learn Docker
February 2021 - February 2022 Remote

AI Researcher SpaceML

Self-supervised and active learning for petabyte-scale satellite imagery.

  • Collaborated with NASA IMPACT to identify relevant Earth observation imagery across massive unlabeled datasets.
  • Built scalable active learning pipelines that reduced manual image labeling from 7,000 hours to 52 minutes for a five-million-image climate dataset.
  • Open-sourced labeling tooling used in NASA's Phenomenon portal.
  • Contributed to a NASA Science Mission Directorate grant proposal ranked in the top 5 of 79 initiatives.
Active learning Self-supervised learning PyTorch TensorFlow AWS GCP
January 2020 - July 2022 India

Research Assistant Thapar University

Explainable AI, medical image synthesis, and applied forecasting research.

  • Studied CNN architecture choices and hyperparameter effects on explainability methods including LIME and SHAP.
  • Explored GAN-based synthetic COVID-19 CT and MRI generation to improve deep learning detection systems.
  • Built machine-learning forecasting workflows for social media traction using NumPy, Pandas, SQL, and model selection.
CNNs GANs LIME SHAP OpenCV SQL

Projects

Systems, research prototypes, and deployed ML workflows.

Current applied ML

Supply Chain Optimization Engine

A solver-agnostic logistics optimization program for shipping, capacity, routing, and demand-priority decisions in forestry operations.

Product ML

People-to-People Recommendation System

A matching and personalization workflow using LLM-derived features, RAG context, vector search, and behavioral analytics.

NLP research

Biomedical Lay Summarization with LLMs

Generated accessible biomedical research summaries for non-expert audiences using domain-specific LLMs, prompt tuning, RAG, and representation engineering.

LLM reasoning

Annotated Corpus for Reasoning Explanations

Created structured natural-language explanations for AI2 Reasoning Challenge questions and used Tree of Thought prompting to guide step-by-step problem solving.

Earth observation

SpaceML Worldview Search

Active learning system for identifying relevant satellite imagery from petabyte-scale unlabeled Earth observation datasets.

Mobile ML

Grape Leaf Disease Diagnosis

Real-time iOS app for grape leaf disease diagnosis with on-device deep learning, remedy suggestions, and model optimization.

Publications

Peer-reviewed work across explainable AI, Earth science, mobile ML, and forecasting.

International Conference on Computer Communication and the Internet / 2022

Machine Learning Based Explainable Financial Forecasting

Mandeep, Abhishek Agarwal, Amrita Bhatia, Avleen Malhi, Priyal Kaler, Husanbir Singh Pannu

Explainable forecasting research focused on making model signals more transparent for financial prediction workflows.

Capabilities

A practical toolkit for turning models into working systems.

Optimization and forecasting

  • Linear programming and mixed-integer optimization
  • SciPy HiGHS, SCIP, CPLEX, and Gurobi
  • Chronos2, TSMixer, XGBoost, LightGBM, and unit8-darts
  • Backtesting, model comparison, and error analysis

Machine learning systems

  • PyTorch, TensorFlow, Keras, Hugging Face, and Scikit-learn
  • Recommendation systems and vector search
  • Synthetic data generation and differential privacy
  • Active learning and self-supervised learning

Language and data products

  • LLMs, RAG, prompt engineering, and control vectors
  • Natural language processing and computational linguistics
  • BigQuery, PostgreSQL, MongoDB, Firebase, and SQL
  • Looker Studio, Plotly, Altair, Matplotlib, and Seaborn

Engineering and delivery

  • Python, R, JavaScript, Bash, and MATLAB
  • Docker, GitHub Actions, Git, GitHub, and GitLab
  • Azure ML, Microsoft Fabric, GCP Cloud Run, AWS, and Kubernetes
  • Weights & Biases, experiment tracking, and reproducible workflows

Education

Data science, computational linguistics, and computer engineering.

September 2023 - July 2024

Master of Data Science in Computational Linguistics

University of British Columbia / Vancouver, BC

GPA 93.4. Focused on advanced NLP, transformer models, machine learning and optimization, computational linguistics, and interactive data visualization.

August 2019 - July 2023

Bachelor of Technology in Computer Engineering

Thapar Institute of Engineering and Technology / Patiala, Punjab, India

GPA 84.5. Covered data structures and algorithms, AI and machine learning, database systems, and software engineering.

Contact

Open to practical AI, optimization, and applied research conversations.