Vancouver, BC, Canada / Data Scientist | Machine Learning Engineer

Mandeep

I build practical AI systems for operations, recommendations, scientific data, and language-heavy workflows. My work spans mathematical optimization, time series forecasting, LLM/RAG products, privacy-preserving synthetic data, and applied research.

  • Optimization
  • Forecasting
  • LLM Systems
  • Recommenders
  • Applied Research

Current Focus

Practical AI systems that connect models to decisions.

Best fit

Data scientist or machine learning engineer roles focused on practical AI systems.

Strongest domains

Optimization, forecasting, recommender systems, LLM/RAG products, and applied research.

Working style

Translate ambiguous operational problems into measurable models, pipelines, and decision support.

$1.5M projected annual savings

Optimization work translating supply chain rules, capacity limits, routing choices, and demand priorities into solver-ready formulations.

7,000h to 52m labeling time reduction

Active learning pipeline for satellite imagery selection with SpaceML and NASA IMPACT.

Top 5 / 79 NASA SMD proposal

Research initiative ranking for AI-assisted Earth observation workflows.

4 published research works

Explainable AI, satellite data curation, mobile ML for agriculture, and machine-learning forecasting.

Selected Work

Operational ML with measurable constraints.

July 2025 - Present Vancouver, BC

Data Scientist Canfor

Optimization, forecasting, and enterprise ML workflows for forestry supply chain and manufacturing operations.

  • Leading a large-scale logistics optimization program that turns operational constraints into mathematical formulations for lower shipping and routing costs.
  • Built solver-agnostic optimization pipelines across SciPy HiGHS, SCIP, CPLEX, and Gurobi to compare solution quality, runtime, and scalability.
  • Developing production forecasting pipelines with Chronos2, TSMixer, XGBoost, LightGBM, and unit8-darts, using rigorous backtesting and error analysis.
  • Deploying and orchestrating ML workflows with Azure Machine Learning and Microsoft Fabric for reproducible enterprise experimentation.
  • Python
  • Linear programming
  • Mixed-integer optimization
  • Azure ML
  • Microsoft Fabric
  • Darts
April 2025 - July 2025 Remote

Machine Learning Engineer CoffeeSpace

Recommendation systems, feature engineering, analytics pipelines, and cloud-native deployment.

  • Enhanced a people-to-people recommender system with LLM-driven feature engineering and retrieval-augmented generation.
  • Built analytics pipelines to evaluate user behavior and recommendation quality with BigQuery and Looker Studio.
  • Developed Python, JavaScript, Firebase, and GCP Cloud Run services to support recommendation and analytics workflows.
  • Integrated Qdrant vector search to improve similarity retrieval and personalization.
  • LLMs
  • RAG
  • Qdrant
  • BigQuery
  • Looker Studio
  • GCP
February 2022 - July 2023 Remote

Machine Learning Engineer Betterdata

Privacy-preserving synthetic tabular data generation and evaluation.

  • Researched and implemented generative model architectures for synthetic data diversity, fidelity, and privacy.
  • Integrated differential privacy into PyTorch-based generative models to support stronger data protection standards.
  • Optimized training and evaluation pipelines with Numba for cloud and on-premises deployment.
  • Engineered evaluation systems for relevance, statistical fidelity, accuracy, and privacy of generated data.
  • PyTorch
  • Differential privacy
  • Numba
  • TensorFlow
  • Scikit-learn
  • Docker
February 2021 - February 2022 Remote

AI Researcher SpaceML

Self-supervised and active learning for petabyte-scale satellite imagery.

  • Collaborated with NASA IMPACT to identify relevant Earth observation imagery across massive unlabeled datasets.
  • Built scalable active learning pipelines that reduced manual image labeling from 7,000 hours to 52 minutes for a five-million-image climate dataset.
  • Open-sourced labeling tooling used in NASA's Phenomenon portal.
  • Contributed to a NASA Science Mission Directorate grant proposal ranked in the top 5 of 79 initiatives.
  • Active learning
  • Self-supervised learning
  • PyTorch
  • TensorFlow
  • AWS
  • GCP
January 2020 - July 2022 India

Research Assistant Thapar University

Explainable AI, medical image synthesis, and applied forecasting research.

  • Studied CNN architecture choices and hyperparameter effects on explainability methods including LIME and SHAP.
  • Explored GAN-based synthetic COVID-19 CT and MRI generation to improve deep learning detection systems.
  • Built machine-learning forecasting workflows for social media traction using NumPy, Pandas, SQL, and model selection.
  • CNNs
  • GANs
  • LIME
  • SHAP
  • OpenCV
  • SQL

Projects

Optimization models, research prototypes, and deployed AI workflows.

Applied mathematics Current

Supply Chain Optimization Engine

A large-scale linear programming and operations research project for modeling shipping, capacity, routing, and demand-priority decisions in forestry operations.

Context
Forestry supply chain planning with capacity limits, demand priorities, and routing tradeoffs.
Role
Own mathematical modeling, solver comparison, and translation of operational rules into solver-ready formulations.
Outcome
Projected annual logistics savings of approximately $1.5M.

Product ML Product

People-to-People Recommendation System

A matching and personalization workflow using LLM-derived features, RAG context, vector search, and behavioral analytics.

Context
Cloud-native recommendation and analytics workflows for a people-matching product.
Role
Enhanced feature engineering, retrieval, analytics, and deployment workflows.
Outcome
Improved recommendation evaluation and similarity search for product workflows.

NLP research Research

Biomedical Lay Summarization with LLMs

Generated accessible biomedical research summaries for non-expert audiences using domain-specific LLMs, prompt tuning, RAG, and representation engineering.

Context
Biomedical NLP workflow for making technical research easier for non-expert readers.
Role
Designed summarization prompts, retrieval context, and representation-control experiments.
Outcome
Improved factuality, readability, and accessibility of complex biomedical summaries.

LLM reasoning Research

Annotated Corpus for Reasoning Explanations

Created structured natural-language explanations for AI2 Reasoning Challenge questions and used Tree of Thought prompting to guide step-by-step problem solving.

Context
Science-question reasoning workflow for improving transparency in model outputs.
Role
Built explanation annotations and experimented with structured reasoning prompts.
Outcome
Improved transparency and reasoning quality for complex science-question workflows.

Earth observation Open source

SpaceML Worldview Search

Active learning system for identifying relevant satellite imagery from petabyte-scale unlabeled Earth observation datasets.

Context
Citizen-science and NASA IMPACT collaboration for large-scale Earth observation data curation.
Role
Built active learning and labeling workflows for surfacing useful imagery from massive unlabeled datasets.
Outcome
Reduced labeling time from 7,000 hours to 52 minutes for a five-million-image climate dataset.

Mobile ML Published

Grape Leaf Disease Diagnosis

Real-time iOS app for grape leaf disease diagnosis with on-device deep learning, remedy suggestions, and model optimization.

Context
Mobile agricultural diagnosis system designed for real-time field use.
Role
Led mobile ML implementation, model optimization, and on-device inference integration.
Outcome
Optimized models with quantization and pruning for iPhone 8 and newer devices.

Publications

Peer-reviewed work across explainable AI, Earth science, mobile ML, and forecasting.

International Conference on Computer Communication and the Internet / 2022

Machine Learning Based Explainable Financial Forecasting

Mandeep, Abhishek Agarwal, Amrita Bhatia, Avleen Malhi, Priyal Kaler, Husanbir Singh Pannu

Explainable forecasting research focused on making model signals more transparent for financial prediction workflows.

Capabilities

A practical toolkit for turning models into working systems.

Optimization and forecasting

  • Linear programming and mixed-integer optimization
  • SciPy HiGHS, SCIP, CPLEX, and Gurobi
  • Chronos2, TSMixer, XGBoost, LightGBM, and unit8-darts
  • Backtesting, model comparison, and error analysis

Machine learning systems

  • PyTorch, TensorFlow, Keras, Hugging Face, and Scikit-learn
  • Recommendation systems and vector search
  • Synthetic data generation and differential privacy
  • Active learning and self-supervised learning

Language and data products

  • LLMs, RAG, prompt engineering, and control vectors
  • Natural language processing and computational linguistics
  • BigQuery, PostgreSQL, MongoDB, Firebase, and SQL
  • Looker Studio, Plotly, Altair, Matplotlib, and Seaborn

Engineering and delivery

  • Python, R, JavaScript, Bash, and MATLAB
  • Docker, GitHub Actions, Git, GitHub, and GitLab
  • Azure ML, Microsoft Fabric, GCP Cloud Run, AWS, and Kubernetes
  • Weights & Biases, experiment tracking, and reproducible workflows

Education

Data science, computational linguistics, and computer engineering.

September 2023 - July 2024

Master of Data Science in Computational Linguistics

University of British Columbia / Vancouver, BC

GPA 93.4. Focused on advanced NLP, transformer models, machine learning and optimization, computational linguistics, and interactive data visualization.

August 2019 - July 2023

Bachelor of Technology in Computer Engineering

Thapar Institute of Engineering and Technology / Patiala, Punjab, India

GPA 84.5. Covered data structures and algorithms, AI and machine learning, database systems, and software engineering.

Contact

Open to practical AI, optimization, and applied research conversations.