Optimization work translating supply chain rules, capacity limits, routing choices, and demand priorities into solver-ready formulations.
Vancouver, BC, Canada / Data Scientist | Machine Learning Engineer
Mandeep
I build practical AI systems for operations, recommendations, scientific data, and language-heavy workflows. My work spans mathematical optimization, time series forecasting, LLM/RAG products, privacy-preserving synthetic data, and applied research.
Active learning pipeline for satellite imagery selection with SpaceML and NASA IMPACT.
Research initiative ranking for AI-assisted Earth observation workflows.
Explainable AI, satellite data curation, mobile ML for agriculture, and machine-learning forecasting.
Selected Work
Operational ML with measurable constraints.
Data Scientist Canfor
Optimization, forecasting, and enterprise ML workflows for forestry supply chain and manufacturing operations.
- Leading a large-scale logistics optimization program that turns operational constraints into mathematical formulations for lower shipping and routing costs.
- Built solver-agnostic optimization pipelines across SciPy HiGHS, SCIP, CPLEX, and Gurobi to compare solution quality, runtime, and scalability.
- Developing production forecasting pipelines with Chronos2, TSMixer, XGBoost, LightGBM, and unit8-darts, using rigorous backtesting and error analysis.
- Deploying and orchestrating ML workflows with Azure Machine Learning and Microsoft Fabric for reproducible enterprise experimentation.
Machine Learning Engineer CoffeeSpace
Recommendation systems, feature engineering, analytics pipelines, and cloud-native deployment.
- Enhanced a people-to-people recommender system with LLM-driven feature engineering and retrieval-augmented generation.
- Built analytics pipelines to evaluate user behavior and recommendation quality with BigQuery and Looker Studio.
- Developed Python, JavaScript, Firebase, and GCP Cloud Run services to support recommendation and analytics workflows.
- Integrated Qdrant vector search to improve similarity retrieval and personalization.
Machine Learning Engineer Betterdata
Privacy-preserving synthetic tabular data generation and evaluation.
- Researched and implemented generative model architectures for synthetic data diversity, fidelity, and privacy.
- Integrated differential privacy into PyTorch-based generative models to support stronger data protection standards.
- Optimized training and evaluation pipelines with Numba for cloud and on-premises deployment.
- Engineered evaluation systems for relevance, statistical fidelity, accuracy, and privacy of generated data.
AI Researcher SpaceML
Self-supervised and active learning for petabyte-scale satellite imagery.
- Collaborated with NASA IMPACT to identify relevant Earth observation imagery across massive unlabeled datasets.
- Built scalable active learning pipelines that reduced manual image labeling from 7,000 hours to 52 minutes for a five-million-image climate dataset.
- Open-sourced labeling tooling used in NASA's Phenomenon portal.
- Contributed to a NASA Science Mission Directorate grant proposal ranked in the top 5 of 79 initiatives.
Research Assistant Thapar University
Explainable AI, medical image synthesis, and applied forecasting research.
- Studied CNN architecture choices and hyperparameter effects on explainability methods including LIME and SHAP.
- Explored GAN-based synthetic COVID-19 CT and MRI generation to improve deep learning detection systems.
- Built machine-learning forecasting workflows for social media traction using NumPy, Pandas, SQL, and model selection.
Projects
Systems, research prototypes, and deployed ML workflows.
Current applied ML
Supply Chain Optimization Engine
A solver-agnostic logistics optimization program for shipping, capacity, routing, and demand-priority decisions in forestry operations.
Product ML
People-to-People Recommendation System
A matching and personalization workflow using LLM-derived features, RAG context, vector search, and behavioral analytics.
NLP research
Biomedical Lay Summarization with LLMs
Generated accessible biomedical research summaries for non-expert audiences using domain-specific LLMs, prompt tuning, RAG, and representation engineering.
LLM reasoning
Annotated Corpus for Reasoning Explanations
Created structured natural-language explanations for AI2 Reasoning Challenge questions and used Tree of Thought prompting to guide step-by-step problem solving.
Earth observation
SpaceML Worldview Search
Active learning system for identifying relevant satellite imagery from petabyte-scale unlabeled Earth observation datasets.
Mobile ML
Grape Leaf Disease Diagnosis
Real-time iOS app for grape leaf disease diagnosis with on-device deep learning, remedy suggestions, and model optimization.
Publications
Peer-reviewed work across explainable AI, Earth science, mobile ML, and forecasting.
Deep learning-based explainable target classification for synthetic aperture radar images
Explainable deep learning for SAR target classification, combining model performance with interpretable evidence for high-stakes imagery workflows.
SpaceML Worldview Search - Learnings from an AI citizen scientist team building a NoCode Data Curator from Unlabeled Petabyte Scale Imagery
Lessons from citizen-science tooling for surfacing useful imagery from petabyte-scale unlabeled Earth observation data.
Smartphone Based Grape Leaf Disease Diagnosis and Remedial System Assisted with Explanations
Mobile explainable AI for grape leaf disease diagnosis, built for real-time on-device use and practical field support.
Machine Learning Based Explainable Financial Forecasting
Explainable forecasting research focused on making model signals more transparent for financial prediction workflows.
Capabilities
A practical toolkit for turning models into working systems.
Optimization and forecasting
- Linear programming and mixed-integer optimization
- SciPy HiGHS, SCIP, CPLEX, and Gurobi
- Chronos2, TSMixer, XGBoost, LightGBM, and unit8-darts
- Backtesting, model comparison, and error analysis
Machine learning systems
- PyTorch, TensorFlow, Keras, Hugging Face, and Scikit-learn
- Recommendation systems and vector search
- Synthetic data generation and differential privacy
- Active learning and self-supervised learning
Language and data products
- LLMs, RAG, prompt engineering, and control vectors
- Natural language processing and computational linguistics
- BigQuery, PostgreSQL, MongoDB, Firebase, and SQL
- Looker Studio, Plotly, Altair, Matplotlib, and Seaborn
Engineering and delivery
- Python, R, JavaScript, Bash, and MATLAB
- Docker, GitHub Actions, Git, GitHub, and GitLab
- Azure ML, Microsoft Fabric, GCP Cloud Run, AWS, and Kubernetes
- Weights & Biases, experiment tracking, and reproducible workflows
Education
Data science, computational linguistics, and computer engineering.
Master of Data Science in Computational Linguistics
University of British Columbia / Vancouver, BC
GPA 93.4. Focused on advanced NLP, transformer models, machine learning and optimization, computational linguistics, and interactive data visualization.
Bachelor of Technology in Computer Engineering
Thapar Institute of Engineering and Technology / Patiala, Punjab, India
GPA 84.5. Covered data structures and algorithms, AI and machine learning, database systems, and software engineering.
Contact