MSCS @ Northeastern University
Former Researcher @ Brigham and Women's Hospital / Harvard Medical School
I build machine learning systems, data infrastructure, and research pipelines for complex real-world problems.
My current interests focus on:
- AI systems and evaluation
- machine learning robustness under domain shift
- multimodal and imperfect-data learning
- scalable backend and data infrastructure
- human-in-the-loop AI systems
- reproducible ML workflows
Biomedical and neuroimaging applications are currently my primary problem domains, and my long-term focus is AI/ML systems and research engineering.
Current interests include:
- domain shift
- imperfect and heterogeneous data
- annotation-efficient learning
- self-supervised representation learning
- robustness across scanners/sites/distributions
Particularly interested in:
- representation learning
- sample selection
- generalization under distribution shift
- scalable biomedical ML pipelines
Working on:
- multi-agent AI systems
- evaluation pipelines
- semantic normalization workflows
- backend services and access-control systems
- scalable research tooling
Interested in how AI systems behave in noisy, real-world environments rather than benchmark-only settings.
- representation learning
- multimodal ML
- self-supervised learning
- medical imaging
- segmentation
- model evaluation
- robustness and generalization
- backend APIs
- authentication systems
- RBAC
- databases
- reproducible pipelines
- data versioning
- scalable tooling
- neuroimaging workflows
- MRI pipelines
- BIDS infrastructure
- large-scale biomedical datasets
- scientific Python ecosystem
Python
PyTorch
Flask
REST APIs
MongoDB
JWT / RBAC
Docker
Git/GitHub
Machine Learning
Deep Learning
Medical Imaging
Data Pipelines
Research Infrastructure
- algorithms and data structures
- ML systems engineering
- robust machine learning
- scalable backend systems
- software engineering practices
- deep learning for real-world deployment
- ML systems
- robust AI pipelines
- multimodal learning
- representation learning
- data-centric AI
- medical imaging AI
- AI infrastructure
- backend engineering for ML systems
I am particularly interested in research problems involving:
- distribution shift
- limited-label learning
- imperfect real-world datasets
- trustworthy AI systems
- evaluation and benchmarking
- scalable research tooling
- human-centered AI systems
Long-term goal: build AI systems that remain reliable under noisy, heterogeneous, and real-world conditions.
I prefer building systems that are:
- reproducible
- scalable
- inspectable
- modular
- deployable
- robust to imperfect data
I care less about leaderboard optimization and more about whether systems continue to work under realistic constraints.
- machine learning under domain shift
- medical imaging pipelines
- AI evaluation systems
- backend engineering for ML workflows
- transitioning from neuroscience/medicine into AI systems engineering
- reproducible research infrastructure
This GitHub is gradually being organized around:
- ML systems
- research engineering
- data infrastructure
- robust machine learning
- biomedical AI pipelines
- reproducible scientific workflows
Public repositories will mainly focus on:
- tooling
- pipelines
- infrastructure
- methods
- reproducible engineering workflows
- LinkedIn: https://www.linkedin.com/in/mengyuan-ding
- GitHub: https://github.com/Jessy-Ding
- Email: ding.mengyu@northeastern.edu

