Welcome to your MLOps project! In this hands-on project, you'll build a complete machine learning system to predict the age of abalone (a type of sea snail) using physical measurements instead of the traditional time-consuming method of counting shell rings under a microscope.
Your Mission: Transform a simple ML model into a production-ready system with automated training, deployment, and prediction capabilities.
Traditionally, determining an abalone's age requires:
- Cutting the shell through the cone
- Staining it
- Counting rings under a microscope (very time-consuming!)
Your Goal: Use easier-to-obtain physical measurements (shell weight, diameter, etc.) to predict the age automatically.
📥 Download: Get the dataset from the Kaggle page
- GitHub account
- Kaggle account (for dataset download)
- Python 3.10 or 3.11
-
Fork this repository
⚠️ Important: Uncheck "Copy themainbranch only" to get all project branches
-
Add your team members as admins to your forked repository
-
Set up your development environment:
# Create and activate a virtual environment uv sync source venv/bin/activate # on Windows: venv\Scripts\activate # Install pre-commit hooks for code quality uv pip install pre-commit uv run pre-commit install
By the end of this project, you'll have created:
- Training workflows using Prefect
- Automatic model retraining on schedule
- Reproducible model and data processing
- REST API for real-time predictions
- Input validation with Pydantic
- Docker containerization
- Clean, well-documented code
- Automated testing and formatting
- Proper error handling
This project is organized into numbered branches, each representing a step in building your MLOps system. Think of it like a guided tutorial where each branch teaches you something new!
Here's how it works:
- Each branch = One pull request with specific tasks
- Follow the numbers (branch_0, branch_1, etc.) in order
- Read the PR instructions (PR_0.md, PR_1.md, etc.) before starting
- Complete all TODOs in that branch's code
- Create a pull request when done
- Merge and move to the next branch
For each numbered branch:
# Switch to the branch
git checkout branch_number_i
# Get latest changes (except for branch_0)
git pull origin main
# Note: A VIM window might open - just type ":wq" to close it
# Push your branch
git pushThen:
- 📖 Read the PR_i.md file carefully
- 💻 Complete all the TODOs in the code
- 🔧 Test your changes
- 📤 Open ONE pull request to your main branch
- ✅ Merge the pull request
- 🔄 Move to the next branch
💡 Pro Tip: Always integrate your previous work when starting a new branch (except branch_0)!
Pull Requests (PRs) are how you propose and review changes before merging them into your main codebase. They're essential for team collaboration!
Important: When creating a PR, make sure you're merging into YOUR forked repository, not the original:
❌ Wrong (merging to original repo):

✅ Correct (merging to your fork):

Use uv to manage dependencies. Install or update packages with:
uv add <package>==<version>Then sync the environment and regenerate the dependency files:
uv sync- The pre-commit hooks will automatically format your code
- Remove all TODOs and unused code before final submission
- Use clear variable names and add docstrings
Your project will be evaluated on:
- Clean, readable code structure
- Proper naming conventions
- Good use of docstrings and type hints
- Consistent style (automated with pre-commit)
- Professional presentation
- Code runs without errors
- All requirements implemented correctly
- Clear README with setup instructions
- Team member names and GitHub usernames
- Step-by-step instructions to run everything
- Effective use of Pull Requests
- Good teamwork and communication
When you're done, your repository should contain:
✅ Automated Training Pipeline
- Prefect workflows for model training
- Separate modules for training and inference
- Reproducible model and encoder generation
✅ Automated Deployment
- Prefect deployment for regular retraining
✅ Production API
- Working REST API for predictions
- Pydantic input validation
- Docker containerization
✅ Professional Documentation
- Updated README with team info
- Clear setup and run instructions
- All TODOs removed from code
Ready to start? Head to branch_0 and read PR_0.md for your first task! 🚀