Repository contains end to end mlops project with detailed tutorial and code walkthroughs
- Uses Yellow Taxi Trip data from NYC in parquet format
- Data loading implemented in
load_datafunction - Loads trip data including pickup/dropoff locations, distances, and timestamps
- Feature engineering and preprocessing in
preprocess_data - Data standardization using scikit-learn's StandardScaler
- Linear Regression model to predict trip duration
- Model evaluation using MSE metric
- Model and scaler artifacts saved using joblib
- FastAPI web service for model serving
- REST API endpoint for trip duration predictions
- Takes pickup location, dropoff location, and trip distance as inputs
- Returns predicted trip duration
- Experiment tracking and model versioning
- Logs model parameters, metrics, and artifacts
- Model registry for managing different versions
- UI available at http://127.0.0.1:5000
- Uses Prefect for pipeline orchestration
- Task dependencies and retries configured
- Handles data loading, preprocessing, training and evaluation
- Automated model artifact management
