This is a beginner-friendly project to understand how to create experiments and test model versions a simple model with MlFlow locally. The data for this project is available here https://www.kaggle.com/datasets/anshtanwar/global-data-on-sustainable-energy/data.
- [Project Overview]
- [Installation]
- [Project Structure]
Our objectives are the below:
- Testing a bagging regressor model with MLFlow.
- Creating a streamline process for data ingestion, data preparation, model training and model evaluation.
- MLFlow, Sklearn, Numpy and Pandas are mainly used, few other ancillary libraries are used as well.
Below screenshot of the logged model after pre-processing, training and testing steps completed with MLFlow.
Below libraries needs to be installed in the virtual environment:
- MLFlow
- Pandas
After cloning the repository, please follow the below steps:
- Install all the libraries mentioned above into the virtual environment.
- Afterwards, run 'python tracker.py'
- Finally, run 'mlflow ui --port 5000' to view the MLFlow dashboard to view the logged model.
- If you improve your model, you can re-run the steps again and see the improved model.
├── loading_data.py # Ingests the data from the Kaggle website.
├── cleaning_data.py # Cleans the data and prepares it.
├── model.py # The model architecture, training and evaluation.
├── tracker.py # Logging all details into MlFlow for the training run and inference.
└── README.md # Project documentation