experiment tracking | UCSC OSPO

Final Report : Streamlining Reproducible Machine Learning Research with Automated MLOps Workflows

Thu, 18 Sep 2025 00:00:00 +0000

Final Report: Applying MLOps to Overcome Reproducibility Barriers in ML

Background

Hello! I’m Ahmed Alghali, and this is my final report the project Applying MLOps to Overcome Reproducibility Barriers in ML under the mentorship of Professor Fraida Fund and Mohamed Saeed.

This project aims to address the reproducibility problem in machine learning—both in core ML research and in applications to other areas of science.

The focus is on making large-scale ML experiments reproducible on Chameleon Cloud. To do this; we developed ReproGen, a template generator that produces ready-to-use, reproducible ML training workflows. The goal: is to make the cloud easy for researchers setting up experiments without the worry about the complexity involved in stitching everything together.

Progress Since Mid-Report

Migration from Cookiecutter to Copier

we initially used Cookiecutter for template generation as a templating engine, but it lacked features we were interested in (e.g., conditional questions). we switched to Copier, which provides more flexibility and better matches our use case.

Support for Multiple Setup Modes

We now offer two setup modes, designed to serve both beginners and users who want advanced options/customization:

Basic Mode – minimal prompts (project name, repository link, framework).
Advanced Mode – detailed control (compute site, GPU type, CUDA version, storage site, etc.).

this ensures accessibility for new users, while still enabling fine-grained control for users.

Automated Credential Generation

previously, users had to manually generate application credentials (via Horizon OpenStack UI). now, we provide scripts that can generate two types of credentials programmatically—Swift and EC2—using Chameleon JupyterHub credentials with python-chi and the openstack-sdk client.

Automatic README.md Generation

each generated project includes a customized README.md, containing setup guidance and commands tailored to the user’s configuration.

Bug Fixes and UX Enhancements

Alongside major features, we implemented numerous smaller changes and fixes to improve the reliability and user experience of the tool.

Deliverables

ReproGen GitHub Repository: source code for the template generator.
mlflow-replay branch: explore a past experiment, artifacts, and logged insights.
LLM-Demo branch: hands-on demo to track fine-tuning of an LLM using infrastructure generated by ReproGen.

Next Steps

Compatibility Matrix
- the tool and the generated setup both depend on software dependencies that required paying attention to compatibility. in all level Hardware, OS, Drivers, Computing Platforms, core and 3rd-party libraries. writing a documentation as a start to help future debugging and adding pieces without breaking what is there. .
Maintain Docker Images

so far we have a cpu and GPU docker images for multiple most frequently used framework.
- CPU based image: for data science workload (Scikit-Learn)
- GPU-Nvidia Variant: for Deep Learning workload on Nvidia Machines (Pytorch, Lightning, TensorFlow)
- GPU-AMD Variant: for Deep Learning workload on AMD Machines (Pytorch, Lightning, TensorFlow) adding more variants for more frameworks + Enhancing the experience of the existing images is recommended.

Reflection

When I first joined SoR 2025, I had a problem crystallizing the idea of how I can practically achieve reproducibility and package a tool that would maximizes the chance of reproducing experiment build using it. throughout the journey my mentors took me under their wings and helped me to understand the reproducibility challenges in ML, my Mentor Professor Fraida Fund wrote materials that saved me a lot of time to familiarize my self with the testbed,important Linux tools and commands, and even getting to have hand on practice how large model training happen with MLflow tracking server system is done in the cloud. and Mohamed Saeed. who took the time reviewing my presentation pushing me to do my best. I’m forever thankful in the way they shaped the project and my personal growth. this hands-on experience help me viewing MLOps , cloud APIs, and workflow design in different lenses, and I’m proud to have contributed a tool that can simplify help reproducible research for others.

Midterm Report : Streamlining Reproducible Machine Learning Research with Automated MLOps Workflows

Wed, 30 Jul 2025 00:00:00 +0000

Refresher about the Project

Hi everyone! for the last month I have been working with my mentors Professor Fraida Fund, and Mohamed Saeed on our Project Applying MLOps to overcome reproducibility barriers in machine learning research As a refresher, our goal is to build a template generator for a reproducible machine learning training workflows at the Chameleon testbed. We want to provide our users with the necessary environment configuration in a handy way. so they won’t be overwhelmed with all the intricate details of setting the environment. This will allow for validation and further development of their setup.

What we have done so far

The current workflow begins in JupyterHub, where the user provides basic details such as project name, site, and node type. the notebooks handle key setup tasks, like creating storage buckets, provisioning and configuring a server with GPU support, and mounting buckets locally via rclone. Once the host environment is ready, the user will SSH that machine, generates the necessary variables via a script and launches a containerized virtual lab that integrates Jupyter and MLflow. Inside the container, users authenticate with GitHub, connect or initialize their repositories, and can immediately begin training models, with all metrics, artifacts, and environment details logged for reproducibility.

The progress on the project so far is as follows:

We finalized the selection of frameworks and storage options.

Artifacts are now logged directly from the MLflow server to the Chameleon object store, without relying on a database backend or an intermediate MinIO S3 layer.

Different jupyter lab images for each framework.

We’ve started with the top ML frameworks — PyTorch Lightning, Keras/TensorFlow, and Scikit-Learn. Each framework now has its own image, which will later be tailored to the user’s selection.

Github CLI and Hugging Face integration inside the container.

The Jupyter container now integrates both the GitHub CLI and Hugging Face authentication. Users can manage their code repositories via GitHub CLI commands and authenticate with Hugging Face tokens to download/upload models and datasets. This eliminates the need for manual credential setup and streamlines ML experimentation within the environment.

Custom Logging Utility

To ensure robust tracking of code versioning and environment details, we added a custom logging utility.
These logs are stored alongside metrics and model artifacts in MLflow, ensuring every experiment is fully documented and reproducible. summary of the functionalities:

`log_git()` — Captures Code Versioning

Uses Git commands (via subprocess) to log:

Current branch name
Commit hash
Repository status (clean or dirty)

Example Output:

commit: a7c3e9d
branch: main
status: dirty (1 file modified)
# and git diff output

`log_python()`— Tracks the Python Environment

Platform information + Python environment info (version)
Exports a full pip freeze list to a .txt file
Saved as an MLflow artifact to guarantee exact package version reproducibility

Example Output (pip freeze extract):

numpy==1.26.4
pandas==2.2.1
scikit-learn==1.4.2
torch==2.2.0

`log_gpu()` - Records GPU Information

Detects available GPU devices
Collects details using NVIDIA’s pynvml or AMD’s ROCm tools
Logs:
GPU name
Driver version
CUDA/ROCm version
Captures gpu-type-smi output for deeper inspection

These utilities ensure that each run can be traced back with:

The exact code version
The full Python environment
The hardware details used

Initial customizable template

We’ve prototyped an initial customizable template using Cookiecutter. it provides an interactive CLI, users provide some key project details (e.g., project name, frameworks, GPU type and integrations if any). Cookiecutter then generates a ready-to-use project structure with pre-configured integrations, reducing manual setup and ensuring consistency across environments.

The user will have notebooks to communicate with chameleon testbed resources, containerized environment and custom training scripts to plug their code.

What’s Next

Template Generation via Config + interactive widgets
We are exploring different ways to generate experiment templates using configuration files and interactive widgets in jupyter notebooks. This would let users quickly customize logging setups and considered to be more user-friendly.
AMD-Compatible Images
Extend support by building and testing Docker images optimized for AMD GPUs. Up to now, our development efforts has focused on NVIDIA GPUs using CUDA-based images
End-to-End Lifecycle Example
Provide a larger example demonstrating the entire ML workflow:
- Data preparation
- Training with GPU logging
- Tracking metrics, artifacts, and environment info in MLflow
- Model evaluation and logging
- Reproducing results on different hardware backends

Working on this project so far has been both challenging and eye-opening. I’ve seen how many moving parts need to come together for a smooth workflow. The support from my mentors has been key in helping me turning challenges into real progress.

Thank you for following along — I’m looking forward to sharing more concrete results soon.

Applying MLOps to overcome reproducibility barriers in machine learning research

Sun, 22 Jun 2025 00:00:00 +0000

About the Project

Hello! I’m Ahmed, an undergraduate Computer Science student at the University of Khartoum I’m working on making machine learning research more reproducible for open access research facilities like Chameleon testbed, under the project Applying MLOps to overcome reproducibility barriers in machine learning research, mentored by Prof. Fraida Fund and Mohamed Saeed. as part of this project my proposal aims to build a template generator that generates repositories for reproducible model training on the Chameleon testbed.

Reproducibility

We argue that unless reproducing research becomes as vital and mainstream part of scientific exploration as reading papers is today, reproducibility will be hard to sustain in the long term because the incentives to make research results reproducible won’t outweigh the still considerable costs

— Three Pillars of Practical Reproducibility Paper

By Reproducibility in science we refer to the ability to obtain consistent results using the same methods and conditions as the previous study. in simple words if I used the same data and metholodgy that was used before, I should obtain the same results. this principle is mapped to almost every scientific field including both Machine Learning research in science and core Machine Learning.

Challenges in Reproducibility

The same way the famous paper about the repoducibility crisis in science was published in in 2016, similar discussions have been published discussing this in machine learning research setting, the paper state of the art reproducibility in artificial intelligence after analayzing 400 hundereds papers from top AI conferences, it was found that around 6% shared code, approximately 33% shared test data. In contrast, 54% only shared a pseudocode (summary of the algorithm).

The lack of software dependency management, proper version control, log tracking, and effective artifacts sharing made it very difficult to reproduce research in machine learning.

Reproducibility in machine learning is largely supported by MLOps practices which is the case in the industry where the majority of researchers are backed by software engineers who are responsible of setting experimental environments or develop tools that streamline the workflow.However, in academic settings reproducibility remains a great challenge, researchers prefer to focus on coding, and worry a little about the the complexities invloved in configuring their experimental environment,As a result, the adaptation and standardization of MLOps practices in academia progress slowly. The best way to ensure a seamleas experience with MLOps, is to make these capabilities easily accessible to the researchers’ workflow. by developing a tool that steamlines the process of provisioning resources, enviornment setup, model training and artifacts tracking, that ensures reproducible results.

Proposed Solution

We want the researchers to spin up ML research instances/bare metal on Chameleon testbed while keeping the technical complexity involved in configuring and stitching everything together abstracted, users simply answer frew questions about their project info, frameworks, tools, features and integrations if there are any, and have a full generated,reproducible project. it contains a provisioning/infrastracture config layer for provisioning resources on the cloud, a dockerfile to spin up services and presistent storage for data,the ML tracking server system that logs the artifacts, metadata, environment configuration, system specification (GPUs type) and Git status using Mlflow, powered by a postgresSQL for storing metadata and a S3 Minio bucket to store artifacts.ML code at its core is a containarized training environment backed by persistent storage for the artifacts generated from the experiment and the datasets and containarization of all these to ensure reproducibility.we aim to make the cloud experience easier, by dealing with the configuration needed for setting up the environment having a 3rd party framework, enabling seamless access to benchmarking dataset or any necessary components from services like Hugging face and GitHub as an example will be accessible from the container easily. for more techincal details about the solution you can read my propsal here.

By addressing these challenges we can accelerate the scientific discovery. this not benefits those who are conducting the research but also the once building on top of it in the future. I look forward to share more updates as the project progresses and I welcome feedback from others interested in advancing reproducibility in ML research.