cc-snapshot | UCSC OSPO

Final Blog:Improving Usability and Performance in cc-snapshot

Sun, 24 Aug 2025 00:00:00 +0000

My name is Zahra Temori, and I’m thrilled to collaborate with mentor Paul Marshall during this summer on the cc-snapshot project.

Introduction

Reproducibility is an important concept in high performance computing and research. It ensures that experiments can be repeated, validated, and extended with confidence. Achieving a reproducible environment requires identical software stacks, with the exact same dependencies, and configuration. The Chameleon Cloud testbed provides the cc-snapshot tool to support reproducibility by capturing the complete state of a running system. This allows researchers to rerun experiments exactly as before, share setups among each other, and avoid potential environmental issues such as missing dependencies or version mismatches. In this work, we explore how to enhance snapshotting as a reproducible method and make it an effective strategy for HPC research.

Key Achievements

The project was divided into two phases.The first phase focused on usability, reorganizing the tool, and expanding its capabilities. The second phase was benchmarking to evaluate alternative image formats and compression methods to improve snapshotting performance.

Usability Enhancements: The original snapshotting tool had challenges including a limited command line, tightly coupled logic, and minimal testing support, which made it difficult for users to interact with and developers to maintain. To enhance the command line interface, we added a flag to disable automatic updates, giving users more control over when to pull the latest version. We also added a dry-run flag to simulate actions before running a snapshot, allowing developers to test and run safely. Moreover, we implemented support for a custom source path, enabling snapshots of specific directories. This helps developers test smaller directories rather than full snapshots, which can be more complicated when testing functionalities. To improve maintainability, we refactored the codebase into five modular functions, allowing developers to make future changes more easily. In addition, we added automated tests with GitHub Actions to validate new and existing features and ensure that changes work as expected.
Performance Optimization: The default format and compression on snapshotting was Qcow2 with zlib, which often resulted in long snapshot creation time. To address this performance issue, we benchmarked other alternatives such as QCOW2 with zstd compression, and RAW with no compression. We also chose three images of varying sizes: small 4.47 GiB, medium 7.62 GiB, and large 12.7 GiB. The medium size image was user created to demonstrate the snapshotting and compression works for both Chameleon-supported images and user-created images.

Results: We ran each image with different compression methods and recorded four key metrics: creation time, upload time, boot time, and final image size. We calculated the overall time of each compression method from experiments on three different image sizes to evaluate which performed better. The results revealed that zstd compression reduced the creation time around 80.6% across the three image sizes. The upload time for zstd was nearly equal to the zlib method, while RAW images, due to no compression and larger size, uploaded much slower compared to images compressed with zlib and zstd. The boot time was nearly the same across all images, confirming that zlib and zstd take about the same time to uncompress, while RAW images take longer to boot due to large size. Our work suggested that QCOW2 with zstd compression should be used instead of QCOW2 with zlib compression when creating a snapshot. This enables researchers to generate and share reproducible environments faster.

Conclusion and Future Work

Snapshotting is a practical way to support reproducibility in HPC, but to be effective, it should be easy to use and fast enough for real research workflows. Our results show that using zstd compression can drop the snapshot creation time by over 80% compared to the common default zlib compression, without affecting upload or boot performance. Looking ahead, we plan to integrate zstd , try it on more workloads and image types, and explore ways to improve snapshotting for even greater speedups and reliable results.

Deliverables

Repository: All comprehensive analysis code and source code can be found in the CC-SNAPSHOT GitHub Repository.

Assessing and Enhancing CC-Snapshot for Reproducible Experiment Enviroments

Sun, 15 Jun 2025 00:00:00 +0000

Hello, My name is Zahra Temori. I am a rising senior in Computer Science at University of Delaware. I’m excited to be working with the Summer of Reproduciblity and the Chameleon Cloud community. My project is cc-snapshot that focuses on enhancing features for helping researchers capture and share reproducible experimental environments within the Chameleon Cloud testbed.

Here is a detailed information about my project and plans to work for summer proposal.

June 10 – June 14, 2025

Getting started with the first milestone and beginning to explore the Chameleon Cloud and the project:

I began familiarizing myself with the Chameleon Cloud platform. I created an account and successfully accessed a project.
I learned how to launch an instance and create a lease for using computing resources.
I met with my mentor to discuss the project goals and outline the next steps.
I experimented with the environment and captured a snapshot to understand the process.

It has been less than a week and I have learned a lot specially about the Chameleon Cloud and how it is different from other clouds like AWS. I am exited to learn more and make progress.

Thanks for reading, I will keep ypu updated as I work :)

Improving Usability and Performance in cc-snapshot: My Midterm Update

Wed, 24 Jul 2024 00:00:00 +0000

Hi! I’m Zahra Temori, a rising junior studying Computer Science at the University of Delaware. This summer, I’ve had the exciting opportunity to participate in the Chameleon Summer Reproducibility Program, where I’ve been working under the mentorship of Paul Marshall. In this blog post, I’d love to share a midterm update on my project cc-snapshot and highlight what I’ve accomplished so far, what I’ve learned, and what’s coming next. It’s been a challenging but rewarding experience diving into real-world research and contributing to tools that help make science more reproducible!

Project Overview

CC-Snapshot is a powerful tool on the Chameleon testbed that enables users to package their customized environments for reproducibility and experiment replication. In research, reproducibility is essential. It allows scientists to run experiments consistently, share complete setups with others, and avoid environment-related errors. However, the current snapshotting mechanism has limitations that make it unreliable and inefficient, particularly in terms of usability and performance. These issues can slow down workflows and create barriers for users trying to reproduce results. Our goal is to improve both the usability and performance of the cc-snapshot tool. A more user-friendly and optimized system means that users can create and restore snapshots more quickly and easily, without needing to manually rebuild environments, ultimately saving time and improving reliability in scientific computing.

Progress So Far

To structure the work, we divided the project into two main phases:

Improving usability, and
Optimizing performance.

I’ve nearly completed the first phase and have just started working on the second.

Phase One – Usability Improvements

The original version of the cc-snapshot tool had several usability challenges that made it difficult for users to interact with and for developers to maintain. These issues included a rigid interface, lack of flexibility, and limited testing support. All of which made the tool harder to use and extend. To address these, I worked on the following improvements:

Problem: The command-line interface was limited and inflexible. Users couldn’t easily control features or customize behavior, which limited their ability to create snapshots in different scenarios.

Solution: I enhanced the CLI by adding:

A flag to disable automatic updates, giving users more control.
A –dry-run flag to simulate actions before actually running them which is useful for testing and safety.
Support for a custom source path, allowing snapshots of specific directories. This makes the tool much more useful for testing smaller environments.

Problem: The code lacked automated tests. Without tests, developers have to manually verify everything, which is time-consuming and error-prone.

Solution: I implemented a basic test suite and integrated it with GitHub Actions, so the tool is automatically tested on every pull request.

Problem: The tool didn’t follow a modular design. The logic was tightly coupled, making it hard to isolate or extend parts of the code.

Solution: I refactored the code by extracting key functions. This makes the code cleaner, easier to understand, and more maintainable in the long term.

Next Steps – Phase Two: Performance Optimization

After improving the usability of the cc-snapshot tool, the next phase of the project focuses on addressing key performance bottlenecks. Currently, the snapshotting process can be slow and resource-intensive, which makes it less practical for frequent use especially with large environments.

Problem 1: Slow Image Compression The current implementation uses the qcow2 image format with zlib compression, which is single-threaded and often inefficient for large disk images. This leads to long snapshot creation times and high CPU usage.

Solution: I will benchmark and compare different compression strategies, specifically:

qcow2 with no compression
qcow2 with zstd compression, which is faster and multi-threaded
raw image format, which has no compression but may benefit from simpler processing

These tests will help determine which method provides the best tradeoff between speed, size, and resource usage.

Problem 2: Suboptimal Storage Backend Snapshots are currently uploaded to Glance, which can be slow and unreliable. Uploading large images can take several minutes, and this slows down the user workflow.

Solution: I will compare Glance with a faster alternative, the Object Store. Smaller, compressed images may upload significantly faster to the Object Store e.g. 30 seconds vs. 2 minutes. By measuring upload speeds and reliability, I can recommend a better default or optional backend for users.

How I will Measure Performance

To understand the impact of different strategies, I will try to collect detailed metrics across three stages:

Image creation: How long it takes to build the image, depending on compression and format
Image upload: How quickly the snapshot can be transferred to Glance or Object Store
Instance boot time: How fast a new instance can start from that image (compressed formats must be decompressed)

I will run multiple tests for each scenario and record performance metrics like CPU usage, memory usage, disk throughput, and total time for each step. This will help identify the most efficient and practical configuration for real-world use.

Conclusion

Addressing the current usability and performance issues in cc-snapshot is essential to improving the overall user experience. By making the tool easier to use, faster, and more flexible, we can support researchers and developers who depend on reproducible computing for their work. So far, I’ve worked on enhancing the tool’s interface, adding testing support, and refactoring the codebase for better maintainability. In the next phase, I’ll be focusing on benchmarking different compression methods, image formats, and storage backends to improve speed and efficiency. These improvements will help make cc-snapshot a more powerful and user-friendly tool for the scientific community.

Stay tuned for the next update and thank you for following my journey!