SoR | UCSC OSPO

Final Report: MPI Appliance for HPC Research on Chameleon

Mon, 01 Sep 2025 00:00:00 +0000

Hi Everyone, This is my final report for the project I completed during my summer as a Summer of Reproducibility (SOR) student. The project, titled “MPI Appliance for HPC Research in Chameleon,” was undertaken in collaboration with Argonne National Laboratory and the Chameleon Cloud community. The project was mentored by Ken Raffenetti and was completed over the summer. This blog details the work and outcomes of the project.

Background

Message Passing Interface (MPI) is the backbone of high-performance computing (HPC), enabling efficient scaling across thousands of processing cores. However, reproducing MPI-based experiments remains challenging due to dependencies on specific library versions, network configurations, and multi-node setups.

To address this, we introduce a reproducibility initiative that provides standardized MPI environments on the Chameleon testbed. This is set up as a master–worker MPI cluster. The master node manages tasks and communication, while the worker nodes do the computations. All nodes have the same MPI libraries, software, and network settings, making experiments easier to scale and reproduce.

Objectives

The aim of this project is to create an MPI cluster that is reproducible, easily deployable, and efficiently configurable.

The key objectives of this project were:

Pre-built MPI Images: Create ready-to-use images with MPI and all dependencies installed.
Automated Cluster Configuration: Develop Ansible playbooks to configure master–worker communication, including host setup, SSH key distribution, and MPI configuration across nodes.
Cluster Orchestration: Develop orchestration template to provision resources and invoke Ansible playbooks for automated cluster setup.

Implementation Strategy and Deliverables

Openstack Image Creation

The first step was to create a standardized pre-built image, which serves as the base image for all nodes in the cluster.

Some important features of the image include:

Built on Ubuntu 22.04 for a stable base environment.
Spack + Lmod integration:
- Spack handles reproducible, version-controlled installations of software packages.
- Lmod (Lua Modules) provides a user-friendly way to load/unload software environments dynamically.
- Together, they allow users to easily switch between MPI versions, libraries, and GPU toolkits
MPICH and OpenMPI pre-installed for standard MPI support and can be loaded/unloaded.
Three image variants for various HPC workloads: CPU-only, NVIDIA GPU (CUDA 12.8), and AMD GPU (ROCm 6.4.2).

These images have been published and are available in the Chameleon Cloud Appliance Catalog:

MPI and Spack for HPC (Ubuntu 22.04) - CPU Only
MPI and Spack for HPC (Ubuntu 22.04 - CUDA) - NVIDIA GPU (CUDA 12.8)
MPI and Spack for HPC (Ubuntu 22.04 - ROCm) - AMD GPU (ROCm 6.4.2)

Cluster Configuration using Ansible

The next step is to create scripts/playbooks to configure these nodes and set up an HPC cluster. We assigned specific roles to different nodes in the cluster and combined them into a single playbook to configure the entire cluster automatically.

Some key steps the playbook performs:

Configure /etc/hosts entries for all nodes.
Mount Manila NFS shares on each node.
Generate an SSH key pair on the master node and add the master’s public key to the workers’ authorized_keys.
Scan worker node keys and update known_hosts on the master.
(Optional) Manage software:
- Install new compilers with Spack
- Add new Spack packages
- Update environment modules to recognize them
Create a hostfile at /etc/mpi/hostfile.

The code is publicly available and can be found on the GitHub repository: https://github.com/rohanbabbar04/MPI-Spack-Experiment-Artifact

Orchestration

With the image now created and deployed, and the Ansible scripts ready for cluster configuration, we put everything together to orchestrate the cluster deployment.

This can be done in two primary ways:

Python CHI(Jupyter) + Ansible

Python-CHI is a python library designed to facilitate interaction with the Chameleon testbed. Often used within environments like Jupyter notebooks.

This setup can be put up as:

Create leases, launch instances, and set up shared storage using python-chi commands.
Automatically generate inventory.ini for Ansible based on launched instances.
Run Ansible playbook programmatically using ansible_runner.
Outcome: fully configured, ready-to-use HPC cluster; SSH into master to run examples.

If you would like to see a working example, you can view it in the Trovi example

Heat Orchestration Template

Heat Orchestration Template(HOT) is a YAML based configuration file. Its purpose is to define/create a stack to automate the deployment and configuration of OpenStack cloud resources.

Challenges

We faced some challenges while working with Heat templates and stacks in particular in Chameleon Cloud

OS::Nova::Keypair(new version): In the latest OpenStack version, the stack fails to launch if the public_key parameter is not provided for the keypair, as auto-generation is no longer supported.
OS::Heat::SoftwareConfig: Deployment scripts often fail, hang, or time out, preventing proper configuration of nodes and causing unreliable deployments.

To tackle these challenges, we designed an approach that is both easy to implement and reproducible. First, we launch instances by provisioning master and worker nodes using the HOT template in OpenStack. Next, we set up a bootstrap node, install Git and Ansible, and run an Ansible playbook from the bootstrap node to configure the master and worker nodes, including SSH, host communication, and MPI setup. The outcome is a fully configured, ready-to-use HPC cluster, where users can simply SSH into the master node to run examples.

Users can view/use the template published in the Appliance Catalog: MPI+Spack Bare Metal Cluster. For example, a demonstration of how to pass parameters is available on Trovi.

Conclusion

In conclusion, this work demonstrates a reproducible approach to building and configuring MPI clusters on the Chameleon testbed. By using standardized images, Ansible automation, and Orchestration Templates, we ensure that every node is consistently set up, reducing manual effort and errors. The artifact, published on Trovi, makes the entire process transparent, reusable, and easy to implement, enabling users/researchers to reliably recreate and extend the cluster environment for their own experiments.

Future Work

Maintaining these images and possibly creating a script to reproduce MPI and Spack on a different image base environment.

Final Update(Mid-Term -> Final): MPI Appliance for HPC Research on Chameleon

Sun, 31 Aug 2025 00:00:00 +0000

Hi everyone! This is my final update, covering the progress made every two weeks from the midterm to the end of the project MPI Appliance for HPC Research on Chameleon, developed in collaboration with Argonne National Laboratory and the Chameleon Cloud community. This blog follows up on my earlier post, which you can find here.

🔧 July 29 – August 11, 2025

With the CUDA- and MPI-Spack–based appliances published, we considered releasing another image variant (ROCm-based) for AMD GPUs. This will be primarily used in CHI@TACC, which provides AMD GPUs. We have successfully published a new image on Chameleon titled MPI and Spack for HPC (Ubuntu 22.04 - ROCm), and we also added an example to demonstrate its usage.

🔧 August 12 – August 25, 2025

With the examples now available on Trovi for creating an MPI cluster using Ansible and Python-CHI, my next step was to experiment with stack orchestration using Heat Orchestration Templates (HOT) on OpenStack Chameleon Cloud. This turned out to be more challenging due to a few restrictions:

OS::Nova::Keypair (new version): In the latest OpenStack version, the stack fails to launch if the public_key parameter is not provided for the keypair, as auto-generation is no longer supported.
OS::Heat::SoftwareConfig: Deployment scripts often fail, hang, or time out, preventing proper configuration of nodes and causing unreliable deployments.

To address these issues, we adopted a new strategy for configuring and creating the MPI cluster: using a temporary bootstrap node.

In simple terms, the workflow of the Heat template is:

Provision master and worker nodes via the HOT template on OpenStack.
Launch a bootstrap node, install Git and Ansible on it, and then run an Ansible playbook from the bootstrap node to configure the master and worker nodes. This includes setting up SSH, host communication, and the MPI environment.

This provides an alternative method for creating an MPI cluster.

We presented this work on August 26, 2025, to the Chameleon Team and the Argonne MPICH Team. The project was very well received.

Stay tuned for my final report on this work, which I’ll be sharing in my next blog post.

[Final Blog] Distrobench: Distributed Protocol Benchmark

Sat, 30 Aug 2025 00:00:00 +0000

Introduction

This is the final blog for our contribution to the Open Testbed for Reproducible Evaluation of Replicated Systems at the Edges project under the mentorship of Fadhil Kurnia for the OSRE program.

Distrobench is a framework to evaluate the performance of replication/coordination protocols for distributed systems. This framework standardizes benchmarking by allowing different protocols to be tested under an identical workload, and supports both local and remote deployment of the protocols. The frameworks tested are restricted under a key-value store application and are categorized under different consistency models, programming languages, and persistency (whether the framework stores its data in-memory or on-disk).

All the benchmark results are stored in a data.json file which can be viewed through a webpage we have provided. A user can clone the git repository, benchmark different protocols on their own machine or in a cluster of remote machines, then view the results locally. We also provided a webpage that shows our own benchmark results which ran on 3 Amazon EC2 t2.micro instances.

How to run a benchmark on Distrobench

Before running a benchmark using Distrobench, the protocol that will be benchmarked must first be built. This is to allow the script to initialize the protocol instance for local benchmark or to send the binaries into the remote machine. The remote machine running the protocol does not need to store the code for the protocol implementations, but does require dependencies for running that specific protocol such as Java, Docker, rsync, etc. The following are commands used to build the ailidani/paxi project which does not need any additional dependency to be run inside of a remote machine:

# Clone the Distrobench repository 
git clone git@github.com:fadhilkurnia/distro.git

# Clone the Paxi repository and build the binary 
cd distro/sut/ailidani.paxi
git clone git@github.com:ailidani/paxi.git
cd paxi/bin/
./build.sh

# Go back to the Distrobench root directory & run python script 
cd ../../../..
python main.py

By default, the script will start 3 local instances of a Paxi protocol implementation that the user chose through the CLI. The user can modify the number of running instances and whether or not it is deployed locally or in a remote machine by changing the contents of the .env file inside the root directory. The following is the contents of the default .env file:

NUM_OF_NODES=3

SSH_KEY=ssh-key.pem
REMOTE_USERNAME=ubuntu

PUBLIC_IP1=127.0.0.1
PUBLIC_IP2=127.0.0.1
PUBLIC_IP3=127.0.0.1

PRIVATE_IP1=127.0.0.1
PRIVATE_IP2=127.0.0.1
PRIVATE_IP3=127.0.0.1

CLIENT_IP=127.0.0.1

OUTPUT=data.json

When running a remote benchmark, a ssh-key should also be added in the root directory to allow the use of ssh and rsync from within the python script. All machines must also allow TCP connection through port 2000-2300 and port 3000-3300 because that would be the port range for communication between the running instances as well as for the YCSB benchmark. Running the benchmark requires the use of at least 3 nodes because it is the minimum number of nodes to support most protocols (5 nodes recommended).

To view the benchmark result in the web page locally, move data.json into the docs/ directory and run python -m http.server 8000. The page is then accessible through http://localhost:8000.

Deep dive on how Distrobench works

The following is the project structure of the Distrobench repository:

distro/
├── main.py // Main python script for running benchmark
├── data.json // Output file for main.py
├── README.md
├── .env // Config for running the benchmark
├── docs/
│ ├── index.html // Web page to show benchmark results
│ ├── data.json // Output file displayed by web page
│ ├── README.md
├── src/
│ ├── utils/
│ └── ycsb/ // Submodule for YCSB
└── sut/ // Systems under test
 ├── ailidani.paxi/
 └── run.py // Protocol-specific benchmark script called by main.py
 ├── apache.zookeeper/
 ├── etcd-io.etcd/
 ├── fadhilkurnia.xdn/
 ├── holipaxos-artifect.holipaxos/
 ├── otoolep.hraftd/
 └── tikv.tikv/

main.py will automatically detect directories inside sut/ and will call the main function inside run.py. The following is the structure of run.py written in pseudocode style:

FUNCTION main(run_ycsb: Function, nodes: List of Nodes, ssh: Dictionary)
 node_data = map_ip_port(nodes)

 SWITCH user\_input
 CASE 0:
 start()
 RETURN
 CASE 1:
 stop()
 RETURN
 CASE 2:
 client_data = []
 FOR EACH item IN node_data
 ADD item.client_addr TO client_data
 END FOR
 run_ycsb(client_data)
 RETURN
 END SWITCH
END FUNCTION

FUNCTION start()
 // Start the protocol instance (local or remote)
END FUNCTION

FUNCTION stop()
 // Stop the protocol instance (local or remote)
END FUNCTION

FUNCTION map_ip_port(nodes: List of Nodes) -> List of Dictionary
 // Generate port numbers based on the protocol requirements
END FUNCTION

The .env file provides both public and private IP addresses to add versatility when running a remote benchmark. Private IP is used for communication between remote machines if they are under the same network group. In the case of our own benchmark, four t2.micro EC2 instances are deployed under the same network group. Three of them are used to run the protocol and the fourth machine acts as the YCSB client. It is possible to use your local machine as the YCSB client instead of through another remote machine by specifying CLIENT_IP in the .env file as 127.0.0.1. The decision to use the remote machine as the YCSB client is made to reduce the impact of network latency between the client and the protocol servers to a minimum.

The main tasks of the start() function can be broken down into the following:

Generate custom configuration files for each remote machine instance (May differ between implementations. Some implementations does not require a config file because they support flag parameters out of the box, others require multiple configuration files for each instance)
rsync binaries into the remote machine (If running a remote benchmark)
Start the instances

The stop() function is a lot simpler since it only kills the process running the protocol and optionally removes the copied binary files in the remote machine. The run_ycsb() function passed onto run.py is defined in main.py and currently supports two types of workload:

Read-heavy: A single-client workload with 95% read and 5% update (write) operations
Update-heavy: A single-client workload with 50% read and 50% update (write) operations

A new workload can be added inside the src/ycsb/workloads directory. Both workloads above only run 1000 operations for the benchmark which may not be enough operations to properly evaluate the performance of the protocols. It should also be noted that while YCSB does support a scan operation, it is never used for our benchmark because none of our tested protocols implement this operation.

How to implement a new protocol in Distrobench

Adding a new protocol to distrobench requires implementing two main components: a Python integration script (run.py) and a YCSB database binding for benchmarking.

Create the protocol directory structure
- Create a new directory under sut/ using format yourrepo.yourprotocol/.
Write run.py integration
- Put script inside yourrepo.yourprotocol/ directory
- Must have the main(run_ycsb, nodes, ssh) function.
- Add start/stop/benchmark menu options
- Handle local (127.0.0.1) and remote deployment
Create YCSB client
- Make Java class extending YCSB’s DB class
- Put inside src/ycsb/yourprotocol/src/main/java/site/ycsb/yourprotocol
- Implement read(), insert(), update(), delete() methods
Register your client
- Register your client to src/pom.xml, src/ycsb/bin/binding.properties, and src/ycsb/bin/ycsb.
Build and test
- Run cd src/ycsb && mvn clean package
- Run python main.py
- Select your protocol and test it

Protocols which have been tested

Distrobench has tested 20 different distributed consensus protocols across 7 different implementation projects.

ailidani/paxi
- Programming Language : Go
- Persistency : On-Disk
- Consistency Model : Linearizability, Eventual
- Protocol : Paxos, EPaxos, SDpaxos, WPaxos, ABD, chain, VPaxos, WanKeeper, KPaxos, Paxos_groups, Dynamo, Blockchain, M2Paxos, HPaxos.
apache/zookeeper
- Programming Language : Java
- Persistency : On-Disk
- Consistency Model : Linearizability + Primary Integrity
- Protocol : Zookeeper implements ZAB (Zookeper Atomic Broadcast)
etcd-io/etcd
- Programming Language : Go
- Persistency : On-Disk
- Consistency Model : Linearizability
- Protocol : Raft
fadhilkurnia/xdn
- Programming Language : Java, Rust
- Persistency : On-Disk
- Consistency Model : Linearizability, Linearizability + Primary Integrity
- Protocol : Gigapaxos
Zhiying12/holipaxos-artifect
- Programming Language : Go, Rust
- Persistency : On-Disk
- Consistency Model : Linearizability
- Protocol : Holipaxos, Omnipaxos, Multipaxos
otoolep/hraftd
- Programming Language : Go
- Persistency : On-Disk
- Consistency Model : Linearizability
- Protocol : Raft
tikv/tikv
- Programming Language : Rust
- Persistency : On-Disk
- Consistency Model : Linearizability
- Protocol : Raft

Challenges

When attempting to benchmark HoliPaxos, the main challenge was handling versions that rely on persistent storage with RocksDB. Since some implementations are written in Go, it was necessary to find compatible versions of RocksDB and gRocksDB (for example, RocksDB 10.5.1 works with gRocksDB 1.10.2). Another difficulty was that RocksDB is resource-intensive to compile, and in our project we did not have sufficient CPU capacity on the remote machine to build RocksDB and run remote benchmarks.
Some projects did not compile successfully at first and required minor modifications to run.

Conclusion and future improvements

The current benchmark result shows the performance of all the mentioned protocols by throughput and benchmark runtime. The results are subject to revisions because it may not reflect the best performance for the protocols due to unoptimized deployment script. We are also planning to switch to a more powerful EC2 machine because t2.micro does not have enough resources to support the use of RocksDB as well as TiKV.

In the near future, additional features will be added to Distrobench such as:

Multi-Client Support: The YCSB client will start multiple clients which will send requests in parallel to different servers in the group.
Commit Versioning: Allows the labelling of all benchmark results with the commit hash of the protocol’s repository version. This allows comparing different versions of the same project.
Adding more Primary-Backup, Sequential, Causal, and Eventual consistency protocols: Implementations with support for a consistency model other than linearizability and one that provides an existing key-value store application are notoriously difficult to find.
Benchmark on node failure
Benchmark on the addition of a new node

End-term Blog: StatWrap: Cross-Project Searching and Classification using Local Indexing

Sat, 23 Aug 2025 00:00:00 +0000

Introduction

Hello everyone!
I am Debangi Ghosh from India, an undergraduate student at the Indian Institute of Technology (IIT) BHU, Varanasi. As part of the StatWrap: Cross-Project Searching and Classification using Local Indexing project, my proposal, under the mentorship of Luke Rasmussen, focuses on developing a full-text search service within the StatWrap user interface. This involves evaluating different search libraries and implementing a classification system to distinguish between active and past projects.

About the Project

As part of the project, I am working on enhancing the usability of StatWrap by enabling efficient cross-project search capabilities. The goal is to make it easier for researchers to discover relevant projects, notes, and assets across both current and archived work, using information that is either user-entered or passively collected by StatWrap.

Given the sensitivity of the data involved, one of the key requirements is that all indexing and search operations must be performed locally. To address this, my responsibilities include:

Evaluating open-source search libraries suitable for local indexing and retrieval
Building the full-text search functionality directly into the StatWrap UI to allow seamless querying across projects
Ensuring reliability through the development of unit tests and comprehensive system testing
Implementing a classification system to label projects as “Active,” “Pinned,” or “Past” within the user interface

This project offers a great opportunity to work at the intersection of software development, information retrieval, and user-centric design—while contributing to research reproducibility and collaboration within scientific workflows.

Deliverables

The project has reached the end of its scope after 12 weeks of work. Here’s a breakdown:

1. Descriptive Comparison of Open-Source Libraries

Compared various open-source search libraries based on evaluation criteria such as indexing speed, search speed, memory usage, typo tolerance, fuzzy searching, partial matching, full-text queries, contextual search, Boolean support, exact word match, installation ease, maintenance, documentation, and developer experience. Decided upon the weights to assign to each of the features and point out the best library to use. According to our weights assigned,

These results are after tuning the hyperparameters to give the best set of results For huge data, FlexSearch has the least memory usage, followed by MiniSearch. The examples we used were limited, so Minisearch had the better memory usage results. Along with the research and evaluation, I looked upon the Performance Benchmark of Full-Text-Search Libraries (Stress Test), available here

The benchmark was measured in terms per seconds, higher values are better (except the test “Memory”). The memory value refers to the amount of memory which was additionally allocated during search.

FlexSearch performs queries up to 1,000,000 times faster compared to other libraries by also providing powerful search capabilities like multi-field search (document search), phonetic transformations, partial matching, tag-search, result highlighting or suggestions. Bigger workloads are scalable through workers to perform any updates or queries to the index in parallel through dedicated balanced threads.

2. The Search User Interface

3. Complete Search Execution Pipeline

4. FlexSearch Features

1. Persistent Indexing with Automatic Loading

Index persistence: Search index automatically saves to disk and loads on startup
Fast restoration: Rebuilds FlexSearch indices from saved document store without re-scanning files
Incremental updates: Detects project changes and updates only modified content
Background processing: Index updates happen asynchronously without blocking the User Interface.

2. Multi-Document Type Support

Unified search: Single search interface for projects, files, people, notes, and assets
Type-specific indices: Separate FlexSearch indices optimized for each document type
Cross-reference capabilities: Documents can reference and link to each other
Flexible schema: Each document type has tailored fields for optimal search performance

3. Intelligent File Content Indexing

Configurable file size limits: Admin-controlled maximum file size for content indexing
Smart file detection: Automatically identifies text files by extension and filename patterns
Content extraction: Full-text indexing with snippet generation for search results
Performance optimization: Skips binary files and respects size constraints to maintain speed

4. Advanced Query Processing

Multi-strategy search: Combines exact matches, fuzzy search, partial matches, and contextual search
Query preprocessing: Removes stop words and applies linguistic filters
Relevance scoring: Custom scoring algorithm considering multiple factors:
- Exact phrase matches (highest weight)
- Individual word matches
- Term frequency with logarithmic capping
- Position-based scoring (earlier matches rank higher)
- Proximity bonuses for terms appearing near each other
- Completeness penalties for missing query terms

5. Real-Time Search Suggestions

Autocomplete support: Dynamic suggestions based on indexed document titles
Search history: Maintains recent searches for quick re-execution
Debounced input: Prevents excessive API calls during typing
Contextual suggestions: Suggestions adapt based on current filters and context

6. Comprehensive Filtering System

Type filtering: Filter by document type (projects, files, people, etc.)
Project scoping: Limit searches to specific projects
File type filtering: Filter files by extension
Advanced search panel: Collapsible interface for power users
Filter persistence: Maintains filter state across searches

7. Performance Monitoring & Analytics

Real-time metrics: Track search times, cache hit rates, and index statistics
Performance dashboard: Visual indicators for system health
Cache management: LRU cache with configurable size and TTL
Search analytics: Historical data on search patterns and performance

8. Index Management Tools

Export/Import functionality: Backup and restore search indices
Full reindexing: Complete index rebuild with progress tracking
Index deletion: Clean slate functionality for troubleshooting
File size adjustment: Modify indexing constraints and rebuild affected content
Index statistics: Detailed breakdown of indexed content by type and project

9. Robust Error Handling & Resilience

Graceful degradation: System continues operating even with partial index corruption
File system error handling: Handles missing files, permission issues, and path changes
Memory management: Prevents memory leaks during large indexing operations
Recovery mechanisms: Automatic fallback to basic search if advanced features fail

10. User Experience Enhancements

Keyboard shortcuts: Ctrl+K to focus search, Escape to clear
Result highlighting: Visual emphasis on matching terms in results
Expandable results: Drill down into detailed information for each result
Loading states: Clear feedback during indexing and search operations
Responsive tabs: Organized results by type with badge counts

5. Classification of Active and Past Projects

A classification system is added within the User Interface similar to “Add to Favorites” option. A new project added by default moves to “Active” section, unless explicitely marked as “Past”. Similarly, when a project is unpinned from Favorites, it goes to “Active” Section.

Conclusion and future Scope

Building a comprehensive search system requires careful attention to performance, user experience, and maintainability. FlexSearch provided the foundation, but the real value came from thoughtful implementation of persistent indexing, advanced scoring, and robust error handling. The result is a search system that feels instant to users while handling complex queries across diverse document types.

The key to success was treating search not as a single feature, but as a complete subsystem with its own data management, performance monitoring, and user interface considerations. By investing in these supporting systems, the search functionality became a central, reliable part of the application that users can depend on.

The future scope would include:

Using a database (for example, SQLite), instead of JSON, which is better for this use case than JSON due to better and efficient query performance and atomic (CRUD) operations.
Integrating any suggestions from my mentors, as well as improvements we feel are necessary.
Developing unit tests for further functionalities and improvements.

[Final]Reproducibility of Interactive Notebooks in Distributed Environments

Wed, 20 Aug 2025 00:00:00 +0000

I am sharing a overview of my project Reproducibility of Interactive Notebooks in Distributed Environments and the work that I did this summer.

Project Overview

This project aims at improving the reproducibility of interactive notebooks which are executed in a distributed environment. Notebooks like in the Jupyter environment have become increasingly popular and are widely used in the scientific community due to their ease of use and portability. Reproducing these notebooks is a challenging task especially in a distributed cluster environment.

In the distributed environments we consider, the notebook code is divided into manager and worker code. The manager code is the main entry point of the program which divides the task at hand into one or more worker codes which run in a parallel, distributed fashion. We utlize several open source tools to package and containerize the application code which can be used to reproduce it across different machines and environments. They include Sciunit, FLINC, and TaskVine. These are the high-level goals of this project:

Generate execution logs for a notebook program.
Generate code and data dependencies for notebook programs in an automated manner.
Utilize the generated dependencies at various granularities to automate the deployment and execution of notebooks in a parallel and distributed environment.
Audit and package the notebook code running in a distributed environment.
Overall, support efficient reproducibility of programs in a notebook program.

Progress Highlights

Here are the details of the work that I did during this summer.

Generation of Execution Logs

We generate execution logs for the notebook programs in a distributed environment the Linux utility strace which records every system call made by the notebook. It includes all files accessed during its execution. We collect separate logs for both manager and the worker code since they are executed on different machines and the dependencies for both are different. By recording the entire notebook execution, we capture all libraries, packages, and data files referenced during notebook execution in the form of execution logs. These logs are then utilized for further analyses.

Extracting Software Dependencies

When a library such as a Python package like Numpy is used by the notebook program, an entry is made in the execution log which has the complete path of the accessed library file(s) along with additional information. We analyze the execution logs for both manager and workers to find and enlist all dependencies. So far, we are limited to Python packages, though this methodology is general and can be used to find dependencies for any programing language. For Python packages, their version numbers are also obtained by querying the package managers like pip or Conda on the local system.

Extracting Data Dependencies

We utilze similar execution logs to identify which data files were used by the notebook program. The list of logged files also contain various configuration or setting files used by certain packages and libraries. These files are removed from the list of data dependencies through post-processing done by analyzing file paths.

Testing the Pipeline

We have conducted our experiments on three use cases obtained from different domains using between 5 and 10 workers. They include distributed image convolution, climate trend analysis, and high energy physics experiment analysis. The results so far are promising with good accuracy and with a slight running time overhead.

Processing at Cell-level

We perform the same steps of log generation and data and software dependency extraction at the level of individual cells in a notebook instead of once for the whole notebook. As a result, we generate software and data dependencies at the level of individual notebook cells. This is achieved by interrupting control flow before and after execution of each cell to write special instructions to the execution log for marking boundaries of cell execution. We then analyze the intervals between these instructions to identify which files and Python packages are accessed by each specific cell. We use this information to generate the list of software dependencies used by that cell only.

We also capture data dependencies by overriding analyzing the execution logs generated by overriding the function of the open function call used to access various files.

Distributed Notebook Auditing

In order to execute and audit workloads in parallel, we use Sciunit Parallel which uses GNU Parallel for efficient parallel execution of tasks. The user specifies the number of tasks or machines to run the task on which is then distributed across them. Once the execution completes, their containerized executions need to be gathered at the host location.

Efficient Reproducibility with Checkpointing

An important challenge with Jupyter notebooks is that sometimes they are unnecessarily time-consuming and resource-intensive, especially when most cells remain unchanged. We worked on NBRewind which is a lightweight tool to accelerate notebook re-execution by avoiding redundant computation. It integrates checkpointing, application virtualization, and content-based deduplication. It enables two kinds of checkpoints: incremental and full-state. In incremental checkpoints, notebook states and dependencies across multiple cells are stored once such that only their deltas are stored again. In full-state checkpoints, the same is stored after each cell. During its restore process, it restores outputs for unchanged cells and thus enables efficient re-execution. Our empirical evaluation demonstrates that NBRewind can significantly reduce both notebook audit and repeat times with incremental checkpoints.

I am very happy abut the experience I have had in this project and I would encourage other students to join this program in the future.

Mid-Term Update: MPI Appliance for HPC Research on Chameleon

Sun, 03 Aug 2025 00:00:00 +0000

Hi everyone! This is my mid-term blog update for the project MPI Appliance for HPC Research on Chameleon, developed in collaboration with Argonne National Laboratory and the Chameleon Cloud community. This blog follows up on my earlier post, which you can find here.

🔧 June 15 – June 29, 2025

Worked on creating and configuring images on Chameleon Cloud for the following three sites: CHI@UC, CHI@TACC, and KVM@TACC.

Key features of the images:

Spack: Pre-installed and configured for easy package management of HPC software.
Lua Modules (LMod): Installed and configured for environment module management.
MPI Support: Both MPICH and Open MPI are pre-installed, enabling users to run distributed applications out-of-the-box.

These images are now publicly available and can be seen directly on the Chameleon Appliance Catalog, titled MPI and Spack for HPC (Ubuntu 22.04).

I also worked on some example Jupyter notebooks on how to get started using these images.

🔧 June 30 – July 13, 2025

With the MPI Appliance now published on Chameleon Cloud, the next step was to automate the setup of an MPI-Spack cluster.

To achieve this, I developed a set of Ansible playbooks that:

Configure both master and worker nodes with site-specific settings
Set up seamless access to Chameleon NFS shares
Allow users to easily install Spack packages, compilers, and dependencies across all nodes

These playbooks aim to simplify the deployment of reproducible HPC environments and reduce the time required to get a working cluster up and running.

🔧 July 14 – July 28, 2025

This week began with me fixing some issues in python-chi, the official Python client for the Chameleon testbed. We also discussed adding support for CUDA-based packages, which would make it easier to work with NVIDIA GPUs. We successfully published a new image on Chameleon, titled MPI and Spack for HPC (Ubuntu 22.04 - CUDA), and added an example to demonstrate its usage.

We compiled the artifact containing the Jupyter notebooks and Ansible playbooks and published it on Chameleon Trovi. Feel free to check it out here. The documentation still needs some work.

📌 That’s it for now! I’m currently working on the documentation, a ROCm-based image for AMD GPUs, and some container-based examples. Stay tuned for more updates in the next blog.

Halfway Blog - WildBerryEye: Mechanical Design & Weather-Resistant Enclosure

Fri, 25 Jul 2025 00:00:00 +0000

Hi everyone! My name is Teodor Langan, and I am an undergraduate studying Robotics Engineering at the University of California, Santa Cruz. I’m happy to share the progress I have been able to make over the last six weeks on my GSoC 2025 project. Over the last six weeks, I have been working on developing the hardware for the WildBerryEye project, mentored by Carlos Isaac Espinosa.

Project Overview

The WildBerryEye project enables AI-powered ecological monitoring using Raspberry Pi cameras and computer vision models. However, achieving this requires a reliable enclosure that can support long-term deployment in the wild. The goal for my project is to address this need by designing a modular, 3D-printable camera casing that protects WildBerryEye’s electronics from outside factors such as rain, dust, and bugs, while remaining easy to print and assemble. To achieve this, my main responsibilities for this project include:

Implementing a modular design and development-friendly features for ease of assembly and flexible use across hardware setups
Prototyping and testing enclosures outdoors to assess durability, water resistance, and ventilation—then iterating based on results
Developing clear documentation, assembly instructions, and designing with open-source tools
Exploring material options and print techniques to improve outdoor lifespan and environmental resilience

Designed largely with FreeCAD and tested in real outdoor conditions, the open-source enclosure will ensure WildBerryEye hardware can be deployed in natural environments for continuous, low-maintenance data collection.

Progress So Far

Over the past 6 weeks, great progress has been made on the design of the WildBerryEye camera enclosure. Some key accomplishments include:

Full 3D Assembly Model of Electronics: Modeled all core components used in the WildBerryEye system to serve as a reference for enclosure design. For parts without existing CAD models, accurate measurements were taken and custom models were created in FreeCAD.
Initial Enclosure Prototype: Designed and 3D-printed a first full prototype featuring a hinge-latch mechanism to allow tool-free easy access to internal electronics for development and maintenance.
Design Iteration Based on Testing: Based on the results of the first print, created an improved version with better electronics integration, port alignment, and more functionality.

Challenges & Next Steps

Field-Ready Integration: Preparing for field testing with upcoming prototypes by making sure that all internal electronics are securely mounted and fully accessible within the enclosure.
Latch Mechanism Refinement: Finalizing a reliable hinge-latch design that can keep the enclosure sealed during outdoor use while remaining easy to open for maintenance.
Balancing Modularity, Size, and Weatherproofing: Maintaining a compact form factor without compromising on modularity or weather resistance—especially when routing cables and mounting components.
Material Experimentation: Beginning test prints with TPU, a flexible filament that may provide improved seals or gaskets for added protection.
Ventilation Without Exposure: Exploring airflow solutions such as labyrinth-style vents to enable heat dissipation without letting in moisture or debris.

Final Thoughts

These past 6 weeks have helped me immensely to grow my skills in mechanical design, CAD modeling, and field-focused prototyping. The WildBerryEye system can help researchers monitor pollinators and other wildlife in their natural habitats without requiring constant in-person observation or high-maintenance setups. By enabling long-term, autonomous data collection in outdoor environments, it opens new possibilities for low-cost, scalable ecological monitoring.

I’m especially grateful to my mentor Carlos Isaac Espinosa and the WildBerryEye team for their ongoing support. Excited for the second half, where the design will face real-world testing and help bring this impactful system one step closer to field deployment!

Mid-term Blog: Building a Simulator for Benchmarking Replicated Systems

Fri, 25 Jul 2025 00:00:00 +0000

Introduction

Hello there, I’m Michael. In this report, I’ll be sharing my progress as part of the Open Testbed for Reproducible Evaluation of Replicated Systems at the Edges project under the mentorship of Fadhil Kurnia.

About the Project

The goal of the project is to build a language-agnostic interface that enables communication between clients and any consensus protocol such as MultiPaxos, Raft, Zookeeper Atomic Broadcast (ZAB), and others. Currently, many of these protocols implement their own custom mechanisms for the client to communicate with the group of peers in the network. An implementation of MultiPaxos from the MultiPaxos Made Complete paper for example, uses a custom Protobuf definition for the packets client send to the MultiPaxos system. With the support of a generalized interface, different consensus protocols can now be tested under the same workload to compare their performance objectively.

Progress

Literature Study: Reviewed papers and implementations of various protocols including GigaPaxos, Raft, Viewstamped Replication (VSR), and ZAB. Analysis focused on their log replication strategies, fault handling, and performance implications.
Development of Custom Protocol: Two custom protocols are currently under development and will serve as initial test subjects for the testbed:
- A modified GigaPaxos protocol
- A Primary-Backup Replication protocol with strict log ordering similar to ZAB (logs are ordered based on the sequence proposed by the primary)
Most of my time has been spent working on the two protocols, particularly on snapshotting and state transfer functionality in the Primary-Backup protocol. Ideally, the testbed should be able to evaluate protocol performance in scenarios involving node failure or a new node being added. In these scenarios, different protocol implementations often vary in their decision of whether to take periodic snapshots or to roll forward whenever possible and generate a snapshot only when necessary.

Challenges

Early in the project, the initial goal was to benchmark different consensus protocols using arbitrary full-stack web applications as their workload. Different protocols would replicate a full-stack application running inside Docker containers across multiple nodes and the testbed would send requests for them to coordinate between those nodes. In fact, the 2 custom protocols being worked on are specifically made to fit these constraints.

Developing a custom protocol that supports the replication of a Docker container is in itself already a difficult task. Abstracting away the functionality that allows communicating with the docker containers, as well as handling entry logs and snapshotting the state, is an order of magnitude more complicated.

As mentioned in the first blog, an application can be categorized into two types: deterministic and non-deterministic applications. The coordination of these two types of applications are handled in very different ways. Most consensus protocols support only deterministic systems, such as key-value stores and can’t easily handle coordination of complex services or external side effects. To allow support for non-deterministic applications would require abstracting over protocol-specific log structures. This effectively restricts the interface to only support protocols that conform to the abstraction, defeating the goal of making the interface broadly usable and protocol-agnostic.

Furthermore, in order to allow any existing protocols to support running something as complex as a stateful docker container without the protocol itself even knowing adds another layer of complexity to the system.

Future Goals

Given these challenges, I decided to pivot to using only key-value stores as the application being used in the benchmark. This aligns with the implementations of most of the existing protocols which typically use key-value stores. In doing so, now the main focus would be to implement an interface that supports HTTP requests from clients to any arbitrary protocols.

Midterm Blog: Open Testbed for Reproducible Evaluation of Replicated Systems at the Edges

Fri, 25 Jul 2025 00:00:00 +0000

Hello! I’m Panji Sri Kuncara Wisma and I want to share my midterm progress on the “Open Testbed for Reproducible Evaluation of Replicated Systems at the Edges” project under the mentorship of Fadhil I. Kurnia.

Project Overview

The goal of our project is to create an open testbed that enables fair, reproducible evaluation of different consensus protocols (Paxos variants, EPaxos, Raft, etc.) when deployed at network edges. Currently, researchers struggle to compare these systems because they lack standardized evaluation environments and often rely on mock implementations of proprietary systems.

XDN (eXtensible Distributed Network) is one of the important consensus systems we plan to evaluate in our benchmarking testbed. Built on GigaPaxos, it allows deployment of replicated stateful services across edge locations. As part of preparing our benchmarking framework, we need to ensure that the systems we evaluate, including XDN, are robust for fair comparison.

Progress

As part of preparing our benchmarking tool, I have been working on refactoring XDN’s FUSE filesystem from C++ to Rust. This work is essential for creating a stable and reliable XDN platform.

The diagram above illustrates how the FUSE filesystem integrates with XDN’s distributed architecture. On the left, we see the standard FUSE setup where applications interact with the filesystem through the kernel’s VFS layer. On the right, the distributed replication flow is shown: Node 1 runs fuselog_core which captures filesystem operations and generates statediffs, while Nodes 2 and 3 run fuselog_apply to receive and apply these statediffs, maintaining replica consistency across the distributed system.

This FUSE component is critical for XDN’s operation as it enables transparent state capture and replication across edge nodes. By refactoring this core component from C++ to Rust, we’re hopefully strengthening the foundation for fair benchmarking comparisons in our testbed.

Core Work: C++ to Rust FUSE Filesystem Migration

XDN relies on a FUSE (Filesystem in Userspace) component to capture filesystem operations and generate “statediffs” - records of changes that get replicated across edge nodes. The original C++ implementation worked but had memory safety concerns and limited optimization capabilities.

I worked on refactoring from C++ to Rust, implementing several improvements:

New Features Added:

Zstd Compression: Reduces statediff payload sizes
Adaptive Compression: Intelligently chooses compression strategies
Advanced Pruning: Removes redundant operations (duplicate chmod/chown, created-then-deleted files)
Bincode Serialization: Helps avoid manual serialization code and reduces the risk of related bugs
Extended Operations: Added support for additional filesystem operations (mkdir, symlink, hardlinks, etc.)

Architectural Improvements:

Memory Safety: Rust’s ownership system helps prevent common memory management issues
Type Safety: Using Rust enums instead of integer constants for better type checking

Findings

The optimization results performed as expected:

Statediff Size Reductions:

MySQL workload: 572MB → 29.6MB (95% reduction)
PostgreSQL workload: 76MB → 11.9MB (84% reduction)
SQLite workload: 4MB → 29KB (99% reduction)

The combination of write coalescing, pruning, and compression proves especially effective for database workloads, where many operations involve small changes to large files.

Performance Comparison: Remarkably, the Rust implementation matches or exceeds C++ performance:

POST operations: 30% faster (10.5ms vs 15ms)
DELETE operations: 33% faster (10ms vs 15ms)
Overall latency: Consistently better (9ms vs 11ms)

Current Challenges

While the core implementation is complete and functional, I’m currently debugging occasional latency spikes that occur under specific workload patterns. These edge cases need to be resolved before moving on to the benchmarking phase, as inconsistent performance could compromise the reliability of the evaluation.

Next Steps

With the FUSE filesystem foundation nearly complete, next steps include:

Resolve latency spike issues and complete XDN stabilization
Build benchmarking framework - a comparison tool that can systematically evaluate different consensus protocols with standardized metrics.
Run systematic evaluation across protocols

The optimized filesystem will hopefully provide a stable base for reproducible performance comparisons between distributed consensus protocols.

Mid-term Blog: StatWrap: Cross-Project Searching and Classification using Local Indexing

Tue, 15 Jul 2025 00:00:00 +0000

Introduction

About the Project

As part of the project, I am working on enhancing the usability of StatWrap by enabling efficient cross-project search capabilities. The goal is to make it easier for investigators to discover relevant projects, notes, and assets—across both current and archived work—using information that is either user-entered or passively collected by StatWrap.

Given the sensitivity of the data involved, one of the key requirements is that all indexing and search operations must be performed locally. To address this, my responsibilities include:

Evaluating open-source search libraries suitable for local indexing and retrieval
Building the full-text search functionality directly into the StatWrap UI to allow seamless querying across projects
Ensuring reliability through the development of unit tests and comprehensive system testing
Implementing a classification system to label projects as “Active,” “Pinned,” or “Past” within the user interface

Progress

It has been more than six weeks since the project began, and significant progress has been made. Here’s a breakdown:

1. Descriptive Comparison of Open-Source Libraries

2. The Libraries

Lunr.js
A small, client-side full-text search engine that mimics Solr capabilities.
- Field-based search, boosting
- Supports TF-IDF, inverted index
- No built-in fuzzy search (only basic wildcards)
- Can serialize/deserialize index
- Not designed for large datasets
- Moderate memory usage and indexing speed
- Good documentation
- Best for: Static websites or SPAs needing simple in-browser search
ElasticLunr.js
A lightweight, more flexible alternative to Lunr.js.
- Dynamic index (add/remove docs)
- Field-based and weighted search
- No advanced fuzzy matching
- Faster and more customizable than Lunr
- Smaller footprint
- Easy to use and maintain
- Best for: Developers wanting Lunr-like features with simpler customization
Fuse.js
A fuzzy search library ideal for small to medium datasets.
- Fuzzy search with typo tolerance
- Deep key/path searching
- No need to build index
- Highly configurable (threshold, distance, etc.)
- Linear scan = slower on large datasets
- Not full-text search (scoring-based match)
- Extremely easy to set up and use
- Best for: Fuzzy search in small in-memory arrays (e.g., auto-suggest, dropdown filters)
FlexSearch
A blazing-fast, modular search engine with advanced indexing options.
- Extremely fast search and indexing
- Supports phonetic, typo-tolerant, and partial matching
- Asynchronous support
- Multi-language + Unicode-friendly
- Low memory footprint
- Configuration can be complex for beginners
- Best for: High-performance search in large/multilingual datasets
MiniSearch
A small, full-text search engine with balanced performance and simplicity.
- Fast indexing and searching
- Fuzzy search, stemming, stop words
- Field boosting and prefix search
- Compact, can serialize index
- Clean and modern API
- Lightweight and easy to maintain
- Best for: Balanced, in-browser full-text search for moderate datasets
Search-Index
A persistent, full-featured search engine for Node.js and browsers.
- Persistent storage with LevelDB
- Real-time indexing
- Fielded queries, faceting, filtering
- Advanced queries (Boolean, range, etc.)
- Slightly heavier setup
- Good for offline/local-first apps
- Browser usage more complex than others
- Best for: Node.js apps, not directly compatible with the Electron + React environment of StatWrap

3. Developer Experience and Maintenance

We analyzed the download trends of the search libraries using npm trends, and also reviewed their maintenance statistics to assess how frequently they are updated.

4. Comparative Analysis After Testing

Each search library was benchmarked against a predefined set of queries based on the same evaluation criteria.
We are yet to finalize the weights for each criterion, which will be done during the end-term evaluation.

5. The User Interface

The user interface includes options to search using three search modes (Basic, Advanced, Boolean operators) with configurable parameters. Results are sorted based on relevance score (highest first), and also grouped by category.

6. Overall Functioning

Indexing Workflow
- Projects are processed sequentially
- Metadata, files, people, and notes are indexed (larger files are queued for later)
- Uses a “brute-force” recursive approach to walk through project directories
  - Skips directories like node_modules, .git, .statwrap
  - Identifies eligible text files for indexing
  - Logs progress every 10 files
Document Creation Logic
- Reads file content as UTF-8 text
- Builds searchable documents with filename, content, and metadata
- Auto-generates tags based on content and file type
- Adds documents to the search index and document store
- Handles errors gracefully with debug logging
Search Functionality
- Uses field-weighted search
- Enriches results with document metadata
- Supports filtering by type or project
- Groups results by category (files, projects, people, etc.)
- Implements caching for improved performance
- Search statistics are generated to monitor performance

Challenges and End-Term Goals

In-memory Indexing Metadata Storing
Most JavaScript search libraries (like Fuse.js, Lunr, MiniSearch) store indexes entirely in memory, which can become problematic for large-scale datasets. A key challenge is designing a scalable solution that allows for disk persistence or lazy loading to prevent memory overflows.
Deciding the Weights Accordingly
An important challenge is tuning the relevance scoring by assigning appropriate weights to different aspects of the search, such as exact word matches, prefix matches, and typo tolerance. For instance, we prefer exact matches to be ranked higher than fuzzy or partial matches.
Implementing the Selected Library
Once a library is selected (based on speed, features, and compatibility with Electron + React), the next challenge is integrating it into StatWrap efficiently—ensuring local indexing, accurate search results, and smooth performance even with large projects.
Classifying Active and Past Projects in the User Interface
To improve navigation and search scoping, we plan to introduce three project sections in the interface: Pinned, Active, and Past projects. This classification will help users prioritize relevant content while enabling smarter indexing strategies.

Stay tuned for the next blog!

MPI Appliance for HPC Research on Chameleon

Sat, 14 Jun 2025 00:00:00 +0000

Hi Everyone,

I’m Rohan Babbar from Delhi, India. This summer, I’m excited to be working with the Argonne National Laboratory and the Chameleon Cloud community. My project focuses on developing an MPI Appliance to support reproducible High-Performance Computing (HPC) research on the Chameleon testbed.

For more details about the project and the planned work for the summer, you can read my proposal here.

👥 Community Bonding Period

Although the project officially started on June 2, 2025, I made good use of the community bonding period beforehand.

I began by getting access to the Chameleon testbed, familiarizing myself with its features and tools.
I experimented with different configurations to understand the ecosystem.
My mentor, Ken Raffenetti, and I had regular check-ins to align our vision and finalize our milestones, many of which were laid out in my proposal.

🔧 June 2 – June 14, 2025

Our first milestone was to build a base image with MPI pre-installed. For this:

We decided to use Spack, a flexible package manager tailored for HPC environments.
The image includes multiple MPI implementations, allowing users to choose the one that best suits their needs and switch between them using simple Lua Module commands.

📌 That’s all for now! Stay tuned for more updates in the next blog.

Thanks for reading!

ML-Powered Problem Detection in Chameleon

Fri, 18 Oct 2024 00:00:00 +0000

Hello! My name is Syed Mohammad Qasim, a PhD candidate at the Department of Electrical and Computer Engineering, Boston University. This summer I worked on the project ML-Powered Problem Detection in Chameleon as part of the Summer of Reproducibility (SoR) program with the mentorship of Ayse Coskun and Michael Sherman.

Chameleon is an open testbed that has supported over 5,000 users working on more than 500 projects. It provides access to over 538 bare metal nodes across various sites, offering approximately 15,000 CPU cores and 5 petabytes of storage. Each site runs independent OpenStack services to deliver its offerings. Currently, Chameleon Cloud comprehensively monitors the sites at the Texas Advanced Computing Center (TACC) and the University of Chicago. Metrics are collected using Prometheus at each site and fed into a central Mimir cluster. All logs are sent to a central Loki, with Grafana used for visualization and alerting. Chameleon currently collects around 3,000 metrics. Manually reviewing and setting alerts for them is time-consuming and labor-intensive. This project aims to help Chameleon operators monitor their systems more effectively and improve overall reliability by creating an anomaly detection service to augment the existing alerting framework.

Over the summer, we focused on analyzing the data and identified 33 key metrics, after discussions with Chameleon operators, from the Prometheus Node Exporter that serve as leading indicators of resource usage on the nodes. For example:

CPU usage: Metrics like node_load1, node_load5, and node_load15.
Memory usage: Including buffer utilization.
Disk usage: Metrics for I/O time, and read/write byte rates.
Network activity: Rate of bytes received and transmitted.
Filesystem metrics: Such as inode_utilization_ratio and node_procs_blocked.
System-level metrics: Including node forks, context switches, and interrupts.

Collected at a rate of every 5 minutes, these metrics provide a comprehensive view of node performance and resource consumption. After finalizing the metrics we wanted to monitor, we selected the following four anomaly detection methods, primarily due to their popularity in academia and recent publication in high-impact conferences such as SIG-KDD and SC.

Omni Anomaly, [KDD 2019] [without POT selection as it requires labels.]
USAD, [KDD 2020]
TranAD, [KDD 2022]
Prodigy, [SC 2023] [Only the VAE, not using their feature selection as it requires labels.]

We collected 75 days of healthy data from Chameleon, and after applying min-max scaling, we trained the models. We then used these models to run inference on the metrics collected during outages, as marked by Chameleon operators. The goal was to determine whether the outage data revealed something interesting or anomalous. We can verify our approach by manually reviewing the results generated by these four anomaly detection methods. Below are the results from the four methods on different outages, followed by an example of how these methods identified the root cause of an anomaly.

The above figure shows the percentage of outage data that was flagged as anomalous by different models.

The above two plots shows two examples of the top 5 metrics which contributed to the anomaly score by each anomaly detection model.

Although the methods seem to indicate anomalies during outages, they are not able to pinpoint the affected service or the exact cause. For example, the first partial authentication outage was due to a DNS error, which can manifest in various ways, such as reduced CPU, memory, or network usage. This work is still in progress, and we are conducting the same analysis on container-level metrics for each service, allowing us to narrow the scope to the affected service and more effectively identify the root cause of anomalies. We will share the next set of results soon.

Thanks for your time, please feel free to reach out to me for any details or questions.

Data Leakage in Applied ML: model uses features that are not legitimate

Tue, 24 Sep 2024 00:00:00 +0000

Hello everyone!

I have been working on reproducing the results from Identification of COVID-19 Samples from Chest X-Ray Images Using Deep Learning: A Comparison of Transfer Learning Approaches. This study aimed to distinguish COVID-19 cases from normal and pneumonia cases using chest X-ray images. Since my last blog post, we have successfully reproduced the results using the VGG19 model, achieving a 92% accuracy on the test set. However, a significant demographic inconsistency exists: normal and pneumonia chest X-ray images were from pediatric patients, while COVID-19 chest X-ray images were from adults. This allowed the model to achieve high accuracy by learning features that were not clinically relevant.

In Reproducing “Identification of COVID-19 samples from chest X-Ray images using deep learning: A comparison of transfer learning approaches” without Data Leakage, we followed the methodology outlined in the paper, but with a key change: we used datasets containing adult chest X-ray images. This time, the model achieved an accuracy of 51%, a 41% drop from the earlier results, confirming that the metrics reported in the paper were overly optimistic due to data leakage, where the model learned illegitimate features.

To further illustrate this issue, we created a toy example demonstrating how a model can learn illegitimate features. Using a small dataset of wolf and husky images, the model achieved an accuracy of 90%. We then revealed that this performance was due to a data leakage issue: all wolf images had snowy backgrounds, while husky images had grassy backgrounds. When we trained the model on a dataset where both wolf and husky images had white backgrounds, the accuracy dropped to 70%. This shows that the accuracy obtained earlier was an overly optimistic measure due to data leakage.

You can explore our work on the COVID-19 paper here.

Lastly, I would like to thank Fraida Fund and Mohamed Saeed for their support and guidance throughout my SoR journey.

Final Post: Enhancing Reproducibility and Portability in Network Experiments

Thu, 05 Sep 2024 00:00:00 +0000

Introduction

As my project with the Summer of Reproducibility (SoR) 2024 comes to a close, I’d like to reflect on the journey and the outcomes achieved. My project focused on enhancing the reproducibility and portability of network experiments by integrating the RO-Crate standard into the TUM intern testbed pos (plain orchestrating service), and deploying this testbed on the Chameleon cloud infrastructure. The aim was to ensure that experiments conducted on one platform could be seamlessly reproduced on another, adhering to the FAIR principles (Findable, Accessible, Interoperable, Reusable) for research data.

Project Recap

The core goal was to make the experiments reproducible and portable between different testbeds like TUM’s pos and Chameleon. To achieve this, I integrated the RO-Crate standard, which ensures that all experiment data is automatically documented and stored with metadata, making it easier for others and especially for machines to understand, replicate, and build on the results. Additionally, deploying a lightweight version of pos on the Chameleon testbed enabled cross-testbed execution, allowing experiments to be replicated across both environments without significant modifications.

Key Achievements

Over the course of the project, several key milestones were achieved:

RO-Crate Integration: The first step was restructuring the results folder and automating the generation of metadata using RO-Crate. This ensured that all experiment data was comprehensively documented with details like author information, hardware configurations, and experiment scripts resulting in comprehensive ro-crate-metadata.json files as important part of each result folder.
Improved Data Management: The integration of RO-Crate greatly simplified the process of organizing and retrieving experiment data and metadata with information about the experiment and the result files. All metadata was automatically generated, making it easier to share and document the experiments for other researchers to replicate.
Automatic Upload to Zenodo: Another crucial achievement was the implementation of automatic uploading of pos experiment result folders to Zenodo, an open-access repository. This step significantly improved the reproducibility and sharing of experiment results, making them easily accessible to the broader scientific community. By utilizing Zenodo, we ensured that experiment results, along with their RO-Crate metadata, could be archived and referenced, fostering greater transparency and collaboration in scientific research.
Chameleon Deployment: Deploying the pos testbed within the Chameleon environment required managing various complexities, particularly related to Chameleon’s OpenStack API, networking setup, and hardware configurations. Coordinating the network components and infrastructure to support pos functionality in this testbed environment demanded significant adjustments to ensure smooth integration and operation.

Challenges

Like any project, this one came with its own set of challenges:

Balancing Automation and Flexibility: While automating the generation of RO-Crate metadata, it was crucial to ensure that the flexibility required by researchers for customizing their documentation was not compromised. Finding this balance required in-depth adjustments to the testbed infrastructure.
Complexity of Testbed Systems: Integrating RO-Crate into a complex system like pos, and ensuring it works seamlessly with Chameleon, involved understanding and adapting to the complexities of both testbeds.

Future Directions

As I move forward with my master’s thesis working on these challenges, we plan to expand on this work by:

Extending the Chameleon Deployment: We aim to deploy the full version of pos on Chameleon, supporting more complex and larger-scale experiments.
Supporting Complex Experiment Workflows: Future work will focus on handling more intricate and larger datasets, ensuring reproducibility for complex workflows. Only by executing more complex experiments will we be able to thoroughly analyze and compare the differences between executions in pos and the pos deployed on Chameleon, helping us better understand the impact of different testbed environments on experiment outcomes.
Automation: The ultimate goal is to fully automate the process of experiment execution, result documentation, and sharing across testbeds, reducing manual intervention and further enhancing reproducibility.

Reflections

By integrating the RO-Crate standard and deploying pos on the Chameleon testbed, we have made significant steps toward enhancing the reproducibility, accessibility, and portability of network experiments across research platforms. These efforts contribute to more shareable, and replicable research processes in the scientific community.

I am excited about the future work ahead and am grateful for the mentorship and support I received during this project.

Deliverables and Availability

Due to the current non-public status of the pos framework, the code and deliverables are not publicly available at the moment.

Previous Blogs

Make sure to check out my other blogs to see how I started this project and the challenges I faced along the way:

Servus!

AutoAppendix: Towards One-Click reproducibility of high-performance computing experiments

Wed, 04 Sep 2024 00:00:00 +0000

Hi everyone,

I’m excited to wrap up the AutoAppendix project with our final findings and insights. Over the course of this initiative, we’ve worked to assess the reproducibility of artifacts submitted to the SC24 conference and create guidelines that aim to improve the standard for reproducible experiments in the future. Here’s a summary of the project’s final phase and what we’ve learned.

Project Goals and Progress

The goal of AutoAppendix was to evaluate the computational artifacts provided by SC24 paper submissions, focusing on reproducibility. These artifacts accompany papers applying for the “Artifact Replicable” badge in the conference’s reproducibility initiative. Volunteer members of this initiative assess 1-2 paper appendices each. In this project, we analyzed a larger portion of artifacts to gain a broader perspective on potential improvements to the reproducibility process.

We selected 18 out of 45 submissions, focusing on experiments that could be easily replicated on Chameleon Cloud. Our evaluation criteria were based on simplicity (single-node setups) and availability of resources. The final analysis expanded on the earlier midterm findings, shedding light on various challenges and best practices related to artifact reproducibility.

Artifact Evaluation Process

During the evaluation process, we focused on examining the completeness and clarity of the provided artifacts, looking closely at documentation, setup instructions, and the degree of automation.

Our first step was to replicate the environments used in the original experiments as closely as possible using the resources from Chameleon. Many papers included instructions for creating the necessary software environments, but the clarity of these instructions varied significantly across submissions. In some cases, we even encountered challenges in reproducing results due to unclear instructions or missing dependencies, which reinforced the need for standardized, clear documentation as part of the artifact submission process.

We observed that containerization and semi-automated setups (with scripts that break down the experiment into smaller steps) were particularly effective in enhancing the reproducibility of the artifacts. One artifact particularly caught our attention due to its usage of the Chameleon JupyterHub platform, making it reproducible with a single click. This highlighted the potential for streamlining the reproducibility process and showcased that, with sufficient effort and the right tools, experiments can indeed be made replicable by anyone.

Results

Throughout the evaluation, we observed that reproducibility could vary widely based on the clarity and completeness of the documentation and the automation of setup procedures. Artifacts that were structured with clear, detailed steps for installation and execution tended to perform well in terms of replicability.

From our evaluation, we derived a set of guidelines (intended as must-haves) and best practices (recommended) for artifact reproducibility, which can be found below.

Due to our fascination of the potential of the Chameleon JupyterHub platform and its adjacent Trovi artifact repository, we decided to create several templates that can be used as a starting point for authors to make integration of their artifacts with the platform easier. In the design of these templates, we made sure that artifacts structured according to our guidelines are particularly easy to integrate.

Guidelines

Clear Documentation: Provide clear and detailed documentation for the artifact in the corresponding appendix, such that the artifact can be replicated without the need for additional information. For third-party software, it is acceptable to refer to the official documentation.
Software Setup: Clearly specify the versions of all (necessary) software components used in the creation of the artifact. This includes the operating system, libraries, and tools. Particularly, state all software setup steps to replicate the software environment
Hardware Specifications: Specify the hardware the experiment was conducted on. Importantly, state the architecture the experiments are intended to run on, and ensure that provided software (e.g. docker images) are compatible with commonly available architectures.
Expected Results: Always provide the expected outputs of the experiment, especially when run on different hardware, to make it easier for reviewers to assess the success of the replication.
Public Data: Publish the experiment data to a public repository, and make sure the data is available for download to reviewers and readers, especially during the evaluation period. Zenodo is a recommended repository for this purpose.
Automated Reproducibility: For long-running experiments, provide progress output to the reviewer to ensure the experiment is running as expected. Give an idea in the documentation of

how much time long-running steps in the reproduction will take
what the progress output looks like or how frequently it is emitted

Sample Execution: Conduct a sample evaluation with hardware and software as similar as possible to the intended reproduction environment.

Best Practices

Reproduciible Environment: Use a reproducible environment for the artifact. This can come in several forms:

Containerization: Provide instructions for building the environment, or, ideally, provide a ready-to-use image. For example, Docker, Signularity or VirtualBox images can be used for this purpose
Reproducible Builds: Package managers like Nix or Guix have recently spiked in popularity and allow their users to create reproducible environments, matching the exact software versions across different systems.

Partial Automation: It often makes sense to break an experiment down into smaller, more manageable steps. For Linux-based systems, bash scripts are particularly viable for this purpose. We recommend prefixing the scripts for each step with a number, such that the order of execution is clear.
X11 Availability: Usually, reviewers will not have access to a graphical user interface on the system where the artifact is evaluated. If the artifact requires a graphical user interface, provide a way to run the artifact without it. For example, save matplotlib plots to disk instead of showing them with plt.show().
Experiment output: Do not provide output files of the experiment in your artifact, unless explicitly intended. If provided output files are intended for comparison, they should be marked as such (e.g. in their filename). Similarly, any output logs or interactive outputs in Jupyter notebook should not be part of the artifact, but rather be initially generate during the artifact evaluation.

Trovi Templates

Our templates share a common base that features a central configuration file for modifying the Chameleon experiment parameters (such as node type). Building on this base, we provide three templates with sample experiments that each use different environments:

Docker template: This template is designed for containerized experiments and supports nvidia GPUs over the nvidia-container-toolkit integration.
Nix template: Sets up the Nix package manager with a shell.nix file that can be used to configure the environment.
Guix template: Installs the Guix package manager and executes a sample experiment from an existing reproducible paper that hinges on the reproducibility of the software environment.

Conclusion

In summary, the AutoAppendix project has been an insightful journey into the complexities of artifact reproducibility. Our evaluations highlight both the challenges and potential solutions for future reproducibility initiatives. By following these essential guidelines and implementing best practices, we aim for the research community to achieve higher standards of transparency and reliability in scientific research and help to ensure that the results of experiments can be replicated by others.

Thanks for following along with our progress! We’re excited to see the positive impact these findings will have on the research community.

If you are interested in the full project report, you can find it here, together with the Trovi templates.

Final Blogpost: Reproducibility in Data Visualization

Wed, 28 Aug 2024 00:00:00 +0000

Hello everyone!

I’m Triveni, a Master’s student in Computer Science at Northern Illinois University (NIU). I’m excited to share my progress on the OSRE 2024 project Categorize Differences in Reproduced Visualizations focusing on data visualization reproducibility. Working under the mentorship of David Koop, I’ve made some significant strides and faced some interesting challenges.

Reproducibility in data visualization

Reproducibility is crucial in data visualization, ensuring that two visualizations accurately convey the same data. This is essential for maintaining transparency and trust in data-driven decision-making. When comparing two visualizations, the challenge is not just spotting differences but determining which differences are meaningful. Tools like OpenCV are often used for image comparison, but they may detect all differences, including those that do not impact the data’s interpretation. For example, slight shifts in labels might be flagged as differences even if the underlying data remains unchanged, making it challenging to assess whether the visualizations genuinely differ in terms of the information they convey.

A Breakthrough with ChartDetective

Among various tools like ChartOCR and ChartReader, ChartDetective proved to be the most effective. This tool enabled me to extract data from a range of visualizations, including bar charts, line charts, box plots, and scatter plots. To enhance its capabilities, I modified the codebase to capture pixel values alongside the extracted data and store both in a CSV file. This enhancement allowed for a direct comparison of data values and their corresponding pixel coordinates between two visualizations, focusing on meaningful differences that truly impact data interpretation.

Example: Comparing Two Bar Plots with ChartDetective

Consider two bar plots that visually appear similar but have slight differences in their data values. Using ChartDetective, I extracted the data and pixel coordinates from both plots and stored this information in a CSV file. The tool then compared these values to identify any discrepancies.

For instance, in one bar plot, the height of a specific bars were slightly increased. By comparing the CSV files generated by ChartDetective, I was able to pinpoint these differences precisely. The final step involved highlighting these differences on one of the plots using OpenCV, making it clear where visualizations diverged.This approach ensures that only meaningful differences—those that reflect changes in the data—are considered when assessing reproducibility.

ChartDetective: SVG or PDF file of the visualization is uploaded to extract data.

- Data Extraction: Data values along with pixel details are stored in the CSV files.

- Highlighting the differences: Differences are highlighted on one of the plots using OpenCV

Understanding User Perspectives on Reproducibility

To complement the technical analysis, I created a pilot survey to understand how users perceive reproducibility in data visualizations. The survey evaluates user interpretations of two visualizations and explores which visual parameters impact their decision-making. This user-centered approach is crucial because even minor differences in visual representation can significantly affect how data is interpreted and used.

Pilot Survey Example:

Pixel Differences: In one scenario, the height of two bars was altered slightly, introducing a noticeable yet subtle change.

Label Swapping: In another scenario, the labels of two bars were swapped without changing their positions or heights.

Participants will be asked to evaluate the reproducibility of these visualizations, considering whether the differences impacted their interpretation of the data. The goal was to determine which visual parameters—such as bar height or label positioning—users find most critical when assessing the similarity of visualizations.

Future Work and Conclusion

Going forward, I plan to develop a proof of concept based on these findings and implement an extensive survey to further explore the impact of visual parameters on users’ perceptions of reproducibility. Understanding this will help refine tools and methods for comparing visualizations, ensuring they not only look similar but also accurately represent the same underlying data.

Final blog: Automatic reproducibility of COMPSs experiments through the integration of RO-Crate in Chameleon

Thu, 22 Aug 2024 00:00:00 +0000

Introduction

Hello everyone,

I’m Archit from India, an undergraduate student at the Indian Institute of Technology, Banaras Hindu University (IIT BHU), Varanasi. As part of the Automatic Reproducibility of COMPSs Experiments through the Integration of RO-Crate in Chameleon project, my proposal, under the mentorship of Raül Sirvent, aims to develop a service that facilitates the automated replication of COMPSs experiments within the Chameleon infrastructure.

About the Project

The project proposes to create a service that can take a COMPSs crate (an artifact adhering to the RO-Crate specification) and, through analysis of the provided metadata, construct a Chameleon-compatible image for replicating the experiment on the testbed.

Final Product

The basic workflow of the COMPSs Reproducibility Service can be explained as follows:

The service takes the workflow path or link as the first argument from the user.
The program shifts the execution to a separate sub-directory, reproducibility_service_{timestamp}, to store the results from the reproducibility process.
Two main flags are required:
- Provenance flag: If you want to generate the provenance of the workflow via the runcompss runtime.
- New Dataset flag: If you want to reproduce the experiment with a new dataset instead of the one originally used.
If there are any remote datasets, they are fetched into the sub-directory.
The main work begins with parsing the metadata from ro-crate-metadata.json and verifying the files present inside the dataset, as well as any files downloaded as remote datasets. This step generates a status table for the user to check if any files are missing or have modified sizes.

The final step is to transform the compss-command-line.txt and all the paths specified inside it to match the local environment where the experiment will be reproduced. This includes:
- Mapping the paths from the old machine to new paths inside the RO-Crate.
- Changing the runtime to runcompss or enqueue_compss, depending on whether the environment is a SLURM cluster.
- Detecting if the paths specified in the command line are for results, and redirecting them to new results inside the reproducibility_service_{timestamp}\Results directory.
After this, the service prompts the user to add any additional flags to the final command. Upon final verification, the command is executed via Python’s subprocess pipe.

Logging System: All logs related to the Reproducibility Service are stored inside the reproducibility_service_{timestamp}\log.

You can view the basic pseudocode of the service.

Conclusion and Future Work

It’s been a long journey since I started this project, and now it’s finally coming to an end. I have learned a lot from this experience, from weekly meetings with my mentor to working towards long-term goals—it has all been thrilling. I would like to thank the OSRE community and my mentor for providing me with this learning opportunity.

This is only version 1.0.0 of the Reproducibility Service. If I have time from my coursework, I would like to fix any bugs or improve the service further to meet user needs.

However, the following issues still exist with the service and can be improved upon:

Third-party software dependencies: Automatic detection and loading of these dependencies on a SLURM cluster are not yet implemented. Currently, these must be handled manually by the user.
Support for workflows with data_persistence = False: There is no support for workflows where all datasets are remote files.

Deliverables

Reproducibility Service Repository: This repository contains the main service along with guidelines on how to use it. The service will be integrated with the COMPSs official distribution in its next release.
Chameleon Appliance : This is a single-node appliance with COMPSs 3.3.1 installed, so that anyone with access to Chameleon can reproduce experiments.

Previous Blogs

Make sure to check out my other blogs to see how I started this project and the challenges I faced along the way:

Thank you for reading the blog, have a nice day!!

Data Leakage in Applied ML

Tue, 13 Aug 2024 00:00:00 +0000

Hello everyone!

I have been working on reproducing the results from Characterization of Term and Preterm Deliveries using Electrohysterograms Signatures. This paper aims to predict preterm birth using Support Vector Machine with RBF kernel. However, there is a major flaw in the methodology: preprocessing on training and test set. This happens when preprocessing is performed on the entire dataset before splitting it into training and test sets.

Reproducing the published results came with its own challenges, including updating EHG-Oversampling to extract meaningful features from EHG signals and finding optimal hyperparameters for the model. Through our work on reproducing the published results and creating toy example notebooks, we have been able to demonstrate that data leakage leads to overly optimistic measures of model performance and models trained with data leakage fail to generalize to real-world data. In such cases, performance on test set doesn’t translate to performance in the real-world.

Next, I’ll be reproducing the results published in Identification of COVID-19 Samples from Chest X-Ray Images Using Deep Learning: A Comparison of Transfer Learning Approaches.

You can follow my work on the EHG paper here.

Stay tuned for more insights on data leakage and updates on our progress!

Midterm Check-In: Progress on the AutoAppendix Project

Sat, 03 Aug 2024 00:00:00 +0000

Hi all,

I’m happy to share a quick update on the AutoAppendix project as we’re about halfway through. We’ve made some steady progress on evaluating artifacts from SC24 papers, and we’re starting to think about how we can use what we’ve learned to improve the artifact evaluation process in the future.

What We’ve Been Up To

As a quick reminder, the goal of our project is to develop a set of guidelines that researchers can use to improve the reproducibility of their work. We’re focusing on papers from the Supercomputing Conference 2024 that applied for an “Artifact Replicable” badge, and we’re evaluating their artifacts to see how well the experiments can be replicated. As it was difficult to make assumptions about the exact outcomes of the project besides detailed experiment recreation, our main goal of this midterm check-in is to share what insights we have gathered so far and to set the stage for the final outcomes.

Our main task so far has been making a selection of submissions with experiments designed for Chameleon Cloud, or those that could be easily adapted to run on Chameleon. As there were 45 submissions that applied for an “Artifact Replicable” badge, it was not easy to choose which ones to evaluate, but we managed to narrow it down to 18 papers that we thought would be a good fit for our project.

We’ve chosen to focus on papers that do not require special hardware (like a specific supercomputer) or complex network setups, as it would be difficult to generalize the insights from these kinds of experiments. Instead, we’ve been looking at those that require only a single computation node, and could theoretically be run with the available hardware on Chameleon.

Observations and Learning Points

At the moment, we’re about halfway through the evaluation process. So far, we’ve noticed a range of approaches to documenting and setting up computational experiments. Even without looking at the appendices in detail, it’s clear that there’s a lot of room for standardization of the documentation format and software setup, which could make life easier for everyone involved. This particularly applies to software setups, which are often daunting to replicate, especially when there are specific version requirements, version incompatibilities or outright missing dependencies. Since the main goal of this project is to develop a set of guidelines that researchers can use to improve the reproducibility of their work, suggesting a way to deal with software versions and dependencies will be a key part of our results.

We’ve observed that submissions with well-structured and detailed appendices tend to fare better in reproducibility checks. This includes those that utilized containerization solutions like Docker, which encapsulate the computing environment needed to run the experiments and thus eliminates the need for installing specific software packages. It’s these kinds of practices that we think could be encouraged more broadly.

Looking Ahead

The next steps are pretty exciting! We’re planning to use what we’ve learned to draft some guidelines that could help future SC conference submissions be more consistent. This might include templates or checklists that ensure all the necessary details are covered.

Additionally, we’re thinking about ways to automate some parts of the artifact evaluation process. The goal here is to make it less labor-intensive and more objective. A particularly nice way of reproducible artifact evaluation is Chameleon’s JupyterHub interface, which in conglomeration with the Trovi artifact sharing platform makes it easy to share artifacts and allow interested parties to reproduce the experiments with minimal effort. We are thus looking into ways to utilize and contribute to these tools in a way that could benefit the broader research community.

Wrapping Up

That’s it for now! We are working towards getting as many insights as possible from the rest of the artifact evaluations, and hopefully, by the end of this project, we’ll have some solid recommendations and tools to show for it. Thanks for keeping up with our progress, and I’ll be back with more updates as we move into the final stages of our work.

Mid-term Blog: Automatic reproducibility of COMPSs experiments through the integration of RO-Crate in Chameleon

Mon, 29 Jul 2024 00:00:00 +0000

Introduction

Hello everyone I’am Archit from India. An undergraduate student at the Indian Institute of Technology, Banaras Hindu University, IIT (BHU), Varanasi. As part of the Automatic reproducibility of COMPSs experiments through the integration of RO-Crate in Chameleon my proposal under mentorship of Raül Sirvent aims to develop a service that facilitates the automated replication of COMPSs experiments within the Chameleon infrastructure.

About the project:

The project proposes to create a service that will have the capability to take a COMPSs crate (an artifact adhering to the RO-Crate specification) and, through analysis of the provided metadata construct a Chameleon-compatible image for replicating the experiment on the testbed.

Progress

It has been more than six weeks since the ReproducibilityService project began, and significant progress has been made. You can test the actual service from my GitHub repository: ReproducibilityService. Let’s break down what the ReproducibilityService is capable of doing now:

Support for Reproducing Basic COMPSs Experiments: The RS program is now fully capable of reproducing basic COMPSs experiments with no third-party dependencies on any device with the COMPSs Runtime installed. Here’s how it works:
- Getting the Crate: The RS program can accept the COMPSs workflow from the user either as a path to the crate or as a link from WorkflowHub. In either case, it creates a sub-directory for further execution named reproducibility_service_{timestamp} and stores the workflow as reproducibility_service_{timestamp}/Workflow.
- Address Mapping: The ro-crate contains compss_submission_command_line.txt, which is the command originally used to execute the experiment. This command may include many paths such as runcompss flag1 flag2 ... flagn <main_workflow_file.py> input1 input2 ... inputn output. The RS program maps all the paths for <main_workflow_file.py> input1 input2 ... inputn output to paths inside the machine where we want to reproduce the experiment. The flags are dropped as they may be device-specific, and the service asks the user for any new flags they want to add to the COMPSs runtime.
- Verifying Files: Before reproducing an experiment, it’s crucial to check whether the inputs or outputs have been tampered with. The RS program cross-verifies the contentSize from the ro-crate-metadata.json and generates warnings in case of any abnormalities.
- Error Logging: In case of any problems during execution, the std_out and std_err are stored inside reproducibility_service_{timestamp}/log.
- Results: If any results do get generated by the experiment, the RS program stores them inside reproducibility_service_{timestamp}/Results. If we ask for the provenance of the workflow also, the ro-crate thus generated is also stored here only.

Support for Reproducing Remote Datasets: If a remote dataset is specified inside the metadata file, the RS program fetches the dataset from the specified link using wget, stores the remote dataset inside the crate, and updates the path in the new command line it generates.

Challenges and End-Term Goals

Support for DATA_PERSISTENCE_FALSE: The RS program still needs to support crates with dataPersistence set to false. After weeks of brainstorming ideas on how to implement this, we recently concluded that since the majority of DATA_PERSISTENCE_FALSE crates are run on SLURM clusters, and the dataset required to fetch in such a case is somewhere inside the cluster, the RS program will support this case for such clusters. Currently, I am working with the Nord3v2 cluster to further enhance the functionality of ReproducibilityService.
Chameleon Cluster Setup: I have made some progress towards creating a new COMPSs 3.3 Appliance on Chameleon to test the service. However, creating the cluster setup script needed for the service to run on a COMPSs 3.3.1 cluster to execute large experiments has been challenging.
Integrating with COMPSs Repository: After completing the support for dataPersistence false cases, we aim to launch this service as a tool inside the COMPSs repository. This will be a significant milestone in my developer journey as it will be the first real-world project I have worked on, and I hope everything goes smoothly.

Stay tuned for the next blog!!

Enabling VAA Execution: Environment and VAA Preparation and/or Reproducibility for Dynamic Bandwidth Allocation (CONCIERGE)

Sat, 20 Jul 2024 00:00:00 +0000

Hi there!

I am Rafael Sinjunatha Wulangsih, a Telecommunication Engineering graduate from the Bandung Institute of Technology (ITB), Bandung, Indonesia. I’m currently contributing to the “EdgeRep: Reproducing and benchmarking edge analytic systems” project under the mentorship of Yuyang (Roy) Huang and Prof. Junchen Jiang. You can find more details about the project proposal here.

This project addresses the challenges posed by the massive deployment of edge devices, such as traffic or security cameras, in smart cities and other environments. In the previous Edgebench project, the team proposed a solution to dynamically allocate bandwidth and compute resources to video analytic applications (VAAs) running on edge devices. However, that project was limited to a single VAA, which may not represent the diverse applications running on edge devices. Therefore, the main goal of this project, “EdgeRep,” is to diversify the VAAs running on edge devices while utilizing a solution similar to that of the Edgebench project. EdgeRep aims to reproduce state-of-the-art self-adaptive VAAs (with seven candidates) and maintain self-adaptation in these video analytics pipelines. We will implement it ourselves if the video analytics applications do not support self-adaptation.

Halfway Through GSOC: Heterogeneous Graph Neural Networks for I/O Performance Bottleneck Diagnosis

Sat, 20 Jul 2024 00:00:00 +0000

Hello, I’m Mahdi Banisharifdehkordi, a Ph.D. student in Computer Science at Iowa State University. I’m currently working on the AIIO / Graph Neural Network project under the guidance of Bin Dong and Suren Byna. Our project focuses on enhancing the AIIO framework to automatically diagnose I/O performance bottlenecks in high-performance computing (HPC) systems using Graph Neural Networks (GNNs).

Project Overview

Our primary goal is to tackle the persistent issue of I/O bottlenecks in HPC applications. Identifying these bottlenecks manually is often labor-intensive and prone to errors. By integrating GNNs into the AIIO framework, we aim to create an automated solution that can diagnose these bottlenecks with high accuracy, ultimately improving the efficiency and reliability of HPC systems.

Progress and Challenges

Over the past few weeks, my work has been centered on developing a robust data pre-processing pipeline. This pipeline is crucial for converting raw I/O log data into a graph format suitable for GNN analysis. The data pre-processing involves extracting relevant features from Darshan I/O logs, which include job-related information and performance metrics. One of the main challenges has been dealing with the heterogeneity and sparsity of the data, which can affect the accuracy of our models. To address this, we’ve focused on using correlation analysis to identify and select the most relevant features, ensuring that the dataset is well-structured and informative for GNN processing.

We’ve also started constructing the GNN model. The model is designed to capture the complex relationships between different I/O operations and their impact on system performance. This involves defining nodes and edges in the graph that represent job IDs, counter types, and their values. We explored different graph structures, including those that focus on counter types and those that incorporate more detailed information. While more detailed graphs offer better accuracy, they also require more computational resources.

Current Achievements

Data Pre-processing Pipeline: We have successfully developed and tested the pipeline to transform Darshan I/O logs into graph-structured data. This was a significant milestone, as it sets the foundation for all subsequent GNN modeling efforts.
GNN Model Construction: The initial version of our GNN model has been implemented. This model is now capable of learning from the graph data and making predictions about I/O performance bottlenecks.
Correlation Analysis for Graph Structure Design: We have used correlation analysis on the dataset to understand the relationships between I/O counters. This analysis has been instrumental in designing a more effective graph structure, helping to better capture the dependencies and interactions critical for accurate performance diagnosis.

Training for Different Graph Structures: We are currently training our model using various graph structures to determine the most effective configuration for accurate I/O performance diagnosis. This ongoing process aims to refine our approach and improve the model’s predictive accuracy.

Next Steps

Looking ahead, we plan to focus on several key areas:

Refinement and Testing: We’ll continue refining the GNN model, focusing on improving its accuracy and efficiency. This includes experimenting with different graph structures and training techniques.
SHAP Analysis: To enhance the interpretability of our model, we’ll incorporate SHAP (SHapley Additive exPlanations) values. This will help us understand the contribution of each feature to the model’s predictions, making it easier to identify critical factors in I/O performance.
Documentation and Community Engagement: As we make progress, we’ll document our methods and findings, sharing them with the broader community. This includes contributing to open-source repositories and engaging with other researchers in the field.

This journey has been both challenging and rewarding, and I am grateful for the support and guidance from my mentors and the community. I look forward to sharing more updates as we continue to advance this exciting project.

Reproducibility in Data Visualization

Fri, 19 Jul 2024 00:00:00 +0000

Hello everyone!

Initial Approach and Challenges

I began my work by comparing original visualizations with reproduced ones using OpenCV for pixel-level comparison. This method helped highlight structural differences but also brought to light some challenges. Different versions of libraries rendered visualizations slightly differently, causing minor positional changes that didn’t affect the overall message but were still flagged as discrepancies.

To address this, I experimented with machine learning models like VGG16, ResNet, and Detectron2. These models are excellent for general image recognition but fell short for our specific needs with charts and visualizations. The results were not as accurate as I had hoped, primarily because these models aren’t tailored to handle the unique characteristics of data visualizations.

Shifting Focus to Chart-Specific Models

Recognizing the limitations of general ML models, I shifted my focus to chart-specific models like ChartQA, ChartOCR, and ChartReader. These models are designed to understand and summarize chart data, making them more suitable for our goal of comparing visualizations based on the information they convey.

Generating Visualization Variations and Understanding Human Perception

Another exciting development in my work has been generating different versions of visualizations. This will allow me to create a survey to collect human categorization of visualizations. By understanding how people perceive differences whether it’s outliers, shapes, data points, or colors. We can gain insights into what parameters impact human interpretation of visualizations.

Next Steps

Moving forward, I’ll continue to delve into chart-specific models to refine our comparison techniques. Additionally, the survey will provide valuable data on human perception, which can be used to improve our automated comparison methods. By combining these approaches, I hope to create a robust framework for reliable and reproducible data visualizations.

I’m thrilled about the progress made so far and eager to share more updates with you all. Stay tuned for more insights and developments on this exciting journey!

Data leakage in applied ML: reproducing examples from genomics, medicine and radiology

Mon, 01 Jul 2024 00:00:00 +0000

Hello everyone! I’m Shaivi Malik, a computer science and engineering student. I am thrilled to announce that I have been selected as a Summer of Reproducibility Fellow. I will be contributing to the Data leakage in applied ML: reproducing examples of irreproducibility project under the mentorship of Fraida Fund and Mohamed Saeed. You can find my proposal here.

This summer, we will reproduce studies from medicine, radiology and genomics. Through these studies, we’ll explore and demonstrate three types of data leakage:

Pre-processing on train and test sets together
Model uses features that are not legitimate
Feature selection on training and test sets

For each paper, we will replicate the published results with and without the data leakage error, and present performance metrics for comparison. We will also provide explanatory materials and example questions to test understanding. All these resources will be bundled together in a dedicated repository for each paper.

This project aims to address the need for accessible educational material on data leakage. These materials will be designed to be readily adopted by instructors teaching machine learning in a wide variety of contexts. They will be presented in a clear and easy-to-follow manner, catering to a broad range of backgrounds and raising awareness about the consequences of data leakage.

Stay tuned for updates on my progress! You can follow me on GitHub and watch out for my upcoming blog posts.

Assessing the Computational Reproducibility of Jupyter Notebooks

Tue, 18 Jun 2024 00:00:00 +0000

Like so many authors before me, my first reproducibility study and very first academic publication started with the age-old platitude, “Reproducibility is a cornerstone of the scientific method.” My team and I participated in a competition to replicate the performance improvements promised by a paper presented at last year’s Supercomputing conference. We weren’t simply re-executing the same experiment with the same cluster; instead, we were trying to confirm that we got similar results on a different cluster with an entirely different architecture. From the very beginning, I struggled to wrap my mind around the many reasons for reproducing computational experiments, their significance, and how to prioritize them. All I knew was that there seemed to be a consensus that reproducibility is important to science and that the experience left me with more questions than answers.

Not long after that, I started a job as a research software engineer at Purdue University, where I worked heavily with Jupyter Notebooks. I used notebooks and interactive components called widgets to create a web application, which I turned into a reusable template. Our team was enthusiastic about using Jupyter Notebooks to quickly develop web applications because the tools were accessible to the laboratory researchers who ultimately needed to maintain them. I was fortunate to receive the Better Scientific Software Fellowship to develop tutorials to teach others how to use notebooks to turn their scientific workflows into web apps. I collected those and other resources and established the Jupyter4Science website, a knowledgebase and blog about Jupyter Notebooks in scientific contexts. That site aims to improve the accessibility of research data and software.

There seemed to be an important relationship between improved accessibility and reuse of research code and data and computational reproducibility, but I still had trouble articulating it. In pursuit of answers, I moved to sunny Arizona to pursue a History and Philosophy of Science degree. My research falls at the confluence of my prior experiences; I’m studying the reproducibility of scientific Jupyter Notebooks. I have learned that questions about reproducibility aren’t very meaningful without considering specific aspects such as who is doing the experiment and replication, the nature of the experimental artifacts, and the context in which the experiment takes place.

I was fortunate to have found a mentor for the Summer of Reproducibility, Tanu Malik, who shares the philosophy that the burden of reproducibility should not solely rest on domain researchers who must develop other expertise. She and her lab have developed FLINC, an application virtualization tool that improves the portability of computational notebooks. Her prior work demonstrated that FLINC provides efficient reproducibility of notebooks and takes significantly less time and space to execute and repeat notebook execution than Docker containers for the same notebooks. My work will expand the scope of this original experiment to include more notebooks to FLINC’s test coverage and show robustness across even more diverse computational tasks. We expect to show that infrastructural tools like FLINC improve the success rate of automated reproducibility.

I’m grateful to both the Summer of Reproducibility program managers and my research mentor for this incredible opportunity to further my dissertation research in the context of meaningful collaboration.

Exploring Reproducibility in High-Performance Computing Publications with the Chameleon Cloud

Sat, 15 Jun 2024 00:00:00 +0000

Hello everyone,

I’m Klaus Kraßnitzer and am currently finishing up my Master’s degree at the Technical University of Vienna. This summer, under the guidance of Sascha Hunold, I’m excited to dive into a project that aims to enhance reproducibility in high-performance computing research.

Our project, AutoAppendix, focuses on the rigorous evaluation and potential automation of Artifact Description (AD) and Artifact Evaluation (AE) appendices from publications to this year’s Supercomputing Conference (SC). Due to a sizeable chunk of SC publications utlizing Chameleon Cloud, a platform known for its robust and scalable experiment setups, the project will be focused on and creating guidelines (and potentially, software tools) that users of the Chameleon Cloud can utilize to make their research more easily reproducible. You can learn more about the project and read the full proposal here.

My fascination with open-source development and research reproducibility was sparked during my undergraduate studies and further nurtured by my role as a teaching assistant. Hands-on projects and academic courses, like those in chemistry emphasizing precise experimental protocols, have deeply influenced my approach to computational science.

Project Objectives

Analyze and Automate: Assess current AE/AD appendices submitted for SC24, focusing on their potential for automation.
Develop Guidelines: Create comprehensive guidelines to aid future SC conferences in artifact submission and evaluation.
Build Tools (Conditionally): Develop automation tools to streamline the evaluation process.

The ultimate aim of the project is to work towards a more efficient, transparent, and reproducible research environment, and I’m committed to making it simpler for researchers to demonstrate and replicate scientific work. I look forward to sharing insights and progress as we move forward.

Thanks for reading, and stay tuned for more updates!

Reproducibility in Data Visualization

Fri, 14 Jun 2024 00:00:00 +0000

Hello! My name is Arya Sarkar and I will be contributing to the research project titled Reproducibility in Data Visualization, with a focus on investigating and coming up with novel solutions to capture both static and dynamic visualizations from different sources. My project is titled Investigate Solutions for Capturing Visualizations and I am mentored by Prof. David Koop.

Open-source has always piqued my interest, but often I found it hard to get started in as a junior in university. I spent a lot of time working with data visualizations but had never dived into the problem of reproducibility before diving into this project. When I saw a plethora of unique and interesting projects during the contribution phase of OSRE-2024, I was confused at the beginning. However, the more I dived into this project and understood the significance of research in this domain to ensure reproducibility, the more did I find myself getting drawn towards it. I am glad to be presented this amazing opportunity to work in the Open-source space as a researcher in reproducibility.

This project aims to investigate, augment, and/or develop solutions to capture visualizations that appear in formats including websites and Jupyter notebooks. We have a special interest on capturing the state of interactive visualizations and preserving the user interactions required to reach a certain visualization in an interactive environment to ensure reproducibility.My proposal can be viewed here!

Heterogeneous Graph Neural Networks for I/O Performance Bottleneck Diagnosis

Fri, 14 Jun 2024 00:00:00 +0000

Hello, I am Mahdi Banisharifdehkordi, a Ph.D. student in Computer Science at Iowa State University, specializing in Artificial Intelligence. This summer, I will be working on the project AIIO / Graph Neural Network under the mentorship of Bin Dong and Suren Byna.

High-Performance Computing (HPC) applications often face performance issues due to I/O bottlenecks. Manually identifying these bottlenecks is time-consuming and error-prone. My project aims to enhance the AIIO framework by integrating a Graph Neural Network (GNN) model to automatically diagnose I/O performance bottlenecks at the job level. This involves developing a comprehensive data pre-processing pipeline, constructing and validating a tailored GNN model, and rigorously testing the model’s accuracy using test cases from the AIIO dataset.

Through this project, I seek to provide a sophisticated, AI-driven approach to understanding and improving I/O performance in HPC systems, ultimately contributing to more efficient and reliable HPC applications.

Reproducibility in Data Visualization

Thu, 13 Jun 2024 00:00:00 +0000

Hello everyone!

I’m Triveni, a Master’s student in Computer Science at Northern Illinois University (NIU). When I came across the OSRE 2024 project Categorize Differences in Reproduced Visualizations focusing on data visualization reproducibility, I was excited because it aligned with my interest in data visualization. While my initial interest was in geospatial data visualization, the project’s goal of ensuring reliable visualizations across all contexts really appealed to me. So, I actively worked on understanding the project’s key concepts and submitted my proposal My proposal can be viewed here under mentorship of David Koop to join the project.

Early Steps and Challenges:

I began working on the project on May 27th, three weeks ago. Setting up the local environment initially presented some challenges, but I persevered and successfully completed the setup process. The past few weeks have been spent exploring the complexities of reproducibility in visualizations, particularly focusing on capturing the discrepancies that arise when using different versions of libraries to generate visualizations. Working with Dr. David Koop as my mentor has been an incredible experience. Our weekly report meetings keep me accountable and focused. While exploring different algorithms and tools to compare visualizations can be challenging at times, it’s a fantastic opportunity to learn cutting-edge technologies and refine my problem-solving skills.

Looking Ahead:

I believe this project can make a valuable contribution to the field of reproducible data visualization. By combining automated comparison tools with a user-centric interface, we can empower researchers and data scientists to make informed decisions about the impact of visualization variations. In future blog posts, I’ll share more about the specific tools and techniques being used, and how this framework will contribute to a more reliable and trustworthy approach to data visualization reproducibility.

Stay tuned!

I’m excited to embark on this journey and share my progress with all of you.

Automatic reproducibility of COMPSs experiments through the integration of RO-Crate in Chameleon

Wed, 12 Jun 2024 00:00:00 +0000

About the project:

How it all started

This journey began amidst our college’s cultural fest, in which I was participating, just 15 days before the proposal submission deadline. Many of my friends had been working for months to get selected for GSoC. I didn’t think I could participate this year because I was late, so I thought, “Better luck next year.” But during the fest, I kept hearing about UC OSPO and that a senior had been selected within a month. So, I was in my room when my friend told me, “What’s the worst that can happen? Just apply,” and so I did. I chose this project and wrote my introduction in Slack without knowing much. After that, it’s history. I worked really hard for the next 10 days learning about the project, making the proposal, and got selected.

First few weeks:

I started the project a week early from June 24, and it’s been two weeks since. The start was a bit challenging since it required setting up a lot of things on my local machine. For the past few weeks, the majority of my time has been dedicated to learning about COMPSs, RO-Crate, and Chameleon, the three technologies this project revolves around. The interaction with my mentor has also been great. From the weekly report meetings to the daily bombardment of doubts by me, he seems really helpful. It is my first time working with Chameleon or any cloud computing software, so it can be a bit overwhelming sometimes, but it is getting better with practice.

Stay tuned for progress in the next blog!!

FSA: Benchmarking Fail-Slow Algorithms

Wed, 12 Jun 2024 00:00:00 +0000

Hi everyone! I’m Xikang, a master’s CS student at UChicago. As a part of FSA benchmarking Project, I’m thrilled to be a contributor to OSRE 2024, collaborating with Kexin Pei, the assistant Professor of Computer Science at Uchicago and Ruidan, a talented PhD student at UChicago.

This summer, I will focus on integrating some advanced ML into our RAID slowdown analysis. Our aim is to assess whether LLMs can effectively identify RAID slowdown issues and to benchmark their performance against our current machine learning algorithms. We will test the algorithms on Chameleon Cloud and benchmark them.

Additionally, we will explore optimization techniques to enhance our pipeline and improve response quality. We hope this research will be a start point for future work, ultilizing LLMs to overcome the limitations of existing algorithms and provide a comprehensive analysis that enhances RAID and other storage system performance.

I’m excited to work with all of you and look forward to your suggestions. if you are interested, Here is my proposal

ML-Powered Problem Detection in Chameleon

Wed, 12 Jun 2024 00:00:00 +0000

Hello, I am Syed Mohammad Qasim, a PhD candidate in Electrical and Computer Engineering at Boston University. I will be spending my summer working on the project ML-Powered Problem Detection in Chameleon under the mentorship of Ayse Coskun and Michael Sherman.

Currently, Chameleon Cloud monitors sites at the Texas Advanced Computing Center (TACC), University of Chicago, Northwestern University, and Argonne National Lab. They collect metrics using Prometheus at each site and feed them all to a central Mimir cluster. All the logs go to a central Loki, and Grafana is used to visualize and set alerts. Chameleon currently collects around 3000 metrics. Manually reviewing and setting alerts on them is time-consuming and labor-intensive. This project aims to help Chameleon operators monitor their systems more effectively and improve overall reliability by creating an anomaly detection service that can augment the existing alerting framework.

Improving Video Applications' Accuracy by Enabling The Use of Concierge

Mon, 31 Jul 2023 00:00:00 +0000

Introduction

Hello, it’s me again, Faishal, a SoR project contributor for the edgebench project. For the past these two months, my mentors and I have been working on improving the performance of our system. In this report, I would like to share with you what we have been working on.

Motivation

Edgebench is a project that focuses on how to efficiently distribute resource (bandwidth and cpu usage) across several video applications. Nowaday’s video applications process its data or video on a server or known as edge computing, hence bandwidth or compute unit may be the greatest concern if we talk about edge computing in terms of WAN, because it is strictly limited.

Consider the following case, suppose we have 3 video applications running that is located in several areas across a city. Suppose the total bandwidth allocated to those 3 video applications is also fixed. Naively, we may divide the bandwidth evenly to every camera in the system. We may have the following graph of the allocated bandwidth overtime.

They are fixed and won’t change. However, every video application has its own characteristic to deliver such a good result or f1-score. It is our task to maintain high average f1-score. Therefore we need to implement a new solution which is accuracy-oriented. The accuracy-gradient[1] comes into this.

System Design

On our current design, we need a resource allocator, namely concierge. This concierge determines how much bandwidth is needed for every video application (vap) in the system. Concierge will do the allocation at a certain time interval that has been determined before. This process is called profiling, on this process, the concierge will first ask every vap to calculate their f1-score at a certain video segment when the bandwidth is added by profile_delta. Then the difference of this f1-score is substracted by the default f1-score, namely f1_diff_high. After that, the concierge will ask to reduce its bandwidth by profile_delta and do the same process as before, this result will be named f1_diff_low. Those two results will be sent to the concierge for the next step. On the concierge, there will be sensitivity calculation, where sensitivity is

This equation tells us which video application will give us the best f1-score improvement if we add more bandwidth to one vap while reducing other’s bandwidth. From this, we will optimize and the concierge will give the bandwdith to the one with the highest sensitivity and take the bandwidth from the app with the lowest sensitvity.

Results

As aforementioned, our main objective is to improve the accuracy. However, there are two parameters that will be taken into account which are improvement and the overhead of its improvement. We first choose 3 dds apps[2] that we think will be our ideal case. The following graphs show the profile of our ideal case

We can see that two of them have high sensitivity especially on lower bandwidth and one of them has low sensitivity. This is a perfect scenario since we may sacrifice one’s bandwidth and give it to the app that has the highest sensitivity at that iteration. We will do the experiment under the following setup

DATASETS=("" "uav-1" "coldwater" "roppongi")
MAX_BW=1200
PROFILING_DELTA=80
MI=5

That setup block tells us we will use the total bandwith of 1200 kbps, that means at first we will distribute the bandwidth evenly (400 kbps). The profiling_delta will be 80 kbps and profiling interval (MI) will be 5 seconds.

Mode	DDS (uav-1)	DDS (coldwater)	DDS (roppongi)	Average
Baseline	0.042	0.913	0.551	0.502
Concierge	0.542	0.854	0.495	0.63 (+25.5%)

From the result, we managed to improve the average f1-score by 0.1 or 25.5%. This is obviously a very good result. There are a total of 10 videos in our dataset, for the next experiment, we first will generate 6 combinations of dds apps. Noted that for each combination, one video will be uav-1 since we know that it has the highest sensitivity. We will the experiment with 4 bandwidth scenarios (1200, 1500, 1800, 2100) in kbps.

The left figure depicts the average improvement of the concierge. Here we can see that the improvement decreases when the total bandwidth increases. The reason behind this is at a higher bandwidth, the sensitivity tends to be closer to 0 and the concierge won’t do any allocation. Overall, this confirms our previous result that with the help of uav-1, the concierge can improve the f1-score up to 0.1. The next experiment is to randomly pick 3 dds videos out of 10 videos that will be generated 10 times. We would like to see how it perfoms without any help of uav-1.

From the result, we still managed to get the improvement. However, it seems that average improvement decreases compared to the previous one. The reason of this phenomenon will be discussed later.

Overhead Measurement

From the graph above, each graph represents the total bandwidth used. In this experiment, it is clearly known that the lower MI leads to higher overhead since there would be more profiling process than higher MI. From the 4 graphs above, it can be known that there would be a significant trade off if we lower the MI since the improvement itself is not highly significant. The highest improvement is at 1200kbps. Hence, for higher bandwidth, there is no need to do the profiling too often

Discussion

There are some limitations of our current design. If we have a look at box-plot in figure 5 above, we can see that there is some combinations where the improvement is negative.

The figure above depicts the profiling process from the segment 6 to determine the bandwidth used at segment 7. Here we can see that the f1-score at that bandwidth for (jakarta) drops significantly. Our current design cannot address this issue yet since we only consider current video segment. There is a need to not only look at current segment, but also the previous and the future segment should be taken into account as well.

Regarding the overhead, we are aware that 50% overhead is still considered bad. We might as well try the dynamic MI or skip the profiling for certain video if not neccesarry.

Conclusion

Regardless the aforementioned limitations, this report shows that the concierge is generally capable of giving an f1-score improvement. The update of the next will be shown in the final report later.

References

[1] https://drive.google.com/file/d/1U_o0IwYcBNF98cb5K_h56Nl-bQJSAtMj/view?usp=sharing
[2] Kuntai Du, Ahsan Pervaiz, Xin Yuan, Aakanksha Chowdhery, Qizheng Zhang, Henry Hoffmann, and Junchen Jiang. 2020. Server-driven video streaming for deep learning inference. In Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication. 557–570.

Public Artifact Data and Visualization

Sat, 17 Jun 2023 00:00:00 +0000

Hello! As part of the Public Artifact Data and Visualization our proposals (proposal from Jiayuan Zhu and proposal from Krishna Madhwani) under the mentorship of Anjo Vahldiek-Oberwagner aims to design a system that allows researchers to conveniently record and compare the environmental information, such as CPU utilization, of different iterations and versions of code during an experiment.

In academic experiments, there is often a need to compare results and performance between different iterations and versions. This comparative analysis helps researchers evaluate the impact of different experimental parameters and algorithms on the results and enables them to optimize experimental design and algorithm selection. However, to conduct effective comparative analysis, it is essential to record and compare environmental information, alongside the experimental data. This information provides valuable insights into the factors that may influence the observed outcomes.

Through this summer, we aim to develop a system that offers a streamlined interface, enabling users to effortlessly monitor their running programs using simple command-line commands. Moreover, our system will feature a user-friendly dashboard where researchers can access historical runtime information and visualize comparisons between different iterations. The dashboard will present comprehensive graphs and charts, facilitating the analysis of trends and patterns in the environmental data.

Reproduce and benchmark self-adaptive edge applications under dynamic resource management

Tue, 30 May 2023 00:00:00 +0000

Hello there!

I am Faishal Zharfan, a senior year student studying Telecommunication Engineering at Bandung Institute of Technology (ITB) in Bandung, Indonesia, my proposal. I’m currently part of the Edgebench under the mentorship of Yuyang Huang. The main goal of this project is to be able to reproduce and benchmark self-adaptive video applications using the proposed solution.

The topic that I’m currently working on is “Reproduce and benchmark self-adaptive edge applications under dynamic resource management” or known as edgebench is led by Prof. Junchen Jiang and Yuyang Huang. Edgebench is a project that focuses on how to efficiently distribute resource (bandwidth and cpu usage) across several video applications. Nowaday’s video applications process its data or video on a server or known as edge computing, hence bandwidth or compute unit may be the greatest concern if we talk about edge computing in terms of WAN, because it is strictly limited. We may distribute the bandwidth evenly across the cameras, however the needs of bandwidth/compute unit of each camera is different. Therefore we need another solution to tackle this problem, the solution proposed recently is called “accuracy gradient”, with this solution, we can tell how much of one application needs the bandwidth on a certain time to achieve higher accuracy. The goal of this solution is to allocate more bandwidth to the apps which has the higher f1-score improvement and reduce the other which doesn’t have a significant diminishment of f1-score. Henceforth, in the end we would have a higher total f1-score.

Throughout this summer, we have planned to implement the “accuracy gradient” and test several baselines to be compared with the solution. As for the implementation, we are currently implementing the latency measurement. We are aware that there is an overhead over this solution, therefore the latency should be taken into account.