edge | UCSC OSPO

Reconfigurable and Placement-Aware Replication for Edge Systems

Sat, 31 Jan 2026 00:00:00 +0000

Project Description

Topics: Distributed systems
Skills: Rust, Java, Go, Python, Bash scripting, Linux, Docker.
Difficulty: Hard
Size: Large (350 hours)
Mentors: Fadhil I. Kurnia

Modern replicated systems are typically evaluated under static configurations with fixed replica placement. However, real-world edge deployments are highly dynamic: workloads shift geographically, edge nodes join or fail, and latency conditions change over time. Our existing testbed provides reproducible evaluation for replicated systems but lacks support for dynamic reconfiguration and adaptive edge placement policies.

This project extends the existing open testbed to support:

Dynamic Replica Reconfiguration
- Membership changes (add/remove replicas)
- Leader migration and shard movement
- Online reconfiguration cost measurement (latency spikes, recovery overhead, state transfer cost)
Edge-Aware Placement Policies
- Demand-aware placement based on geographic workload skew
- Latency-aware and bandwidth-aware replica selection
- Comparison of static vs. adaptive placement strategies
- Evaluation under real-world latency matrices (e.g., US metro-level or cloud region traces)
What-if Simulation Framework
- Replay workload traces with time-varying demand
- Simulate hundreds of edge sites with realistic network conditions
- Quantify trade-offs between consistency, availability, reconfiguration overhead, and cost

The outcome will be an open-source framework that enables researchers to evaluate not only steady-state replication performance, but also how systems behave under churn, scaling events, and demand shifts. They are central challenges in real edge environments.

Expected Deliverables

Reconfiguration abstraction layer (API for membership & placement changes)
Placement policy plugin framework (k-means, facility-location heuristics, latency-minimizing, cost-aware)
Trace-driven dynamic workload engine
Public benchmark scenarios and reproducible experiment scripts
Artifact-ready documentation and evaluation report

[Final Blog] Distrobench: Distributed Protocol Benchmark

Sat, 30 Aug 2025 00:00:00 +0000

Introduction

This is the final blog for our contribution to the Open Testbed for Reproducible Evaluation of Replicated Systems at the Edges project under the mentorship of Fadhil Kurnia for the OSRE program.

Distrobench is a framework to evaluate the performance of replication/coordination protocols for distributed systems. This framework standardizes benchmarking by allowing different protocols to be tested under an identical workload, and supports both local and remote deployment of the protocols. The frameworks tested are restricted under a key-value store application and are categorized under different consistency models, programming languages, and persistency (whether the framework stores its data in-memory or on-disk).

All the benchmark results are stored in a data.json file which can be viewed through a webpage we have provided. A user can clone the git repository, benchmark different protocols on their own machine or in a cluster of remote machines, then view the results locally. We also provided a webpage that shows our own benchmark results which ran on 3 Amazon EC2 t2.micro instances.

How to run a benchmark on Distrobench

Before running a benchmark using Distrobench, the protocol that will be benchmarked must first be built. This is to allow the script to initialize the protocol instance for local benchmark or to send the binaries into the remote machine. The remote machine running the protocol does not need to store the code for the protocol implementations, but does require dependencies for running that specific protocol such as Java, Docker, rsync, etc. The following are commands used to build the ailidani/paxi project which does not need any additional dependency to be run inside of a remote machine:

# Clone the Distrobench repository 
git clone git@github.com:fadhilkurnia/distro.git

# Clone the Paxi repository and build the binary 
cd distro/sut/ailidani.paxi
git clone git@github.com:ailidani/paxi.git
cd paxi/bin/
./build.sh

# Go back to the Distrobench root directory & run python script 
cd ../../../..
python main.py

By default, the script will start 3 local instances of a Paxi protocol implementation that the user chose through the CLI. The user can modify the number of running instances and whether or not it is deployed locally or in a remote machine by changing the contents of the .env file inside the root directory. The following is the contents of the default .env file:

NUM_OF_NODES=3

SSH_KEY=ssh-key.pem
REMOTE_USERNAME=ubuntu

PUBLIC_IP1=127.0.0.1
PUBLIC_IP2=127.0.0.1
PUBLIC_IP3=127.0.0.1

PRIVATE_IP1=127.0.0.1
PRIVATE_IP2=127.0.0.1
PRIVATE_IP3=127.0.0.1

CLIENT_IP=127.0.0.1

OUTPUT=data.json

When running a remote benchmark, a ssh-key should also be added in the root directory to allow the use of ssh and rsync from within the python script. All machines must also allow TCP connection through port 2000-2300 and port 3000-3300 because that would be the port range for communication between the running instances as well as for the YCSB benchmark. Running the benchmark requires the use of at least 3 nodes because it is the minimum number of nodes to support most protocols (5 nodes recommended).

To view the benchmark result in the web page locally, move data.json into the docs/ directory and run python -m http.server 8000. The page is then accessible through http://localhost:8000.

Deep dive on how Distrobench works

The following is the project structure of the Distrobench repository:

distro/
├── main.py // Main python script for running benchmark
├── data.json // Output file for main.py
├── README.md
├── .env // Config for running the benchmark
├── docs/
│ ├── index.html // Web page to show benchmark results
│ ├── data.json // Output file displayed by web page
│ ├── README.md
├── src/
│ ├── utils/
│ └── ycsb/ // Submodule for YCSB
└── sut/ // Systems under test
 ├── ailidani.paxi/
 └── run.py // Protocol-specific benchmark script called by main.py
 ├── apache.zookeeper/
 ├── etcd-io.etcd/
 ├── fadhilkurnia.xdn/
 ├── holipaxos-artifect.holipaxos/
 ├── otoolep.hraftd/
 └── tikv.tikv/

main.py will automatically detect directories inside sut/ and will call the main function inside run.py. The following is the structure of run.py written in pseudocode style:

FUNCTION main(run_ycsb: Function, nodes: List of Nodes, ssh: Dictionary)
 node_data = map_ip_port(nodes)

 SWITCH user\_input
 CASE 0:
 start()
 RETURN
 CASE 1:
 stop()
 RETURN
 CASE 2:
 client_data = []
 FOR EACH item IN node_data
 ADD item.client_addr TO client_data
 END FOR
 run_ycsb(client_data)
 RETURN
 END SWITCH
END FUNCTION

FUNCTION start()
 // Start the protocol instance (local or remote)
END FUNCTION

FUNCTION stop()
 // Stop the protocol instance (local or remote)
END FUNCTION

FUNCTION map_ip_port(nodes: List of Nodes) -> List of Dictionary
 // Generate port numbers based on the protocol requirements
END FUNCTION

The .env file provides both public and private IP addresses to add versatility when running a remote benchmark. Private IP is used for communication between remote machines if they are under the same network group. In the case of our own benchmark, four t2.micro EC2 instances are deployed under the same network group. Three of them are used to run the protocol and the fourth machine acts as the YCSB client. It is possible to use your local machine as the YCSB client instead of through another remote machine by specifying CLIENT_IP in the .env file as 127.0.0.1. The decision to use the remote machine as the YCSB client is made to reduce the impact of network latency between the client and the protocol servers to a minimum.

The main tasks of the start() function can be broken down into the following:

Generate custom configuration files for each remote machine instance (May differ between implementations. Some implementations does not require a config file because they support flag parameters out of the box, others require multiple configuration files for each instance)
rsync binaries into the remote machine (If running a remote benchmark)
Start the instances

The stop() function is a lot simpler since it only kills the process running the protocol and optionally removes the copied binary files in the remote machine. The run_ycsb() function passed onto run.py is defined in main.py and currently supports two types of workload:

Read-heavy: A single-client workload with 95% read and 5% update (write) operations
Update-heavy: A single-client workload with 50% read and 50% update (write) operations

A new workload can be added inside the src/ycsb/workloads directory. Both workloads above only run 1000 operations for the benchmark which may not be enough operations to properly evaluate the performance of the protocols. It should also be noted that while YCSB does support a scan operation, it is never used for our benchmark because none of our tested protocols implement this operation.

How to implement a new protocol in Distrobench

Adding a new protocol to distrobench requires implementing two main components: a Python integration script (run.py) and a YCSB database binding for benchmarking.

Create the protocol directory structure
- Create a new directory under sut/ using format yourrepo.yourprotocol/.
Write run.py integration
- Put script inside yourrepo.yourprotocol/ directory
- Must have the main(run_ycsb, nodes, ssh) function.
- Add start/stop/benchmark menu options
- Handle local (127.0.0.1) and remote deployment
Create YCSB client
- Make Java class extending YCSB’s DB class
- Put inside src/ycsb/yourprotocol/src/main/java/site/ycsb/yourprotocol
- Implement read(), insert(), update(), delete() methods
Register your client
- Register your client to src/pom.xml, src/ycsb/bin/binding.properties, and src/ycsb/bin/ycsb.
Build and test
- Run cd src/ycsb && mvn clean package
- Run python main.py
- Select your protocol and test it

Protocols which have been tested

Distrobench has tested 20 different distributed consensus protocols across 7 different implementation projects.

ailidani/paxi
- Programming Language : Go
- Persistency : On-Disk
- Consistency Model : Linearizability, Eventual
- Protocol : Paxos, EPaxos, SDpaxos, WPaxos, ABD, chain, VPaxos, WanKeeper, KPaxos, Paxos_groups, Dynamo, Blockchain, M2Paxos, HPaxos.
apache/zookeeper
- Programming Language : Java
- Persistency : On-Disk
- Consistency Model : Linearizability + Primary Integrity
- Protocol : Zookeeper implements ZAB (Zookeper Atomic Broadcast)
etcd-io/etcd
- Programming Language : Go
- Persistency : On-Disk
- Consistency Model : Linearizability
- Protocol : Raft
fadhilkurnia/xdn
- Programming Language : Java, Rust
- Persistency : On-Disk
- Consistency Model : Linearizability, Linearizability + Primary Integrity
- Protocol : Gigapaxos
Zhiying12/holipaxos-artifect
- Programming Language : Go, Rust
- Persistency : On-Disk
- Consistency Model : Linearizability
- Protocol : Holipaxos, Omnipaxos, Multipaxos
otoolep/hraftd
- Programming Language : Go
- Persistency : On-Disk
- Consistency Model : Linearizability
- Protocol : Raft
tikv/tikv
- Programming Language : Rust
- Persistency : On-Disk
- Consistency Model : Linearizability
- Protocol : Raft

Challenges

When attempting to benchmark HoliPaxos, the main challenge was handling versions that rely on persistent storage with RocksDB. Since some implementations are written in Go, it was necessary to find compatible versions of RocksDB and gRocksDB (for example, RocksDB 10.5.1 works with gRocksDB 1.10.2). Another difficulty was that RocksDB is resource-intensive to compile, and in our project we did not have sufficient CPU capacity on the remote machine to build RocksDB and run remote benchmarks.
Some projects did not compile successfully at first and required minor modifications to run.

Conclusion and future improvements

The current benchmark result shows the performance of all the mentioned protocols by throughput and benchmark runtime. The results are subject to revisions because it may not reflect the best performance for the protocols due to unoptimized deployment script. We are also planning to switch to a more powerful EC2 machine because t2.micro does not have enough resources to support the use of RocksDB as well as TiKV.

In the near future, additional features will be added to Distrobench such as:

Multi-Client Support: The YCSB client will start multiple clients which will send requests in parallel to different servers in the group.
Commit Versioning: Allows the labelling of all benchmark results with the commit hash of the protocol’s repository version. This allows comparing different versions of the same project.
Adding more Primary-Backup, Sequential, Causal, and Eventual consistency protocols: Implementations with support for a consistency model other than linearizability and one that provides an existing key-value store application are notoriously difficult to find.
Benchmark on node failure
Benchmark on the addition of a new node

Mid-term Blog: Building a Simulator for Benchmarking Replicated Systems

Fri, 25 Jul 2025 00:00:00 +0000

Introduction

Hello there, I’m Michael. In this report, I’ll be sharing my progress as part of the Open Testbed for Reproducible Evaluation of Replicated Systems at the Edges project under the mentorship of Fadhil Kurnia.

About the Project

The goal of the project is to build a language-agnostic interface that enables communication between clients and any consensus protocol such as MultiPaxos, Raft, Zookeeper Atomic Broadcast (ZAB), and others. Currently, many of these protocols implement their own custom mechanisms for the client to communicate with the group of peers in the network. An implementation of MultiPaxos from the MultiPaxos Made Complete paper for example, uses a custom Protobuf definition for the packets client send to the MultiPaxos system. With the support of a generalized interface, different consensus protocols can now be tested under the same workload to compare their performance objectively.

Progress

Literature Study: Reviewed papers and implementations of various protocols including GigaPaxos, Raft, Viewstamped Replication (VSR), and ZAB. Analysis focused on their log replication strategies, fault handling, and performance implications.
Development of Custom Protocol: Two custom protocols are currently under development and will serve as initial test subjects for the testbed:
- A modified GigaPaxos protocol
- A Primary-Backup Replication protocol with strict log ordering similar to ZAB (logs are ordered based on the sequence proposed by the primary)
Most of my time has been spent working on the two protocols, particularly on snapshotting and state transfer functionality in the Primary-Backup protocol. Ideally, the testbed should be able to evaluate protocol performance in scenarios involving node failure or a new node being added. In these scenarios, different protocol implementations often vary in their decision of whether to take periodic snapshots or to roll forward whenever possible and generate a snapshot only when necessary.

Challenges

Early in the project, the initial goal was to benchmark different consensus protocols using arbitrary full-stack web applications as their workload. Different protocols would replicate a full-stack application running inside Docker containers across multiple nodes and the testbed would send requests for them to coordinate between those nodes. In fact, the 2 custom protocols being worked on are specifically made to fit these constraints.

Developing a custom protocol that supports the replication of a Docker container is in itself already a difficult task. Abstracting away the functionality that allows communicating with the docker containers, as well as handling entry logs and snapshotting the state, is an order of magnitude more complicated.

As mentioned in the first blog, an application can be categorized into two types: deterministic and non-deterministic applications. The coordination of these two types of applications are handled in very different ways. Most consensus protocols support only deterministic systems, such as key-value stores and can’t easily handle coordination of complex services or external side effects. To allow support for non-deterministic applications would require abstracting over protocol-specific log structures. This effectively restricts the interface to only support protocols that conform to the abstraction, defeating the goal of making the interface broadly usable and protocol-agnostic.

Furthermore, in order to allow any existing protocols to support running something as complex as a stateful docker container without the protocol itself even knowing adds another layer of complexity to the system.

Future Goals

Given these challenges, I decided to pivot to using only key-value stores as the application being used in the benchmark. This aligns with the implementations of most of the existing protocols which typically use key-value stores. In doing so, now the main focus would be to implement an interface that supports HTTP requests from clients to any arbitrary protocols.

Midterm Blog: Open Testbed for Reproducible Evaluation of Replicated Systems at the Edges

Fri, 25 Jul 2025 00:00:00 +0000

Hello! I’m Panji Sri Kuncara Wisma and I want to share my midterm progress on the “Open Testbed for Reproducible Evaluation of Replicated Systems at the Edges” project under the mentorship of Fadhil I. Kurnia.

Project Overview

The goal of our project is to create an open testbed that enables fair, reproducible evaluation of different consensus protocols (Paxos variants, EPaxos, Raft, etc.) when deployed at network edges. Currently, researchers struggle to compare these systems because they lack standardized evaluation environments and often rely on mock implementations of proprietary systems.

XDN (eXtensible Distributed Network) is one of the important consensus systems we plan to evaluate in our benchmarking testbed. Built on GigaPaxos, it allows deployment of replicated stateful services across edge locations. As part of preparing our benchmarking framework, we need to ensure that the systems we evaluate, including XDN, are robust for fair comparison.

Progress

As part of preparing our benchmarking tool, I have been working on refactoring XDN’s FUSE filesystem from C++ to Rust. This work is essential for creating a stable and reliable XDN platform.

The diagram above illustrates how the FUSE filesystem integrates with XDN’s distributed architecture. On the left, we see the standard FUSE setup where applications interact with the filesystem through the kernel’s VFS layer. On the right, the distributed replication flow is shown: Node 1 runs fuselog_core which captures filesystem operations and generates statediffs, while Nodes 2 and 3 run fuselog_apply to receive and apply these statediffs, maintaining replica consistency across the distributed system.

This FUSE component is critical for XDN’s operation as it enables transparent state capture and replication across edge nodes. By refactoring this core component from C++ to Rust, we’re hopefully strengthening the foundation for fair benchmarking comparisons in our testbed.

Core Work: C++ to Rust FUSE Filesystem Migration

XDN relies on a FUSE (Filesystem in Userspace) component to capture filesystem operations and generate “statediffs” - records of changes that get replicated across edge nodes. The original C++ implementation worked but had memory safety concerns and limited optimization capabilities.

I worked on refactoring from C++ to Rust, implementing several improvements:

New Features Added:

Zstd Compression: Reduces statediff payload sizes
Adaptive Compression: Intelligently chooses compression strategies
Advanced Pruning: Removes redundant operations (duplicate chmod/chown, created-then-deleted files)
Bincode Serialization: Helps avoid manual serialization code and reduces the risk of related bugs
Extended Operations: Added support for additional filesystem operations (mkdir, symlink, hardlinks, etc.)

Architectural Improvements:

Memory Safety: Rust’s ownership system helps prevent common memory management issues
Type Safety: Using Rust enums instead of integer constants for better type checking

Findings

The optimization results performed as expected:

Statediff Size Reductions:

MySQL workload: 572MB → 29.6MB (95% reduction)
PostgreSQL workload: 76MB → 11.9MB (84% reduction)
SQLite workload: 4MB → 29KB (99% reduction)

The combination of write coalescing, pruning, and compression proves especially effective for database workloads, where many operations involve small changes to large files.

Performance Comparison: Remarkably, the Rust implementation matches or exceeds C++ performance:

POST operations: 30% faster (10.5ms vs 15ms)
DELETE operations: 33% faster (10ms vs 15ms)
Overall latency: Consistently better (9ms vs 11ms)

Current Challenges

While the core implementation is complete and functional, I’m currently debugging occasional latency spikes that occur under specific workload patterns. These edge cases need to be resolved before moving on to the benchmarking phase, as inconsistent performance could compromise the reliability of the evaluation.

Next Steps

With the FUSE filesystem foundation nearly complete, next steps include:

Resolve latency spike issues and complete XDN stabilization
Build benchmarking framework - a comparison tool that can systematically evaluate different consensus protocols with standardized metrics.
Run systematic evaluation across protocols

The optimized filesystem will hopefully provide a stable base for reproducible performance comparisons between distributed consensus protocols.

Developing an Open Testbed for Edge Replication System Evaluation

Sun, 15 Jun 2025 00:00:00 +0000

Hi, I’m Panji. I’m currently contributing to the Open Testbed for Reproducible Evaluation of Replicated Systems at the Edges under the mentorship of Fadhil I. Kurnia. You can find more details on the project proposal here.

The primary challenge we’re addressing is the current difficulty in fairly comparing different edge replication systems. To fix this, we’re trying to build a testing platform with four key parts. We’re collecting real data about how people actually use edge services, creating a tool that can simulate realistic user traffic across many locations, building a system that mimics network delays between hundreds of edge servers, and packaging everything into an open-source toolkit.

This will let researchers test different coordination methods like EPaxos, Raft, and others using the same data and conditions. We hope this will help provide researchers with a more standardized way to evaluate their systems. We’re working with multiple programming languages and focusing on making complex edge computing scenarios accessible to everyone in the research community.

One of the most interesting aspects of this project is tackling the challenge of creating realistic simulations that accurately reflect the performance characteristics different coordination protocols would exhibit in actual edge deployments. The end goal is to provide the research community with a standardized, reproducible environment for edge replication.

Building a Simulator for Benchmarking Replicated Systems

Sat, 14 Jun 2025 00:00:00 +0000

Hi, I’m Michael. I’m currently contributing to the Open Testbed for Reproducible Evaluation of Replicated Systems at the Edges under the mentorship of Fadhil Kurnia. You can find more details on the project proposal here.

What we are trying to achieve is to create a system to test and evaluate the performance of different consensus protocols and consistency models under the same application and workload. The consensus protocols and consistency models are both tested on various replicated black-box applications. Essentially, the testbed itself is able to deploy any arbitrary stateful application on multiple machines (nodes) as long as it is packaged in the form of a docker image. The consensus protocol is used to perform synchronization between the stateful part of the application (in most cases, the database). The goal is that by the end of this project, the testbed we are building has provided the functionality and abstraction to support the creation of new consensus protocols to run tests on.

One major challenge in implementing this is with regards to the handling of replication on the running docker containers. Generally, the services that can be deployed in this system would be of two types:

A Deterministic Application (An application that will always return the same output when given the same input. e.g., a simple CRUD app)
A Non-Deterministic Application (An application that may return the different outputs when given the same input. e.g., an LLM which may return different response from the same prompt request)

Both of these application types requires different implementation of consensus protocols. In the case of a deterministic application, since all request will always yield the same response (and the same changes inside the database of the application itself), the replication protocol can perform replication on the request to all nodes. On the other hand, in a non-determinisitic application, the replication protocol applies synchronization on the state of the database directly since a different response may be returned on the same request.

Open Testbed for Reproducible Evaluation of Replicated Systems at the Edges

Sat, 15 Feb 2025 00:00:00 +0000

Project Description

Topics: Distributed systems
Skills: Java, Go, Python, Bash scripting, Linux, Docker.
Difficulty: Hard
Size: Large (350 hours)
Mentors: Fadhil I. Kurnia

Replication is commonly employed to improve system availability and reduce latency. By maintaining multiple copies, the system can continue operating even if some replicas fail, thereby ensuring consistent availability. Placing replicas closer to users further decreases latency by minimizing the distance data must travel. A typical illustration of these advantages is a Content Delivery Network (CDN), where distributing content to edge servers can yield latencies of under 10 milliseconds when users and contents are in the same city.

In recent times, numerous edge datastores have emerged, allowing dynamic data to be served directly from network-edge replicas. Each of these replicated systems may employ different coordination protocols to synchronize replicas, leading to varied performance and consistency characteristics. For instance, Workers KV relies on a push-based coordination mechanism that provides eventual consistency, whereas Cloudflare Durable Objects and Turso deliver stronger consistency guarantees. Additionally, researchers have introduced various coordination protocols—such as SwiftPaxos, EPaxos, OPaxos, WPaxos, Raft, PANDO, and QuePaxa—each exhibiting its own performance profile, especially when being used in geo-distributed deployment.

This project aims to develop an open testbed for evaluating replicated systems and their coordination protocols under edge deployment. Currently, researchers face challenges in fairly comparing different replicated systems, as they often lack control over replica placement. Many previous studies on coordination protocols and replicated systems relied on mock implementations, particularly for well-known systems like Dynamo and Spanner, which are not open source. An open testbed would provide a standardized environment where researchers can compare various replicated systems, classes of coordination protocols, and specific protocol implementations using common benchmarks. Since the performance of replicated systems and coordination protocols varies depending on the application, workload, and replica placement, this testbed would offer a more systematic and fair evaluation framework. Furthermore, by enabling easier testing and validation, the testbed could accelerate the adoption of research prototypes in the industry.

Project Deliverables

Compilation of traces and applications from various open traces and open benchmarks.
Distributed workload generator to run the traces and applications.
Test framework to simulate latency of 100s of edge servers for measurement.
Open artifact of the traces, applications, workload generator, and test framework, published on Github.