<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Michael Chan | UCSC OSPO</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/michael-chan/</link><atom:link href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/michael-chan/index.xml" rel="self" type="application/rss+xml"/><description>Michael Chan</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><image><url>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/michael-chan/avatar_hu6dc3f6f2b83723f44d9114f0008d03dc_204096_270x270_fill_q75_lanczos_center.jpg</url><title>Michael Chan</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/michael-chan/</link></image><item><title>[Final Blog] Distrobench: Distributed Protocol Benchmark</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/umass/edge-replication/20250830-panjisri/</link><pubDate>Sat, 30 Aug 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/umass/edge-replication/20250830-panjisri/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>This is the final blog for our contribution to the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/umass/edge-replication/">Open Testbed for Reproducible Evaluation of Replicated Systems at the Edges&lt;/a> project under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fadhil-kurnia/">Fadhil Kurnia&lt;/a> for the OSRE program.&lt;/p>
&lt;p>&lt;a href="https://github.com/fadhilkurnia/distro" target="_blank" rel="noopener">Distrobench&lt;/a> is a framework to evaluate the performance of replication/coordination protocols for distributed systems. This framework standardizes benchmarking by allowing different protocols to be tested under an identical workload, and supports both local and remote deployment of the protocols. The frameworks tested are restricted under a key-value store application and are categorized under different &lt;a href="https://jepsen.io/consistency/models" target="_blank" rel="noopener">consistency models&lt;/a>, programming languages, and persistency (whether the framework stores its data in-memory or on-disk).&lt;/p>
&lt;p>All the benchmark results are stored in a &lt;code>data.json&lt;/code> file which can be viewed through a webpage we have provided. A user can clone the git repository, benchmark different protocols on their own machine or in a cluster of remote machines, then view the results locally. We also provided a &lt;a href="https://distrobench.org" target="_blank" rel="noopener">webpage&lt;/a> that shows our own benchmark results which ran on 3 Amazon EC2 t2.micro instances.&lt;br>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="" srcset="
/report/osre25/umass/edge-replication/20250830-panjisri/image_hu785d614b38f6808c04fc85bf3c31eb36_153748_2eb41220c4287bdc730b38c76a5643f8.webp 400w,
/report/osre25/umass/edge-replication/20250830-panjisri/image_hu785d614b38f6808c04fc85bf3c31eb36_153748_789a9a55850eed73f3a681f8423873cf.webp 760w,
/report/osre25/umass/edge-replication/20250830-panjisri/image_hu785d614b38f6808c04fc85bf3c31eb36_153748_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/umass/edge-replication/20250830-panjisri/image_hu785d614b38f6808c04fc85bf3c31eb36_153748_2eb41220c4287bdc730b38c76a5643f8.webp"
width="760"
height="381"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="how-to-run-a-benchmark-on-distrobench">How to run a benchmark on Distrobench&lt;/h2>
&lt;p>Before running a benchmark using Distrobench, the protocol that will be benchmarked must first be built. This is to allow the script to initialize the protocol instance for local benchmark or to send the binaries into the remote machine. The remote machine running the protocol does not need to store the code for the protocol implementations, but does require dependencies for running that specific protocol such as Java, Docker, rsync, etc. The following are commands used to build the &lt;a href="https://github.com/ailidani/paxi" target="_blank" rel="noopener">ailidani/paxi&lt;/a> project which does not need any additional dependency to be run inside of a remote machine:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sh" data-lang="sh">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Clone the Distrobench repository &lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">git clone git@github.com:fadhilkurnia/distro.git
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Clone the Paxi repository and build the binary &lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">cd&lt;/span> distro/sut/ailidani.paxi
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">git clone git@github.com:ailidani/paxi.git
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">cd&lt;/span> paxi/bin/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">./build.sh
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Go back to the Distrobench root directory &amp;amp; run python script &lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">cd&lt;/span> ../../../..
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">python main.py
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>By default, the script will start 3 local instances of a Paxi protocol implementation that the user chose through the CLI. The user can modify the number of running instances and whether or not it is deployed locally or in a remote machine by changing the contents of the &lt;code>.env&lt;/code> file inside the root directory. The following is the contents of the default .env file:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">NUM_OF_NODES=3
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">SSH_KEY=ssh-key.pem
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">REMOTE_USERNAME=ubuntu
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">PUBLIC_IP1=127.0.0.1
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">PUBLIC_IP2=127.0.0.1
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">PUBLIC_IP3=127.0.0.1
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">PRIVATE_IP1=127.0.0.1
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">PRIVATE_IP2=127.0.0.1
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">PRIVATE_IP3=127.0.0.1
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">CLIENT_IP=127.0.0.1
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">OUTPUT=data.json
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>When running a remote benchmark, a ssh-key should also be added in the root directory to allow the use of ssh and rsync from within the python script. All machines must also allow TCP connection through port 2000-2300 and port 3000-3300 because that would be the port range for communication between the running instances as well as for the YCSB benchmark. Running the benchmark requires the use of at least 3 nodes because it is the minimum number of nodes to support most protocols (5 nodes recommended).&lt;/p>
&lt;p>To view the benchmark result in the web page locally, move &lt;code>data.json&lt;/code> into the &lt;code>docs/&lt;/code> directory and run &lt;code>python -m http.server 8000&lt;/code>. The page is then accessible through &lt;code>http://localhost:8000&lt;/code>.&lt;/p>
&lt;h2 id="deep-dive-on-how-distrobench-works">Deep dive on how Distrobench works&lt;/h2>
&lt;p>The following is the project structure of the Distrobench repository:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">distro/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">├── main.py // Main python script for running benchmark
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">├── data.json // Output file for main.py
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">├── README.md
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">├── .env // Config for running the benchmark
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">├── docs/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">│ ├── index.html // Web page to show benchmark results
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">│ ├── data.json // Output file displayed by web page
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">│ ├── README.md
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">├── src/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">│ ├── utils/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">│ └── ycsb/ // Submodule for YCSB
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">└── sut/ // Systems under test
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> ├── ailidani.paxi/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> └── run.py // Protocol-specific benchmark script called by main.py
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> ├── apache.zookeeper/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> ├── etcd-io.etcd/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> ├── fadhilkurnia.xdn/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> ├── holipaxos-artifect.holipaxos/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> ├── otoolep.hraftd/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> └── tikv.tikv/
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>main.py&lt;/code> will automatically detect directories inside &lt;code>sut/&lt;/code> and will call the main function inside &lt;code>run.py&lt;/code>. The following is the structure of &lt;code>run.py&lt;/code> written in pseudocode style:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">FUNCTION main(run_ycsb: Function, nodes: List of Nodes, ssh: Dictionary)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> node_data = map_ip_port(nodes)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> SWITCH user\_input
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> CASE 0:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> start()
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> RETURN
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> CASE 1:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> stop()
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> RETURN
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> CASE 2:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> client_data = []
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> FOR EACH item IN node_data
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> ADD item.client_addr TO client_data
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> END FOR
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> run_ycsb(client_data)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> RETURN
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> END SWITCH
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">END FUNCTION
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">FUNCTION start()
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> // Start the protocol instance (local or remote)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">END FUNCTION
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">FUNCTION stop()
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> // Stop the protocol instance (local or remote)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">END FUNCTION
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">FUNCTION map_ip_port(nodes: List of Nodes) -&amp;gt; List of Dictionary
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> // Generate port numbers based on the protocol requirements
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">END FUNCTION
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The .env file provides both public and private IP addresses to add versatility when running a remote benchmark. Private IP is used for communication between remote machines if they are under the same network group. In the case of our own benchmark, four t2.micro EC2 instances are deployed under the same network group. Three of them are used to run the protocol and the fourth machine acts as the YCSB client. It is possible to use your local machine as the YCSB client instead of through another remote machine by specifying &lt;code>CLIENT_IP&lt;/code> in the .env file as &lt;code>127.0.0.1&lt;/code>. The decision to use the remote machine as the YCSB client is made to reduce the impact of network latency between the client and the protocol servers to a minimum.&lt;/p>
&lt;p>The main tasks of the &lt;code>start()&lt;/code> function can be broken down into the following:&lt;/p>
&lt;ol>
&lt;li>Generate custom configuration files for each remote machine instance (May differ between implementations. Some implementations does not require a config file because they support flag parameters out of the box, others require multiple configuration files for each instance)&lt;/li>
&lt;li>rsync binaries into the remote machine (If running a remote benchmark)&lt;/li>
&lt;li>Start the instances&lt;/li>
&lt;/ol>
&lt;p>The &lt;code>stop()&lt;/code> function is a lot simpler since it only kills the process running the protocol and optionally removes the copied binary files in the remote machine. The &lt;code>run_ycsb()&lt;/code> function passed onto &lt;code>run.py&lt;/code> is defined in &lt;code>main.py&lt;/code> and currently supports two types of workload:&lt;/p>
&lt;ol>
&lt;li>Read-heavy: A single-client workload with 95% read and 5% update (write) operations&lt;/li>
&lt;li>Update-heavy: A single-client workload with 50% read and 50% update (write) operations&lt;/li>
&lt;/ol>
&lt;p>A new workload can be added inside the &lt;code>src/ycsb/workloads&lt;/code> directory. Both workloads above only run 1000 operations for the benchmark which may not be enough operations to properly evaluate the performance of the protocols. It should also be noted that while YCSB does support a &lt;code>scan&lt;/code> operation, it is never used for our benchmark because none of our tested protocols implement this operation.&lt;/p>
&lt;h3 id="how-to-implement-a-new-protocol-in-distrobench">How to implement a new protocol in Distrobench&lt;/h3>
&lt;p>Adding a new protocol to distrobench requires implementing two main components: a Python integration script (&lt;code>run.py&lt;/code>) and a YCSB database binding for benchmarking.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Create the protocol directory structure&lt;/p>
&lt;ul>
&lt;li>Create a new directory under &lt;code>sut/&lt;/code> using format &lt;code>yourrepo.yourprotocol/.&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Write &lt;code>run.py&lt;/code> integration&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Put script inside yourrepo.yourprotocol/ directory&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Must have the &lt;code>main(run_ycsb, nodes, ssh)&lt;/code> function.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Add start/stop/benchmark menu options&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Handle local (127.0.0.1) and remote deployment&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Create YCSB client&lt;/p>
&lt;ul>
&lt;li>Make Java class extending YCSB&amp;rsquo;s DB class&lt;/li>
&lt;li>Put inside &lt;code>src/ycsb/yourprotocol/src/main/java/site/ycsb/yourprotocol&lt;/code>&lt;/li>
&lt;li>Implement &lt;code>read()&lt;/code>, &lt;code>insert()&lt;/code>, &lt;code>update()&lt;/code>, &lt;code>delete()&lt;/code> methods&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Register your client&lt;/p>
&lt;ul>
&lt;li>Register your client to &lt;code>src/pom.xml&lt;/code>, &lt;code>src/ycsb/bin/binding.properties&lt;/code>, and &lt;code>src/ycsb/bin/ycsb&lt;/code>.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Build and test&lt;/p>
&lt;ul>
&lt;li>Run &lt;code>cd src/ycsb &amp;amp;&amp;amp; mvn clean package&lt;/code>&lt;/li>
&lt;li>Run python &lt;code>main.py&lt;/code>&lt;/li>
&lt;li>Select your protocol and test it&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;h2 id="protocols-which-have-been-tested">Protocols which have been tested&lt;/h2>
&lt;p>Distrobench has tested 20 different distributed consensus protocols across 7 different implementation projects.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;a href="https://github.com/ailidani/paxi" target="_blank" rel="noopener">ailidani/paxi&lt;/a>&lt;/p>
&lt;ul>
&lt;li>Programming Language : Go&lt;/li>
&lt;li>Persistency : On-Disk&lt;/li>
&lt;li>Consistency Model : Linearizability, Eventual&lt;/li>
&lt;li>Protocol : Paxos, EPaxos, SDpaxos, WPaxos, ABD, chain, VPaxos, WanKeeper, KPaxos, Paxos_groups, Dynamo, Blockchain, M2Paxos, HPaxos.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://github.com/apache/zookeeper" target="_blank" rel="noopener">apache/zookeeper&lt;/a>&lt;/p>
&lt;ul>
&lt;li>Programming Language : Java&lt;/li>
&lt;li>Persistency : On-Disk&lt;/li>
&lt;li>Consistency Model : Linearizability + Primary Integrity&lt;/li>
&lt;li>Protocol : Zookeeper implements ZAB (Zookeper Atomic Broadcast)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://github.com/etcd-io/etcd" target="_blank" rel="noopener">etcd-io/etcd&lt;/a>&lt;/p>
&lt;ul>
&lt;li>Programming Language : Go&lt;/li>
&lt;li>Persistency : On-Disk&lt;/li>
&lt;li>Consistency Model : Linearizability&lt;/li>
&lt;li>Protocol : Raft&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://github.com/fadhilkurnia/xdn" target="_blank" rel="noopener">fadhilkurnia/xdn&lt;/a>&lt;/p>
&lt;ul>
&lt;li>Programming Language : Java, Rust&lt;/li>
&lt;li>Persistency : On-Disk&lt;/li>
&lt;li>Consistency Model : Linearizability, Linearizability + Primary Integrity&lt;/li>
&lt;li>Protocol : Gigapaxos&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://github.com/Zhiying12/holipaxos-artifect" target="_blank" rel="noopener">Zhiying12/holipaxos-artifect&lt;/a>&lt;/p>
&lt;ul>
&lt;li>Programming Language : Go, Rust&lt;/li>
&lt;li>Persistency : On-Disk&lt;/li>
&lt;li>Consistency Model : Linearizability&lt;/li>
&lt;li>Protocol : Holipaxos, Omnipaxos, Multipaxos&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://github.com/otoolep/hraftd" target="_blank" rel="noopener">otoolep/hraftd&lt;/a>&lt;/p>
&lt;ul>
&lt;li>Programming Language : Go&lt;/li>
&lt;li>Persistency : On-Disk&lt;/li>
&lt;li>Consistency Model : Linearizability&lt;/li>
&lt;li>Protocol : Raft&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://github.com/tikv/tikv" target="_blank" rel="noopener">tikv/tikv&lt;/a>&lt;/p>
&lt;ul>
&lt;li>Programming Language : Rust&lt;/li>
&lt;li>Persistency : On-Disk&lt;/li>
&lt;li>Consistency Model : Linearizability&lt;/li>
&lt;li>Protocol : Raft&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;h2 id="challenges">Challenges&lt;/h2>
&lt;ul>
&lt;li>When attempting to benchmark HoliPaxos, the main challenge was handling versions that rely on persistent storage with RocksDB. Since some implementations are written in Go, it was necessary to find compatible versions of RocksDB and gRocksDB (for example, RocksDB 10.5.1 works with gRocksDB 1.10.2). Another difficulty was that RocksDB is resource-intensive to compile, and in our project we did not have sufficient CPU capacity on the remote machine to build RocksDB and run remote benchmarks.&lt;/li>
&lt;li>Some projects did not compile successfully at first and required minor modifications to run.&lt;/li>
&lt;/ul>
&lt;h2 id="conclusion-and-future-improvements">Conclusion and future improvements&lt;/h2>
&lt;p>The current benchmark result shows the performance of all the mentioned protocols by throughput and benchmark runtime. The results are subject to revisions because it may not reflect the best performance for the protocols due to unoptimized deployment script. We are also planning to switch to a more powerful EC2 machine because t2.micro does not have enough resources to support the use of RocksDB as well as TiKV.&lt;/p>
&lt;p>In the near future, additional features will be added to Distrobench such as:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Multi-Client Support:&lt;/strong> The YCSB client will start multiple clients which will send requests in parallel to different servers in the group.&lt;/li>
&lt;li>&lt;strong>Commit Versioning:&lt;/strong> Allows the labelling of all benchmark results with the commit hash of the protocol&amp;rsquo;s repository version. This allows comparing different versions of the same project.&lt;/li>
&lt;li>&lt;strong>Adding more Primary-Backup, Sequential, Causal, and Eventual consistency protocols:&lt;/strong> Implementations with support for a consistency model other than linearizability and one that provides an existing key-value store application are notoriously difficult to find.&lt;/li>
&lt;li>&lt;strong>Benchmark on node failure&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Benchmark on the addition of a new node&lt;/strong>&lt;/li>
&lt;/ul></description></item><item><title>Mid-term Blog: Building a Simulator for Benchmarking Replicated Systems</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/umass/edge-replication/20250725-mchan/</link><pubDate>Fri, 25 Jul 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/umass/edge-replication/20250725-mchan/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Hello there, I&amp;rsquo;m Michael. In this report, I&amp;rsquo;ll be sharing my progress as part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/umass/edge-replication/">Open Testbed for Reproducible Evaluation of Replicated Systems at the Edges&lt;/a> project under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fadhil-kurnia/">Fadhil Kurnia&lt;/a>.&lt;/p>
&lt;h2 id="about-the-project">About the Project&lt;/h2>
&lt;p>The goal of the project is to build a &lt;em>language-agnostic&lt;/em> interface that enables communication between clients and any consensus protocol such as MultiPaxos, Raft, Zookeeper Atomic Broadcast (ZAB), and others. Currently, many of these protocols implement their own custom mechanisms for the client to communicate with the group of peers in the network. An implementation of MultiPaxos from the &lt;a href="https://arxiv.org/abs/2405.11183" target="_blank" rel="noopener">MultiPaxos Made Complete&lt;/a> paper for example, uses a custom Protobuf definition for the packets client send to the MultiPaxos system. With the support of a generalized interface, different consensus protocols can now be tested under the same workload to compare their performance objectively.&lt;/p>
&lt;h2 id="progress">Progress&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Literature Study:&lt;/strong>
Reviewed papers and implementations of various protocols including GigaPaxos, Raft, Viewstamped Replication (VSR), and ZAB. Analysis focused on their log replication strategies, fault handling, and performance implications.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Development of Custom Protocol:&lt;/strong>
Two custom protocols are currently under development and will serve as initial test subjects for the testbed:&lt;/p>
&lt;ul>
&lt;li>A modified GigaPaxos protocol&lt;/li>
&lt;li>A Primary-Backup Replication protocol with strict log ordering similar to ZAB (logs are ordered based on the sequence proposed by the primary)&lt;/li>
&lt;/ul>
&lt;p>Most of my time has been spent working on the two protocols, particularly on snapshotting and state transfer functionality in the Primary-Backup protocol. Ideally, the testbed should be able to evaluate protocol performance in scenarios involving node failure or a new node being added. In these scenarios, different protocol implementations often vary in their decision of whether to take periodic snapshots or to roll forward whenever possible and generate a snapshot only when necessary.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="challenges">Challenges&lt;/h2>
&lt;p>Early in the project, the initial goal was to benchmark different consensus protocols using arbitrary full-stack web applications as their workload. Different protocols would replicate a full-stack application running inside Docker containers across multiple nodes and the testbed would send requests for them to coordinate between those nodes. In fact, the 2 custom protocols being worked on are specifically made to fit these constraints.&lt;/p>
&lt;p>Developing a custom protocol that supports the replication of a Docker container is in itself already a difficult task. Abstracting away the functionality that allows communicating with the docker containers, as well as handling entry logs and snapshotting the state, is an order of magnitude more complicated.&lt;/p>
&lt;p>As mentioned in the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/umass/edge-replication/20250613-mchan/">first blog&lt;/a>, an application can be categorized into two types: deterministic and non-deterministic applications. The coordination of these two types of applications are handled in very different ways. Most consensus protocols support only deterministic systems, such as key-value stores and can&amp;rsquo;t easily handle coordination of complex services or external side effects. To allow support for non-deterministic applications would require abstracting over protocol-specific log structures. This effectively restricts the interface to only support protocols that conform to the abstraction, defeating the goal of making the interface broadly usable and protocol-agnostic.&lt;/p>
&lt;p>Furthermore, in order to allow &lt;strong>any&lt;/strong> existing protocols to support running something as complex as a stateful docker container without the protocol itself even knowing adds another layer of complexity to the system.&lt;/p>
&lt;h2 id="future-goals">Future Goals&lt;/h2>
&lt;p>Given these challenges, I decided to pivot to using only key-value stores as the application being used in the benchmark. This aligns with the implementations of most of the existing protocols which typically use key-value stores. In doing so, now the main focus would be to implement an interface that supports HTTP requests from clients to any arbitrary protocols.&lt;/p></description></item><item><title>Building a Simulator for Benchmarking Replicated Systems</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/umass/edge-replication/20250613-mchan/</link><pubDate>Sat, 14 Jun 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/umass/edge-replication/20250613-mchan/</guid><description>&lt;p>Hi, I&amp;rsquo;m Michael. I&amp;rsquo;m currently contributing to the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/umass/edge-replication/">Open Testbed for Reproducible Evaluation of Replicated Systems at the Edges&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fadhil-kurnia/">Fadhil Kurnia&lt;/a>. You can find more details on the project proposal &lt;a href="https://drive.google.com/file/d/1LQCPu1h9vXAbdL6AX_E9S43dsIOndTyW/view?usp=sharing" target="_blank" rel="noopener">here&lt;/a>.&lt;/p>
&lt;p>What we are trying to achieve is to create a system to test and evaluate the performance of different consensus protocols and consistency models under the same application and workload. The consensus protocols and consistency models are both tested on various replicated black-box applications. Essentially, the testbed itself is able to deploy any arbitrary stateful application on multiple machines (nodes) as long as it is packaged in the form of a docker image. The consensus protocol is used to perform synchronization between the stateful part of the application (in most cases, the database). The goal is that by the end of this project, the testbed we are building has provided the functionality and abstraction to support the creation of new consensus protocols to run tests on.&lt;/p>
&lt;p>One major challenge in implementing this is with regards to the handling of replication on the running docker containers. Generally, the services that can be deployed in this system would be of two types:&lt;/p>
&lt;ol>
&lt;li>A Deterministic Application (An application that will always return the same output when given the same input. e.g., a simple CRUD app)&lt;/li>
&lt;li>A Non-Deterministic Application (An application that may return the different outputs when given the same input. e.g., an LLM which may return different response from the same prompt request)&lt;/li>
&lt;/ol>
&lt;p>Both of these application types requires different implementation of consensus protocols. In the case of a deterministic application, since all request will always yield the same response (and the same changes inside the database of the application itself), the replication protocol can perform replication on the request to all nodes. On the other hand, in a non-determinisitic application, the replication protocol applies synchronization on the state of the database directly since a different response may be returned on the same request.&lt;/p></description></item></channel></rss>