<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Zhiyan "Alex" Wang | UCSC OSPO</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/zhiyan-alex-wang/</link><atom:link href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/zhiyan-alex-wang/index.xml" rel="self" type="application/rss+xml"/><description>Zhiyan "Alex" Wang</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><image><url>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/zhiyan-alex-wang/avatar_hu597e6505c129fd4b7d7c6a0de24e0a62_30174_270x270_fill_q75_lanczos_center.jpeg</url><title>Zhiyan "Alex" Wang</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/zhiyan-alex-wang/</link></image><item><title>Reproducible Evaluation of Multi-level Erasure Coding (Midterm)</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ornl/multilevelerasure/20230801-zhiyanw/</link><pubDate>Sat, 05 Aug 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ornl/multilevelerasure/20230801-zhiyanw/</guid><description>&lt;p>Hi Everyone,&lt;/p>
&lt;p>I hope everything goes well! This is my second blog post for my project &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ornl/MultiLevelErasure">Reproducible Evaluation of Multi-level Erasure Coding&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/john-bent/">John Bent&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/anjus-george/">Anjus George&lt;/a>, and Meng Wang. In summary, my project aims to build a platform to reproducibly evaluate the performance and durability of MLEC (Multi-Level Erasure Coding) for large-scale storage systems under different design configurations. The details are in this &lt;a href="https://docs.google.com/document/d/1dO1aING1QcSB---XklzUjNz0usVh7qWffVGC3GZq2AE/edit?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a>.&lt;/p>
&lt;p>In the course of these few weeks, I&amp;rsquo;ve completed several tasks to achieve the aim of this project, including&lt;/p>
&lt;ul>
&lt;li>Literature Review&lt;/li>
&lt;li>Studying the Erasure Coding Simulator and Creating Reproducible Evaluations, with the following policies
&lt;ul>
&lt;li>Clustered/Declustered Local-level SLEC&lt;/li>
&lt;li>Clustered/Declustered Network-level SLEC&lt;/li>
&lt;li>MLEC with C/C, C/D, D/C, D/D configuration&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="literature-review">Literature Review&lt;/h2>
&lt;p>Prior to developing the simulator, my first step was to delve into various literature related to distinct Erasure Coding policies. To understand a simulator for complex Erasure coding policy such as MLEC, I want to start from the simpler EC policies, and then extend my knowledge to more complex ones such as MLEC. Moreover, I also aimed to contrast the durability of MLEC with other comparable EC policies like LRC in my evaluations, making it vital to understand the implementation of these policies.&lt;/p>
&lt;p>Over the first week, I read several papers regarding different chunk placement policies regarding erasure coding, including LRC (Local Reconstruction Codes), CL-LRC (Combined Locality for Local Reconstruction Codes), SODP (Single Overlap declustered parity), and MLEC (Multi-Level Erasure Coding). These papers offered a fundamental comprehension of each policy, their respective advantages and drawbacks, and their practical usage in production environments.&lt;/p>
&lt;h2 id="simulator-reproduction">Simulator Reproduction&lt;/h2>
&lt;p>After gaining some understanding with the papers I read, I started to study the EC simulator by building the simulator myself. I got the MLEC simulator from the mentors. However, the simulator lacks documentation and guides, making it hard for others to reproduce evaluation results. The simulator is also complicated to understand, as it simulates various EC schemes, chunk placements, and rebuild policies, which results in 13,000 LOC. Therefore, my goal is to understand the design and implementation details of the simulator, after which I will create guides for reproducible evaluations.&lt;/p>
&lt;p>In order to fully understand the simulator, the best way is to rebuild the simulator by myself. The simulator is designed to mimic disk failures over the span of a year under varying chunk placement policies. Once successfully rebuilt, the simulator will enable me to assess the durability of MLEC in relation to other widely-used chunk placement policies. I followed the given simulator and rewrote it on my own in Python.&lt;/p>
&lt;p>Based on the skeleton of the given simulator, I first rebuilt a simple simulator that simulates SLEC (single level erasure coding, in both local and network settings) with clustered parities. With the arguments given, the simulator can run arbitrary numbers of iterations that simulate disk failures in one year. The simulator then collects iterations in which there is a data loss. The ratio of failed iterations to total executed iterations is the durability of the erasure coding policy. This simulation allows us to evaluate the durability of SLEC, laying foundations for later evaluation of MLEC.&lt;/p>
&lt;p>Next, I extended my simulator from local-level SLEC implementation by adding more policies. I began by introducing a network-level SLEC policy with clustered parities. This differs slightly from the local-level EC as it necessitates the consideration of factors like network bandwidth within the simulator.&lt;/p>
&lt;p>In addition, I have delved deeper into simulating declustered parities and successfully discovered a method to simulate disk failures. Basically, the simulator generates failures within a one-year timeframe and subsequently repairs them using priority queues. The disks associated with stripes experiencing the most failures are given the highest repair priority. With this construction, the simulator is capable of simulating local-level declustered parities, with the ability to specify parameters.&lt;/p>
&lt;p>Upon successfully simulating local-level declustered parities, the construction of the simulator for network level declustered parities was rather straightforward. I then validated it using the simulator and math models provided by the mentors. The results perfectly agree with each other, which proves the correctness of my understanding for the SLEC declustered placements. By implementing the simulator myself, I strengthened my understanding of erasure coding designs and the simulation techniques, which equipped me with a solid foundation to continue to reproduce MLEC simulations.&lt;/p>
&lt;p>Based on my knowledge gained from implementing SLEC simulators myself, I then reverse-engineered the MLEC simulator provided by the mentors from their MLEC paper. I choose to start from the simplest policy, which is clustered parities in both levels. After spending a considerable time digging into the simulator source codes, I was able to understand the simulation workflows, different repair methods that it implements, and the splitting method that it uses to simulate high durabilities. I then revised my simulator based on my understanding. I also tried to run a few experiments using the same configuration setups as specified in the paper. The results agree well with those in the paper, which verified the success of my reproducing work.&lt;/p>
&lt;h2 id="technical-issues">Technical Issues&lt;/h2>
&lt;p>In the process of building the MLEC, I&amp;rsquo;ve encountered many issues, conceptual or technical. The mentors are super helpful and responsive in the process, so I was able to have steady progress.&lt;/p>
&lt;h2 id="summary">Summary&lt;/h2>
&lt;p>Overall, I&amp;rsquo;ve rebuilt a python simulator for various EC policies, and the simulator can successfully reproduce the results from paper.&lt;/p>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;p>My next step would be to package the simulator into reprodTrovi artifact, so others can reproduce evaluations on performance and durability of various EC policies, in particular MLEC&lt;/p></description></item><item><title>Reproducible Evaluation of Multi-level Erasure Coding</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ornl/multilevelerasure/20230531-zhiyanw/</link><pubDate>Wed, 31 May 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ornl/multilevelerasure/20230531-zhiyanw/</guid><description>&lt;p>Hi! My name is Alex, an undergraduate student at the University of Chicago. As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ornl/MultiLevelErasure">Reproducible Evaluation of Multi-level Erasure Coding&lt;/a>, my &lt;a href="https://docs.google.com/document/d/1dO1aING1QcSB---XklzUjNz0usVh7qWffVGC3GZq2AE/edit?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/john-bent/">John Bent&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/anjus-george/">Anjus George&lt;/a> aims to build a platform to reproducibly evaluate the performance and durability of MLEC (Multi-Level Erasure Coding) for large-scale storage systems under different design configurations.&lt;/p>
&lt;p>To provide some context, Erasure Coding (EC) is a common approach to protect data from disk failures. Data centers nowadays increasingly use Multi-Level Erasure Coding (MLEC), a newly developed erasure coding method that aims to deal with the drawbacks of Single-Level Erasure Coding (SLEC). Despite its increasing popularity, there have not been many systematic studies to analyze and evaluate MLEC, which is the focus of this project.&lt;/p>
&lt;p>The evaluation will primarily be conducted through simulations, since modifying configurations in a real large-scale system is costly and impractical. The expected deliverables of this project will be:&lt;/p>
&lt;ul>
&lt;li>An MLEC simulator that can reproducibly simulate different configurations of the MLEC system, e.g. coding parameter selection, chunk placement scheme, repair method choice, etc.&lt;/li>
&lt;li>An analysis of the performance and durability tradeoffs between different MLEC design choices based on the evaluation results from the simulation&lt;/li>
&lt;li>Reproduced SLEC evaluation results using existing SLEC simulators&lt;/li>
&lt;li>A comparison between MLEC and SLEC on performance and durability tradeoffs&lt;/li>
&lt;li>Well-written documents and detailed guides on how to reproduce the evaluation results&lt;/li>
&lt;/ul>
&lt;p>Our plan is to build the simulator throughout the summer. We hope our simulator and evaluation results can provide designers of large-scale storage systems with valuable insights on choosing the most appropriate erasure coding configuration per their needs.&lt;/p></description></item></channel></rss>