<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Anjus George | UCSC OSPO</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/anjus-george/</link><atom:link href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/anjus-george/index.xml" rel="self" type="application/rss+xml"/><description>Anjus George</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><image><url>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/anjus-george/avatar_hudd1308004fc63796fcf97667a85a63db_88921_270x270_fill_q75_lanczos_center.jpg</url><title>Anjus George</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/anjus-george/</link></image><item><title>OpenMLEC: Open-source MLEC implementation with HDFS on top of ZFS</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/openmlec/202406012-jiajunmao/</link><pubDate>Wed, 12 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/openmlec/202406012-jiajunmao/</guid><description>&lt;p>Hello, I&amp;rsquo;m &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jiajun-mao/">Jiajun Mao&lt;/a>, a BS/MS student at the University of Chicago studying Computer Science. I will be spending this summer working on the project &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/ornl/openmlec/">OpenMLEC: Open-source MLEC implementation with HDFS on top of ZFS&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/meng-wang/">Meng Wang&lt;/a>
and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/anjus-george/">Anjus George&lt;/a>, my &lt;a href="https://docs.google.com/document/d/1nYgNlGdl0jUgW8avpu671oRpMoxaZHZPwlDfBNXRVro/edit?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a>.&lt;/p>
&lt;p>How to increase data’s durability and reliability while decreasing storage cost have always been interesting topics of research. Erasure coded storage systems in recent years have been seen as strong candidates to replace replications for colder storage tiers. In the paper “Design Considerations and Analysis of Multi-Level Erasure Coding in Large-Scale Data Centers”, the authors explored using theory and simulation on how a multiple tiered erasure coded system can out-perform systems using single level erasure codes in areas such as encoding throughput and network bandwidth consumed for repair, addressing a few pain points in adopting erasure coded storage systems. I will be implementing the theoretical and simulation result of this paper by building on top of HDFS and ZFS, and benchmarking the system performance.&lt;/p>
&lt;p>The project will aim to achieve&lt;/p>
&lt;ul>
&lt;li>HDFS understanding the underlying characteristics of ZFS as the filesystem&lt;/li>
&lt;li>HDFS understanding the failure report from ZFS, and use new and special MLEC repair logic to execute parity repair&lt;/li>
&lt;li>ZFS will be able to accept repair data from HDFS to repair a suspended pool caused by catastrophic data corruption&lt;/li>
&lt;/ul></description></item><item><title>OpenMLEC: Open-source MLEC implementation with HDFS on top of ZFS</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/ornl/openmlec/</link><pubDate>Mon, 05 Feb 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/ornl/openmlec/</guid><description>&lt;p>&lt;strong>Project Idea Description&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Storage Systems, Erasure Coding&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> C/C++, Java, Bash scripting, Linux, HDFS, ZFS, Erasure Coding&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/meng-wang/">Meng Wang&lt;/a> (&lt;a href="mailto:wangm12@uchicago.edu">Main contact person&lt;/a>) and Anjus George&lt;/li>
&lt;/ul>
&lt;p>Multi-Level Erasure Coding (MLEC), which performs erasure coding at both network and local levels, has seen large deployments in practice. Our recent research work has shown that MLEC can provide high durability with higher encoding throughput and less repair network traffic compared to other erasure coding methods. This makes MLEC particularly appealing for large-scale data centers, especially high-performance computing (HPC) systems.&lt;/p>
&lt;p>However, current MLEC systems often rely on straightforward design choices, such as Clustered/Clustered (C/C) chunk placement and the Repair-All (RALL) method for catastrophic local failures. Our recent simulations [1] have revealed the potential benefits of more complex chunk placement strategies like Clustered/Declustered (C/D), Declustered/Clustered (D/C), and Declustered/Declustered (D/D). Additionally, advanced repair methods such as Repair Failed Chunks Only (RFCO), Repair Hybrid (RHYB), and Repair Minimum (RMIN) have shown promise for improving durability and performance according to our simulations. Despite promising simulation results, these optimized design choices have not been implemented in real systems.&lt;/p>
&lt;p>In this project, we propose to develop open-source MLEC implementations in real systems, offering a range of design choices from simple to complex. Our approach leverages ZFS for local-level erasure coding and HDFS for network-level erasure coding, supporting both clustered and declustered chunk placement at each level. The student&amp;rsquo;s responsibilities include setting up HDFS on top of ZFS, configuring various MLEC chunk placements (e.g., C/D, D/C, D/D), and implementing advanced repair methods within HDFS and ZFS. The project will culminate in reproducible experiments to evaluate the performance of MLEC systems under different design choices.&lt;/p>
&lt;p>We will open-source our code and aim to provide valuable insights to the community on optimizing erasure-coded systems. Additionally, we will provide comprehensive documentation of our work and share Trovi artifacts on Chameleon Cloud to facilitate easy reproducibility of our experiments.&lt;/p>
&lt;p>[1] Meng Wang, Jiajun Mao, Rajdeep Rana, John Bent, Serkay Olmez, Anjus George, Garrett Wilson Ransom, Jun Li, and Haryadi S. Gunawi. Design Considerations and Analysis of Multi-Level Erasure Coding in Large- Scale Data Centers. In The International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’23), 2023.&lt;/p>
&lt;p>&lt;strong>Project Deliverable&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Open-source MLEC implementations with a diverse range of design choices.&lt;/li>
&lt;li>Configuration setup for HDFS on top of ZFS, supporting various MLEC chunk placements.&lt;/li>
&lt;li>Implementation of advanced repair methods within HDFS and ZFS.&lt;/li>
&lt;li>Reproducible experiments to assess the performance of MLEC systems across distinct design choices.&lt;/li>
&lt;li>Comprehensive documentation of the project and the provision of shared Trovi artifacts on Chameleon Cloud for ease of reproducibility.&lt;/li>
&lt;/ul></description></item><item><title>Reproducible Evaluation of Multi-level Erasure Coding</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ornl/multilevelerasure/</link><pubDate>Thu, 02 Feb 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ornl/multilevelerasure/</guid><description>&lt;p>Massive storage systems rely heavily on erasure coding (EC) to protect data from drive failures and provide data durability. Existing storage systems mostly adopt single-level erasure coding (SLEC) to protect data, either performing EC at the network level or performing EC at the local level. However, both SLEC approaches have limitations, as network-only SLEC introduces heavy network traffic overhead, and local-only SLEC cannot tolerate rack failures.&lt;/p>
&lt;p>Accordingly, some data centers are starting to use multi-level erasure coding (MLEC), which is a hybrid approach performing EC at both the network level and the local level. However, prior EC research and evaluations mostly focused on SLEC, and it remains to be answered how MLEC is compared to SLEC in terms of durability, capacity overhead, encoding throughput, network traffic, and other overheads.&lt;/p>
&lt;p>Therefore, in this project we seek to build a platform to evaluate the durability and overheads of MLEC. The platform will allow us to evaluate dozens of EC strategies in many dimensions including recovery strategies, chunk placement choices, various parity schemes, etc. To the best of our knowledge, there is no other evaluation platform like what we propose here. We seek to make the platform open-source and the evaluation reproducible, allowing future researchers to benefit from it and conduct more research on MLEC.&lt;/p>
&lt;h3 id="building-a-platform-to-evaluate-mlec">Building a platform to evaluate MLEC&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Storage systems, reproducibility, erasure coding, evaluation&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Linux, C, Python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> 350 hours&lt;/li>
&lt;li>&lt;strong>Mentor(s):&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/john-bent/">John Bent&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/anjus-george/">Anjus George&lt;/a>&lt;/li>
&lt;li>&lt;strong>Contributor(s):&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/zhiyan-alex-wang/">Zhiyan &amp;quot;Alex&amp;quot; Wang&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Build a platform to evaluate the durability and overheads of MLEC. The platform will be able to evaluate different EC strategies in various dimensions including repair strategies, chunk placement choices, parity schemes, etc. Analyze the evaluation results.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Work with mentors on understanding the context of the project.&lt;/li>
&lt;li>Reproduce the SLEC evaluation results from prior SLEC evaluation tools&lt;/li>
&lt;li>Based on prior SLEC evaluation tools, build a platform to evaluate the durability and overheads of MLEC under various EC strategies&lt;/li>
&lt;li>Collect and analyze the MLEC evaluation results&lt;/li>
&lt;/ul></description></item></channel></rss>