Meng Wang | UCSC OSPO

OpenMLEC: Open-source MLEC implementation with HDFS on top of ZFS

Wed, 12 Jun 2024 00:00:00 +0000

Hello, I’m Jiajun Mao, a BS/MS student at the University of Chicago studying Computer Science. I will be spending this summer working on the project OpenMLEC: Open-source MLEC implementation with HDFS on top of ZFS under the mentorship of Meng Wang and Anjus George, my proposal.

How to increase data’s durability and reliability while decreasing storage cost have always been interesting topics of research. Erasure coded storage systems in recent years have been seen as strong candidates to replace replications for colder storage tiers. In the paper “Design Considerations and Analysis of Multi-Level Erasure Coding in Large-Scale Data Centers”, the authors explored using theory and simulation on how a multiple tiered erasure coded system can out-perform systems using single level erasure codes in areas such as encoding throughput and network bandwidth consumed for repair, addressing a few pain points in adopting erasure coded storage systems. I will be implementing the theoretical and simulation result of this paper by building on top of HDFS and ZFS, and benchmarking the system performance.

The project will aim to achieve

HDFS understanding the underlying characteristics of ZFS as the filesystem
HDFS understanding the failure report from ZFS, and use new and special MLEC repair logic to execute parity repair
ZFS will be able to accept repair data from HDFS to repair a suspended pool caused by catastrophic data corruption

GPEC: An Open Emulation Platform to Evaluate GPU/ML Workloads on Erasure Coding Storage

Thu, 08 Feb 2024 00:00:00 +0000

Project Idea Description

Topics: Storage Systems, Machine Learning, Erasure Coding
Skills: C/C++, Python, PyTorch, Bash scripting, Linux, Erasure Coding, Machine Learning
Difficulty: Hard
Size: Large (350 hours)
Mentors: Meng Wang (primary contact), John Bent

Large-scale data centers store immense amounts of user data across a multitude of disks, necessitating redundancy strategies like erasure coding (EC) to safeguard against disk failures. Numerous research efforts have sought to assess the performance and durability of various erasure coding approaches, including single-level erasure coding, locally recoverable coding, and multi-level erasure coding.

Despite its widespread adoption, a significant research gap exists regarding the performance of large-scale erasure-coded storage systems when exposed to machine learning (ML) workloads. While conventional practice often leans towards replication for enhanced performance, this project seeks to explore whether cost-effective erasure encoding can deliver comparable performance. In this context, several fundamental questions remain unanswered, including: Can a typical erasure-coded storage system deliver sufficient throughput for ML training tasks? Can an erasure-coded storage system maintain low-latency performance for ML training and inference workloads? How does disk failure and subsequent repair impact the throughput and latency of ML workloads? What influence do various erasure coding design choices, such as chunk placement strategies and repair methods, have on the aforementioned performance metrics?

To address these questions, the most straightforward approach would involve running ML workloads on large-scale erasure coded storage systems within HPC data centers. However, this presents challenges for researchers and students due to limited access to expensive GPUs and distributed storage systems, especially when dealing with large-scale evaluations. Consequently, there is a need for a cost-effective evaluation platform.

The objective of this project is to develop an open-source platform that facilitates cheap and reproducible evaluations of erasure-coded storage systems concerning ML workloads. This platform consists of two key components: GPU Emulator: This emulator is designed to simulate GPU performance for ML workloads. Development of the GPU emulator is near completion. EC Emulator: This emulator is designed to simulate the performance characteristics of erasure-coded storage systems. It is still in the exploratory phase and requires further development.

The student’s responsibilities will include documenting the GPU emulator, progressing the development of the EC emulator, and packaging the experiments to ensure easy reproducibility. It is anticipated that this platform will empower researchers and students to conduct cost-effective and reproducible evaluations of large-scale erasure-coded storage systems in the context of ML workloads.

Project Deliverable

Build an EC emulator to emulate the performance characteristics of large-scale erasure-coded storage systems
Incorporate the EC emulator into ML workloads and GPU emulator
Conduct reproducible experiments to evaluate the performance of erasure-coded storage systems in the context of ML workloads
Publish a Trovi artifact shared on Chameleon Cloud and a GitHub repository with open-source code

OpenMLEC: Open-source MLEC implementation with HDFS on top of ZFS

Mon, 05 Feb 2024 00:00:00 +0000

Project Idea Description

Topics: Storage Systems, Erasure Coding
Skills: C/C++, Java, Bash scripting, Linux, HDFS, ZFS, Erasure Coding
Difficulty: Hard
Size: Large (350 hours)
Mentors: Meng Wang (Main contact person) and Anjus George

Multi-Level Erasure Coding (MLEC), which performs erasure coding at both network and local levels, has seen large deployments in practice. Our recent research work has shown that MLEC can provide high durability with higher encoding throughput and less repair network traffic compared to other erasure coding methods. This makes MLEC particularly appealing for large-scale data centers, especially high-performance computing (HPC) systems.

However, current MLEC systems often rely on straightforward design choices, such as Clustered/Clustered (C/C) chunk placement and the Repair-All (RALL) method for catastrophic local failures. Our recent simulations [1] have revealed the potential benefits of more complex chunk placement strategies like Clustered/Declustered (C/D), Declustered/Clustered (D/C), and Declustered/Declustered (D/D). Additionally, advanced repair methods such as Repair Failed Chunks Only (RFCO), Repair Hybrid (RHYB), and Repair Minimum (RMIN) have shown promise for improving durability and performance according to our simulations. Despite promising simulation results, these optimized design choices have not been implemented in real systems.

In this project, we propose to develop open-source MLEC implementations in real systems, offering a range of design choices from simple to complex. Our approach leverages ZFS for local-level erasure coding and HDFS for network-level erasure coding, supporting both clustered and declustered chunk placement at each level. The student’s responsibilities include setting up HDFS on top of ZFS, configuring various MLEC chunk placements (e.g., C/D, D/C, D/D), and implementing advanced repair methods within HDFS and ZFS. The project will culminate in reproducible experiments to evaluate the performance of MLEC systems under different design choices.

We will open-source our code and aim to provide valuable insights to the community on optimizing erasure-coded systems. Additionally, we will provide comprehensive documentation of our work and share Trovi artifacts on Chameleon Cloud to facilitate easy reproducibility of our experiments.

[1] Meng Wang, Jiajun Mao, Rajdeep Rana, John Bent, Serkay Olmez, Anjus George, Garrett Wilson Ransom, Jun Li, and Haryadi S. Gunawi. Design Considerations and Analysis of Multi-Level Erasure Coding in Large- Scale Data Centers. In The International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’23), 2023.

Project Deliverable

Open-source MLEC implementations with a diverse range of design choices.
Configuration setup for HDFS on top of ZFS, supporting various MLEC chunk placements.
Implementation of advanced repair methods within HDFS and ZFS.
Reproducible experiments to assess the performance of MLEC systems across distinct design choices.
Comprehensive documentation of the project and the provision of shared Trovi artifacts on Chameleon Cloud for ease of reproducibility.