<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>performance | UCSC OSPO</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/tag/performance/</link><atom:link href="https://deploy-preview-1007--ucsc-ospo.netlify.app/tag/performance/index.xml" rel="self" type="application/rss+xml"/><description>performance</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><lastBuildDate>Tue, 21 Feb 2023 00:00:00 +0000</lastBuildDate><image><url>https://deploy-preview-1007--ucsc-ospo.netlify.app/media/logo_hub6795c39d7c5d58c9535d13299c9651f_74810_300x300_fit_lanczos_3.png</url><title>performance</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/tag/performance/</link></image><item><title>eBPF Monitoring Tools</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/lanl/ebpftools/</link><pubDate>Tue, 21 Feb 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/lanl/ebpftools/</guid><description>&lt;p>&lt;a href="https://ebpf.io" target="_blank" rel="noopener">eBPF&lt;/a> is a technology that allows sandboxed programs to run in a priviledged context such as a Linux kernel. eBPF is for operating systems what Javascript is for web browsers: new functionality can be safely loaded without restarting or continually upgrading the operating system or browser and executed efficiently. eBPF is used to introduce new functionality into a running Linux kernel, including next-generation networking, observability, and security functionality. The following is just one idea of many possible.&lt;/p>
&lt;h3 id="implement-darshan-functionality-as-ebpf-tool">Implement Darshan functionality as eBPF tool&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> performance, I/O, workload characterization&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:treddy@lanl.gov">Tyler Reddy&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;a href="https://www.mcs.anl.gov/research/projects/darshan/" target="_blank" rel="noopener">Darshan&lt;/a> is an HPC I/O characterization tool that collect statistics using a lightweight design that makes it suitable for full time deployment. Darshan is an interposer library that catches and counts IO requests (open, write, read, etc.) to a file/file system and it keeps the counters in buckets in data structure that can be queried. How many reads of small size, medium size, large size) for example are the types of things that are counted.&lt;/p>
&lt;p>Having this be an interposer library requires users to link their application with this library. Having this function in epbf would make this same function transparent to users. Darshan has all the functions and could provide the list of functions to implement and the programmer could build and test these functions in ebpf on a linux machine. This could be a broadly available open tool that would be generally useful and but one of perhaps hundreds of examples of where ebpf based tools that could be in the open community for all to leverage.&lt;/p></description></item><item><title>GPU Emulator for Easy Reproducibility of DNN Training</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/utexas/gpuemulator/</link><pubDate>Sun, 05 Feb 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/utexas/gpuemulator/</guid><description>&lt;p>Deep Neural Networks (DNN) have achieved success in many machine learning (ML) tasks including image recognition, video classification and natural language processing. Nonetheless, training DNN models is highly computation intensive and usually requires running complex computations on GPUs, while GPU is a very expensive and scarce resource. Therefore, many research works on DNN training are delayed because of the lack of access to GPUs. However, many research prototypes don&amp;rsquo;t require GPUs but only the performance profiles of GPUs. For example, research on DNN training storage systems doesn’t need to run real computations on GPUs, but only needs to know how much time each GPU computation will take. Meanwhile, GPU performance in DNN training is predictable and reproducible, as every batch of training performs a deterministic sequence of mathematical operations on a fixed number of data.&lt;/p>
&lt;p>Therefore, in this project we seek to build a GPU emulator platform on PyTorch to easily reproduce DNN training without using real GPUs. We will measure the performance profiles of GPU computations for different models, GPU types, and batch sizes. Based on the measured GPU performance profiles, we will build a platform to emulate the GPU behaviors and reproduce DNN training using CPUs only. We will make the platform and the measurements open-source, allowing other researchers to reproduce the performance measurements and easily conduct research on DNN training systems. We will also encourage the community to enrich the database by adding GPU performance measurements for their own models and GPU types. We will be the first one to build and release this kind of GPU emulator for DNN training, and we believe researchers and the community can benefit a lot from it, especially after more and more GPU performance profiles are added by the community.&lt;/p>
&lt;h3 id="building-a-platform-to-emulate-gpu-performance-in-dnn-training">Building a platform to emulate GPU performance in DNN training&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> DNN training, reproducibility, GPU emulator, performance measurement - Skills: Linux, Python, PyTorch, deep learning&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> 350 hours&lt;/li>
&lt;li>&lt;strong>Mentor(s):&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/vijay-chidambaram/">Vijay Chidambaram&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/yeonju-ro/">Yeonju Ro&lt;/a>&lt;/li>
&lt;li>&lt;strong>Contributor(s):&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/haoran-wu/">Haoran Wu&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The student will measure the GPU performance profiles for different models and GPU types, based on which the student will build a platform to emulate the GPU behaviors and easily reproduce DNN training. The GPU performance measurements should be made open-source and reproducible for other researchers to reproduce results and add GPU profiles for their own needs.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Work with mentors on understanding the context of the project.&lt;/li>
&lt;li>Study and get familiar with the PyTorch DNN training pipelines&lt;/li>
&lt;li>Measure GPU performance profiles for different DNN models and GPU types&lt;/li>
&lt;li>Based on the GPU performance measurements, build a platform to emulate the GPU behaviors and reproduce DNN training without using real GPUs&lt;/li>
&lt;li>Organize and document the codes to make them reproducible for the community&lt;/li>
&lt;/ul></description></item><item><title>FlashNet: Towards Reproducible Data Science for Storage System</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/uchicago/flashnet/</link><pubDate>Thu, 02 Feb 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/uchicago/flashnet/</guid><description>&lt;p>The Data Storage Research Vision 2025, organized in an NSF workshop, calls for more “AI for storage” research. However, performing ML-for-storage research can be a daunting task for new storage researchers. The person must know both the storage side as well the ML side as if studying two different fields at the same time. This project aims to answer these questions:&lt;/p>
&lt;ol>
&lt;li>How can we encourage data scientists to look into storage problems?&lt;/li>
&lt;li>How can we create a transparent platform that allows such decoupling?&lt;/li>
&lt;li>Within the storage/ML community can we create two collaborative communities, the storage engineers and the storage data scientists?&lt;/li>
&lt;/ol>
&lt;p>In the ML/Deep Learning community, the large ImageNet benchmarks have spurred research in image recognition. Similarly, we would like to provide benchmarks for fostering storage research in ML-based per-IO latency prediction. Therefore, we present FlashNet, a reproducible data science platform for storage systems. To start a big task, we use I/O latency prediction as a case study. Thus, FlashNet has been built for I/O latency prediction tasks. With FlashNet, data engineers can collect the IO traces of various devices. The data scientists then can train the ML models to predict the IO latency based on those traces. All traces, results, and codes will be shared in the FlashNet training ground platform which utilizes Chameleon trovi for better reproducibility.&lt;/p>
&lt;p>In this project, we plan to improve the modularity of the FlashNet pipeline and develop the Chameleon trovi packages. We will also continue to improve the performance of our binary-class and multiclass classifiers and test them on the new production traces that we collected from SNIA IOTA public trace repository. Finally, we will optimize the deployment of our continual-learning mechanism and test it in a cloud system environment. To the best of our knowledge, we are building the world-first end-to-end data science platform for storage systems.&lt;/p>
&lt;h3 id="building-flashnet-platform">Building FlashNet Platform&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Storage systems, reproducibility, machine learning, continual learning&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> C++, Python, PyTorch, Experienced with Machine Learning pipeline&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/haryadi-s.-gunawi/">Haryadi S. Gunawi&lt;/a>&lt;/li>
&lt;li>&lt;strong>Contributor(s):&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/justin-shin/">Justin Shin&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/maharani-ayu-putri-irawan/">Maharani Ayu Putri Irawan&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Build an open-source platform to enable collaboration between storage and ML communities, specifically to provide a common platform for advancing data science research for storage systems. The platform will be able to reproduce and evaluate different ML models/architecture, dataset patterns, data preprocessing techniques, and various feature engineering strategies.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Work with mentors on understanding the context of the project.&lt;/li>
&lt;li>Reproduce the FlashNet evaluation results from prior works.&lt;/li>
&lt;li>Build and improve FlashNet components based on the existing blueprint.&lt;/li>
&lt;li>Collect and analyze the FlashNet evaluation results.&lt;/li>
&lt;/ul></description></item><item><title>Efficient Communication with Key/Value Storage Devices</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/kvstore/</link><pubDate>Sun, 27 Feb 2022 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/kvstore/</guid><description>&lt;p>Network key value stores are used throughout the cloud as a storage backends (eg AWS ShardStore) and are showing up in devices (eg NVMe KV SSD). The KV clients use traditional network sockets and POSIX APIs to communicate with the KV store. An advancement that has occurred in the last 2 years is a new kernel interface that can be used in lieu of the POSIX API, namely &lt;code>io_uring&lt;/code>. This new interface uses a set of shared memory queues to provide for kernel-to-user communication and permits zero copy transfer of data. This scheme avoids the overhead of system calls and can improve performance.&lt;/p>
&lt;h3 id="implement-io_uring-communication-backend">Implement &lt;code>io_uring&lt;/code> communication backend&lt;/h3>
&lt;p>&lt;strong>Topics:&lt;/strong> performance, I/O, network, key-value, storage&lt;br>
&lt;strong>Difficulty:&lt;/strong> Medium&lt;br>
&lt;strong>Size:&lt;/strong> Medium or large (120 or 150 hours)&lt;br>
&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:philip.kufeldt@seagate.com">Philip Kufeldt (Seagate)&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/aldrin-montana/">Aldrin Montana&lt;/a> (UC Santa Cruz)
&lt;strong>Contributor(s):&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/manank-patel/">Manank Patel&lt;/a>&lt;/p>
&lt;p>Seagate has been using a network-based KV HDD as a research vehicle for computational storage. This research vehicle uses open-source user library that implements a KV API by sending network protobuf-based RPCs to a network KV store. Currently it is implemented with the standard socket and POSIX APIs to communicate with the KV backend. This project would implement an &lt;code>io_uring&lt;/code> communication backend and compare the results of both implementations.&lt;/p></description></item></channel></rss>