<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Maharani Ayu Putri Irawan | UCSC OSPO</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/maharani-ayu-putri-irawan/</link><atom:link href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/maharani-ayu-putri-irawan/index.xml" rel="self" type="application/rss+xml"/><description>Maharani Ayu Putri Irawan</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><image><url>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/maharani-ayu-putri-irawan/avatar_hu65e7eba29424060006a3de81080abe58_373851_270x270_fill_q75_lanczos_center.jpg</url><title>Maharani Ayu Putri Irawan</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/maharani-ayu-putri-irawan/</link></image><item><title>[Midterm] FlashNet: Towards Reproducible Continual Learning for Storage System</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uchicago/flashnet/20230807-rannnayy/</link><pubDate>Wed, 02 Aug 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uchicago/flashnet/20230807-rannnayy/</guid><description>&lt;h2 id="mid-term-report">Mid-Term Report&lt;/h2>
&lt;p>As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/uchicago/flashnet">FlashNet&lt;/a> my &lt;a href="https://drive.google.com/file/d/1EhJm3kqrpybOkpXiiRMfqVxGeKe9iIsh/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/haryadi-s.-gunawi/">Haryadi S. Gunawi&lt;/a> and &lt;strong>Daniar Kurniawan&lt;/strong> aims to implement and optimize the FlashNet model in real-world storage systems using continual learning techniques. We focus on predicting I/Os latency to decide whether or not the I/O should be failovered to other SSD. The following sections elaborates the work description, major milestones achieved, accomplishments, and challenges during the first half of summer.&lt;/p>
&lt;h2 id="work-description-major-milestones-achieved-and-accomplishments">Work Description, Major Milestones Achieved, and Accomplishments&lt;/h2>
&lt;p>For the first half of the summer, I implemented continual learning pipeline of the model and several drift detection algorithms. After that, I evaluated the effectiveness. Below are the detailed description for each subtask.&lt;/p>
&lt;h3 id="1-continual-learning-pipeline">1. Continual Learning pipeline&lt;/h3>
&lt;p>Firstly, I designed the pipeline. As shown on the graph below, the pipeline contains 4 main modules, namely initial train, retrain, inference, and monitor.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Pipeline Flowchart" srcset="
/report/osre23/uchicago/flashnet/20230807-rannnayy/cl-pipeline_hubf4c27ce042fa200bb9ef46ed6f9b5dd_194399_2067e763ad30087275106bc5b2921a5a.webp 400w,
/report/osre23/uchicago/flashnet/20230807-rannnayy/cl-pipeline_hubf4c27ce042fa200bb9ef46ed6f9b5dd_194399_fcd6d4a25c164fcfc872329662c36fa5.webp 760w,
/report/osre23/uchicago/flashnet/20230807-rannnayy/cl-pipeline_hubf4c27ce042fa200bb9ef46ed6f9b5dd_194399_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uchicago/flashnet/20230807-rannnayy/cl-pipeline_hubf4c27ce042fa200bb9ef46ed6f9b5dd_194399_2067e763ad30087275106bc5b2921a5a.webp"
width="760"
height="249"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>The modules were first developed in Python using linear regression model. Turns out, linear regression model is not good enough that it gave bad accuracy. To overcome this problem, I introduced more models and learning task.&lt;/p>
&lt;p>Hence, in the final implementation, we have random forest and neural networks model for both regression and classification task. Aforementioned models outperforms linear regression. The pipeline is also already optimized.&lt;/p>
&lt;h3 id="2-drift-detection-algorithms">2. Drift detection algorithms&lt;/h3>
&lt;p>Sometimes, the built model&amp;rsquo;s performance may degrade when facing recent I/Os having different characteristics than what it was trained upon. Hence, there should be a retrain process. Retrain should be triggered. The trigger could be as simple as periodically, or using technique called drift detection. While retraining too often might cause big overhead for computation, retraining too seldom might also cause performance degradation. Hence, we should build a good and reliable drift detection algorithm that can sense the presence of concept and covariate drift in recent data.&lt;/p>
&lt;p>In order to build a good algorithm, I used heuristics derivated from the understanding about latency and throughput change over time. However, the result turns out not really good. Thus, I&amp;rsquo;ve been relying on using statistical tests as the drift detector. By far, Kalmogorov-Smirnov Test&amp;ndash;commonly known as ks-test&amp;ndash;is the best drift detector.&lt;/p>
&lt;h3 id="3-evaluation">3. Evaluation&lt;/h3>
&lt;p>The featured image in the headline of this blog, also shown below, is the result of the evaluation. I evaluated the models and drift detection algorithms using Cumulative Distribution Function (CDF) graph, to see if any tail cut is made.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Evaluation" srcset="
/report/osre23/uchicago/flashnet/20230807-rannnayy/featured_hua13ad1b86612ea35a1f0d083114566fc_25432_4866e846612d96725d801519edf06392.webp 400w,
/report/osre23/uchicago/flashnet/20230807-rannnayy/featured_hua13ad1b86612ea35a1f0d083114566fc_25432_9203cd36fc4c6de03e02a799cd564f1d.webp 760w,
/report/osre23/uchicago/flashnet/20230807-rannnayy/featured_hua13ad1b86612ea35a1f0d083114566fc_25432_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uchicago/flashnet/20230807-rannnayy/featured_hua13ad1b86612ea35a1f0d083114566fc_25432_4866e846612d96725d801519edf06392.webp"
width="760"
height="396"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="challenges">Challenges&lt;/h2>
&lt;p>During the implementation, I encountered several challenges as follows,&lt;/p>
&lt;h3 id="1-choice-of-model">1. Choice of Model&lt;/h3>
&lt;p>Since we want to integrate the pipeline to real storage systems, we had to be mindful of model choice. Machine learning based models are lighter than deep learning based models. However, deep learning based models offer higher accuracy, thus more preferable. Hence, I implemented both and examine the effectivity of the models.&lt;/p>
&lt;h3 id="2-choice-of-drift-detection-algorithm">2. Choice of Drift Detection Algorithm&lt;/h3>
&lt;p>Continual learning technique is chosen for this task may require the model to be retrained since the workload may change over time. However, the implication is we need to have a condition that triggers the retraining to be done. As training model is costly, we need to retrain it mindfully. Thus, we use drift detection algorithm to detect whether or not retraining is needed.&lt;/p>
&lt;p>There are two types of drift detection algorithms, namely statistical based test and model based drift detection. For minimizing overhead reason, we pick statistical tests. There exists various algorithms of choice. I picked 5 of them to be implemented and evaluated.&lt;/p>
&lt;h2 id="plan">Plan&lt;/h2>
&lt;p>For the second half of the summer, I am going to study Riak and create Chameleon Trovi artifact for deploying Riak in a cluster.&lt;/p></description></item><item><title>FlashNet: Towards Reproducible Continual Learning for Storage System</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uchicago/flashnet/20230604-rannnayy/</link><pubDate>Sun, 04 Jun 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uchicago/flashnet/20230604-rannnayy/</guid><description>&lt;p>Hello! I&amp;rsquo;m Rani, a third year undergraduate student at Institut Teknologi Bandung majoring at Informatics. As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/uchicago/flashnet">FlashNet&lt;/a> my &lt;a href="https://drive.google.com/file/d/1EhJm3kqrpybOkpXiiRMfqVxGeKe9iIsh/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/haryadi-s.-gunawi/">Haryadi S. Gunawi&lt;/a> and &lt;strong>Daniar Kurniawan&lt;/strong> aims to implement and optimize the FlashNet model in real-world storage systems using continual learning techniques.&lt;/p>
&lt;p>In real world workloads, it is known that the I/O stream changes and varies. Hence, the performance of I/O read/write could vary and introduce the tail latency. We would like to predict the latency of I/O read to cut the tail and improve the system&amp;rsquo;s performance. This project focuses on improving the FlashNet pipeline and introducing adaptability to the machine learning models built.&lt;/p>
&lt;p>During the summer, we planned to implement the continual learning pipeline using machine learning models we have built previously in the project. Of course, continual learning isn&amp;rsquo;t a continual learning without the ability of self-motivated retraining. Thus, we will implement several drift detection algorithms, evaluate, and test them. Besides, we will also build a visualization platform to evaluate and monitor the performance of the models built. Lastly, we planned to create Chameleon Trovi artifacts to demonstrate our experiments and make these implementations available and reproducible to the public.&lt;/p></description></item></channel></rss>