<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>reproducibility | UCSC OSPO</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/tag/reproducibility/</link><atom:link href="https://deploy-preview-1007--ucsc-ospo.netlify.app/tag/reproducibility/index.xml" rel="self" type="application/rss+xml"/><description>reproducibility</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><lastBuildDate>Sat, 31 Jan 2026 00:00:00 +0000</lastBuildDate><image><url>https://deploy-preview-1007--ucsc-ospo.netlify.app/media/logo_hub6795c39d7c5d58c9535d13299c9651f_74810_300x300_fit_lanczos_3.png</url><title>reproducibility</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/tag/reproducibility/</link></image><item><title>Reconfigurable and Placement-Aware Replication for Edge Systems</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/umass/edge-replication/</link><pubDate>Sat, 31 Jan 2026 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/umass/edge-replication/</guid><description>&lt;h2 id="project-description">Project Description&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: Distributed systems&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Rust, Java, Go, Python, Bash scripting, Linux, Docker.&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Hard&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors&lt;/strong>: &lt;a href="mailto:fikurnia@cs.umass.edu">Fadhil I. Kurnia&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Modern replicated systems are typically evaluated under static configurations with fixed replica placement. However, real-world edge deployments are highly dynamic: workloads shift geographically, edge nodes join or fail, and latency conditions change over time. Our existing testbed provides reproducible evaluation for replicated systems but lacks support for dynamic reconfiguration and adaptive edge placement policies.&lt;/p>
&lt;p>This project extends the existing open testbed to support:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Dynamic Replica Reconfiguration&lt;/p>
&lt;ul>
&lt;li>Membership changes (add/remove replicas)&lt;/li>
&lt;li>Leader migration and shard movement&lt;/li>
&lt;li>Online reconfiguration cost measurement (latency spikes, recovery overhead, state transfer cost)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Edge-Aware Placement Policies&lt;/p>
&lt;ul>
&lt;li>Demand-aware placement based on geographic workload skew&lt;/li>
&lt;li>Latency-aware and bandwidth-aware replica selection&lt;/li>
&lt;li>Comparison of static vs. adaptive placement strategies&lt;/li>
&lt;li>Evaluation under real-world latency matrices (e.g., US metro-level or cloud region traces)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>What-if Simulation Framework&lt;/p>
&lt;ul>
&lt;li>Replay workload traces with time-varying demand&lt;/li>
&lt;li>Simulate hundreds of edge sites with realistic network conditions&lt;/li>
&lt;li>Quantify trade-offs between consistency, availability, reconfiguration overhead, and cost&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;p>The outcome will be an &lt;a href="https://distrobench.org" target="_blank" rel="noopener">open-source framework&lt;/a> that enables researchers to evaluate not only steady-state replication performance, but also how systems behave under churn, scaling events, and demand shifts. They are central challenges in real edge environments.&lt;/p>
&lt;h3 id="expected-deliverables">Expected Deliverables&lt;/h3>
&lt;ul>
&lt;li>Reconfiguration abstraction layer (API for membership &amp;amp; placement changes)&lt;/li>
&lt;li>Placement policy plugin framework (k-means, facility-location heuristics, latency-minimizing, cost-aware)&lt;/li>
&lt;li>Trace-driven dynamic workload engine&lt;/li>
&lt;li>Public benchmark scenarios and reproducible experiment scripts&lt;/li>
&lt;li>Artifact-ready documentation and evaluation report&lt;/li>
&lt;/ul></description></item><item><title>Reproducible CXL Emulation</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/ucmerced/cxl_emu/</link><pubDate>Fri, 30 Jan 2026 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/ucmerced/cxl_emu/</guid><description>&lt;p>Compute Express Link (CXL) is an emerging memory interconnect standard that enables shared, coherent memory across CPUs, accelerators, and multiple hosts, unlocking new possibilities in hyperscale, HPC, and disaggregated systems. However, because access to real multi-host CXL hardware is limited, it is difficult for researchers and students to experiment with, evaluate, and reproduce results on advanced CXL topologies.
OCEAN (Open-source CXL Emulation At Hyperscale) [https://github.com/cxl-emu/OCEAN] is a full-stack CXL emulation platform built on QEMU that enables detailed emulation of CXL 3.0 memory systems, including multi-host shared memory pools, coherent fabric topologies, and latency modeling. This project will create reproducible experiment pipelines, automated deployment workflows, and user-friendly tutorials so that others can reliably run and extend CXL emulation experiments without requiring specialized hardware.&lt;/p>
&lt;h3 id="reproducible-cxl-emulation-for-multi-host-memory-systems">Reproducible CXL Emulation for Multi-Host Memory Systems&lt;/h3>
&lt;p>Streamline multi-host CXL emulation without specialized hardware.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>CXL emulation&lt;/code> &lt;code>Memory Systems&lt;/code> &lt;code>Reproducibility&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> C/C++, Virtualization (QEMU), Scripting, Performance Modeling&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:mrafi@ucmerced.edu">Mujahid Al Rafi&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/luanzheng-lenny-guo/">Luanzheng &amp;quot;Lenny&amp;quot; Guo&lt;/a>.&lt;/li>
&lt;/ul>
&lt;p>Tasks:&lt;/p>
&lt;ul>
&lt;li>Create automated deployment scripts and configuration templates for OCEAN-based CXL emulation topologies (single-host and multi-host).&lt;/li>
&lt;li>Develop a standardized experiment harness for running memory performance benchmarks (e.g., OSU micro-benchmarks, STREAM-style tests) in emulated CXL environments.&lt;/li>
&lt;li>Build reproducible experiment pipelines that others can run to evaluate latency, bandwidth, and scaling properties of CXL memory systems.&lt;/li>
&lt;li>Produce tutorials, documentation, and reproducibility artifacts to guide new users through setup, execution, and analysis.&lt;/li>
&lt;li>Package and contribute all scripts, configurations, and documentation back to the OCEAN open-source repository.&lt;/li>
&lt;/ul>
&lt;h3 id="exploring-security-and-isolation-in-cxl-based-memory-systems">Exploring Security and Isolation in CXL-Based Memory Systems&lt;/h3>
&lt;p>Investigate security and isolation properties of CXL-based memory systems using software emulation.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>CXL Systems&lt;/code> &lt;code>Security&lt;/code> &lt;code>Memory Isolation&lt;/code> &lt;code>Side Channel&lt;/code> &lt;code>Emulation&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> C/C++, Virtualization (QEMU), Scripting, Computer Architecture, Security&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:mrafi@ucmerced.edu">Mujahid Al Rafi&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/luanzheng-lenny-guo/">Luanzheng &amp;quot;Lenny&amp;quot; Guo&lt;/a>.&lt;/li>
&lt;/ul>
&lt;p>Tasks:&lt;/p>
&lt;ul>
&lt;li>Study the CXL memory model and fabric architecture to identify potential security and isolation risks in multi-host shared memory environments (e.g., contention, timing variation, and resource interference).&lt;/li>
&lt;li>Set up multi-host or multi-VM CXL emulation environments using OCEAN that mimic realistic multi-tenant deployments.&lt;/li>
&lt;li>Design and implement reproducible micro-benchmarks to measure timing, bandwidth contention, or observable interference through shared CXL memory pools.&lt;/li>
&lt;li>Analyze how fabric configuration choices (e.g., topology, latency injection, memory partitioning, or allocation policies) affect isolation and leakage behavior.&lt;/li>
&lt;li>Explore and prototype mitigation strategies—such as memory partitioning, throttling, or policy-driven allocation—and evaluate their effectiveness using the emulation platform.&lt;/li>
&lt;/ul></description></item><item><title>StatWrap</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/northwestern/statwrap/</link><pubDate>Thu, 29 Jan 2026 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/northwestern/statwrap/</guid><description>&lt;p>&lt;a href="https://sites.northwestern.edu/statwrap/" target="_blank" rel="noopener">StatWrap&lt;/a> is a free and open-source assistive, non-invasive discovery and inventory tool to document research projects. It inventories project assets (e.g., code files, data files, manuscripts, documentation) and organizes information without additional input from the user. It also provides structure for users to add searchable and filterable notes connected to files to help communicate metadata about intent and analysis steps.&lt;/p>
&lt;p>At its core, StatWrap helps investigators identify and track changes in a research project as it evolves - which may affect reproducibility. For example: (1) people on the project can change over time, so processes may not be consistently executed due to transitions in employment; (2) data changes over time, due to accruing additional cases, adding new variables, or correcting mistakes in existing data; (3) software (e.g. used for data preparation and statistical analysis) evolves as it is edited, improved, and optimized; and (4) software can break or produce different results due to changes &amp;lsquo;under the hood&amp;rsquo; such as updates to statistical packages, compilers, or interpreters. StatWrap passively and actively documents these changes to support reproducibility.&lt;/p>
&lt;p>Additional information:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://sites.northwestern.edu/statwrap/" target="_blank" rel="noopener">StatWrap home&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/stattag/statwrap" target="_blank" rel="noopener">StatWrap code (GitHub)&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="group-and-individual-customizations">Group and Individual Customizations&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>configuration&lt;/code>, &lt;code>user interface&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: JavaScript, React&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/luke-rasmussen/">Luke Rasmussen&lt;/a>, &lt;a href="mailto:ewhitley@northwestern.edu">Eric Whitley&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The goal of this project is to expand the existing capabilities of StatWrap to provide more flexibility to individual users and groups. Currently, features within StatWrap such as the directory template for creating new projects and the reproducibility checklist are static, meaning everyone who downloads StatWrap has the same configuration. However, each user and team work differently and should be able to configure StatWrap to support their needs.&lt;/p>
&lt;p>When a user creates a new project, StatWrap provides a collection of project templates. These create a directory hierarchy, along with some seed files (e.g., a README.md file in the project root). Different groups have their own conventions for creating project directories. While StatWrap can be released with additional project templates defined, there are many situations in which users would want to keep their project template local. StatWrap should allow a user to create a project template configuration, from scratch or being seeded by the contents of an existing project. A user should then be able to export this configuration, share it with others, and other user should have the ability to import the configuration into their instance of StatWrap.&lt;/p>
&lt;p>Similarly, StatWrap provides a reproducibility checklist that includes six existing checklist items. However, individual users and groups may have their own checklists, including institution-specific steps. Similar to the project template, a user should be able to configure additional items for the checklist. A user should be able to create a &amp;ldquo;checklist template&amp;rdquo; that can be used and applied in multiple projects. A specific project&amp;rsquo;s template should also be modifiable once the checklist has been created.&lt;/p>
&lt;p>The specific tasks of the project include:&lt;/p>
&lt;ul>
&lt;li>Developing a configuration scheme for New Project templates&lt;/li>
&lt;li>Provide a way for a user to import/export a template for New Projects&lt;/li>
&lt;li>Develop a configuration scheme for Reproducibility Checklist questions&lt;/li>
&lt;li>Provide a way for a user to import/export a template for the Reproducibility Checklist&lt;/li>
&lt;li>Develop a configuration scheme for asset (file) attributes&lt;/li>
&lt;li>Develop unit tests and conduct system testing&lt;/li>
&lt;/ul></description></item><item><title>Final Report for Smart Environments</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/smart_environments/20251105-sam_huang/</link><pubDate>Wed, 05 Nov 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/smart_environments/20251105-sam_huang/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>The process of creating the necessary software environment for code to run is a significant challenge in software development. Given a piece of open-source software intended for research, setting up the environmental dependencies to run the software could take significant manual effort. Existing automation methods struggle due to the complexity of managing diverse languages, dependencies, and hardware. In Smart Environments, I have created ENVAGENT, a general multi-agent framework designed to automate the construction of executable environments for reproducing research prototypes from top-tier conferences and journals. While reproducibility has become a growing concern in the research community, the process of setting up environments remains time-consuming, error-prone, and often poorly documented.&lt;/p>
&lt;p>To assess this capability, a new benchmark, ENVBENCH, was created, containing 54 popular projects across seven languages. Results show ENVAGENT dramatically improves environment construction compared to current agents (+16.2%). Furthermore, the system shows initial promise in dynamically adjusting cloud-based hardware resources based on the code’s needs.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="EnvGym Cover" srcset="
/report/osre25/uchicago/smart_environments/20251105-sam_huang/cover_hue02fdf353b4e99cf1af213026c4f6804_1815797_30e3b2194be140fa608780847e6c7fa1.webp 400w,
/report/osre25/uchicago/smart_environments/20251105-sam_huang/cover_hue02fdf353b4e99cf1af213026c4f6804_1815797_d39b2369b5df80ffa715197c993f0681.webp 760w,
/report/osre25/uchicago/smart_environments/20251105-sam_huang/cover_hue02fdf353b4e99cf1af213026c4f6804_1815797_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/smart_environments/20251105-sam_huang/cover_hue02fdf353b4e99cf1af213026c4f6804_1815797_30e3b2194be140fa608780847e6c7fa1.webp"
width="760"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="method">Method&lt;/h2>
&lt;h3 id="envagent">EnvAgent&lt;/h3>
&lt;p>The EnvAgent I created during my time at OSRE utilizes a multi-agent workflow to automatically build software execution environments. The process is structured into three phases: preparation, construction, and refinement.&lt;/p>
&lt;p>Phase 1 (Preparation): Specialized agents collect information about the software repository – its structure, relevant files, and the host system’s hardware specifications (CPU, memory, etc.). This data is then used by a planning agent to generate a detailed, step-by-step instruction set for creating a functional Dockerfile.&lt;/p>
&lt;p>Phase 2 (Construction): Two agents work in tandem: one generates or modifies the Dockerfile based on the plan, while the other executes the Dockerfile within an isolated container, capturing any errors.&lt;/p>
&lt;p>Phase 3 (Refinement): A final agent analyzes the container execution data, identifying areas for improvement in the Dockerfile. This process repeats until a stable, executable environment is achieved.&lt;/p>
&lt;p>To improve efficiency, EnvAgent incorporates rule-based tools for predictable tasks like directory setup and log management, reducing the need for complex agent reasoning. This combination of intelligent agents and automated routines (&amp;ldquo;scaffolding&amp;rdquo;) ensures a robust and adaptive system.&lt;/p>
&lt;h3 id="enveval-benchmark">EnvEval Benchmark&lt;/h3>
&lt;p>In addition to the agent, one significant contribution is the manual curation of a benchmark that measures the quality of generated environments. EnvEval is a benchmark specifically designed to assess environment setup qualities across 54 carefully curated open-source repositories. They are chosen from both Chameleon reproducible artifacts and Multi-SWE-bench dataset. EnvEval contains json rubrics that can be used to automatically determine the quality of constructed environments.&lt;/p>
&lt;p>Each rubric is divided into three parts, corresponding to three major objectives that a successfully constructed environment should have:&lt;/p>
&lt;ol>
&lt;li>Structure: Checks for basic directory structure, file presence, and environment variables.&lt;/li>
&lt;li>Configuration: Asks the question &amp;ldquo;Is this configured?&amp;rdquo;, checks for whether dependencies have been correctly configured.&lt;/li>
&lt;li>Functionality: Asks the question &amp;ldquo;Is this usable?&amp;rdquo;, runs actual tests to see if the functionalities are present.&lt;/li>
&lt;/ol>
&lt;p>There are many tests in each category, and their weights are adjusted based on their importance.&lt;/p>
&lt;h2 id="evaluation">Evaluation&lt;/h2>
&lt;p>Baseline Systems:&lt;/p>
&lt;p>The study compared EnvAgent to two established automated code generation systems: one utilizing Anthropic’s advanced reasoning models and the other employing OpenAI’s code-focused models. These systems were chosen for their strong performance in creating software code and their prevalence in automated engineering processes. Both baselines were given full access to the target software repositories and complete details about the host system’s hardware.&lt;/p>
&lt;p>Evaluation Metrics:&lt;/p>
&lt;p>The performance of EnvAgent was assessed using three key metrics. These included the ability to create working environments, the quality of those environments, and a single combined score. Results showed EnvAgent significantly outperformed the baselines, achieving a 33.91% improvement in the final overall score – reaching 74.01, which was higher than the best baseline score of 30.10. This suggests EnvAgent produced both more functional environments and ensured greater accuracy through extensive testing.&lt;/p>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>The process of creating the necessary software environments for code agents is a major hurdle in scaling up research and development. Currently, this task relies heavily on manual labor. To address this, a new system, ENVAGENT, was created to automatically build these environments using intelligent agents and by understanding dependencies. A new benchmark, ENVBENCH, was also developed to assess this system’s effectiveness. Preliminary results demonstrate a significant improvement – ENVAGENT achieved a 33.91% increase in success rates compared to existing automated agents, representing a substantial step towards more efficient and reproducible research.&lt;/p>
&lt;h1 id="thank-you">Thank you!&lt;/h1>
&lt;p>Autofill&lt;/p>
&lt;p>;
20251105-Sam_Huang&lt;/p></description></item><item><title>Final Report : Streamlining Reproducible Machine Learning Research with Automated MLOps Workflows</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/nyu/mlops/09182025-alghali/</link><pubDate>Thu, 18 Sep 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/nyu/mlops/09182025-alghali/</guid><description>&lt;h1 id="final-report-applying-mlops-to-overcome-reproducibility-barriers-in-ml">Final Report: Applying MLOps to Overcome Reproducibility Barriers in ML&lt;/h1>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Generating project" srcset="
/report/osre25/nyu/mlops/09182025-alghali/image1_hu9510d428e5a70e6f0fb80fd1f824e093_949203_8793561656181f829e3597ae957831b0.webp 400w,
/report/osre25/nyu/mlops/09182025-alghali/image1_hu9510d428e5a70e6f0fb80fd1f824e093_949203_c1f605866d28e52418a2120d1e90b899.webp 760w,
/report/osre25/nyu/mlops/09182025-alghali/image1_hu9510d428e5a70e6f0fb80fd1f824e093_949203_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/nyu/mlops/09182025-alghali/image1_hu9510d428e5a70e6f0fb80fd1f824e093_949203_8793561656181f829e3597ae957831b0.webp"
width="760"
height="447"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="background">Background&lt;/h2>
&lt;p>Hello! I’m Ahmed Alghali, and this is my final report the project &lt;a href="https://ucsc-ospo.github.io/project/osre25/nyu/mlops/" target="_blank" rel="noopener">&lt;strong>Applying MLOps to Overcome Reproducibility Barriers in ML&lt;/strong>&lt;/a> under the mentorship of Professor &lt;a href="https://ucsc-ospo.github.io/author/fraida-fund/" target="_blank" rel="noopener">Fraida Fund&lt;/a> and &lt;a href="https://ucsc-ospo.github.io/author/mohamed-saeed/" target="_blank" rel="noopener">Mohamed Saeed&lt;/a>.&lt;/p>
&lt;p>This project aims to address the &lt;strong>reproducibility problem&lt;/strong> in machine learning—both in core ML research and in applications to other areas of science.&lt;/p>
&lt;p>The focus is on making large-scale ML experiments &lt;strong>reproducible on &lt;a href="https://www.chameleoncloud.org/" target="_blank" rel="noopener">Chameleon Cloud&lt;/a>&lt;/strong>. To do this; we developed &lt;a href="https://github.com/A7med7x7/ReproGen" target="_blank" rel="noopener">&lt;strong>ReproGen&lt;/strong>&lt;/a>, a template generator that produces ready-to-use, reproducible ML training workflows. The goal: is to make the cloud easy for researchers setting up experiments without the worry about the complexity involved in stitching everything together.&lt;/p>
&lt;hr>
&lt;h2 id="progress-since-mid-report">Progress Since Mid-Report&lt;/h2>
&lt;h3 id="migration-from-cookiecutter-to-copier">Migration from Cookiecutter to Copier&lt;/h3>
&lt;p>we initially used &lt;a href="https://www.cookiecutter.io/" target="_blank" rel="noopener">Cookiecutter&lt;/a> for template generation as a templating engine, but it lacked features we were interested in (e.g., conditional questions). we switched to &lt;a href="https://copier.readthedocs.io/en/stable/" target="_blank" rel="noopener">Copier&lt;/a>, which provides more flexibility and better matches our use case.&lt;/p>
&lt;h3 id="support-for-multiple-setup-modes">Support for Multiple Setup Modes&lt;/h3>
&lt;p>We now offer &lt;strong>two setup modes&lt;/strong>, designed to serve both beginners and users who want advanced options/customization:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Basic Mode&lt;/strong> – minimal prompts (project name, repository link, framework).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Advanced Mode&lt;/strong> – detailed control (compute site, GPU type, CUDA version, storage site, etc.).&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>this ensures accessibility for new users, while still enabling fine-grained control for users.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="prompting" srcset="
/report/osre25/nyu/mlops/09182025-alghali/image2_hu192805bf2f3285f5d80677238d9527e7_1255822_c0169673360dadfbcd30a72263676479.webp 400w,
/report/osre25/nyu/mlops/09182025-alghali/image2_hu192805bf2f3285f5d80677238d9527e7_1255822_416b0bbcc859df3cd794d760ce0308c8.webp 760w,
/report/osre25/nyu/mlops/09182025-alghali/image2_hu192805bf2f3285f5d80677238d9527e7_1255822_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/nyu/mlops/09182025-alghali/image2_hu192805bf2f3285f5d80677238d9527e7_1255822_c0169673360dadfbcd30a72263676479.webp"
width="760"
height="448"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h3 id="automated-credential-generation">Automated Credential Generation&lt;/h3>
&lt;p>previously, users had to manually generate application credentials (via Horizon OpenStack UI). now, we provide scripts that can generate two types of credentials programmatically—&lt;strong>Swift&lt;/strong> and &lt;strong>EC2&lt;/strong>—using &lt;strong>Chameleon JupyterHub credentials&lt;/strong> with &lt;code>python-chi&lt;/code> and the &lt;code>openstack-sdk&lt;/code> client.&lt;/p>
&lt;h3 id="automatic-readmemd-generation">Automatic README.md Generation&lt;/h3>
&lt;p>each generated project includes a &lt;strong>customized README.md&lt;/strong>, containing setup guidance and commands tailored to the user’s configuration.&lt;/p>
&lt;h3 id="bug-fixes-and-ux-enhancements">Bug Fixes and UX Enhancements&lt;/h3>
&lt;p>Alongside major features, we implemented numerous smaller changes and fixes to improve the reliability and user experience of the tool.&lt;/p>
&lt;hr>
&lt;h2 id="deliverables">Deliverables&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="https://github.com/A7med7x7/ReproGen" target="_blank" rel="noopener">&lt;strong>ReproGen GitHub Repository&lt;/strong>&lt;/a>: source code for the template generator.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://github.com/A7med7x7/ReproGen/tree/mlflow-replay" target="_blank" rel="noopener">&lt;strong>mlflow-replay branch&lt;/strong>&lt;/a>: explore a past experiment, artifacts, and logged insights.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://github.com/A7med7x7/ReproGen/tree/training-demo" target="_blank" rel="noopener">&lt;strong>LLM-Demo branch&lt;/strong>&lt;/a>: hands-on demo to track fine-tuning of an LLM using infrastructure generated by ReproGen.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Compatibility Matrix&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>the tool and the generated setup both depend on software dependencies that required paying attention to compatibility. in all level Hardware, OS, Drivers, Computing Platforms, core and 3rd-party libraries. writing a documentation as a start to help future debugging and adding pieces without breaking what is there. .&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Maintain Docker Images&lt;/strong>&lt;/p>
&lt;p>so far we have a cpu and GPU docker images for multiple most frequently used framework.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>CPU based image&lt;/strong>: for data science workload (Scikit-Learn)&lt;/li>
&lt;li>&lt;strong>GPU-Nvidia Variant&lt;/strong>: for Deep Learning workload on Nvidia Machines (Pytorch, Lightning, TensorFlow)&lt;/li>
&lt;li>&lt;strong>GPU-AMD Variant&lt;/strong>: for Deep Learning workload on AMD Machines (Pytorch, Lightning, TensorFlow)
adding more variants for more frameworks + Enhancing the experience of the existing images is recommended.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h2 id="reflection">Reflection&lt;/h2>
&lt;p>When I first joined SoR 2025, I had a problem crystallizing the idea of how I can practically achieve reproducibility and package a tool that would maximizes the chance of reproducing experiment build using it. throughout the journey my mentors took me under their wings and helped me to understand the &lt;strong>reproducibility challenges in ML&lt;/strong>, my Mentor Professor &lt;a href="https://ucsc-ospo.github.io/author/fraida-fund/" target="_blank" rel="noopener">Fraida Fund&lt;/a> wrote materials that saved me a lot of time to familiarize my self with the &lt;a href="chameleoncloud.org">testbed&lt;/a>,important Linux tools and commands, and even getting to have hand on practice how &lt;a href="https://teaching-on-testbeds.github.io/mltrain-chi/" target="_blank" rel="noopener">large model training&lt;/a> happen with MLflow tracking server system is done in the cloud. and &lt;a href="https://ucsc-ospo.github.io/author/mohamed-saeed/" target="_blank" rel="noopener">Mohamed Saeed&lt;/a>. who took the time reviewing my presentation pushing me to do my best. I&amp;rsquo;m forever thankful in the way they shaped the project and my personal growth. this hands-on experience help me viewing &lt;strong>MLOps , cloud APIs, and workflow design&lt;/strong> in different lenses, and I’m proud to have contributed a tool that can simplify help reproducible research for others.&lt;/p></description></item><item><title>Final Report: A Systematic Investigation into the Reproducibility of RAG Systems</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/pnnl/llm_rag_reproducibility/20250905-wbq321/</link><pubDate>Fri, 05 Sep 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/pnnl/llm_rag_reproducibility/20250905-wbq321/</guid><description>&lt;p>I&amp;rsquo;m Baiqiang, and this is the final report for the &lt;a href="https://ucsc-ospo.github.io/project/osre25/pnnl/llm_rag_reproducibility/" target="_blank" rel="noopener">Enhancing Reproducibility in RAG Frameworks for Scientific Workflows&lt;/a> project, mentored by Luanzheng &amp;ldquo;Lenny&amp;rdquo; Guo and Dongfang Zhao. This project successfully developed a novel framework to quantitatively measure reproducibility in AI systems, yielding several surprising and impactful results.&lt;/p>
&lt;h3 id="the-challenge-the-need-for-systematic-measurement">The Challenge: The Need for Systematic Measurement&lt;/h3>
&lt;p>Retrieval-Augmented Generation (RAG) is a cornerstone of AI for science, but its reliability is often compromised by non-determinism. While this issue was a known concern, a fundamental challenge was the lack of standardized tools and methodologies to systematically measure and quantify the sources of this inconsistency. Without a rigorous way to analyze the problem, it was difficult to move beyond ad-hoc tests and establish the true root causes, hindering the development of truly trustworthy AI systems for science.&lt;/p>
&lt;h3 id="our-contribution-the-reprorag-framework">Our Contribution: The ReproRAG Framework&lt;/h3>
&lt;p>To address this gap, the central contribution of this project is &lt;strong>ReproRAG&lt;/strong>, a comprehensive, open-source benchmarking framework. ReproRAG is designed to systematically investigate sources of uncertainty across the entire RAG pipeline by:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Isolating Variables:&lt;/strong> It allows for controlled experiments on embedding models, numerical precision, retrieval algorithms, hardware configurations (CPU/GPU), and distributed execution environments.&lt;/li>
&lt;li>&lt;strong>Quantifying Uncertainty:&lt;/strong> It employs a suite of metrics—including Exact Match Rate, Jaccard Similarity, and Kendall&amp;rsquo;s Tau—to precisely measure the impact of each variable on the final retrieved results.&lt;/li>
&lt;/ul>
&lt;h3 id="key-findings-a-new-hierarchy-of-uncertainty">Key Findings: A New Hierarchy of Uncertainty&lt;/h3>
&lt;p>Our large-scale empirical study using ReproRAG challenged common assumptions and established a clear hierarchy of what actually impacts reproducibility.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Core Algorithms Are Not the Problem:&lt;/strong> Our most surprising finding is that modern retrieval libraries like FAISS are perfectly reproducible out-of-the-box. Across all tested index types (including approximate ones like HNSW and IVF) and execution environments (single-node CPU/GPU and multi-node distributed systems), we achieved perfect run-to-run reproducibility (1.000 scores on all metrics) when environmental factors like random seeds were controlled. This falsifies the common hypothesis that approximate nearest neighbor algorithms are a primary source of randomness.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Embedding Model Choice is a Dominant Source of Variation:&lt;/strong> We found that the choice of the embedding model is a dominant factor driving result variation. When comparing outputs from different state-of-the-art models (BGE, E5, Qwen) for the same query, the agreement was very low (e.g., Overlap Coefficient of ~0.43-0.54). This means a scientific conclusion drawn with one model may not be reproducible with another, as they are fundamentally &amp;ldquo;seeing&amp;rdquo; different evidence.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Environmental Factors Introduce Measurable &amp;ldquo;Drift&amp;rdquo;:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Numerical Precision:&lt;/strong> Changing floating-point precision (e.g., FP32 vs. FP16) was a guaranteed source of variation, but it caused a small and quantifiable &amp;ldquo;embedding drift&amp;rdquo; rather than chaotic changes.&lt;/li>
&lt;li>&lt;strong>Data Insertion:&lt;/strong> Incrementally adding new data to an index caused a predictable &amp;ldquo;displacement&amp;rdquo; of old results, not a re-shuffling. The relative ranking of the remaining original documents was perfectly stable (Kendall&amp;rsquo;s Tau of 1.000).&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Common Determinism Flags Can Be Ineffective:&lt;/strong> Our tests showed that popular software-level controls, like &lt;code>cudnn.deterministic&lt;/code> flags in PyTorch, had no observable effect on the output of modern transformer-based embedding models. This underscores the necessity of empirical validation over assuming that framework settings work as advertised.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h3 id="conclusion">Conclusion&lt;/h3>
&lt;p>This project successfully shifted the focus of the RAG reproducibility problem. The key challenge is not to fix supposedly &amp;ldquo;random&amp;rdquo; algorithms, but to rigorously control the entire experimental environment. We delivered &lt;strong>ReproRAG&lt;/strong>, a framework that empowers researchers to do just that. Our findings provide actionable insights for the community: efforts to improve reproducibility should focus less on the retrieval algorithms themselves and more on disciplined management of embedding models, data versioning, and numerical precision.&lt;/p></description></item><item><title>Final Report: MPI Appliance for HPC Research on Chameleon</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/mpi/20250901-rohan-babbar/</link><pubDate>Mon, 01 Sep 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/mpi/20250901-rohan-babbar/</guid><description>&lt;p>Hi Everyone, This is my final report for the project I completed during my summer as a &lt;a href="https://ucsc-ospo.github.io/sor/" target="_blank" rel="noopener">Summer of Reproducibility (SOR)&lt;/a> student.
The project, titled &amp;ldquo;&lt;a href="https://ucsc-ospo.github.io/project/osre25/uchicago/mpi/" target="_blank" rel="noopener">MPI Appliance for HPC Research in Chameleon&lt;/a>,&amp;rdquo; was undertaken in collaboration with Argonne National Laboratory
and the Chameleon Cloud community. The project was mentored by &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ken-raffenetti/">Ken Raffenetti&lt;/a> and was completed over the summer.
This blog details the work and outcomes of the project.&lt;/p>
&lt;h2 id="background">Background&lt;/h2>
&lt;p>Message Passing Interface (MPI) is the backbone of high-performance computing (HPC), enabling efficient scaling across thousands of
processing cores. However, reproducing MPI-based experiments remains challenging due to dependencies on specific library versions,
network configurations, and multi-node setups.&lt;/p>
&lt;p>To address this, we introduce a reproducibility initiative that provides standardized MPI environments on the Chameleon testbed.
This is set up as a master–worker MPI cluster. The master node manages tasks and communication, while the worker nodes do the computations.
All nodes have the same MPI libraries, software, and network settings, making experiments easier to scale and reproduce.&lt;/p>
&lt;h2 id="objectives">Objectives&lt;/h2>
&lt;p>The aim of this project is to create an MPI cluster that is reproducible, easily deployable, and efficiently configurable.&lt;/p>
&lt;p>The key objectives of this project were:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Pre-built MPI Images: Create ready-to-use images with MPI and all dependencies installed.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Automated Cluster Configuration: Develop Ansible playbooks to configure master–worker communication, including host setup, SSH key distribution, and MPI configuration across nodes.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Cluster Orchestration: Develop orchestration template to provision resources and invoke Ansible playbooks for automated cluster setup.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="implementation-strategy-and-deliverables">Implementation Strategy and Deliverables&lt;/h2>
&lt;h3 id="openstack-image-creation">Openstack Image Creation&lt;/h3>
&lt;p>The first step was to create a standardized pre-built image, which serves as the base image for all nodes in the cluster.&lt;/p>
&lt;p>Some important features of the image include:&lt;/p>
&lt;ol>
&lt;li>Built on Ubuntu 22.04 for a stable base environment.&lt;/li>
&lt;li>&lt;a href="https://spack.io/" target="_blank" rel="noopener">Spack&lt;/a> + Lmod integration:
&lt;ul>
&lt;li>Spack handles reproducible, version-controlled installations of software packages.&lt;/li>
&lt;li>Lmod (Lua Modules) provides a user-friendly way to load/unload software environments dynamically.&lt;/li>
&lt;li>Together, they allow users to easily switch between MPI versions, libraries, and GPU toolkits&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="https://github.com/pmodels/mpich" target="_blank" rel="noopener">MPICH&lt;/a> and &lt;a href="https://github.com/open-mpi/ompi" target="_blank" rel="noopener">OpenMPI&lt;/a> pre-installed for standard MPI support and can be loaded/unloaded.&lt;/li>
&lt;li>Three image variants for various HPC workloads: CPU-only, NVIDIA GPU (CUDA 12.8), and AMD GPU (ROCm 6.4.2).&lt;/li>
&lt;/ol>
&lt;p>These images have been published and are available in the Chameleon Cloud Appliance Catalog:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://chameleoncloud.org/appliances/127/" target="_blank" rel="noopener">MPI and Spack for HPC (Ubuntu 22.04)&lt;/a> - CPU Only&lt;/li>
&lt;li>&lt;a href="https://chameleoncloud.org/appliances/130/" target="_blank" rel="noopener">MPI and Spack for HPC (Ubuntu 22.04 - CUDA)&lt;/a> - NVIDIA GPU (CUDA 12.8)&lt;/li>
&lt;li>&lt;a href="https://chameleoncloud.org/appliances/131/" target="_blank" rel="noopener">MPI and Spack for HPC (Ubuntu 22.04 - ROCm)&lt;/a> - AMD GPU (ROCm 6.4.2)&lt;/li>
&lt;/ul>
&lt;h3 id="cluster-configuration-using-ansible">Cluster Configuration using Ansible&lt;/h3>
&lt;p>The next step is to create scripts/playbooks to configure these nodes and set up an HPC cluster.
We assigned specific roles to different nodes in the cluster and combined them into a single playbook to configure the entire cluster automatically.&lt;/p>
&lt;p>Some key steps the playbook performs:&lt;/p>
&lt;ol>
&lt;li>Configure /etc/hosts entries for all nodes.&lt;/li>
&lt;li>Mount Manila NFS shares on each node.&lt;/li>
&lt;li>Generate an SSH key pair on the master node and add the master’s public key to the workers’ authorized_keys.&lt;/li>
&lt;li>Scan worker node keys and update known_hosts on the master.&lt;/li>
&lt;li>(Optional) Manage software:
&lt;ul>
&lt;li>Install new compilers with Spack&lt;/li>
&lt;li>Add new Spack packages&lt;/li>
&lt;li>Update environment modules to recognize them&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Create a hostfile at /etc/mpi/hostfile.&lt;/li>
&lt;/ol>
&lt;p>The code is publicly available and can be found on the GitHub repository: &lt;a href="https://github.com/rohanbabbar04/MPI-Spack-Experiment-Artifact" target="_blank" rel="noopener">https://github.com/rohanbabbar04/MPI-Spack-Experiment-Artifact&lt;/a>&lt;/p>
&lt;h3 id="orchestration">Orchestration&lt;/h3>
&lt;p>With the image now created and deployed, and the Ansible scripts ready for cluster configuration, we put everything
together to orchestrate the cluster deployment.&lt;/p>
&lt;p>This can be done in two primary ways:&lt;/p>
&lt;h4 id="python-chijupyter--ansible">Python CHI(Jupyter) + Ansible&lt;/h4>
&lt;p>&lt;a href="https://github.com/ChameleonCloud/python-chi" target="_blank" rel="noopener">Python-CHI&lt;/a> is a python library designed to facilitate interaction with the Chameleon testbed. Often used within environments like Jupyter notebooks.&lt;/p>
&lt;p>This setup can be put up as:&lt;/p>
&lt;ol>
&lt;li>Create leases, launch instances, and set up shared storage using python-chi commands.&lt;/li>
&lt;li>Automatically generate inventory.ini for Ansible based on launched instances.&lt;/li>
&lt;li>Run Ansible playbook programmatically using &lt;code>ansible_runner&lt;/code>.&lt;/li>
&lt;li>Outcome: fully configured, ready-to-use HPC cluster; SSH into master to run examples.&lt;/li>
&lt;/ol>
&lt;p>If you would like to see a working example, you can view it in the &lt;a href="https://chameleoncloud.org/experiment/share/7424a8dc-0688-4383-9d67-1e40ff37de17" target="_blank" rel="noopener">Trovi example&lt;/a>&lt;/p>
&lt;h4 id="heat-orchestration-template">Heat Orchestration Template&lt;/h4>
&lt;p>Heat Orchestration Template(HOT) is a YAML based configuration file. Its purpose is to define/create a stack to automate
the deployment and configuration of OpenStack cloud resources.&lt;/p>
&lt;p>&lt;strong>Challenges&lt;/strong>&lt;/p>
&lt;p>We faced some challenges while working with Heat templates and stacks in particular in Chameleon Cloud&lt;/p>
&lt;ol>
&lt;li>&lt;code>OS::Nova::Keypair&lt;/code>(new version): In the latest OpenStack version, the stack fails to launch if the &lt;code>public_key&lt;/code> parameter is not provided for the keypair,
as auto-generation is no longer supported.&lt;/li>
&lt;li>&lt;code>OS::Heat::SoftwareConfig&lt;/code>: Deployment scripts often fail, hang, or time out, preventing proper configuration of nodes and causing unreliable deployments.&lt;/li>
&lt;/ol>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Heat Approach" srcset="
/report/osre25/uchicago/mpi/20250901-rohan-babbar/heatapproach_hua2bf48ad20dec386c348c909fcaf7111_39548_05fca9fb65271d31e3fd79f2e7b58a53.webp 400w,
/report/osre25/uchicago/mpi/20250901-rohan-babbar/heatapproach_hua2bf48ad20dec386c348c909fcaf7111_39548_19399eb0dbf598de84852723f8d60783.webp 760w,
/report/osre25/uchicago/mpi/20250901-rohan-babbar/heatapproach_hua2bf48ad20dec386c348c909fcaf7111_39548_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/mpi/20250901-rohan-babbar/heatapproach_hua2bf48ad20dec386c348c909fcaf7111_39548_05fca9fb65271d31e3fd79f2e7b58a53.webp"
width="760"
height="235"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>To tackle these challenges, we designed an approach that is both easy to implement and reproducible. First, we launch instances
by provisioning master and worker nodes using the HOT template in OpenStack. Next, we set up a bootstrap node, install Git and Ansible,
and run an Ansible playbook from the bootstrap node to configure the master and worker nodes, including SSH, host communication, and
MPI setup. The outcome is a fully configured, ready-to-use HPC cluster, where users can simply SSH into the master node to run examples.&lt;/p>
&lt;p>Users can view/use the template published in the Appliance Catalog: &lt;a href="https://chameleoncloud.org/appliances/132/" target="_blank" rel="noopener">MPI+Spack Bare Metal Cluster&lt;/a>.
For example, a demonstration of how to pass parameters is available on &lt;a href="https://chameleoncloud.org/experiment/share/7424a8dc-0688-4383-9d67-1e40ff37de17" target="_blank" rel="noopener">Trovi&lt;/a>.&lt;/p>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>In conclusion, this work demonstrates a reproducible approach to building and configuring MPI clusters on the Chameleon testbed. By using standardized images,
Ansible automation, and Orchestration Templates, we ensure that every node is consistently set up, reducing manual effort and errors. The artifact, published on Trovi,
makes the entire process transparent, reusable, and easy to implement, enabling users/researchers to reliably recreate and extend the cluster environment for their own
experiments.&lt;/p>
&lt;h2 id="future-work">Future Work&lt;/h2>
&lt;p>Maintaining these images and possibly creating a script to reproduce MPI and Spack on a different image base environment.&lt;/p></description></item><item><title>Final Update(Mid-Term -> Final): MPI Appliance for HPC Research on Chameleon</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/mpi/20250831-rohan-babbar/</link><pubDate>Sun, 31 Aug 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/mpi/20250831-rohan-babbar/</guid><description>&lt;p>Hi everyone! This is my final update, covering the progress made every two weeks from the midterm to the end of the
project &lt;a href="https://ucsc-ospo.github.io/project/osre25/uchicago/mpi/" target="_blank" rel="noopener">MPI Appliance for HPC Research on Chameleon&lt;/a>, developed
in collaboration with Argonne National Laboratory and the Chameleon Cloud community.
This blog follows up on my earlier post, which you can find &lt;a href="https://ucsc-ospo.github.io/report/osre25/uchicago/mpi/20250803-rohan-babbar/" target="_blank" rel="noopener">here&lt;/a>.&lt;/p>
&lt;h3 id="-july-29--august-11-2025">🔧 July 29 – August 11, 2025&lt;/h3>
&lt;p>With the CUDA- and MPI-Spack–based appliances published, we considered releasing another image variant (ROCm-based) for AMD GPUs.
This will be primarily used in CHI@TACC, which provides AMD GPUs. We have successfully published a new image on Chameleon titled &lt;a href="https://chameleoncloud.org/appliances/131/" target="_blank" rel="noopener">MPI and Spack for HPC (Ubuntu 22.04 - ROCm)&lt;/a>,
and we also added an example to demonstrate its usage.&lt;/p>
&lt;h3 id="-august-12--august-25-2025">🔧 August 12 – August 25, 2025&lt;/h3>
&lt;p>With the examples now available on Trovi for creating an MPI cluster using Ansible and Python-CHI, my next step was to experiment with stack orchestration using Heat Orchestration Templates (HOT) on OpenStack Chameleon Cloud.
This turned out to be more challenging due to a few restrictions:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>OS::Nova::Keypair (new version)&lt;/strong>: In the latest OpenStack version, the stack fails to launch if the public_key parameter is not provided for the keypair, as auto-generation is no longer supported.&lt;/li>
&lt;li>&lt;strong>OS::Heat::SoftwareConfig&lt;/strong>: Deployment scripts often fail, hang, or time out, preventing proper configuration of nodes and causing unreliable deployments.&lt;/li>
&lt;/ol>
&lt;p>To address these issues, we adopted a new strategy for configuring and creating the MPI cluster: using a temporary bootstrap node.&lt;/p>
&lt;p>In simple terms, the workflow of the Heat template is:&lt;/p>
&lt;ol>
&lt;li>Provision master and worker nodes via the HOT template on OpenStack.&lt;/li>
&lt;li>Launch a bootstrap node, install Git and Ansible on it, and then run an Ansible playbook from the bootstrap node to configure the master and worker nodes. This includes setting up SSH, host communication, and the MPI environment.&lt;/li>
&lt;/ol>
&lt;p>This provides an alternative method for creating an MPI cluster.&lt;/p>
&lt;p>We presented this work on August 26, 2025, to the Chameleon Team and the Argonne MPICH Team. The project was very well received.&lt;/p>
&lt;p>Stay tuned for my final report on this work, which I’ll be sharing in my next blog post.&lt;/p></description></item><item><title>[Final Blog] Distrobench: Distributed Protocol Benchmark</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/umass/edge-replication/20250830-panjisri/</link><pubDate>Sat, 30 Aug 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/umass/edge-replication/20250830-panjisri/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>This is the final blog for our contribution to the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/umass/edge-replication/">Open Testbed for Reproducible Evaluation of Replicated Systems at the Edges&lt;/a> project under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fadhil-kurnia/">Fadhil Kurnia&lt;/a> for the OSRE program.&lt;/p>
&lt;p>&lt;a href="https://github.com/fadhilkurnia/distro" target="_blank" rel="noopener">Distrobench&lt;/a> is a framework to evaluate the performance of replication/coordination protocols for distributed systems. This framework standardizes benchmarking by allowing different protocols to be tested under an identical workload, and supports both local and remote deployment of the protocols. The frameworks tested are restricted under a key-value store application and are categorized under different &lt;a href="https://jepsen.io/consistency/models" target="_blank" rel="noopener">consistency models&lt;/a>, programming languages, and persistency (whether the framework stores its data in-memory or on-disk).&lt;/p>
&lt;p>All the benchmark results are stored in a &lt;code>data.json&lt;/code> file which can be viewed through a webpage we have provided. A user can clone the git repository, benchmark different protocols on their own machine or in a cluster of remote machines, then view the results locally. We also provided a &lt;a href="https://distrobench.org" target="_blank" rel="noopener">webpage&lt;/a> that shows our own benchmark results which ran on 3 Amazon EC2 t2.micro instances.&lt;br>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="" srcset="
/report/osre25/umass/edge-replication/20250830-panjisri/image_hu785d614b38f6808c04fc85bf3c31eb36_153748_2eb41220c4287bdc730b38c76a5643f8.webp 400w,
/report/osre25/umass/edge-replication/20250830-panjisri/image_hu785d614b38f6808c04fc85bf3c31eb36_153748_789a9a55850eed73f3a681f8423873cf.webp 760w,
/report/osre25/umass/edge-replication/20250830-panjisri/image_hu785d614b38f6808c04fc85bf3c31eb36_153748_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/umass/edge-replication/20250830-panjisri/image_hu785d614b38f6808c04fc85bf3c31eb36_153748_2eb41220c4287bdc730b38c76a5643f8.webp"
width="760"
height="381"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="how-to-run-a-benchmark-on-distrobench">How to run a benchmark on Distrobench&lt;/h2>
&lt;p>Before running a benchmark using Distrobench, the protocol that will be benchmarked must first be built. This is to allow the script to initialize the protocol instance for local benchmark or to send the binaries into the remote machine. The remote machine running the protocol does not need to store the code for the protocol implementations, but does require dependencies for running that specific protocol such as Java, Docker, rsync, etc. The following are commands used to build the &lt;a href="https://github.com/ailidani/paxi" target="_blank" rel="noopener">ailidani/paxi&lt;/a> project which does not need any additional dependency to be run inside of a remote machine:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sh" data-lang="sh">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Clone the Distrobench repository &lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">git clone git@github.com:fadhilkurnia/distro.git
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Clone the Paxi repository and build the binary &lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">cd&lt;/span> distro/sut/ailidani.paxi
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">git clone git@github.com:ailidani/paxi.git
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">cd&lt;/span> paxi/bin/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">./build.sh
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Go back to the Distrobench root directory &amp;amp; run python script &lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">cd&lt;/span> ../../../..
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">python main.py
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>By default, the script will start 3 local instances of a Paxi protocol implementation that the user chose through the CLI. The user can modify the number of running instances and whether or not it is deployed locally or in a remote machine by changing the contents of the &lt;code>.env&lt;/code> file inside the root directory. The following is the contents of the default .env file:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">NUM_OF_NODES=3
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">SSH_KEY=ssh-key.pem
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">REMOTE_USERNAME=ubuntu
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">PUBLIC_IP1=127.0.0.1
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">PUBLIC_IP2=127.0.0.1
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">PUBLIC_IP3=127.0.0.1
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">PRIVATE_IP1=127.0.0.1
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">PRIVATE_IP2=127.0.0.1
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">PRIVATE_IP3=127.0.0.1
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">CLIENT_IP=127.0.0.1
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">OUTPUT=data.json
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>When running a remote benchmark, a ssh-key should also be added in the root directory to allow the use of ssh and rsync from within the python script. All machines must also allow TCP connection through port 2000-2300 and port 3000-3300 because that would be the port range for communication between the running instances as well as for the YCSB benchmark. Running the benchmark requires the use of at least 3 nodes because it is the minimum number of nodes to support most protocols (5 nodes recommended).&lt;/p>
&lt;p>To view the benchmark result in the web page locally, move &lt;code>data.json&lt;/code> into the &lt;code>docs/&lt;/code> directory and run &lt;code>python -m http.server 8000&lt;/code>. The page is then accessible through &lt;code>http://localhost:8000&lt;/code>.&lt;/p>
&lt;h2 id="deep-dive-on-how-distrobench-works">Deep dive on how Distrobench works&lt;/h2>
&lt;p>The following is the project structure of the Distrobench repository:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">distro/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">├── main.py // Main python script for running benchmark
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">├── data.json // Output file for main.py
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">├── README.md
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">├── .env // Config for running the benchmark
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">├── docs/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">│ ├── index.html // Web page to show benchmark results
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">│ ├── data.json // Output file displayed by web page
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">│ ├── README.md
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">├── src/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">│ ├── utils/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">│ └── ycsb/ // Submodule for YCSB
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">└── sut/ // Systems under test
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> ├── ailidani.paxi/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> └── run.py // Protocol-specific benchmark script called by main.py
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> ├── apache.zookeeper/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> ├── etcd-io.etcd/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> ├── fadhilkurnia.xdn/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> ├── holipaxos-artifect.holipaxos/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> ├── otoolep.hraftd/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> └── tikv.tikv/
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>main.py&lt;/code> will automatically detect directories inside &lt;code>sut/&lt;/code> and will call the main function inside &lt;code>run.py&lt;/code>. The following is the structure of &lt;code>run.py&lt;/code> written in pseudocode style:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">FUNCTION main(run_ycsb: Function, nodes: List of Nodes, ssh: Dictionary)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> node_data = map_ip_port(nodes)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> SWITCH user\_input
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> CASE 0:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> start()
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> RETURN
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> CASE 1:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> stop()
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> RETURN
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> CASE 2:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> client_data = []
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> FOR EACH item IN node_data
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> ADD item.client_addr TO client_data
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> END FOR
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> run_ycsb(client_data)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> RETURN
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> END SWITCH
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">END FUNCTION
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">FUNCTION start()
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> // Start the protocol instance (local or remote)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">END FUNCTION
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">FUNCTION stop()
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> // Stop the protocol instance (local or remote)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">END FUNCTION
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">FUNCTION map_ip_port(nodes: List of Nodes) -&amp;gt; List of Dictionary
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> // Generate port numbers based on the protocol requirements
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">END FUNCTION
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The .env file provides both public and private IP addresses to add versatility when running a remote benchmark. Private IP is used for communication between remote machines if they are under the same network group. In the case of our own benchmark, four t2.micro EC2 instances are deployed under the same network group. Three of them are used to run the protocol and the fourth machine acts as the YCSB client. It is possible to use your local machine as the YCSB client instead of through another remote machine by specifying &lt;code>CLIENT_IP&lt;/code> in the .env file as &lt;code>127.0.0.1&lt;/code>. The decision to use the remote machine as the YCSB client is made to reduce the impact of network latency between the client and the protocol servers to a minimum.&lt;/p>
&lt;p>The main tasks of the &lt;code>start()&lt;/code> function can be broken down into the following:&lt;/p>
&lt;ol>
&lt;li>Generate custom configuration files for each remote machine instance (May differ between implementations. Some implementations does not require a config file because they support flag parameters out of the box, others require multiple configuration files for each instance)&lt;/li>
&lt;li>rsync binaries into the remote machine (If running a remote benchmark)&lt;/li>
&lt;li>Start the instances&lt;/li>
&lt;/ol>
&lt;p>The &lt;code>stop()&lt;/code> function is a lot simpler since it only kills the process running the protocol and optionally removes the copied binary files in the remote machine. The &lt;code>run_ycsb()&lt;/code> function passed onto &lt;code>run.py&lt;/code> is defined in &lt;code>main.py&lt;/code> and currently supports two types of workload:&lt;/p>
&lt;ol>
&lt;li>Read-heavy: A single-client workload with 95% read and 5% update (write) operations&lt;/li>
&lt;li>Update-heavy: A single-client workload with 50% read and 50% update (write) operations&lt;/li>
&lt;/ol>
&lt;p>A new workload can be added inside the &lt;code>src/ycsb/workloads&lt;/code> directory. Both workloads above only run 1000 operations for the benchmark which may not be enough operations to properly evaluate the performance of the protocols. It should also be noted that while YCSB does support a &lt;code>scan&lt;/code> operation, it is never used for our benchmark because none of our tested protocols implement this operation.&lt;/p>
&lt;h3 id="how-to-implement-a-new-protocol-in-distrobench">How to implement a new protocol in Distrobench&lt;/h3>
&lt;p>Adding a new protocol to distrobench requires implementing two main components: a Python integration script (&lt;code>run.py&lt;/code>) and a YCSB database binding for benchmarking.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Create the protocol directory structure&lt;/p>
&lt;ul>
&lt;li>Create a new directory under &lt;code>sut/&lt;/code> using format &lt;code>yourrepo.yourprotocol/.&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Write &lt;code>run.py&lt;/code> integration&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Put script inside yourrepo.yourprotocol/ directory&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Must have the &lt;code>main(run_ycsb, nodes, ssh)&lt;/code> function.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Add start/stop/benchmark menu options&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Handle local (127.0.0.1) and remote deployment&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Create YCSB client&lt;/p>
&lt;ul>
&lt;li>Make Java class extending YCSB&amp;rsquo;s DB class&lt;/li>
&lt;li>Put inside &lt;code>src/ycsb/yourprotocol/src/main/java/site/ycsb/yourprotocol&lt;/code>&lt;/li>
&lt;li>Implement &lt;code>read()&lt;/code>, &lt;code>insert()&lt;/code>, &lt;code>update()&lt;/code>, &lt;code>delete()&lt;/code> methods&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Register your client&lt;/p>
&lt;ul>
&lt;li>Register your client to &lt;code>src/pom.xml&lt;/code>, &lt;code>src/ycsb/bin/binding.properties&lt;/code>, and &lt;code>src/ycsb/bin/ycsb&lt;/code>.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Build and test&lt;/p>
&lt;ul>
&lt;li>Run &lt;code>cd src/ycsb &amp;amp;&amp;amp; mvn clean package&lt;/code>&lt;/li>
&lt;li>Run python &lt;code>main.py&lt;/code>&lt;/li>
&lt;li>Select your protocol and test it&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;h2 id="protocols-which-have-been-tested">Protocols which have been tested&lt;/h2>
&lt;p>Distrobench has tested 20 different distributed consensus protocols across 7 different implementation projects.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;a href="https://github.com/ailidani/paxi" target="_blank" rel="noopener">ailidani/paxi&lt;/a>&lt;/p>
&lt;ul>
&lt;li>Programming Language : Go&lt;/li>
&lt;li>Persistency : On-Disk&lt;/li>
&lt;li>Consistency Model : Linearizability, Eventual&lt;/li>
&lt;li>Protocol : Paxos, EPaxos, SDpaxos, WPaxos, ABD, chain, VPaxos, WanKeeper, KPaxos, Paxos_groups, Dynamo, Blockchain, M2Paxos, HPaxos.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://github.com/apache/zookeeper" target="_blank" rel="noopener">apache/zookeeper&lt;/a>&lt;/p>
&lt;ul>
&lt;li>Programming Language : Java&lt;/li>
&lt;li>Persistency : On-Disk&lt;/li>
&lt;li>Consistency Model : Linearizability + Primary Integrity&lt;/li>
&lt;li>Protocol : Zookeeper implements ZAB (Zookeper Atomic Broadcast)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://github.com/etcd-io/etcd" target="_blank" rel="noopener">etcd-io/etcd&lt;/a>&lt;/p>
&lt;ul>
&lt;li>Programming Language : Go&lt;/li>
&lt;li>Persistency : On-Disk&lt;/li>
&lt;li>Consistency Model : Linearizability&lt;/li>
&lt;li>Protocol : Raft&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://github.com/fadhilkurnia/xdn" target="_blank" rel="noopener">fadhilkurnia/xdn&lt;/a>&lt;/p>
&lt;ul>
&lt;li>Programming Language : Java, Rust&lt;/li>
&lt;li>Persistency : On-Disk&lt;/li>
&lt;li>Consistency Model : Linearizability, Linearizability + Primary Integrity&lt;/li>
&lt;li>Protocol : Gigapaxos&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://github.com/Zhiying12/holipaxos-artifect" target="_blank" rel="noopener">Zhiying12/holipaxos-artifect&lt;/a>&lt;/p>
&lt;ul>
&lt;li>Programming Language : Go, Rust&lt;/li>
&lt;li>Persistency : On-Disk&lt;/li>
&lt;li>Consistency Model : Linearizability&lt;/li>
&lt;li>Protocol : Holipaxos, Omnipaxos, Multipaxos&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://github.com/otoolep/hraftd" target="_blank" rel="noopener">otoolep/hraftd&lt;/a>&lt;/p>
&lt;ul>
&lt;li>Programming Language : Go&lt;/li>
&lt;li>Persistency : On-Disk&lt;/li>
&lt;li>Consistency Model : Linearizability&lt;/li>
&lt;li>Protocol : Raft&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://github.com/tikv/tikv" target="_blank" rel="noopener">tikv/tikv&lt;/a>&lt;/p>
&lt;ul>
&lt;li>Programming Language : Rust&lt;/li>
&lt;li>Persistency : On-Disk&lt;/li>
&lt;li>Consistency Model : Linearizability&lt;/li>
&lt;li>Protocol : Raft&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;h2 id="challenges">Challenges&lt;/h2>
&lt;ul>
&lt;li>When attempting to benchmark HoliPaxos, the main challenge was handling versions that rely on persistent storage with RocksDB. Since some implementations are written in Go, it was necessary to find compatible versions of RocksDB and gRocksDB (for example, RocksDB 10.5.1 works with gRocksDB 1.10.2). Another difficulty was that RocksDB is resource-intensive to compile, and in our project we did not have sufficient CPU capacity on the remote machine to build RocksDB and run remote benchmarks.&lt;/li>
&lt;li>Some projects did not compile successfully at first and required minor modifications to run.&lt;/li>
&lt;/ul>
&lt;h2 id="conclusion-and-future-improvements">Conclusion and future improvements&lt;/h2>
&lt;p>The current benchmark result shows the performance of all the mentioned protocols by throughput and benchmark runtime. The results are subject to revisions because it may not reflect the best performance for the protocols due to unoptimized deployment script. We are also planning to switch to a more powerful EC2 machine because t2.micro does not have enough resources to support the use of RocksDB as well as TiKV.&lt;/p>
&lt;p>In the near future, additional features will be added to Distrobench such as:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Multi-Client Support:&lt;/strong> The YCSB client will start multiple clients which will send requests in parallel to different servers in the group.&lt;/li>
&lt;li>&lt;strong>Commit Versioning:&lt;/strong> Allows the labelling of all benchmark results with the commit hash of the protocol&amp;rsquo;s repository version. This allows comparing different versions of the same project.&lt;/li>
&lt;li>&lt;strong>Adding more Primary-Backup, Sequential, Causal, and Eventual consistency protocols:&lt;/strong> Implementations with support for a consistency model other than linearizability and one that provides an existing key-value store application are notoriously difficult to find.&lt;/li>
&lt;li>&lt;strong>Benchmark on node failure&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Benchmark on the addition of a new node&lt;/strong>&lt;/li>
&lt;/ul></description></item><item><title>Final Blog:Improving Usability and Performance in cc-snapshot</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/cc-snapshot/20250824-zahratm/</link><pubDate>Sun, 24 Aug 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/cc-snapshot/20250824-zahratm/</guid><description>&lt;p>My name is Zahra Temori, and I&amp;rsquo;m thrilled to collaborate with mentor Paul Marshall during this summer on the cc-snapshot project.&lt;/p>
&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Reproducibility is an important concept in high performance computing and research. It ensures that experiments can be repeated, validated, and extended with confidence. Achieving a reproducible environment requires identical software stacks, with the exact same dependencies, and configuration. The Chameleon Cloud testbed provides the cc-snapshot tool to support reproducibility by capturing the complete state of a running system. This allows researchers to rerun experiments exactly as before, share setups among each other, and avoid potential environmental issues such as missing dependencies or version mismatches. In this work, we explore how to enhance snapshotting as a reproducible method and make it an effective strategy for HPC research.&lt;/p>
&lt;h2 id="key-achievements">Key Achievements&lt;/h2>
&lt;p>The project was divided into two phases.The first phase focused on usability, reorganizing the tool, and expanding its capabilities. The second phase was benchmarking to evaluate alternative image formats and compression methods to improve snapshotting performance.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Usability Enhancements:&lt;/strong>
The original snapshotting tool had challenges including a limited command line, tightly coupled logic, and minimal testing support, which made it difficult for users to interact with and developers to maintain. To enhance the command line interface, we added a flag to disable automatic updates, giving users more control over when to pull the latest version. We also added a dry-run flag to simulate actions before running a snapshot, allowing developers to test and run safely. Moreover, we implemented support for a custom source path, enabling snapshots of specific directories. This helps developers test smaller directories rather than full snapshots, which can be more complicated when testing functionalities.
To improve maintainability, we refactored the codebase into five modular functions, allowing developers to make future changes more easily. In addition, we added automated tests with GitHub Actions to validate new and existing features and ensure that changes work as expected.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Performance Optimization:&lt;/strong>
The default format and compression on snapshotting was Qcow2 with zlib, which often resulted in long snapshot creation time. To address this performance issue, we benchmarked other alternatives such as QCOW2 with zstd compression, and RAW with no compression. We also chose three images of varying sizes: small 4.47 GiB, medium 7.62 GiB, and large 12.7 GiB. The medium size image was user created to demonstrate the snapshotting and compression works for both Chameleon-supported images and user-created images.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Results:&lt;/strong>
We ran each image with different compression methods and recorded four key metrics: creation time, upload time, boot time, and final image size. We calculated the overall time of each compression method from experiments on three different image sizes to evaluate which performed better. The results revealed that zstd compression reduced the creation time around 80.6% across the three image sizes. The upload time for zstd was nearly equal to the zlib method, while RAW images, due to no compression and larger size, uploaded much slower compared to images compressed with zlib and zstd. The boot time was nearly the same across all images, confirming that zlib and zstd take about the same time to uncompress, while RAW images take longer to boot due to large size. Our work suggested that QCOW2 with zstd compression should be used instead of QCOW2 with zlib compression when creating a snapshot. This enables researchers to generate and share reproducible environments faster.&lt;/p>
&lt;h2 id="conclusion-and-future-work">Conclusion and Future Work&lt;/h2>
&lt;p>Snapshotting is a practical way to support reproducibility in HPC, but to be effective, it should be easy to use and fast enough for real research workflows. Our results show that using zstd compression can drop the snapshot creation time by over 80% compared to the common default zlib compression, without affecting upload or boot performance. Looking ahead, we plan to integrate zstd , try it on more workloads and image types, and explore ways to improve snapshotting for even greater speedups and reliable results.&lt;/p>
&lt;h2 id="deliverables">Deliverables&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Repository:&lt;/strong> All comprehensive analysis code and source code can be found in the &lt;a href="https://github.com/ChameleonCloud/cc-snapshot/tree/reproducibility-improvements" target="_blank" rel="noopener">CC-SNAPSHOT GitHub Repository&lt;/a>.&lt;/li>
&lt;/ul></description></item><item><title>End-term Blog: StatWrap: Cross-Project Searching and Classification using Local Indexing</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/northwestern/statwrap/20250823-debangi29/</link><pubDate>Sat, 23 Aug 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/northwestern/statwrap/20250823-debangi29/</guid><description>&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Heading" srcset="
/report/osre25/northwestern/statwrap/20250823-debangi29/image0_hu69efae69f006c4366342bdc2ded8b248_187729_f9e5e16b2001b9950ad995b2c786abc9.webp 400w,
/report/osre25/northwestern/statwrap/20250823-debangi29/image0_hu69efae69f006c4366342bdc2ded8b248_187729_27bc4379277ab462935158b3db96d992.webp 760w,
/report/osre25/northwestern/statwrap/20250823-debangi29/image0_hu69efae69f006c4366342bdc2ded8b248_187729_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/northwestern/statwrap/20250823-debangi29/image0_hu69efae69f006c4366342bdc2ded8b248_187729_f9e5e16b2001b9950ad995b2c786abc9.webp"
width="760"
height="392"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h1 id="introduction">&lt;strong>Introduction&lt;/strong>&lt;/h1>
&lt;p>Hello everyone!&lt;br>
I am Debangi Ghosh from India, an undergraduate student at the Indian Institute of Technology (IIT) BHU, Varanasi. As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/northwestern/statwrap/">StatWrap: Cross-Project Searching and Classification using Local Indexing&lt;/a> project, my &lt;a href="https://drive.google.com/file/d/1dxyBP2oMJwYDCKyIWzr465zNmm6UWtnI/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a>, under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/luke-rasmussen/">Luke Rasmussen&lt;/a>, focuses on developing a full-text search service within the StatWrap user interface. This involves evaluating different search libraries and implementing a classification system to distinguish between active and past projects.&lt;/p>
&lt;h1 id="about-the-project">&lt;strong>About the Project&lt;/strong>&lt;/h1>
&lt;p>As part of the project, I am working on enhancing the usability of StatWrap by enabling efficient cross-project search capabilities. The goal is to make it easier for researchers to discover relevant projects, notes, and assets across both current and archived work, using information that is either user-entered or passively collected by StatWrap.&lt;/p>
&lt;p>Given the sensitivity of the data involved, one of the key requirements is that all indexing and search operations must be performed locally. To address this, my responsibilities include:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Evaluating open-source search libraries&lt;/strong> suitable for local indexing and retrieval&lt;/li>
&lt;li>&lt;strong>Building the full-text search functionality&lt;/strong> directly into the StatWrap UI to allow seamless querying across projects&lt;/li>
&lt;li>&lt;strong>Ensuring reliability&lt;/strong> through the development of unit tests and comprehensive system testing&lt;/li>
&lt;li>&lt;strong>Implementing a classification system&lt;/strong> to label projects as “Active,” “Pinned,” or “Past” within the user interface&lt;/li>
&lt;/ul>
&lt;p>This project offers a great opportunity to work at the intersection of software development, information retrieval, and user-centric design—while contributing to research reproducibility and collaboration within scientific workflows.&lt;/p>
&lt;h1 id="deliverables">&lt;strong>Deliverables&lt;/strong>&lt;/h1>
&lt;p>The project has reached the end of its scope after 12 weeks of work. Here&amp;rsquo;s a breakdown:&lt;/p>
&lt;h2 id="1-descriptive-comparison-of-open-source-libraries">&lt;strong>1. Descriptive Comparison of Open-Source Libraries&lt;/strong>&lt;/h2>
&lt;p>Compared various open-source search libraries based on evaluation criteria such as &lt;strong>indexing speed, search speed, memory usage, typo tolerance, fuzzy searching, partial matching, full-text queries, contextual search, Boolean support, exact word match, installation ease, maintenance, documentation&lt;/strong>, and &lt;strong>developer experience&lt;/strong>. Decided upon the weights to assign to each of the features and point out the best library to use. According to our weights assigned,
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Evaluation" srcset="
/report/osre25/northwestern/statwrap/20250823-debangi29/image1_hu63c79919752d2305350a1cb96819590d_110608_4b5e863d88146124b333878508147eff.webp 400w,
/report/osre25/northwestern/statwrap/20250823-debangi29/image1_hu63c79919752d2305350a1cb96819590d_110608_c2220a56c480048842e8b750cc2ca56f.webp 760w,
/report/osre25/northwestern/statwrap/20250823-debangi29/image1_hu63c79919752d2305350a1cb96819590d_110608_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/northwestern/statwrap/20250823-debangi29/image1_hu63c79919752d2305350a1cb96819590d_110608_4b5e863d88146124b333878508147eff.webp"
width="760"
height="603"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>These results are after tuning the hyperparameters to give the best set of results
For huge data, FlexSearch has the least memory usage, followed by MiniSearch. The examples we used were limited, so Minisearch had the better memory usage results.
Along with the research and evaluation, I looked upon the Performance Benchmark of Full-Text-Search Libraries (Stress Test), available &lt;a href="https://nextapps-de.github.io/flexsearch/" target="_blank" rel="noopener">here&lt;/a>&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Stress Test" srcset="
/report/osre25/northwestern/statwrap/20250823-debangi29/image2_hu9b739b80416dccda0a7e0361ba4f7e36_163727_407cb964e7e05c64834433b6a84182ff.webp 400w,
/report/osre25/northwestern/statwrap/20250823-debangi29/image2_hu9b739b80416dccda0a7e0361ba4f7e36_163727_167223f62fbaf30991601d7745fad9f5.webp 760w,
/report/osre25/northwestern/statwrap/20250823-debangi29/image2_hu9b739b80416dccda0a7e0361ba4f7e36_163727_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/northwestern/statwrap/20250823-debangi29/image2_hu9b739b80416dccda0a7e0361ba4f7e36_163727_407cb964e7e05c64834433b6a84182ff.webp"
width="760"
height="384"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>The benchmark was measured in terms per seconds, higher values are better (except the test &amp;ldquo;Memory&amp;rdquo;). The memory value refers to the amount of memory which was additionally allocated during search.&lt;/p>
&lt;p>FlexSearch performs queries up to 1,000,000 times faster compared to other libraries by also providing powerful search capabilities like multi-field search (document search), phonetic transformations, partial matching, tag-search, result highlighting or suggestions.
Bigger workloads are scalable through workers to perform any updates or queries to the index in parallel through dedicated balanced threads.&lt;/p>
&lt;h2 id="2-the-search-user-interface">&lt;strong>2. The Search User Interface&lt;/strong>&lt;/h2>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="ui" srcset="
/report/osre25/northwestern/statwrap/20250823-debangi29/image3_hu2c7c529fbdaba5c9b4f85e802acf251e_292973_5c88d9d2587c54c50da97d6c489519dc.webp 400w,
/report/osre25/northwestern/statwrap/20250823-debangi29/image3_hu2c7c529fbdaba5c9b4f85e802acf251e_292973_82065ca30e98bced61362bca45765215.webp 760w,
/report/osre25/northwestern/statwrap/20250823-debangi29/image3_hu2c7c529fbdaba5c9b4f85e802acf251e_292973_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/northwestern/statwrap/20250823-debangi29/image3_hu2c7c529fbdaba5c9b4f85e802acf251e_292973_5c88d9d2587c54c50da97d6c489519dc.webp"
width="760"
height="428"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="ui2" srcset="
/report/osre25/northwestern/statwrap/20250823-debangi29/image5_hu55f5482f96b2f6db562c5a51f9b5f629_220424_7a3499ad0fc3cd06919fcdd17194742a.webp 400w,
/report/osre25/northwestern/statwrap/20250823-debangi29/image5_hu55f5482f96b2f6db562c5a51f9b5f629_220424_5840b85d48a6e608855c8e0d96b4fe49.webp 760w,
/report/osre25/northwestern/statwrap/20250823-debangi29/image5_hu55f5482f96b2f6db562c5a51f9b5f629_220424_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/northwestern/statwrap/20250823-debangi29/image5_hu55f5482f96b2f6db562c5a51f9b5f629_220424_7a3499ad0fc3cd06919fcdd17194742a.webp"
width="760"
height="652"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="3-complete-search-execution-pipeline">&lt;strong>3. Complete Search Execution Pipeline&lt;/strong>&lt;/h2>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="ui2" srcset="
/report/osre25/northwestern/statwrap/20250823-debangi29/Flowchart__hu0123533bb7a682ac6b28d9b34fa57bc0_349775_bd4ac2fa5efb17e2b237cf8d78278398.webp 400w,
/report/osre25/northwestern/statwrap/20250823-debangi29/Flowchart__hu0123533bb7a682ac6b28d9b34fa57bc0_349775_a0e8f31fdbdc656a2886def3dca3410b.webp 760w,
/report/osre25/northwestern/statwrap/20250823-debangi29/Flowchart__hu0123533bb7a682ac6b28d9b34fa57bc0_349775_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/northwestern/statwrap/20250823-debangi29/Flowchart__hu0123533bb7a682ac6b28d9b34fa57bc0_349775_bd4ac2fa5efb17e2b237cf8d78278398.webp"
width="513"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="4-flexsearch-features">&lt;strong>4. FlexSearch Features&lt;/strong>&lt;/h2>
&lt;h4 id="1-persistent-indexing-with-automatic-loading">1. &lt;strong>Persistent Indexing with Automatic Loading&lt;/strong>&lt;/h4>
&lt;ul>
&lt;li>&lt;strong>Index persistence&lt;/strong>: Search index automatically saves to disk and loads on startup&lt;/li>
&lt;li>&lt;strong>Fast restoration&lt;/strong>: Rebuilds FlexSearch indices from saved document store without re-scanning files&lt;/li>
&lt;li>&lt;strong>Incremental updates&lt;/strong>: Detects project changes and updates only modified content&lt;/li>
&lt;li>&lt;strong>Background processing&lt;/strong>: Index updates happen asynchronously without blocking the User Interface.&lt;/li>
&lt;/ul>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="indexing" srcset="
/report/osre25/northwestern/statwrap/20250823-debangi29/image4_hu4893772edaa569a0d2e6454373f66573_78656_23074ee37edbb0f6abbd289ef211f756.webp 400w,
/report/osre25/northwestern/statwrap/20250823-debangi29/image4_hu4893772edaa569a0d2e6454373f66573_78656_993d6a1363d2cddf66632c4102acb8f5.webp 760w,
/report/osre25/northwestern/statwrap/20250823-debangi29/image4_hu4893772edaa569a0d2e6454373f66573_78656_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/northwestern/statwrap/20250823-debangi29/image4_hu4893772edaa569a0d2e6454373f66573_78656_23074ee37edbb0f6abbd289ef211f756.webp"
width="494"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h4 id="2-multi-document-type-support">2. &lt;strong>Multi-Document Type Support&lt;/strong>&lt;/h4>
&lt;ul>
&lt;li>&lt;strong>Unified search&lt;/strong>: Single search interface for projects, files, people, notes, and assets&lt;/li>
&lt;li>&lt;strong>Type-specific indices&lt;/strong>: Separate FlexSearch indices optimized for each document type&lt;/li>
&lt;li>&lt;strong>Cross-reference capabilities&lt;/strong>: Documents can reference and link to each other&lt;/li>
&lt;li>&lt;strong>Flexible schema&lt;/strong>: Each document type has tailored fields for optimal search performance&lt;/li>
&lt;/ul>
&lt;h4 id="3-intelligent-file-content-indexing">3. &lt;strong>Intelligent File Content Indexing&lt;/strong>&lt;/h4>
&lt;ul>
&lt;li>&lt;strong>Configurable file size limits&lt;/strong>: Admin-controlled maximum file size for content indexing&lt;/li>
&lt;li>&lt;strong>Smart file detection&lt;/strong>: Automatically identifies text files by extension and filename patterns&lt;/li>
&lt;li>&lt;strong>Content extraction&lt;/strong>: Full-text indexing with snippet generation for search results&lt;/li>
&lt;li>&lt;strong>Performance optimization&lt;/strong>: Skips binary files and respects size constraints to maintain speed&lt;/li>
&lt;/ul>
&lt;h4 id="4-advanced-query-processing">4. &lt;strong>Advanced Query Processing&lt;/strong>&lt;/h4>
&lt;ul>
&lt;li>&lt;strong>Multi-strategy search&lt;/strong>: Combines exact matches, fuzzy search, partial matches, and contextual search&lt;/li>
&lt;li>&lt;strong>Query preprocessing&lt;/strong>: Removes stop words and applies linguistic filters&lt;/li>
&lt;li>&lt;strong>Relevance scoring&lt;/strong>: Custom scoring algorithm considering multiple factors:
&lt;ul>
&lt;li>Exact phrase matches (highest weight)&lt;/li>
&lt;li>Individual word matches&lt;/li>
&lt;li>Term frequency with logarithmic capping&lt;/li>
&lt;li>Position-based scoring (earlier matches rank higher)&lt;/li>
&lt;li>Proximity bonuses for terms appearing near each other&lt;/li>
&lt;li>Completeness penalties for missing query terms&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h4 id="5-real-time-search-suggestions">5. &lt;strong>Real-Time Search Suggestions&lt;/strong>&lt;/h4>
&lt;ul>
&lt;li>&lt;strong>Autocomplete support&lt;/strong>: Dynamic suggestions based on indexed document titles&lt;/li>
&lt;li>&lt;strong>Search history&lt;/strong>: Maintains recent searches for quick re-execution&lt;/li>
&lt;li>&lt;strong>Debounced input&lt;/strong>: Prevents excessive API calls during typing&lt;/li>
&lt;li>&lt;strong>Contextual suggestions&lt;/strong>: Suggestions adapt based on current filters and context&lt;/li>
&lt;/ul>
&lt;h4 id="6-comprehensive-filtering-system">6. &lt;strong>Comprehensive Filtering System&lt;/strong>&lt;/h4>
&lt;ul>
&lt;li>&lt;strong>Type filtering&lt;/strong>: Filter by document type (projects, files, people, etc.)&lt;/li>
&lt;li>&lt;strong>Project scoping&lt;/strong>: Limit searches to specific projects&lt;/li>
&lt;li>&lt;strong>File type filtering&lt;/strong>: Filter files by extension&lt;/li>
&lt;li>&lt;strong>Advanced search panel&lt;/strong>: Collapsible interface for power users&lt;/li>
&lt;li>&lt;strong>Filter persistence&lt;/strong>: Maintains filter state across searches&lt;/li>
&lt;/ul>
&lt;h4 id="7-performance-monitoring--analytics">7. &lt;strong>Performance Monitoring &amp;amp; Analytics&lt;/strong>&lt;/h4>
&lt;ul>
&lt;li>&lt;strong>Real-time metrics&lt;/strong>: Track search times, cache hit rates, and index statistics&lt;/li>
&lt;li>&lt;strong>Performance dashboard&lt;/strong>: Visual indicators for system health&lt;/li>
&lt;li>&lt;strong>Cache management&lt;/strong>: LRU cache with configurable size and TTL&lt;/li>
&lt;li>&lt;strong>Search analytics&lt;/strong>: Historical data on search patterns and performance&lt;/li>
&lt;/ul>
&lt;h4 id="8-index-management-tools">8. &lt;strong>Index Management Tools&lt;/strong>&lt;/h4>
&lt;ul>
&lt;li>&lt;strong>Export/Import functionality&lt;/strong>: Backup and restore search indices&lt;/li>
&lt;li>&lt;strong>Full reindexing&lt;/strong>: Complete index rebuild with progress tracking&lt;/li>
&lt;li>&lt;strong>Index deletion&lt;/strong>: Clean slate functionality for troubleshooting&lt;/li>
&lt;li>&lt;strong>File size adjustment&lt;/strong>: Modify indexing constraints and rebuild affected content&lt;/li>
&lt;li>&lt;strong>Index statistics&lt;/strong>: Detailed breakdown of indexed content by type and project&lt;/li>
&lt;/ul>
&lt;h4 id="9-robust-error-handling--resilience">9. &lt;strong>Robust Error Handling &amp;amp; Resilience&lt;/strong>&lt;/h4>
&lt;ul>
&lt;li>&lt;strong>Graceful degradation&lt;/strong>: System continues operating even with partial index corruption&lt;/li>
&lt;li>&lt;strong>File system error handling&lt;/strong>: Handles missing files, permission issues, and path changes&lt;/li>
&lt;li>&lt;strong>Memory management&lt;/strong>: Prevents memory leaks during large indexing operations&lt;/li>
&lt;li>&lt;strong>Recovery mechanisms&lt;/strong>: Automatic fallback to basic search if advanced features fail&lt;/li>
&lt;/ul>
&lt;h4 id="10-user-experience-enhancements">10. &lt;strong>User Experience Enhancements&lt;/strong>&lt;/h4>
&lt;ul>
&lt;li>&lt;strong>Keyboard shortcuts&lt;/strong>: Ctrl+K to focus search, Escape to clear&lt;/li>
&lt;li>&lt;strong>Result highlighting&lt;/strong>: Visual emphasis on matching terms in results&lt;/li>
&lt;li>&lt;strong>Expandable results&lt;/strong>: Drill down into detailed information for each result&lt;/li>
&lt;li>&lt;strong>Loading states&lt;/strong>: Clear feedback during indexing and search operations&lt;/li>
&lt;li>&lt;strong>Responsive tabs&lt;/strong>: Organized results by type with badge counts&lt;/li>
&lt;/ul>
&lt;h2 id="5-classification-of-active-and-past-projects">&lt;strong>5. Classification of Active and Past Projects&lt;/strong>&lt;/h2>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Active Pinned" srcset="
/report/osre25/northwestern/statwrap/20250823-debangi29/image6_huacf20425d6903f6cfe6149bc5cb1772d_171494_1d3344ebb95180438d54893a9b5683e4.webp 400w,
/report/osre25/northwestern/statwrap/20250823-debangi29/image6_huacf20425d6903f6cfe6149bc5cb1772d_171494_a0f8ee7f62445c2f5f806022268d0821.webp 760w,
/report/osre25/northwestern/statwrap/20250823-debangi29/image6_huacf20425d6903f6cfe6149bc5cb1772d_171494_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/northwestern/statwrap/20250823-debangi29/image6_huacf20425d6903f6cfe6149bc5cb1772d_171494_1d3344ebb95180438d54893a9b5683e4.webp"
width="733"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Past" srcset="
/report/osre25/northwestern/statwrap/20250823-debangi29/image7_hu7cccff315a5d098cd440d7277689d606_85529_76660a0dce9ac0ba1fa91c959db2773c.webp 400w,
/report/osre25/northwestern/statwrap/20250823-debangi29/image7_hu7cccff315a5d098cd440d7277689d606_85529_cc2abd1a6a3019f703ca3e656e55f920.webp 760w,
/report/osre25/northwestern/statwrap/20250823-debangi29/image7_hu7cccff315a5d098cd440d7277689d606_85529_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/northwestern/statwrap/20250823-debangi29/image7_hu7cccff315a5d098cd440d7277689d606_85529_76660a0dce9ac0ba1fa91c959db2773c.webp"
width="740"
height="542"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>A classification system is added within the User Interface similar to &lt;strong>&amp;ldquo;Add to Favorites&amp;rdquo;&lt;/strong> option. A new project added by default moves to &lt;strong>&amp;ldquo;Active&amp;rdquo;&lt;/strong> section, unless explicitely marked as &lt;strong>&amp;ldquo;Past&amp;rdquo;&lt;/strong>. Similarly, when a project is unpinned from Favorites, it goes to &amp;ldquo;Active&amp;rdquo; Section.&lt;/p>
&lt;h1 id="conclusion-and-future-scope">&lt;strong>Conclusion and future Scope&lt;/strong>&lt;/h1>
&lt;p>Building a comprehensive search system requires careful attention to performance, user experience, and maintainability. FlexSearch provided the foundation, but the real value came from thoughtful implementation of persistent indexing, advanced scoring, and robust error handling. The result is a search system that feels instant to users while handling complex queries across diverse document types.&lt;/p>
&lt;p>The key to success was treating search not as a single feature, but as a complete subsystem with its own data management, performance monitoring, and user interface considerations. By investing in these supporting systems, the search functionality became a central, reliable part of the application that users can depend on.&lt;/p>
&lt;p>The future scope would include:&lt;/p>
&lt;ol>
&lt;li>Using a database (for example, SQLite), instead of JSON, which is better for this use case than JSON due to better and efficient query performance and atomic (CRUD) operations.&lt;/li>
&lt;li>Integrating any suggestions from my mentors, as well as improvements we feel are necessary.&lt;/li>
&lt;li>Developing unit tests for further functionalities and improvements.&lt;/li>
&lt;/ol>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Thank You!" srcset="
/report/osre25/northwestern/statwrap/20250823-debangi29/image_hu81a7405087771991938f164c6a45c6d2_109315_f70985a589ad6b79f8c95b36c5279852.webp 400w,
/report/osre25/northwestern/statwrap/20250823-debangi29/image_hu81a7405087771991938f164c6a45c6d2_109315_b28b9dbb6c70c33ca845fda461a64fcf.webp 760w,
/report/osre25/northwestern/statwrap/20250823-debangi29/image_hu81a7405087771991938f164c6a45c6d2_109315_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/northwestern/statwrap/20250823-debangi29/image_hu81a7405087771991938f164c6a45c6d2_109315_f70985a589ad6b79f8c95b36c5279852.webp"
width="760"
height="235"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p></description></item><item><title>[Final]Reproducibility of Interactive Notebooks in Distributed Environments</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/depaul/notebook-rep/08202025-rahmad/</link><pubDate>Wed, 20 Aug 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/depaul/notebook-rep/08202025-rahmad/</guid><description>&lt;p>I am sharing a overview of my project &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/ucsc/06122025-rahmad">Reproducibility of Interactive Notebooks in Distributed Environments&lt;/a> and the work that I did this summer.&lt;/p>
&lt;h1 id="project-overview">Project Overview&lt;/h1>
&lt;p>This project aims at improving the reproducibility of interactive notebooks which are executed in a distributed environment. Notebooks like in the &lt;a href="https://jupyter.org/" target="_blank" rel="noopener">Jupyter&lt;/a> environment have become increasingly popular and are widely used in the scientific community due to their ease of use and portability. Reproducing these notebooks is a challenging task especially in a distributed cluster environment.&lt;/p>
&lt;p>In the distributed environments we consider, the notebook code is divided into manager and worker code. The manager code is the main entry point of the program which divides the task at hand into one or more worker codes which run in a parallel, distributed fashion. We utlize several open source tools to package and containerize the application code which can be used to reproduce it across different machines and environments. They include &lt;a href="https://github.com/radiant-systems-lab/sciunit" target="_blank" rel="noopener">Sciunit&lt;/a>, &lt;a href="https://github.com/radiant-systems-lab/Flinc" target="_blank" rel="noopener">FLINC&lt;/a>, and &lt;a href="https://cctools.readthedocs.io/en/stable/taskvine/" target="_blank" rel="noopener">TaskVine&lt;/a>. These are the high-level goals of this project:&lt;/p>
&lt;ol>
&lt;li>Generate execution logs for a notebook program.&lt;/li>
&lt;li>Generate code and data dependencies for notebook programs in an automated manner.&lt;/li>
&lt;li>Utilize the generated dependencies at various granularities to automate the deployment and execution of notebooks in a parallel and distributed environment.&lt;/li>
&lt;li>Audit and package the notebook code running in a distributed environment.&lt;/li>
&lt;li>Overall, support efficient reproducibility of programs in a notebook program.&lt;/li>
&lt;/ol>
&lt;h1 id="progress-highlights">Progress Highlights&lt;/h1>
&lt;p>Here are the details of the work that I did during this summer.&lt;/p>
&lt;h2 id="generation-of-execution-logs">Generation of Execution Logs&lt;/h2>
&lt;p>We generate execution logs for the notebook programs in a distributed environment the Linux utility &lt;a href="https://man7.org/linux/man-pages/man1/strace.1.html" target="_blank" rel="noopener">strace&lt;/a> which records every system call made by the notebook. It includes all files accessed during its execution. We collect separate logs for both manager and the worker code since they are executed on different machines and the dependencies for both are different. By recording the entire notebook execution, we capture all libraries, packages, and data files referenced during notebook execution in the form of execution logs. These logs are then utilized for further analyses.&lt;/p>
&lt;h2 id="extracting-software-dependencies">Extracting Software Dependencies&lt;/h2>
&lt;p>When a library such as a Python package like &lt;em>Numpy&lt;/em> is used by the notebook program, an entry is made in the execution log which has the complete path of the accessed library file(s) along with additional information. We analyze the execution logs for both manager and workers to find and enlist all dependencies. So far, we are limited to Python packages, though this methodology is general and can be used to find dependencies for any programing language. For Python packages, their version numbers are also obtained by querying the package managers like &lt;em>pip&lt;/em> or &lt;em>Conda&lt;/em> on the local system.&lt;/p>
&lt;h2 id="extracting-data-dependencies">Extracting Data Dependencies&lt;/h2>
&lt;p>We utilze similar execution logs to identify which data files were used by the notebook program. The list of logged files also contain various configuration or setting files used by certain packages and libraries. These files are removed from the list of data dependencies through post-processing done by analyzing file paths.&lt;/p>
&lt;h2 id="testing-the-pipeline">Testing the Pipeline&lt;/h2>
&lt;p>We have conducted our experiments on three use cases obtained from different domains using between 5 and 10 workers. They include distributed image convolution, climate trend analysis, and high energy physics experiment analysis. The results so far are promising with good accuracy and with a slight running time overhead.&lt;/p>
&lt;h2 id="processing-at-cell-level">Processing at Cell-level&lt;/h2>
&lt;p>We perform the same steps of log generation and data and software dependency extraction at the level of individual cells in a notebook instead of once for the whole notebook. As a result, we generate software and data dependencies at the level of individual notebook cells. This is achieved by interrupting control flow before and after execution of each cell to write special instructions to the execution log for marking boundaries of cell execution. We then analyze the intervals between these instructions to identify which files and Python packages are accessed by each specific cell. We use this information to generate the list of software dependencies used by that cell only.&lt;/p>
&lt;p>We also capture data dependencies by overriding analyzing the execution logs generated by overriding the function of the &lt;em>open&lt;/em> function call used to access various files.&lt;/p>
&lt;h2 id="distributed-notebook-auditing">Distributed Notebook Auditing&lt;/h2>
&lt;p>In order to execute and audit workloads in parallel, we use &lt;a href="https://github.com/radiant-systems-lab/parallel-sciunit" target="_blank" rel="noopener">Sciunit Parallel&lt;/a> which uses GNU Parallel for efficient parallel execution of tasks. The user specifies the number of tasks or machines to run the task on which is then distributed across them. Once the execution completes, their containerized executions need to be gathered at the host location.&lt;/p>
&lt;h2 id="efficient-reproducibility-with-checkpointing">Efficient Reproducibility with Checkpointing&lt;/h2>
&lt;p>An important challenge with Jupyter notebooks is that sometimes they are unnecessarily time-consuming and resource-intensive, especially when most cells remain unchanged. We worked on &lt;a href="https://github.com/talha129/NBRewind/tree/master" target="_blank" rel="noopener">NBRewind&lt;/a> which is a lightweight tool to accelerate notebook re-execution by avoiding redundant computation. It integrates checkpointing, application virtualization, and content-based deduplication. It enables two kinds of checkpoints: incremental and full-state. In incremental checkpoints, notebook states and dependencies across multiple cells are stored once such that only their deltas are stored again. In full-state checkpoints, the same is stored after each cell. During its restore process, it restores outputs for unchanged cells and thus enables efficient re-execution. Our empirical
evaluation demonstrates that NBRewind can significantly reduce both notebook audit and repeat times with incremental checkpoints.&lt;/p>
&lt;p>I am very happy abut the experience I have had in this project and I would encourage other students to join this program in the future.&lt;/p></description></item><item><title>Midterm Report: Simulation, Comparison, and Conclusion of Cache Eviction</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/harvard/cachebench/2025-08-06-haochengxia/</link><pubDate>Wed, 06 Aug 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/harvard/cachebench/2025-08-06-haochengxia/</guid><description>&lt;h2 id="project-overview">Project Overview&lt;/h2>
&lt;p>&lt;strong>CacheBench&lt;/strong> is a benchmarking suite designed for comprehensive cache performance evaluation, with a particular focus on analyzing the miss ratios of various cache eviction algorithms.&lt;/p>
&lt;p>At the core of CacheBench lie two key components: the high-performance cache simulator, &lt;a href="https://github.com/1a1a11a/libCacheSim" target="_blank" rel="noopener">libCacheSim&lt;/a>, and the extensive &lt;a href="https://github.com/cacheMon/cache_dataset" target="_blank" rel="noopener">open-source cache datasets&lt;/a>, which collectively contain over 8,000 traces from diverse applications. This ensures broad coverage across a range of realistic workloads.&lt;/p>
&lt;p>Our primary goal is to evaluate all major and widely-used cache eviction algorithms on thousands of traces, in order to gain insights into their behaviors and design trade-offs. Additionally, we aim to identify and distill representative workloads, making benchmarking more efficient and comprehensive for future cache research.&lt;/p>
&lt;h2 id="progress-and-pain-points">Progress and Pain Points&lt;/h2>
&lt;p>We began by benchmarking prevalent eviction algorithms, including FIFO, LRU, CLOCK, LFU, Random, Belady (BeladySize), CAR, ARC, LIRS, LHD, Hyperbolic, GDSF, W-TinyLFU, 2Q, SLRU, S3-FIFO, SIEVE, and LeCaR. As we developed the suite, we made progressive improvements to both the simulator and dataset infrastructure. Our progress can be summarized as follows:&lt;/p>
&lt;ul>
&lt;li>Collected miss ratio results for all listed algorithms across 8,000+ traces.&lt;/li>
&lt;li>Identified best- and worst-performing traces for each algorithm, and conducted feature analysis of these traces.&lt;/li>
&lt;li>Developed Python bindings: To increase accessibility, we provided a Python package that allows users to easily download traces and run simulation analyses using &lt;a href="https://github.com/1a1a11a/libCacheSim" target="_blank" rel="noopener">libCacheSim&lt;/a> and the &lt;a href="https://github.com/cacheMon/cache_dataset" target="_blank" rel="noopener">cache datasets&lt;/a>.&lt;/li>
&lt;/ul>
&lt;p>However, analysis remains challenging because there is no universally accepted metric or baseline for objectively comparing cache eviction algorithms&amp;rsquo; performance across all workloads.&lt;/p>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;p>For the second half of the project, my focus will shift to:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Evaluating More Complex Eviction Algorithms&lt;/strong>: Having concentrated mainly on static eviction policies so far (which are generally more deterministic and understandable), I will now investigate learning-based eviction algorithms such as LRB and 3L-Cache. These models incorporate learning components and incur additional computational overhead, making simulations slower and more complex.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Detailed Trace Analysis&lt;/strong>: Since eviction algorithms can have highly variable performance on the same trace, I plan to analyze why certain algorithms excel on specific traces while others do not. Understanding these factors is crucial to characterizing both the algorithms and the workload traces.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Constructing Representative Workload Sets&lt;/strong>: Based on ongoing simulations and trace analyses, I aim to identify a minimal but representative subset of traces that can serve as a basic evaluation suite, simplifying testing and improving accessibility.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="reflection">Reflection&lt;/h2>
&lt;p>This project has truly been the highlight of my summer. By evaluating a wide range of cache eviction algorithms, I&amp;rsquo;ve significantly deepened my understanding of cache design and its underlying principles.&lt;/p>
&lt;p>I&amp;rsquo;m especially grateful to my mentors for their constant support, patience, and guidance throughout this journey. It’s been a privilege to learn from you!&lt;/p>
&lt;p>I&amp;rsquo;m excited to see the final results of CacheBench!&lt;/p></description></item><item><title>Mid-Term Update: MPI Appliance for HPC Research on Chameleon</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/mpi/20250803-rohan-babbar/</link><pubDate>Sun, 03 Aug 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/mpi/20250803-rohan-babbar/</guid><description>&lt;p>Hi everyone! This is my mid-term blog update for the project &lt;a href="https://ucsc-ospo.github.io/project/osre25/uchicago/mpi/" target="_blank" rel="noopener">MPI Appliance for HPC Research on Chameleon&lt;/a>, developed in collaboration with Argonne National Laboratory and the Chameleon Cloud community.
This blog follows up on my earlier post, which you can find &lt;a href="https://ucsc-ospo.github.io/report/osre25/uchicago/mpi/20250614-rohan-babbar/" target="_blank" rel="noopener">here&lt;/a>.&lt;/p>
&lt;h3 id="-june-15--june-29-2025">🔧 June 15 – June 29, 2025&lt;/h3>
&lt;p>Worked on creating and configuring images on Chameleon Cloud for the following three sites:
CHI@UC, CHI@TACC, and KVM@TACC.&lt;/p>
&lt;p>Key features of the images:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Spack&lt;/strong>: Pre-installed and configured for easy package management of HPC software.&lt;/li>
&lt;li>&lt;strong>Lua Modules (LMod)&lt;/strong>: Installed and configured for environment module management.&lt;/li>
&lt;li>&lt;strong>MPI Support&lt;/strong>: Both MPICH and Open MPI are pre-installed, enabling users to run distributed applications out-of-the-box.&lt;/li>
&lt;/ul>
&lt;p>These images are now publicly available and can be seen directly on the Chameleon Appliance Catalog, titled &lt;a href="https://chameleoncloud.org/appliances/127/" target="_blank" rel="noopener">MPI and Spack for HPC (Ubuntu 22.04)&lt;/a>.&lt;/p>
&lt;p>I also worked on some example Jupyter notebooks on how to get started using these images.&lt;/p>
&lt;h3 id="-june-30--july-13-2025">🔧 June 30 – July 13, 2025&lt;/h3>
&lt;p>With the MPI Appliance now published on Chameleon Cloud, the next step was to automate the setup of an MPI-Spack cluster.&lt;/p>
&lt;p>To achieve this, I developed a set of Ansible playbooks that:&lt;/p>
&lt;ol>
&lt;li>Configure both master and worker nodes with site-specific settings&lt;/li>
&lt;li>Set up seamless access to Chameleon NFS shares&lt;/li>
&lt;li>Allow users to easily install Spack packages, compilers, and dependencies across all nodes&lt;/li>
&lt;/ol>
&lt;p>These playbooks aim to simplify the deployment of reproducible HPC environments and reduce the time required to get a working cluster up and running.&lt;/p>
&lt;h3 id="-july-14--july-28-2025">🔧 July 14 – July 28, 2025&lt;/h3>
&lt;p>This week began with me fixing some issues in python-chi, the official Python client for the Chameleon testbed.
We also discussed adding support for CUDA-based packages, which would make it easier to work with NVIDIA GPUs.
We successfully published a new image on Chameleon, titled &lt;a href="https://chameleoncloud.org/appliances/130/" target="_blank" rel="noopener">MPI and Spack for HPC (Ubuntu 22.04 - CUDA)&lt;/a>, and added an example to demonstrate its usage.&lt;/p>
&lt;p>We compiled the artifact containing the Jupyter notebooks and Ansible playbooks and published it on Chameleon Trovi.
Feel free to check it out &lt;a href="https://chameleoncloud.org/experiment/share/7424a8dc-0688-4383-9d67-1e40ff37de17" target="_blank" rel="noopener">here&lt;/a>. The documentation still needs some work.&lt;/p>
&lt;p>📌 That’s it for now! I’m currently working on the documentation, a ROCm-based image for AMD GPUs, and some container-based examples.
Stay tuned for more updates in the next blog.&lt;/p></description></item><item><title>Mid-Term Report: Uncovering the True Sources of Non-Reproducibility in AI for Science</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/pnnl/llm_rag_reproducibility/20250725-wbq321/</link><pubDate>Fri, 01 Aug 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/pnnl/llm_rag_reproducibility/20250725-wbq321/</guid><description>&lt;p>Hello, I&amp;rsquo;m Baiqiang. I’m excited to share a mid-term update from the &lt;a href="https://ucsc-ospo.github.io/project/osre25/pnnl/llm_rag_reproducibility/" target="_blank" rel="noopener">Enhancing Reproducibility in RAG Frameworks for Scientific Workflows&lt;/a> project. This journey, mentored by Luanzheng &amp;ldquo;Lenny&amp;rdquo; Guo and Dongfang Zhao, has taken a fascinating and unexpected turn, leading to a much deeper understanding of what it takes to build truly reliable AI for science.&lt;/p>
&lt;h3 id="the-search-for-an-invisible-bug">The Search for an Invisible Bug&lt;/h3>
&lt;p>As a quick recap, our project tackles the critical problem of &lt;strong>non-determinism&lt;/strong> in Retrieval-Augmented Generation (RAG) systems. For science to be trustworthy, it must be repeatable. If an AI system gives different answers to the same question, it fails this fundamental test. Our initial goal, outlined in my &lt;a href="https://www.overleaf.com/read/fcbxtpngdnhw#8cc2c8" target="_blank" rel="noopener">proposal&lt;/a>, was to find and fix the sources of this inconsistency, which we believed lay within the retrieval algorithms themselves.&lt;/p>
&lt;p>To do this, we built a comprehensive testing framework capable of running thousands of controlled experiments. We designed it to meticulously measure the consistency of retrieval results while varying everything from the indexing algorithm to the underlying hardware.&lt;/p>
&lt;h3 id="a-surprising-discovery-the-usual-suspect-is-innocent">A Surprising Discovery: The Usual Suspect is Innocent&lt;/h3>
&lt;p>The common wisdom in the community is that high-performance, approximate search libraries like FAISS are a major source of randomness. We put this to the test, running repeated queries against various index types, including complex ones like &lt;code>HNSW&lt;/code> and &lt;code>IndexIVF&lt;/code>.&lt;/p>
&lt;p>Our results were clear and surprising: &lt;strong>FAISS is remarkably reproducible out of the box.&lt;/strong> When run on a consistent hardware and software stack, it returns the exact same results, every single time. The library appears to have robust internal seed management that ensures deterministic behavior.&lt;/p>
&lt;p>This finding was a pivotal moment. The non-reproducibility that researchers observe in practice is real, but it doesn&amp;rsquo;t come from where we expected. The problem isn&amp;rsquo;t the algorithm itself, but the environment it runs in. Our investigation immediately shifted to find the real culprits.&lt;/p>
&lt;h3 id="pinpointing-the-true-sources-of-non-determinism">Pinpointing the True Sources of Non-Determinism&lt;/h3>
&lt;p>Our framework quickly helped us identify the true sources of inconsistency:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Hardware-Induced Variation (CPU vs. GPU):&lt;/strong> This is the most significant factor. Running the exact same retrieval code can produce different document rankings and even different document sets when executed on a CPU versus a GPU. This is likely due to subtle differences in floating-point arithmetic and library optimizations in the hardware stack.&lt;/li>
&lt;li>&lt;strong>The Impact of Numerical Precision:&lt;/strong> We also confirmed that changing the floating-point precision of the data (e.g., from FP32 to FP16) can introduce small numerical variations that are just large enough to reorder the results, potentially changing the evidence the LLM receives.&lt;/li>
&lt;/ol>
&lt;h3 id="our-mission-refined-building-tools-for-environmental-control">Our Mission Refined: Building Tools for Environmental Control&lt;/h3>
&lt;p>This discovery has sharpened our project&amp;rsquo;s mission. The challenge is not to &amp;ldquo;fix&amp;rdquo; a supposedly random algorithm, but to develop the tools and best practices to control for the entire experimental environment. Our focus for the second half of the project is to:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Develop a Hardware-Aware Configuration Tracker:&lt;/strong> We are building a tool that goes beyond logging software versions. It will capture the critical details of the hardware environment—CPU/GPU model, CUDA version, etc.—and link them directly to an experiment&amp;rsquo;s results.&lt;/li>
&lt;li>&lt;strong>Create a Cross-Environment Validation Suite:&lt;/strong> Our open-source benchmarking suite will empower researchers to test their own pipelines. Crucially, it will help them identify and diagnose inconsistencies when moving workflows between different machines, such as from a local laptop to a cloud-based GPU.&lt;/li>
&lt;li>&lt;strong>Establish New Best Practices:&lt;/strong> We will distill our findings into clear, actionable guidance. The key recommendation is no longer just about choosing the right algorithm, but ensuring a consistent and well-documented hardware and software environment to guarantee reproducible outcomes.&lt;/li>
&lt;/ol>
&lt;p>By following the evidence, we’ve uncovered the root cause of a critical problem in AI-driven research. We are now developing the solutions needed to manage it, paving the way for a future where scientific discoveries powered by AI are built on a foundation of verifiable trust.&lt;/p></description></item><item><title>Midterm Report : Streamlining Reproducible Machine Learning Research with Automated MLOps Workflows</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/nyu/mlops/07292025-alghali/</link><pubDate>Wed, 30 Jul 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/nyu/mlops/07292025-alghali/</guid><description>&lt;h3 id="refresher-about-the-project">Refresher about the Project&lt;/h3>
&lt;p>Hi everyone! for the last month I have been working with my mentors Professor &lt;a href="https://ucsc-ospo.github.io/author/fraida-fund/" target="_blank" rel="noopener">Fraida Fund&lt;/a>, and &lt;a href="https://ucsc-ospo.github.io/author/mohamed-saeed/" target="_blank" rel="noopener">Mohamed Saeed&lt;/a> on our Project &lt;a href="https://ucsc-ospo.github.io/project/osre25/nyu/mlops/" target="_blank" rel="noopener">Applying MLOps to overcome reproducibility barriers in machine learning research&lt;/a> As a refresher, our goal is to build a template generator for a reproducible machine learning training workflows at the Chameleon testbed. We want to provide our users with the necessary environment configuration in a handy way. so they won&amp;rsquo;t be overwhelmed with all the intricate details of setting the environment. This will allow for validation and further development of their setup.&lt;/p>
&lt;hr>
&lt;h3 id="what-we-have-done-so-far">What we have done so far&lt;/h3>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="userflow" srcset="
/report/osre25/nyu/mlops/07292025-alghali/userflow_hu8aae690c470ebe5647870c6d86c96c68_71910_d0aee31c44beeded617d15565a3078b7.webp 400w,
/report/osre25/nyu/mlops/07292025-alghali/userflow_hu8aae690c470ebe5647870c6d86c96c68_71910_23aab3e41951725ceb2ba1683e8a5455.webp 760w,
/report/osre25/nyu/mlops/07292025-alghali/userflow_hu8aae690c470ebe5647870c6d86c96c68_71910_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/nyu/mlops/07292025-alghali/userflow_hu8aae690c470ebe5647870c6d86c96c68_71910_d0aee31c44beeded617d15565a3078b7.webp"
width="760"
height="307"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>The current workflow begins in JupyterHub, where the user provides basic details such as project name, site, and node type. the notebooks handle key setup tasks, like creating storage buckets, provisioning and configuring a server with GPU support, and mounting buckets locally via rclone. Once the host environment is ready, the user will SSH that machine, generates the necessary variables via a script and launches a containerized virtual lab that integrates Jupyter and MLflow. Inside the container, users authenticate with GitHub, connect or initialize their repositories, and can immediately begin training models, with all metrics, artifacts, and environment details logged for reproducibility.&lt;/p>
&lt;p>The progress on the project so far is as follows:&lt;/p>
&lt;h4 id="we-finalized-the-selection-of-frameworks-and-storage-options">We finalized the selection of frameworks and storage options.&lt;/h4>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="results" srcset="
/report/osre25/nyu/mlops/07292025-alghali/setup_huf1d9f9b29ea3e918ebffad4d45a90b19_52037_cc94f8d2983a972d5d551a1fd1b51c86.webp 400w,
/report/osre25/nyu/mlops/07292025-alghali/setup_huf1d9f9b29ea3e918ebffad4d45a90b19_52037_bd2f06761e3836b650d87a84b3ed4d00.webp 760w,
/report/osre25/nyu/mlops/07292025-alghali/setup_huf1d9f9b29ea3e918ebffad4d45a90b19_52037_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/nyu/mlops/07292025-alghali/setup_huf1d9f9b29ea3e918ebffad4d45a90b19_52037_cc94f8d2983a972d5d551a1fd1b51c86.webp"
width="760"
height="346"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>Artifacts are now logged directly from the MLflow server to the Chameleon object store, without relying on a database backend or an intermediate MinIO S3 layer.&lt;/p>
&lt;h4 id="different-jupyter-lab-images-for-each-framework">Different jupyter lab images for each framework.&lt;/h4>
&lt;p>We’ve started with the top ML frameworks — PyTorch Lightning, Keras/TensorFlow, and Scikit-Learn. Each framework now has its own image, which will later be tailored to the user’s selection.&lt;/p>
&lt;h4 id="github-cli-and-hugging-face-integration-inside-the-container">Github CLI and Hugging Face integration inside the container.&lt;/h4>
&lt;p>The Jupyter container now integrates both the GitHub CLI and Hugging Face authentication. Users can manage their code repositories via GitHub CLI commands and authenticate with Hugging Face tokens to download/upload models and datasets. This eliminates the need for manual credential setup and streamlines ML experimentation within the environment.&lt;/p>
&lt;h4 id="custom-logging-utility">Custom Logging Utility&lt;/h4>
&lt;p>To ensure robust tracking of code versioning and environment details, we added a custom logging utility.&lt;br>
These logs are stored alongside metrics and model artifacts in MLflow, ensuring every experiment is fully documented and reproducible. summary of the functionalities:&lt;/p>
&lt;hr>
&lt;h5 id="log_git--captures-code-versioning">&lt;code>log_git()&lt;/code> — Captures Code Versioning&lt;/h5>
&lt;p>Uses Git commands (via subprocess) to log:&lt;/p>
&lt;ul>
&lt;li>Current branch name&lt;/li>
&lt;li>Commit hash&lt;/li>
&lt;li>Repository status (clean or dirty)&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Example Output:&lt;/strong>&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="cl">commit: a7c3e9d
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">branch: main
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">status: dirty (1 file modified)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"># and git diff output
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h5 id="log_python-tracks-the-python-environment">&lt;code>log_python()&lt;/code>— Tracks the Python Environment&lt;/h5>
&lt;ul>
&lt;li>
&lt;p>Platform information + Python environment info (version)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Exports a full pip freeze list to a .txt file&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Saved as an MLflow artifact to guarantee exact package version reproducibility&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Example Output (pip freeze extract):&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-txt" data-lang="txt">&lt;span class="line">&lt;span class="cl">numpy==1.26.4
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">pandas==2.2.1
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">scikit-learn==1.4.2
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">torch==2.2.0
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h5 id="log_gpu---records-gpu-information">&lt;code>log_gpu()&lt;/code> - Records GPU Information&lt;/h5>
&lt;ul>
&lt;li>
&lt;p>Detects available GPU devices&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Collects details using NVIDIA’s pynvml or AMD’s ROCm tools&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Logs:&lt;/p>
&lt;/li>
&lt;li>
&lt;p>GPU name&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Driver version&lt;/p>
&lt;/li>
&lt;li>
&lt;p>CUDA/ROCm version&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Captures gpu-type-smi output for deeper inspection&lt;/p>
&lt;/li>
&lt;/ul>
&lt;hr>
&lt;p>These utilities ensure that each run can be traced back with:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>The exact code version&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The full Python environment&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The hardware details used&lt;/p>
&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h3 id="initial-customizable-template">Initial customizable template&lt;/h3>
&lt;p>We’ve prototyped an initial customizable template using Cookiecutter. it provides an interactive CLI, users provide some key project details (e.g., project name, frameworks, GPU type and integrations if any). Cookiecutter then generates a ready-to-use project structure with pre-configured integrations, reducing manual setup and ensuring consistency across environments.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="template generator"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/nyu/mlops/07292025-alghali/generator.gif"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>The user will have notebooks to communicate with chameleon testbed resources, containerized environment and custom training scripts to plug their code.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="emelents" srcset="
/report/osre25/nyu/mlops/07292025-alghali/elements_huf8c1a359b014b199be1f96460f6453ca_50752_d71a4a6bed166f1ba25e0480abe6d891.webp 400w,
/report/osre25/nyu/mlops/07292025-alghali/elements_huf8c1a359b014b199be1f96460f6453ca_50752_0451200eb97ac154443b7261da58399a.webp 760w,
/report/osre25/nyu/mlops/07292025-alghali/elements_huf8c1a359b014b199be1f96460f6453ca_50752_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/nyu/mlops/07292025-alghali/elements_huf8c1a359b014b199be1f96460f6453ca_50752_d71a4a6bed166f1ba25e0480abe6d891.webp"
width="760"
height="262"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h3 id="whats-next">What’s Next&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Template Generation via Config + interactive widgets&lt;/strong>&lt;br>
We are exploring different ways to generate experiment templates using configuration files and interactive widgets in jupyter notebooks. This would let users quickly customize logging setups and considered to be more user-friendly.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>AMD-Compatible Images&lt;/strong>&lt;br>
Extend support by building and testing Docker images optimized for AMD GPUs. Up to now, our development efforts has focused on NVIDIA GPUs using CUDA-based images&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>End-to-End Lifecycle Example&lt;/strong>&lt;br>
Provide a larger example demonstrating the entire ML workflow:&lt;/p>
&lt;ul>
&lt;li>Data preparation&lt;/li>
&lt;li>Training with GPU logging&lt;/li>
&lt;li>Tracking metrics, artifacts, and environment info in MLflow&lt;/li>
&lt;li>Model evaluation and logging&lt;/li>
&lt;li>Reproducing results on different hardware backends&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>Working on this project so far has been both challenging and eye-opening. I’ve seen how many moving parts need to come together for a smooth workflow. The support from my mentors has been key in helping me turning challenges into real progress.&lt;/p>
&lt;p>Thank you for following along — I’m looking forward to sharing more concrete results soon.&lt;/p></description></item><item><title>Type Narrowing: Evaluate New Gradual Languages and Do Unsound Narrowings Lead to Exploits</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uutah/type-narrowing/20250729-sivasathyaseelan/</link><pubDate>Tue, 29 Jul 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uutah/type-narrowing/20250729-sivasathyaseelan/</guid><description>&lt;p>Hello! I’m Siva Sathyaseelan D N, a pre-final year B.Tech + M.Tech Engineering student at IIT BHU, Varanasi, India. With a deep-rooted passion for software development and scientific computing. I thrive at the intersection of code and real-world problem-solving. For two years, I’ve engaged in open-source work across scientific simulation, blockchain, and cloud-native technologies, through hobby projects, hackathons, internships, and an LFX mentee. I&amp;rsquo;m contributing to&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/uutah/type-narrowing/">Type Narrowing: Evaluate New Gradual Languages and Do Unsound Narrowings Lead to Exploits&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/content/authors/bennn">Ben Greenman&lt;/a>. My proposal can be viewed &lt;a href="https://docs.google.com/document/d/1QcfiOWQQBxTW3YnkCmgfz-xHwLGad4OuCMjyphbaz54/edit?usp=sharing" target="_blank" rel="noopener">here&lt;/a>!&lt;/p>
&lt;h2 id="project-overview">Project Overview&lt;/h2>
&lt;p>Gradual typing enhances untyped languages like JavaScript and Python with static type checkers in systems like TypeScript, Flow, Mypy, Pyright, and Typed Racket, using type narrowing to refine types via runtime checks (e.g., typeof item[&amp;ldquo;price&amp;rdquo;] === &amp;ldquo;number&amp;rdquo;). Designs vary, TypeScript permits unverified predicates, Flow ensures soundness, and Typed Racket tracks types compositionally—prompting the If-T benchmark &lt;a href="https://github.com/utahplt/ift-benchmark" target="_blank" rel="noopener">ift-benchmark&lt;/a> to evaluate narrowing across five languages, though it omits tools like Sorbet, Hack, Luau, Pyre, Cinder/Static Python, Typed Clojure, and Elixir, and the risks of unsound narrowings remain unclear.&lt;/p>
&lt;p>&lt;strong>Objectives&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Extend the If-T benchmark to Sorbet, Hack, Luau, Pyre, Cinder/Static Python, Typed Clojure, and potentially Elixir.&lt;/li>
&lt;li>Analyze their type narrowing precision, expressiveness, and soundness.&lt;/li>
&lt;li>Conduct a corpus study of TypeScript or Python code using GitHub or Software Heritage APIs.&lt;/li>
&lt;li>Assess the prevalence and exploit potential of unsound narrowings.&lt;/li>
&lt;li>Link corpus findings to benchmark results for broader insights.&lt;/li>
&lt;/ul>
&lt;h2 id="progress-so-far">Progress So Far&lt;/h2>
&lt;p>During the first half of the SoR 2025 period, I focused on lextending the If-T benchmark to Sorbet, Pyre, Cinder/Static Python, Typed Clojure. These are the PRs which extends If-T benchmark:&lt;/p>
&lt;ul>
&lt;li>Sorbet -&amp;gt; &lt;a href="https://github.com/utahplt/ifT-benchmark/pull/20" target="_blank" rel="noopener">https://github.com/utahplt/ifT-benchmark/pull/20&lt;/a>&lt;/li>
&lt;li>Pyre -&amp;gt; &lt;a href="https://github.com/utahplt/ifT-benchmark/pull/26" target="_blank" rel="noopener">https://github.com/utahplt/ifT-benchmark/pull/26&lt;/a>&lt;/li>
&lt;li>Typed Clojure -&amp;gt; &lt;a href="https://github.com/utahplt/ifT-benchmark/pull/27" target="_blank" rel="noopener">https://github.com/utahplt/ifT-benchmark/pull/27&lt;/a>&lt;/li>
&lt;li>Cinder -&amp;gt; &lt;a href="https://github.com/utahplt/ifT-benchmark/pull/28" target="_blank" rel="noopener">https://github.com/utahplt/ifT-benchmark/pull/28&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="whats-next">What&amp;rsquo;s Next&lt;/h2>
&lt;p>I will be working on Conduct a corpus study of TypeScript or Python code using GitHub or Software Heritage APIs. Assess the prevalence and exploit potential of unsound narrowings. Also Link corpus findings to benchmark results for broader insights &lt;a href="https://github.com/utahplt/TGUsage" target="_blank" rel="noopener">TGUsage&lt;/a>.&lt;/p>
&lt;h2 id="final-thoughts">Final Thoughts&lt;/h2>
&lt;p>Working on &lt;strong>Type Narrowing&lt;/strong> has been incredibly rewarding, it’s more than just code. It’s studying the type systems of different programming languages which is very important for the large scale software systems and softwware security, and I’m honored to be a part of that.&lt;/p>
&lt;p>Big thanks to my mentors &lt;strong>Ben Greenman&lt;/strong> for their support and thoughtful feedback throughout. I’ve learned a ton already, and I can’t wait to keep building.&lt;/p></description></item><item><title>Mid-term Blog: Building a Simulator for Benchmarking Replicated Systems</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/umass/edge-replication/20250725-mchan/</link><pubDate>Fri, 25 Jul 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/umass/edge-replication/20250725-mchan/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Hello there, I&amp;rsquo;m Michael. In this report, I&amp;rsquo;ll be sharing my progress as part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/umass/edge-replication/">Open Testbed for Reproducible Evaluation of Replicated Systems at the Edges&lt;/a> project under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fadhil-kurnia/">Fadhil Kurnia&lt;/a>.&lt;/p>
&lt;h2 id="about-the-project">About the Project&lt;/h2>
&lt;p>The goal of the project is to build a &lt;em>language-agnostic&lt;/em> interface that enables communication between clients and any consensus protocol such as MultiPaxos, Raft, Zookeeper Atomic Broadcast (ZAB), and others. Currently, many of these protocols implement their own custom mechanisms for the client to communicate with the group of peers in the network. An implementation of MultiPaxos from the &lt;a href="https://arxiv.org/abs/2405.11183" target="_blank" rel="noopener">MultiPaxos Made Complete&lt;/a> paper for example, uses a custom Protobuf definition for the packets client send to the MultiPaxos system. With the support of a generalized interface, different consensus protocols can now be tested under the same workload to compare their performance objectively.&lt;/p>
&lt;h2 id="progress">Progress&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Literature Study:&lt;/strong>
Reviewed papers and implementations of various protocols including GigaPaxos, Raft, Viewstamped Replication (VSR), and ZAB. Analysis focused on their log replication strategies, fault handling, and performance implications.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Development of Custom Protocol:&lt;/strong>
Two custom protocols are currently under development and will serve as initial test subjects for the testbed:&lt;/p>
&lt;ul>
&lt;li>A modified GigaPaxos protocol&lt;/li>
&lt;li>A Primary-Backup Replication protocol with strict log ordering similar to ZAB (logs are ordered based on the sequence proposed by the primary)&lt;/li>
&lt;/ul>
&lt;p>Most of my time has been spent working on the two protocols, particularly on snapshotting and state transfer functionality in the Primary-Backup protocol. Ideally, the testbed should be able to evaluate protocol performance in scenarios involving node failure or a new node being added. In these scenarios, different protocol implementations often vary in their decision of whether to take periodic snapshots or to roll forward whenever possible and generate a snapshot only when necessary.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="challenges">Challenges&lt;/h2>
&lt;p>Early in the project, the initial goal was to benchmark different consensus protocols using arbitrary full-stack web applications as their workload. Different protocols would replicate a full-stack application running inside Docker containers across multiple nodes and the testbed would send requests for them to coordinate between those nodes. In fact, the 2 custom protocols being worked on are specifically made to fit these constraints.&lt;/p>
&lt;p>Developing a custom protocol that supports the replication of a Docker container is in itself already a difficult task. Abstracting away the functionality that allows communicating with the docker containers, as well as handling entry logs and snapshotting the state, is an order of magnitude more complicated.&lt;/p>
&lt;p>As mentioned in the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/umass/edge-replication/20250613-mchan/">first blog&lt;/a>, an application can be categorized into two types: deterministic and non-deterministic applications. The coordination of these two types of applications are handled in very different ways. Most consensus protocols support only deterministic systems, such as key-value stores and can&amp;rsquo;t easily handle coordination of complex services or external side effects. To allow support for non-deterministic applications would require abstracting over protocol-specific log structures. This effectively restricts the interface to only support protocols that conform to the abstraction, defeating the goal of making the interface broadly usable and protocol-agnostic.&lt;/p>
&lt;p>Furthermore, in order to allow &lt;strong>any&lt;/strong> existing protocols to support running something as complex as a stateful docker container without the protocol itself even knowing adds another layer of complexity to the system.&lt;/p>
&lt;h2 id="future-goals">Future Goals&lt;/h2>
&lt;p>Given these challenges, I decided to pivot to using only key-value stores as the application being used in the benchmark. This aligns with the implementations of most of the existing protocols which typically use key-value stores. In doing so, now the main focus would be to implement an interface that supports HTTP requests from clients to any arbitrary protocols.&lt;/p></description></item><item><title>Midterm Blog: Open Testbed for Reproducible Evaluation of Replicated Systems at the Edges</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/umass/edge-replication/20250725-panjisri/</link><pubDate>Fri, 25 Jul 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/umass/edge-replication/20250725-panjisri/</guid><description>&lt;p>Hello! I&amp;rsquo;m Panji Sri Kuncara Wisma and I want to share my midterm progress on the &amp;ldquo;Open Testbed for Reproducible Evaluation of Replicated Systems at the Edges&amp;rdquo; project under the mentorship of Fadhil I. Kurnia.&lt;/p>
&lt;h2 id="project-overview">Project Overview&lt;/h2>
&lt;p>The goal of our project is to create an open testbed that enables fair, reproducible evaluation of different consensus protocols (Paxos variants, EPaxos, Raft, etc.) when deployed at network edges. Currently, researchers struggle to compare these systems because they lack standardized evaluation environments and often rely on mock implementations of proprietary systems.&lt;/p>
&lt;p>XDN (eXtensible Distributed Network) is one of the important consensus systems we plan to evaluate in our benchmarking testbed. Built on GigaPaxos, it allows deployment of replicated stateful services across edge locations. As part of preparing our benchmarking framework, we need to ensure that the systems we evaluate, including XDN, are robust for fair comparison.&lt;/p>
&lt;h2 id="progress">Progress&lt;/h2>
&lt;p>As part of preparing our benchmarking tool, I have been working on refactoring XDN&amp;rsquo;s FUSE filesystem from C++ to Rust. This work is essential for creating a stable and reliable XDN platform.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="System Architecture" srcset="
/report/osre25/umass/edge-replication/20250725-panjisri/fuselog_design_hu4e0250a1afb641f82d064bca3b5b892d_118470_5600401ae6570bf38b96fa89a080f4f7.webp 400w,
/report/osre25/umass/edge-replication/20250725-panjisri/fuselog_design_hu4e0250a1afb641f82d064bca3b5b892d_118470_6d3b555dbec3bdb305839eda9b227acf.webp 760w,
/report/osre25/umass/edge-replication/20250725-panjisri/fuselog_design_hu4e0250a1afb641f82d064bca3b5b892d_118470_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/umass/edge-replication/20250725-panjisri/fuselog_design_hu4e0250a1afb641f82d064bca3b5b892d_118470_5600401ae6570bf38b96fa89a080f4f7.webp"
width="760"
height="439"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>The diagram above illustrates how the FUSE filesystem integrates with XDN&amp;rsquo;s distributed architecture. On the left, we see the standard FUSE setup where applications interact with the filesystem through the kernel&amp;rsquo;s VFS layer. On the right, the distributed replication flow is shown: Node 1 runs &lt;code>fuselog_core&lt;/code> which captures filesystem operations and generates statediffs, while Nodes 2 and 3 run &lt;code>fuselog_apply&lt;/code> to receive and apply these statediffs, maintaining replica consistency across the distributed system.&lt;/p>
&lt;p>This FUSE component is critical for XDN&amp;rsquo;s operation as it enables transparent state capture and replication across edge nodes. By refactoring this core component from C++ to Rust, we&amp;rsquo;re hopefully strengthening the foundation for fair benchmarking comparisons in our testbed.&lt;/p>
&lt;h3 id="core-work-c-to-rust-fuse-filesystem-migration">Core Work: C++ to Rust FUSE Filesystem Migration&lt;/h3>
&lt;p>XDN relies on a FUSE (Filesystem in Userspace) component to capture filesystem operations and generate &amp;ldquo;statediffs&amp;rdquo; - records of changes that get replicated across edge nodes. The original C++ implementation worked but had memory safety concerns and limited optimization capabilities.&lt;/p>
&lt;p>I worked on refactoring from C++ to Rust, implementing several improvements:&lt;/p>
&lt;p>&lt;strong>New Features Added:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Zstd Compression&lt;/strong>: Reduces statediff payload sizes&lt;/li>
&lt;li>&lt;strong>Adaptive Compression&lt;/strong>: Intelligently chooses compression strategies&lt;/li>
&lt;li>&lt;strong>Advanced Pruning&lt;/strong>: Removes redundant operations (duplicate chmod/chown, created-then-deleted files)&lt;/li>
&lt;li>&lt;strong>Bincode Serialization&lt;/strong>: Helps avoid manual serialization code and reduces the risk of related bugs&lt;/li>
&lt;li>&lt;strong>Extended Operations&lt;/strong>: Added support for additional filesystem operations (mkdir, symlink, hardlinks, etc.)&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Architectural Improvements:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Memory Safety&lt;/strong>: Rust&amp;rsquo;s ownership system helps prevent common memory management issues&lt;/li>
&lt;li>&lt;strong>Type Safety&lt;/strong>: Using Rust enums instead of integer constants for better type checking&lt;/li>
&lt;/ul>
&lt;h2 id="findings">Findings&lt;/h2>
&lt;p>The optimization results performed as expected:&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Database Performance Comparison" srcset="
/report/osre25/umass/edge-replication/20250725-panjisri/performance_hudc10c2ffc95d775aedb0a1dad587d6fd_55711_cb1ea5caaa82d543dfeabd0c97f7c4fe.webp 400w,
/report/osre25/umass/edge-replication/20250725-panjisri/performance_hudc10c2ffc95d775aedb0a1dad587d6fd_55711_d65f44ef3f769dddda7f0211b94ad6b6.webp 760w,
/report/osre25/umass/edge-replication/20250725-panjisri/performance_hudc10c2ffc95d775aedb0a1dad587d6fd_55711_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/umass/edge-replication/20250725-panjisri/performance_hudc10c2ffc95d775aedb0a1dad587d6fd_55711_cb1ea5caaa82d543dfeabd0c97f7c4fe.webp"
width="760"
height="433"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>&lt;strong>Statediff Size Reductions:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>MySQL workload&lt;/strong>: 572MB → 29.6MB (95% reduction)&lt;/li>
&lt;li>&lt;strong>PostgreSQL workload&lt;/strong>: 76MB → 11.9MB (84% reduction)&lt;/li>
&lt;li>&lt;strong>SQLite workload&lt;/strong>: 4MB → 29KB (99% reduction)&lt;/li>
&lt;/ul>
&lt;p>The combination of write coalescing, pruning, and compression proves especially effective for database workloads, where many operations involve small changes to large files.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Rust vs C&amp;#43;&amp;#43; Performance Comparison" srcset="
/report/osre25/umass/edge-replication/20250725-panjisri/latency_hu3b080735c91d058ad2f9cf67a54d5f14_21553_2adee964972897a04e60327dcfe9675e.webp 400w,
/report/osre25/umass/edge-replication/20250725-panjisri/latency_hu3b080735c91d058ad2f9cf67a54d5f14_21553_dd86a6fc0dabbac3beb17266f1f49002.webp 760w,
/report/osre25/umass/edge-replication/20250725-panjisri/latency_hu3b080735c91d058ad2f9cf67a54d5f14_21553_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/umass/edge-replication/20250725-panjisri/latency_hu3b080735c91d058ad2f9cf67a54d5f14_21553_2adee964972897a04e60327dcfe9675e.webp"
width="760"
height="470"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>&lt;strong>Performance Comparison:&lt;/strong>
Remarkably, the Rust implementation matches or exceeds C++ performance:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>POST operations&lt;/strong>: 30% faster (10.5ms vs 15ms)&lt;/li>
&lt;li>&lt;strong>DELETE operations&lt;/strong>: 33% faster (10ms vs 15ms)&lt;/li>
&lt;li>&lt;strong>Overall latency&lt;/strong>: Consistently better (9ms vs 11ms)&lt;/li>
&lt;/ul>
&lt;h2 id="current-challenges">Current Challenges&lt;/h2>
&lt;p>While the core implementation is complete and functional, I&amp;rsquo;m currently debugging occasional latency spikes that occur under specific workload patterns. These edge cases need to be resolved before moving on to the benchmarking phase, as inconsistent performance could compromise the reliability of the evaluation.&lt;/p>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;p>With the FUSE filesystem foundation nearly complete, next steps include:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Resolve latency spike issues&lt;/strong> and complete XDN stabilization&lt;/li>
&lt;li>&lt;strong>Build benchmarking framework&lt;/strong> - a comparison tool that can systematically evaluate different consensus protocols with standardized metrics.&lt;/li>
&lt;li>&lt;strong>Run systematic evaluation&lt;/strong> across protocols&lt;/li>
&lt;/ol>
&lt;p>The optimized filesystem will hopefully provide a stable base for reproducible performance comparisons between distributed consensus protocols.&lt;/p></description></item><item><title>Midterm for Smart Environments</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/smart_environments/20250724-sam_huang/</link><pubDate>Thu, 24 Jul 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/smart_environments/20250724-sam_huang/</guid><description>&lt;h2 id="what-is-envgym">What is EnvGym?&lt;/h2>
&lt;p>EnvGym is a general multi-agent framework designed to automate the construction of executable environments for reproducing research prototypes from top-tier conferences and journals. While reproducibility has become a growing concern in the research community, the process of setting up environments remains time-consuming, error-prone, and often poorly documented.&lt;/p>
&lt;p>EnvGym addresses this gap by leveraging LLM-powered agents to analyze project instructions, resolve dependencies, configure execution environments, and validate results—thereby reducing human overhead and improving reproducibility at scale.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="EnvGym Cover" srcset="
/report/osre25/uchicago/smart_environments/20250724-sam_huang/cover_hue02fdf353b4e99cf1af213026c4f6804_1815797_30e3b2194be140fa608780847e6c7fa1.webp 400w,
/report/osre25/uchicago/smart_environments/20250724-sam_huang/cover_hue02fdf353b4e99cf1af213026c4f6804_1815797_d39b2369b5df80ffa715197c993f0681.webp 760w,
/report/osre25/uchicago/smart_environments/20250724-sam_huang/cover_hue02fdf353b4e99cf1af213026c4f6804_1815797_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/smart_environments/20250724-sam_huang/cover_hue02fdf353b4e99cf1af213026c4f6804_1815797_30e3b2194be140fa608780847e6c7fa1.webp"
width="760"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="progress">Progress&lt;/h2>
&lt;h3 id="new-tools">New Tools&lt;/h3>
&lt;p>Initially, our agent had access to only one tool: the command line. This constrained the agent’s ability to decompose complex tasks and respond flexibly to failures. Over the last few weeks, we introduced a modular tool system, enabling the agent to handle specific subtasks more effectively.&lt;/p>
&lt;p>The new toolset includes:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>dockerrun: Executes Dockerfiles.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>hardware_checking, hardware_adjustment: Tailor builds to available resources.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>history_manager, stats: Tracks historical data for improvement and reproducibility.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>planning: Generates high-level execution plans.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>summarize: Interprets build results to adjust subsequent iterations.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>writing_docker_initial, writing_docker_revision: Generate and refine Dockerfiles.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>While some of those tools, such as dockerrun, run programmatic scripts, other scripts such as planning are more complex and use LLMs themselves.&lt;/p>
&lt;h3 id="agent-re-architecture-moving-beyond-codex">Agent Re-Architecture: Moving Beyond Codex&lt;/h3>
&lt;p>We transitioned away from OpenAI&amp;rsquo;s Codex agent implementation. While powerful, Codex&amp;rsquo;s framework was overly reliant on its CLI frontend, which added unnecessary complexity and limited customizability for our research context.&lt;/p>
&lt;p>We implemented our own lightweight, customizable agent pipeline that integrates LLM-based planning with iterative execution. Conceptually, the agent executes the following loop:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Repo Scanning&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Hardware Check&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Planning &amp;amp; Initial Dockerfile Generation&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Docker Execution&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Progress Summarization &amp;amp; Adjustment&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Iterative Dockerfile Refinement (up to 20 rounds)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Success Check &amp;amp; Logging&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>This new agent design is easier to control, extend, and debug—aligning better with the needs of reproducibility research.&lt;/p>
&lt;h3 id="prompt-engineering">Prompt Engineering&lt;/h3>
&lt;p>For each tool that requires LLMs to function, we created a set of custom prompts that outline the task and breaks down the goals. For instance, the prompt used in summarize differs from the one in planning, allowing us to optimize the behavior of LLM agents per context.&lt;/p>
&lt;h3 id="performance-gains">Performance Gains&lt;/h3>
&lt;p>With these improvements, EnvGym now successfully replicates 9 repositories, surpassing our baseline Codex agent which struggled with the same set. We’ve observed more reliable planning, better handling of edge-case dependencies, and faster convergence in iterative Dockerfile revisions.&lt;/p>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;h3 id="granular-evaluation-metric">Granular Evaluation Metric&lt;/h3>
&lt;p>We plan to adopt a tree-structured rubric-based evaluation, inspired by PaperBench. Instead of binary success/failure, each repo will be assigned a reproducibility score from 0–100.&lt;/p>
&lt;p>Key tasks include:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Rubric Design: Define a hierarchical rubric with criteria like dependency resolution, test success rate, runtime match, etc.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Manual Annotation: Build a dataset of ground-truth rubrics for a subset of repos to calibrate our automatic judge.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Judge Implementation: Develop an LLM-based judge function that takes (i) rubric and (ii) environment state, and returns a reproducibility score.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Example of a rubric tree" srcset="
/report/osre25/uchicago/smart_environments/20250724-sam_huang/rubric-tree_hu9020427fa0020bc8ab99a7f01a351cd0_70521_ae181d659b85544bd98fa2bbdbe0c09d.webp 400w,
/report/osre25/uchicago/smart_environments/20250724-sam_huang/rubric-tree_hu9020427fa0020bc8ab99a7f01a351cd0_70521_700416bce638eba7acc49573f12b11b0.webp 760w,
/report/osre25/uchicago/smart_environments/20250724-sam_huang/rubric-tree_hu9020427fa0020bc8ab99a7f01a351cd0_70521_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/smart_environments/20250724-sam_huang/rubric-tree_hu9020427fa0020bc8ab99a7f01a351cd0_70521_ae181d659b85544bd98fa2bbdbe0c09d.webp"
width="557"
height="497"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
Source: Starace, Giulio, et al. &amp;ldquo;PaperBench: Evaluating AI&amp;rsquo;s Ability to Replicate AI Research.&amp;rdquo; arXiv preprint arXiv:2504.01848 (2025).&lt;/p>
&lt;p>This will make EnvGym suitable for benchmarking. We will run our new method and obtain a score to compare with baseline methods!&lt;/p>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>EnvGym has made strong progress toward automating reproducibility in computational research. Through modularization, agentic design, and prompt optimizations, we’ve surpassed existing baselines and laid the groundwork for even more improvement.&lt;/p>
&lt;p>The upcoming focus on metrics and benchmarking will elevate EnvGym from a functional prototype to a standardized reproducibility benchmark tool and also quantitatively prove that our new agentic method is better than existing tools such as Codex. Excited for what&amp;rsquo;s to come!&lt;/p>
&lt;p>Autofill&lt;/p>
&lt;p>;
20250724-Sam_Huang&lt;/p></description></item><item><title>Mid-term Blog: StatWrap: Cross-Project Searching and Classification using Local Indexing</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/northwestern/statwrap/20250715-debangi29/</link><pubDate>Tue, 15 Jul 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/northwestern/statwrap/20250715-debangi29/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Hello everyone!&lt;br>
I am Debangi Ghosh from India, an undergraduate student at the Indian Institute of Technology (IIT) BHU, Varanasi. As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/northwestern/statwrap/">StatWrap: Cross-Project Searching and Classification using Local Indexing&lt;/a> project, my &lt;a href="https://drive.google.com/file/d/1dxyBP2oMJwYDCKyIWzr465zNmm6UWtnI/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a>, under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/luke-rasmussen/">Luke Rasmussen&lt;/a>, focuses on developing a full-text search service within the StatWrap user interface. This involves evaluating different search libraries and implementing a classification system to distinguish between active and past projects.&lt;/p>
&lt;h2 id="about-the-project">&lt;strong>About the Project&lt;/strong>&lt;/h2>
&lt;p>As part of the project, I am working on enhancing the usability of StatWrap by enabling efficient cross-project search capabilities. The goal is to make it easier for investigators to discover relevant projects, notes, and assets—across both current and archived work—using information that is either user-entered or passively collected by StatWrap.&lt;/p>
&lt;p>Given the sensitivity of the data involved, one of the key requirements is that all indexing and search operations must be performed locally. To address this, my responsibilities include:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Evaluating open-source search libraries&lt;/strong> suitable for local indexing and retrieval&lt;/li>
&lt;li>&lt;strong>Building the full-text search functionality&lt;/strong> directly into the StatWrap UI to allow seamless querying across projects&lt;/li>
&lt;li>&lt;strong>Ensuring reliability&lt;/strong> through the development of unit tests and comprehensive system testing&lt;/li>
&lt;li>&lt;strong>Implementing a classification system&lt;/strong> to label projects as “Active,” “Pinned,” or “Past” within the user interface&lt;/li>
&lt;/ul>
&lt;p>This project offers a great opportunity to work at the intersection of software development, information retrieval, and user-centric design—while contributing to research reproducibility and collaboration within scientific workflows.&lt;/p>
&lt;h2 id="progress">Progress&lt;/h2>
&lt;p>It has been more than six weeks since the project began, and significant progress has been made. Here&amp;rsquo;s a breakdown:&lt;/p>
&lt;h3 id="1-descriptive-comparison-of-open-source-libraries">1. &lt;strong>Descriptive Comparison of Open-Source Libraries&lt;/strong>&lt;/h3>
&lt;p>Compared various open-source search libraries based on evaluation criteria such as &lt;strong>indexing speed, search speed, memory usage, typo tolerance, fuzzy searching, partial matching, full-text queries, contextual search, Boolean support, exact word match, installation ease, maintenance, documentation&lt;/strong>, and &lt;strong>developer experience&lt;/strong>.&lt;/p>
&lt;h3 id="2-the-libraries">2. &lt;strong>The Libraries&lt;/strong>&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Lunr.js&lt;/strong>&lt;br>
A small, client-side full-text search engine that mimics Solr capabilities.&lt;/p>
&lt;ul>
&lt;li>Field-based search, boosting&lt;/li>
&lt;li>Supports TF-IDF, inverted index&lt;/li>
&lt;li>No built-in fuzzy search (only basic wildcards)&lt;/li>
&lt;li>Can serialize/deserialize index&lt;/li>
&lt;li>Not designed for large datasets&lt;/li>
&lt;li>Moderate memory usage and indexing speed&lt;/li>
&lt;li>Good documentation&lt;/li>
&lt;li>&lt;strong>Best for&lt;/strong>: Static websites or SPAs needing simple in-browser search&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>ElasticLunr.js&lt;/strong>&lt;br>
A lightweight, more flexible alternative to Lunr.js.&lt;/p>
&lt;ul>
&lt;li>Dynamic index (add/remove docs)&lt;/li>
&lt;li>Field-based and weighted search&lt;/li>
&lt;li>No advanced fuzzy matching&lt;/li>
&lt;li>Faster and more customizable than Lunr&lt;/li>
&lt;li>Smaller footprint&lt;/li>
&lt;li>Easy to use and maintain&lt;/li>
&lt;li>&lt;strong>Best for&lt;/strong>: Developers wanting Lunr-like features with simpler customization&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Fuse.js&lt;/strong>&lt;br>
A fuzzy search library ideal for small to medium datasets.&lt;/p>
&lt;ul>
&lt;li>Fuzzy search with typo tolerance&lt;/li>
&lt;li>Deep key/path searching&lt;/li>
&lt;li>No need to build index&lt;/li>
&lt;li>Highly configurable (threshold, distance, etc.)&lt;/li>
&lt;li>Linear scan = slower on large datasets&lt;/li>
&lt;li>Not full-text search (scoring-based match)&lt;/li>
&lt;li>Extremely easy to set up and use&lt;/li>
&lt;li>&lt;strong>Best for&lt;/strong>: Fuzzy search in small in-memory arrays (e.g., auto-suggest, dropdown filters)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>FlexSearch&lt;/strong>&lt;br>
A blazing-fast, modular search engine with advanced indexing options.&lt;/p>
&lt;ul>
&lt;li>Extremely fast search and indexing&lt;/li>
&lt;li>Supports phonetic, typo-tolerant, and partial matching&lt;/li>
&lt;li>Asynchronous support&lt;/li>
&lt;li>Multi-language + Unicode-friendly&lt;/li>
&lt;li>Low memory footprint&lt;/li>
&lt;li>Configuration can be complex for beginners&lt;/li>
&lt;li>&lt;strong>Best for&lt;/strong>: High-performance search in large/multilingual datasets&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>MiniSearch&lt;/strong>&lt;br>
A small, full-text search engine with balanced performance and simplicity.&lt;/p>
&lt;ul>
&lt;li>Fast indexing and searching&lt;/li>
&lt;li>Fuzzy search, stemming, stop words&lt;/li>
&lt;li>Field boosting and prefix search&lt;/li>
&lt;li>Compact, can serialize index&lt;/li>
&lt;li>Clean and modern API&lt;/li>
&lt;li>Lightweight and easy to maintain&lt;/li>
&lt;li>&lt;strong>Best for&lt;/strong>: Balanced, in-browser full-text search for moderate datasets&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Search-Index&lt;/strong>&lt;br>
A persistent, full-featured search engine for Node.js and browsers.&lt;/p>
&lt;ul>
&lt;li>Persistent storage with LevelDB&lt;/li>
&lt;li>Real-time indexing&lt;/li>
&lt;li>Fielded queries, faceting, filtering&lt;/li>
&lt;li>Advanced queries (Boolean, range, etc.)&lt;/li>
&lt;li>Slightly heavier setup&lt;/li>
&lt;li>Good for offline/local-first apps&lt;/li>
&lt;li>Browser usage more complex than others&lt;/li>
&lt;li>&lt;strong>Best for&lt;/strong>: Node.js apps, &lt;strong>not directly compatible with the Electron + React environment of StatWrap&lt;/strong>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="3-developer-experience-and-maintenance">3. Developer Experience and Maintenance&lt;/h3>
&lt;p>We analyzed the download trends of the search libraries using npm trends, and also reviewed their maintenance statistics to assess how frequently they are updated.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="DOWNLOADS" srcset="
/report/osre25/northwestern/statwrap/20250715-debangi29/downloads_hu3acc13cb2503d87ec01b259eecff7d9f_205568_2981b0e25cc7e6da71dd1af69f1ab499.webp 400w,
/report/osre25/northwestern/statwrap/20250715-debangi29/downloads_hu3acc13cb2503d87ec01b259eecff7d9f_205568_52b5a1c87803e2c8a2f59ad52703cd75.webp 760w,
/report/osre25/northwestern/statwrap/20250715-debangi29/downloads_hu3acc13cb2503d87ec01b259eecff7d9f_205568_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/northwestern/statwrap/20250715-debangi29/downloads_hu3acc13cb2503d87ec01b259eecff7d9f_205568_2981b0e25cc7e6da71dd1af69f1ab499.webp"
width="760"
height="362"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;br>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Maintenance" srcset="
/report/osre25/northwestern/statwrap/20250715-debangi29/Maintenance_hub392779bb7551900858e36e62009d315_166372_50f35746c2224661759e3d1f68308f5c.webp 400w,
/report/osre25/northwestern/statwrap/20250715-debangi29/Maintenance_hub392779bb7551900858e36e62009d315_166372_1f83a8585ae086eae8ad16a0d18c8fff.webp 760w,
/report/osre25/northwestern/statwrap/20250715-debangi29/Maintenance_hub392779bb7551900858e36e62009d315_166372_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/northwestern/statwrap/20250715-debangi29/Maintenance_hub392779bb7551900858e36e62009d315_166372_50f35746c2224661759e3d1f68308f5c.webp"
width="760"
height="261"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h3 id="4-comparative-analysis-after-testing">4. Comparative Analysis After Testing&lt;/h3>
&lt;p>Each search library was benchmarked against a predefined set of queries based on the same evaluation criteria.&lt;br>
We are yet to finalize the weights for each criterion, which will be done during the end-term evaluation.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="COMPARATIVE ANALYSIS" srcset="
/report/osre25/northwestern/statwrap/20250715-debangi29/image_huff63b524c7af2307fdfe0ebf7a2c55bc_128809_cf08ab4466e54fc0970dac451ab583d2.webp 400w,
/report/osre25/northwestern/statwrap/20250715-debangi29/image_huff63b524c7af2307fdfe0ebf7a2c55bc_128809_4d08ea843125818ade4b1288b2ed91fd.webp 760w,
/report/osre25/northwestern/statwrap/20250715-debangi29/image_huff63b524c7af2307fdfe0ebf7a2c55bc_128809_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/northwestern/statwrap/20250715-debangi29/image_huff63b524c7af2307fdfe0ebf7a2c55bc_128809_cf08ab4466e54fc0970dac451ab583d2.webp"
width="760"
height="578"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h3 id="5-the-user-interface">5. The User Interface&lt;/h3>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="User Interface" srcset="
/report/osre25/northwestern/statwrap/20250715-debangi29/UI_hu614745e803a206ba95d1613340cef4da_263973_ad72fdc47d934ea42f989055b49d88aa.webp 400w,
/report/osre25/northwestern/statwrap/20250715-debangi29/UI_hu614745e803a206ba95d1613340cef4da_263973_51decc3c2ce6793ca567153dd67113d0.webp 760w,
/report/osre25/northwestern/statwrap/20250715-debangi29/UI_hu614745e803a206ba95d1613340cef4da_263973_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/northwestern/statwrap/20250715-debangi29/UI_hu614745e803a206ba95d1613340cef4da_263973_ad72fdc47d934ea42f989055b49d88aa.webp"
width="760"
height="475"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;br>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Debug Tools" srcset="
/report/osre25/northwestern/statwrap/20250715-debangi29/image-1_huff1ce04307fd90cec714c35adb969f67_82199_e86edc8fa7aba824f1fd8a90948c619c.webp 400w,
/report/osre25/northwestern/statwrap/20250715-debangi29/image-1_huff1ce04307fd90cec714c35adb969f67_82199_ba6358e5089040847a0e39704677cc12.webp 760w,
/report/osre25/northwestern/statwrap/20250715-debangi29/image-1_huff1ce04307fd90cec714c35adb969f67_82199_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/northwestern/statwrap/20250715-debangi29/image-1_huff1ce04307fd90cec714c35adb969f67_82199_e86edc8fa7aba824f1fd8a90948c619c.webp"
width="760"
height="482"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>The user interface includes options to search using three search modes (Basic, Advanced, Boolean operators) with configurable parameters. Results are sorted based on relevance score (highest first), and also grouped by category.&lt;/p>
&lt;h3 id="6-overall-functioning">6. Overall Functioning&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Indexing Workflow&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Projects are processed sequentially&lt;/li>
&lt;li>Metadata, files, people, and notes are indexed (larger files are queued for later)&lt;/li>
&lt;li>Uses a &amp;ldquo;brute-force&amp;rdquo; recursive approach to walk through project directories
&lt;ul>
&lt;li>Skips directories like &lt;code>node_modules&lt;/code>, &lt;code>.git&lt;/code>, &lt;code>.statwrap&lt;/code>&lt;/li>
&lt;li>Identifies eligible text files for indexing&lt;/li>
&lt;li>Logs progress every 10 files&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Document Creation Logic&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Reads file content as UTF-8 text&lt;/li>
&lt;li>Builds searchable documents with filename, content, and metadata&lt;/li>
&lt;li>Auto-generates tags based on content and file type&lt;/li>
&lt;li>Adds documents to the search index and document store&lt;/li>
&lt;li>Handles errors gracefully with debug logging&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Search Functionality&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Uses field-weighted search&lt;/li>
&lt;li>Enriches results with document metadata&lt;/li>
&lt;li>Supports filtering by type or project&lt;/li>
&lt;li>Groups results by category (files, projects, people, etc.)&lt;/li>
&lt;li>Implements caching for improved performance&lt;/li>
&lt;li>Search statistics are generated to monitor performance&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="challenges-and-end-term-goals">Challenges and End-Term Goals&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>In-memory Indexing Metadata Storing&lt;/strong>&lt;br>
Most JavaScript search libraries (like Fuse.js, Lunr, MiniSearch) store indexes entirely in memory, which can become problematic for large-scale datasets. A key challenge is designing a scalable solution that allows for disk persistence or lazy loading to prevent memory overflows.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Deciding the Weights Accordingly&lt;/strong>&lt;br>
An important challenge is tuning the relevance scoring by assigning appropriate weights to different aspects of the search, such as exact word matches, prefix matches, and typo tolerance. For instance, we prefer exact matches to be ranked higher than fuzzy or partial matches.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Implementing the Selected Library&lt;/strong>&lt;br>
Once a library is selected (based on speed, features, and compatibility with Electron + React), the next challenge is integrating it into StatWrap efficiently—ensuring local indexing, accurate search results, and smooth performance even with large projects.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Classifying Active and Past Projects in the User Interface&lt;/strong>&lt;br>
To improve navigation and search scoping, we plan to introduce three project sections in the interface: &lt;strong>Pinned&lt;/strong>, &lt;strong>Active&lt;/strong>, and &lt;strong>Past&lt;/strong> projects. This classification will help users prioritize relevant content while enabling smarter indexing strategies.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Stay tuned for the next blog!&lt;/p></description></item><item><title>Enhancing Reproducibility in RAG Frameworks for Scientific Workflows</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/pnnl/llm_rag_reproducibility/20250625-wbq321/</link><pubDate>Wed, 25 Jun 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/pnnl/llm_rag_reproducibility/20250625-wbq321/</guid><description>&lt;p>Hello, I&amp;rsquo;m Baiqiang. As part of the &lt;a href="https://ucsc-ospo.github.io/project/osre25/pnnl/llm_rag_reproducibility/" target="_blank" rel="noopener">Enhancing Reproducibility in RAG Frameworks for Scientific Workflows&lt;/a> project, I am excited to introduce my work on a crucial challenge in modern computational science. My &lt;a href="https://www.overleaf.com/read/fcbxtpngdnhw#8cc2c8" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of Luanzheng &amp;ldquo;Lenny&amp;rdquo; Guo at Pacific Northwest National Laboratory and Dongfang Zhao at the University of Washington aims to enhance the reproducibility of AI-driven scientific workflows.&lt;/p>
&lt;h3 id="the-problem-a-crisis-of-confidence-in-ai-for-science">The Problem: A Crisis of Confidence in AI for Science&lt;/h3>
&lt;p>Large Language Models (LLMs) are transforming scientific research, from accelerating literature reviews to generating novel hypotheses. However, their power is matched by their pitfalls: a tendency to &amp;ldquo;hallucinate&amp;rdquo; facts and a lack of transparency. Retrieval-Augmented Generation (RAG) was developed as a powerful solution, grounding LLM outputs in factual evidence retrieved from a specific knowledge base (like a database of scientific papers).&lt;/p>
&lt;p>But a hidden problem lurks within RAG: &lt;strong>non-determinism&lt;/strong>. The very first step of a RAG system—the similarity search that finds relevant documents—can produce different results even when asked the same question. Variations in indexing algorithms, data updates, or even the underlying software can change which documents are retrieved. For science, this is a critical flaw. If an experiment cannot be repeated with the same results, its conclusions cannot be trusted. This project tackles that challenge head-on.&lt;/p>
&lt;h3 id="our-mission-forging-a-path-to-reproducible-rag">Our Mission: Forging a Path to Reproducible RAG&lt;/h3>
&lt;p>This project proposes a comprehensive solution to systematically identify, measure, and mitigate non-determinism in RAG frameworks. Our goal is to empower researchers to build and use AI tools with confidence.&lt;/p>
&lt;p>Our approach is built on four key pillars:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Systematic Analysis:&lt;/strong> We will conduct a deep dive into popular RAG components (like FAISS, ScaNN, and HNSW) to pinpoint the exact sources of randomness and variability.&lt;/li>
&lt;li>&lt;strong>Rigorous Benchmarking:&lt;/strong> We will develop a public, open-source benchmarking suite using standardized scientific datasets (from PubMed, arXiv, etc.). This will allow anyone to quantitatively measure the reproducibility of their own RAG pipeline using clear metrics like retrieval overlap and rank correlation.&lt;/li>
&lt;li>&lt;strong>Targeted Enhancements:&lt;/strong> Based on our findings, we will implement practical solutions, including:
&lt;ul>
&lt;li>Promoting deterministic algorithms and configurations.&lt;/li>
&lt;li>Building robust data versioning and provenance tracking tools (inspired by DVC and Git LFS).&lt;/li>
&lt;li>Creating tools for precise configuration management to capture the entire experimental setup.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Practical Guidance and Open Source Tools:&lt;/strong> We will distill our insights into comprehensive documentation, reusable code examples, and best practices. All tools and findings will be contributed back to the open-source community.&lt;/li>
&lt;/ol></description></item><item><title>From Friction to Flow: Why I'm Building Widgets for Reproducible Research</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/jupyter-widgets/20250624-nbrewer/</link><pubDate>Tue, 24 Jun 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/jupyter-widgets/20250624-nbrewer/</guid><description>&lt;blockquote>
&lt;p>This summer, I’m building Jupyter Widgets to reduce friction in reproducible workflows on Chameleon. Along the way, I’m reflecting on what usability teaches us about the real meaning of reproducibility.&lt;/p>
&lt;/blockquote>
&lt;h2 id="supercomputing-competition-reproducibility-reality-check">Supercomputing Competition: Reproducibility Reality Check&lt;/h2>
&lt;p>My first reproducibility experience threw me into the deep end—trying to recreate a tsunami simulation with a GitHub repository, a scientific paper, and a lot of assumptions. I was part of a student cluster competition at the Supercomputing Conference, where one of our challenges was to reproduce the results of a prior-year paper. I assumed “reproduce” meant something like “re-run the code and get the same numbers.” But what we actually had to do was rebuild the entire computing environment from scratch—on different hardware, with different software versions, and vague documentation. I remember thinking: &lt;em>If all these conditions are so different, what are we really trying to learn by conducting reproducibility experiments?&lt;/em> That experience left me with more questions than answers, and those questions have stayed with me. In fact, they’ve become central to my PhD research.&lt;/p>
&lt;h2 id="summer-of-reproducibility-lessons-from-100-experiments-on-chameleon">Summer of Reproducibility: Lessons from 100+ Experiments on Chameleon&lt;/h2>
&lt;p>I’m currently a PhD student and research software engineer exploring questions around what computational reproducibility really means, and when and why it matters. I also participated in the &lt;strong>&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/depaul/repronb/">Summer of Reproducibility 2024&lt;/a>&lt;/strong>, where I helped assess over 100 public experiments on the Chameleon platform. &lt;a href="https://doi.org/10.1109/e-Science62913.2024.10678673" target="_blank" rel="noopener">Our analysis&lt;/a> revealed key friction points—especially around usability—that don’t necessarily prevent reproducibility in the strictest sense, but introduce barriers in terms of time, effort, and clarity. These issues may not stop an expert from reproducing an experiment, but they can easily deter others from even trying. This summer’s project is about reducing that friction—some of which I experienced firsthand—by improving the interface between researchers and the infrastructure they rely on.&lt;/p>
&lt;h2 id="from-psychology-labs-to-jupyter-notebooks-usability-is-central-to-reproducibility">From Psychology Labs to Jupyter Notebooks: Usability is Central to Reproducibility&lt;/h2>
&lt;p>My thinking shifted further when I was working as a research software engineer at Purdue, supporting a psychology lab that relied on a complex statistical package. For most researchers in the lab, using the tool meant wrestling with cryptic scripts and opaque parameters. So I built a simple Jupyter-based interface to help them visualize input matrices, validate settings, and run analyses without writing code. The difference was immediate: suddenly, people could actually use the tool. It wasn’t just more convenient—it made the research process more transparent and repeatable. That experience was a turning point for me. I realized that usability isn’t a nice-to-have; it’s critical for reproducibility.&lt;/p>
&lt;h2 id="teaching-jupyter-widget-tutorials-at-scipy">Teaching Jupyter Widget Tutorials at SciPy&lt;/h2>
&lt;p>Since that first experience, I’ve leaned into building better interfaces for research workflows—especially using Jupyter Widgets. Over the past few years, I’ve developed and taught tutorials on how to turn scientific notebooks into interactive web apps, including at the &lt;strong>SciPy conference&lt;/strong> in &lt;a href="https://github.com/Jupyter4Science/scipy23-jupyter-web-app-tutorial" target="_blank" rel="noopener">2023&lt;/a> and &lt;a href="https://github.com/Jupyter4Science/scipy2024-jupyter-widgets-tutorial" target="_blank" rel="noopener">2024&lt;/a>. These tutorials go beyond the basics: I focus on building real, multi-tab applications that reflect the complexity of actual research tools. Teaching others how to do this has deepened my own knowledge of the widget ecosystem and reinforced my belief that good interfaces can dramatically reduce the effort it takes to reproduce and reuse scientific code. That’s exactly the kind of usability work I’m continuing this summer—this time by improving the interface between researchers and the Chameleon platform itself.&lt;/p>
&lt;h2 id="making-chameleon-even-more-reproducible-with-widgets">Making Chameleon Even More Reproducible with Widgets&lt;/h2>
&lt;p>This summer, I’m returning to Chameleon with a more focused goal: reducing some of the friction I encountered during last year’s reproducibility project. One of Chameleon’s standout features is its Jupyter-based interface, which already goes a long way toward making reproducibility more achievable. My work builds on that strong foundation by improving and extending interactive widgets in the &lt;strong>Python-chi&lt;/strong> library — making tasks like provisioning resources, managing leases, and tracking experiment progress on Chameleon even more intuitive. For example, instead of manually digging through IDs to find an existing lease, a widget could present your current leases in a dropdown or table, making it easier to pick up where you left off and avoid unintentionally reserving unnecessary resources. It’s a small feature, but smoothing out this kind of interaction can make the difference between someone giving up or trying again. That’s what this project is about.&lt;/p>
&lt;h2 id="looking-ahead-building-for-people-not-just-platforms">Looking Ahead: Building for People, Not Just Platforms&lt;/h2>
&lt;p>I’m excited to spend the next few weeks digging into these questions—not just about what we can build, but how small improvements in usability can ripple outward to support more reproducible, maintainable, and accessible research. Reproducibility isn’t just about rerunning code; it’s about supporting the people who do the work. I’ll be sharing updates as the project progresses, and I’m looking forward to learning (and building) along the way. I’m incredibly grateful to once again take part in this paid experience, made possible by the 2025 Open Source Research Experience team and my mentors.&lt;/p></description></item><item><title>Applying MLOps to overcome reproducibility barriers in machine learning research</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/nyu/mlops/06212025-alghali/</link><pubDate>Sun, 22 Jun 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/nyu/mlops/06212025-alghali/</guid><description>&lt;h3 id="about-the-project">About the Project&lt;/h3>
&lt;p>Hello! I&amp;rsquo;m Ahmed, an undergraduate Computer Science student at the University of Khartoum I&amp;rsquo;m working on making machine learning research more reproducible for open access research facilities like &lt;a href="chameleoncloud.org">Chameleon testbed&lt;/a>, under the project &lt;a href="https://ucsc-ospo.github.io/project/osre25/nyu/mlops/" target="_blank" rel="noopener">Applying MLOps to overcome reproducibility barriers in machine learning research&lt;/a>, mentored by Prof. &lt;a href="https://ucsc-ospo.github.io/author/fraida-fund/" target="_blank" rel="noopener">Fraida Fund&lt;/a> and &lt;a href="https://ucsc-ospo.github.io/author/mohamed-saeed/" target="_blank" rel="noopener">Mohamed Saeed&lt;/a>. as part of this project my &lt;a href="https://docs.google.com/document/d/146PutdVy7cWSf_Gn8qcn0Ba2llMHjNtHIQzZ5a-xRvQ/edit?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> aims to build a template generator that generates repositories for reproducible model training on the Chameleon testbed.&lt;/p>
&lt;h3 id="reproducibility">Reproducibility&lt;/h3>
&lt;blockquote>
&lt;p>&lt;em>We argue that unless reproducing research becomes as vital and mainstream part of scientific exploration as reading papers is today, reproducibility will be hard to sustain in the long term because the incentives to make research results reproducible won’t outweigh the still considerable costs&lt;/em>&lt;/p>
&lt;p>— &lt;a href="https://www.chameleoncloud.org/media/filer_public/25/18/25189b96-c3a2-4a55-b99b-c25322fe6682/reproducibility_on_chameleon-3.pdf" target="_blank" rel="noopener">Three Pillars of Practical Reproducibility Paper&lt;/a>&lt;/p>
&lt;/blockquote>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Acadamic code quality" srcset="
/report/osre25/nyu/mlops/06212025-alghali/codquality_huc392b48b950e52e3828e898b495a387e_63844_1883a01619446991471adb625dc1a04c.webp 400w,
/report/osre25/nyu/mlops/06212025-alghali/codquality_huc392b48b950e52e3828e898b495a387e_63844_a0629a8267968adb7dca83065a454987.webp 760w,
/report/osre25/nyu/mlops/06212025-alghali/codquality_huc392b48b950e52e3828e898b495a387e_63844_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/nyu/mlops/06212025-alghali/codquality_huc392b48b950e52e3828e898b495a387e_63844_1883a01619446991471adb625dc1a04c.webp"
width="733"
height="646"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>By Reproducibility in science we refer to the ability to obtain consistent results using the same methods and conditions as the previous study. in simple words if I used the same data and metholodgy that was used before, I should obtain the same results. this principle is mapped to almost every scientific field including both Machine Learning research in science and core Machine Learning.&lt;/p>
&lt;h3 id="challenges-in-reproducibility">Challenges in Reproducibility&lt;/h3>
&lt;p>The same way the famous paper about the &lt;a href="https://www.nature.com/articles/d41586-019-00067-3" target="_blank" rel="noopener">repoducibility crisis in science&lt;/a> was published in in 2016, similar discussions have been published discussing this in machine learning research setting, the &lt;a href="https://ojs.aaai.org/index.php/AAAI/article/view/11503" target="_blank" rel="noopener">paper state of the art reproducibility in artificial intelligence&lt;/a> after analayzing 400 hundereds papers from top AI conferences, it was found that around 6% shared code, approximately 33% shared test data. In contrast, 54% only shared a pseudocode (summary of the algorithm).&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Percentage of papers documenting each variable for the three factors" srcset="
/report/osre25/nyu/mlops/06212025-alghali/variables_hu1dd3560d8f29bff068e6ba2a71eed30f_236032_98f72f91d5f4040ac93d46a70ece1f4c.webp 400w,
/report/osre25/nyu/mlops/06212025-alghali/variables_hu1dd3560d8f29bff068e6ba2a71eed30f_236032_af62a4672817798441065a29b632ce1d.webp 760w,
/report/osre25/nyu/mlops/06212025-alghali/variables_hu1dd3560d8f29bff068e6ba2a71eed30f_236032_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/nyu/mlops/06212025-alghali/variables_hu1dd3560d8f29bff068e6ba2a71eed30f_236032_98f72f91d5f4040ac93d46a70ece1f4c.webp"
width="760"
height="312"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>The lack of software dependency management, proper version control, log tracking, and effective artifacts sharing made it very difficult to reproduce research in machine learning.&lt;/p>
&lt;p>Reproducibility in machine learning is largely supported by MLOps practices which is the case in the industry where the majority of researchers are backed by software engineers who are responsible of setting experimental environments or develop tools that streamline the workflow.However, in academic settings reproducibility remains a great challenge, researchers prefer to focus on coding, and worry a little about the the complexities invloved in configuring their experimental environment,As a result, the adaptation and standardization of MLOps practices in academia progress slowly. The best way to ensure a seamleas experience with MLOps, is to make these capabilities easily accessible to the researchers&amp;rsquo; workflow. by developing a tool that steamlines the process of provisioning resources, enviornment setup, model training and artifacts tracking, that ensures reproducible results.&lt;/p>
&lt;h3 id="proposed-solution">Proposed Solution&lt;/h3>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Solution Architecture" srcset="
/report/osre25/nyu/mlops/06212025-alghali/Design_hue0a2172c7dadd98f1563084aefb8ce3c_266216_eca83abc0b11e0d295efffaa464eaf53.webp 400w,
/report/osre25/nyu/mlops/06212025-alghali/Design_hue0a2172c7dadd98f1563084aefb8ce3c_266216_4abd128ad260ffc60e4a7ebd623e4e32.webp 760w,
/report/osre25/nyu/mlops/06212025-alghali/Design_hue0a2172c7dadd98f1563084aefb8ce3c_266216_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/nyu/mlops/06212025-alghali/Design_hue0a2172c7dadd98f1563084aefb8ce3c_266216_eca83abc0b11e0d295efffaa464eaf53.webp"
width="760"
height="547"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>We want the researchers to spin up ML research instances/bare metal on Chameleon testbed while keeping the technical complexity involved in configuring and stitching everything together abstracted, users simply answer frew questions about their project info, frameworks, tools, features and integrations if there are any, and have a full generated,reproducible project. it contains a provisioning/infrastracture config layer for provisioning resources on the cloud, a dockerfile to spin up services and presistent storage for data,the ML tracking server system that logs the artifacts, metadata, environment configuration, system specification (GPUs type) and Git status using Mlflow, powered by a postgresSQL for storing metadata and a S3 Minio bucket to store artifacts.ML code at its core is a containarized training environment backed by
persistent storage for the artifacts generated from the experiment and the datasets and containarization of all these to ensure reproducibility.we aim to make the cloud experience easier, by dealing with the configuration needed for setting up the environment having a 3rd party framework, enabling seamless access to benchmarking dataset or any necessary components from services like Hugging face and GitHub as an example will be accessible from the container easily. for more techincal details about the solution you can read my propsal &lt;a href="https://docs.google.com/document/d/1ilm-yMEq-UTiJPGMl8tQc3Anl5cKM5RD2sUGInLjLbU" target="_blank" rel="noopener">here&lt;/a>.&lt;/p>
&lt;p>By addressing these challenges we can accelerate the scientific discovery. this not benefits those who are conducting the research but also the once building on top of it in the future. I look forward to share more updates as the project progresses and I welcome feedback from others interested in advancing reproducibility in ML research.&lt;/p></description></item><item><title>Building a Benchmarking Suite for Cache Performance Evaluation</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/harvard/cachebench/2025-06-21-haochengxia/</link><pubDate>Sat, 21 Jun 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/harvard/cachebench/2025-06-21-haochengxia/</guid><description>&lt;p>Hi! I&amp;rsquo;m Haocheng Xia, a Computer Science student at the &lt;strong>University of Illinois Urbana-Champaign&lt;/strong>, passionate about the intersection of &lt;strong>machine learning and storage systems&lt;/strong>. Specifically, I&amp;rsquo;m keen on &lt;strong>workload analysis&lt;/strong> and &lt;strong>KV cache management for large language models&lt;/strong>.&lt;/p>
&lt;p>This summer, I&amp;rsquo;m happy to be a part of &lt;strong>SoR 2025&lt;/strong> and &lt;strong>OSRE 2025&lt;/strong>. I&amp;rsquo;m contributing to the &lt;strong>CacheBench&lt;/strong> project. My initiative, &lt;strong>&amp;lsquo;Building a Benchmarking Suite for Cache Performance Evaluation,&amp;rsquo;&lt;/strong> will create a robust platform. This involves extensive simulation of existing eviction algorithms using &lt;a href="https://github.com/cacheMon/libCacheSim" target="_blank" rel="noopener">libCacheSim&lt;/a>, developing microbenchmarks, and building a user-friendly platform for researchers to effortlessly evaluate novel cache designs. The ultimate goal is to establish a competitive leaderboard.&lt;/p>
&lt;p>My contributions will include a comprehensive dataset detailing simulated &lt;strong>miss ratios&lt;/strong> and &lt;strong>throughput&lt;/strong> of current cache eviction algorithms, an extension to &lt;a href="https://github.com/cacheMon/libCacheSim" target="_blank" rel="noopener">libCacheSim&lt;/a> for executing microbenchmarks both locally and on our online platform, and the creation and ongoing maintenance of a public web leaderboard. I&amp;rsquo;m grateful to be mentored by &lt;strong>Juncheng Yang&lt;/strong> and &lt;strong>Yazhuo Zhang&lt;/strong>.&lt;/p>
&lt;p>I&amp;rsquo;m thrilled to be part of building tools that empower users and advance the vision of a more decentralized web. Looking forward to a productive summer!&lt;/p></description></item><item><title>EnvGym – An AI System for Reproducible Custom Computing Environments</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/envgym/</link><pubDate>Mon, 16 Jun 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/envgym/</guid><description>&lt;p>Hello, My name is Yiming Cheng. I am a Pre-doc researcher in Computer Science at University of Chicago. I&amp;rsquo;m excited to be working with the Summer of Reproducibility and the Chameleon Cloud community as a project leader. My project is &lt;a href="https://github.com/eaminc/envgym" target="_blank" rel="noopener">EnvGym&lt;/a> that focuses on developing an AI-driven system to automatically generate and configure reproducible computing environments based on natural language descriptions from artifact descriptions, Trovi artifacts, and research papers.&lt;/p>
&lt;p>The complexity of environment setup often hinders reproducibility in scientific computing. My project aims to bridge the knowledge gap between experiment authors and reviewers by translating natural language requirements into actionable, reproducible configurations using AI and NLP techniques.&lt;/p>
&lt;h3 id="project-overview">Project Overview&lt;/h3>
&lt;p>EnvGym addresses fundamental reproducibility barriers by:&lt;/p>
&lt;ul>
&lt;li>Using AI to translate natural language environment requirements into actionable configurations&lt;/li>
&lt;li>Automatically generating machine images deployable on bare metal and VM instances&lt;/li>
&lt;li>Bridging the knowledge gap between experiment authors and reviewers&lt;/li>
&lt;li>Standardizing environment creation across different hardware platforms&lt;/li>
&lt;/ul>
&lt;h3 id="june-10--june-16-2025">June 10 – June 16, 2025&lt;/h3>
&lt;p>Getting started with the project setup and initial development:&lt;/p>
&lt;ul>
&lt;li>I began designing the NLP pipeline architecture to parse plain-English descriptions (e.g., &amp;ldquo;I need Python 3.9, CUDA 11, and scikit-learn&amp;rdquo;) into structured environment &amp;ldquo;recipes&amp;rdquo;&lt;/li>
&lt;li>I set up the initial project repository and development environment&lt;/li>
&lt;li>I met with my mentor Prof. Kexin Pei to discuss the project roadmap and technical approach&lt;/li>
&lt;li>I started researching existing artifact descriptions from conferences and Trovi to understand common patterns in environment requirements&lt;/li>
&lt;li>I began prototyping the backend environment builder logic that will convert parsed requirements into machine-image definitions&lt;/li>
&lt;li>I explored Chameleon&amp;rsquo;s APIs for provisioning servers and automated configuration&lt;/li>
&lt;/ul>
&lt;h3 id="next-steps">Next Steps&lt;/h3>
&lt;ul>
&lt;li>Continue developing the NLP component for requirement parsing&lt;/li>
&lt;li>Implement the core backend logic for environment generation&lt;/li>
&lt;li>Begin integration with Chameleon Cloud APIs&lt;/li>
&lt;li>Start building the user interface for environment specification&lt;/li>
&lt;/ul>
&lt;p>This is an exciting and challenging project that combines my interests in AI systems and reproducible research. I&amp;rsquo;m looking forward to building a system that will help researchers focus on their science rather than struggling with environment setup issues.&lt;/p>
&lt;p>Thanks for reading, I will keep you updated as I make progress on EnvGym!&lt;/p></description></item><item><title>Smart Environments – An AI System for Reproducible Custom Computing Environments</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/smart_environments/20250616-sam_huang/</link><pubDate>Mon, 16 Jun 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/smart_environments/20250616-sam_huang/</guid><description>&lt;p>Hi everyone, I&amp;rsquo;m Sam! I&amp;rsquo;m excited to be working with the Argonne National Laboratory and SoR this summer on Smart Environments. Have you ever encountered a great opensource project and wanted to run it or use it locally, only to find that it&amp;rsquo;s such a headache to set up all the dependencies? Maybe your system version wasn&amp;rsquo;t correct, or a piece of software was outdated, or the dependencies were incompatible with something you had already on your machine?&lt;/p>
&lt;p>In comes EnvGym to save the day! We want EnvGym to be an agent that would help reproduce opensource projects by automatically setting up the environmental dependencies required to get them running. That&amp;rsquo;s what I will be working on for the rest of the summer! To make EnvGym work, we will be leveraging LLM agents to tackle the problem. We will use EnvGym to read documentations, understand code structures, run commands to set up environments, and reflectively react to any errors and warnings.&lt;/p>
&lt;p>To build EnvGym, I have the following to-do&amp;rsquo;s in mind:&lt;/p>
&lt;ul>
&lt;li>Building a dataset that includes repos to be reproduced&lt;/li>
&lt;li>Establishing a baseline using current methods&lt;/li>
&lt;li>Implementing the actual EnvGym algorithm&lt;/li>
&lt;li>Testing EnvGym against baseline performance and iteratively improving it&lt;/li>
&lt;li>Deploying EnvGym to real-world use cases and gathering feedback&lt;/li>
&lt;/ul>
&lt;p>Here is the repo that we are working on:
&lt;a href="https://github.com/EaminC/EnvGym/tree/main" target="_blank" rel="noopener">https://github.com/EaminC/EnvGym/tree/main&lt;/a>&lt;/p>
&lt;p>More updates to come, thanks for reading!&lt;/p></description></item><item><title>Assessing and Enhancing CC-Snapshot for Reproducible Experiment Enviroments</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/cc-snapshot/20250616-zahratm/</link><pubDate>Sun, 15 Jun 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/cc-snapshot/20250616-zahratm/</guid><description>&lt;p>Hello, My name is Zahra Temori. I am a rising senior in Computer Science at University of Delaware. I’m excited to be working with the Summer of Reproduciblity and the Chameleon Cloud community. My project is &lt;a href="https://github.com/ChameleonCloud/cc-snapshot" target="_blank" rel="noopener">cc-snapshot&lt;/a> that focuses on enhancing features for helping researchers capture and share reproducible experimental environments within the Chameleon Cloud testbed.&lt;/p>
&lt;p>Here is a detailed information about my project and plans to work for summer &lt;a href="https://docs.google.com/document/d/1kFOFL-H4WrXF7EUuXzcHLZ2p5w_DxbbWOGi-IGx39LM/edit?tab=t.0" target="_blank" rel="noopener">proposal&lt;/a>.&lt;/p>
&lt;h3 id="june-10--june-14-2025">June 10 – June 14, 2025&lt;/h3>
&lt;p>Getting started with the first milestone and beginning to explore the Chameleon Cloud and the project:&lt;/p>
&lt;ul>
&lt;li>I began familiarizing myself with the Chameleon Cloud platform. I created an account and successfully accessed a project.&lt;/li>
&lt;li>I learned how to launch an instance and create a lease for using computing resources.&lt;/li>
&lt;li>I met with my mentor to discuss the project goals and outline the next steps.&lt;/li>
&lt;li>I experimented with the environment and captured a snapshot to understand the process.&lt;/li>
&lt;/ul>
&lt;p>It has been less than a week and I have learned a lot specially about the Chameleon Cloud and how it is different from other clouds like AWS. I am exited to learn more and make progress.&lt;/p>
&lt;p>Thanks for reading, I will keep ypu updated as I work :)&lt;/p></description></item><item><title>Developing an Open Testbed for Edge Replication System Evaluation</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/umass/edge-replication/20250615-panjisri/</link><pubDate>Sun, 15 Jun 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/umass/edge-replication/20250615-panjisri/</guid><description>&lt;p>Hi, I&amp;rsquo;m Panji. I&amp;rsquo;m currently contributing to the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/umass/edge-replication/">Open Testbed for Reproducible Evaluation of Replicated Systems at the Edges&lt;/a> under the mentorship of Fadhil I. Kurnia. You can find more details on the project proposal &lt;a href="https://drive.google.com/file/d/1CFT5CJJXbQlVPz8_A9Dxkjl7oRjESdli/view?usp=sharing" target="_blank" rel="noopener">here&lt;/a>.&lt;/p>
&lt;p>The primary challenge we&amp;rsquo;re addressing is the current difficulty in fairly comparing different edge replication systems. To fix this, we&amp;rsquo;re trying to build a testing platform with four key parts. We&amp;rsquo;re collecting real data about how people actually use edge services, creating a tool that can simulate realistic user traffic across many locations, building a system that mimics network delays between hundreds of edge servers, and packaging everything into an open-source toolkit.&lt;/p>
&lt;p>This will let researchers test different coordination methods like EPaxos, Raft, and others using the same data and conditions. We hope this will help provide researchers with a more standardized way to evaluate their systems. We&amp;rsquo;re working with multiple programming languages and focusing on making complex edge computing scenarios accessible to everyone in the research community.&lt;/p>
&lt;p>One of the most interesting aspects of this project is tackling the challenge of creating realistic simulations that accurately reflect the performance characteristics different coordination protocols would exhibit in actual edge deployments. The end goal is to provide the research community with a standardized, reproducible environment for edge replication.&lt;/p></description></item><item><title>Type Narrowing: Evaluate New Gradual Languages and Do Unsound Narrowings Lead to Exploits</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uutah/type-narrowing/20250615-sivasathyaseelan/</link><pubDate>Sun, 15 Jun 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uutah/type-narrowing/20250615-sivasathyaseelan/</guid><description>&lt;p>Hello! I’m Siva Sathyaseelan D N, a pre-final year B.Tech + M.Tech Engineering student at IIT BHU, Varanasi, India. With a deep-rooted passion for software development and scientific computing. I thrive at the intersection of code and real-world problem-solving. For two years, I’ve engaged in open-source work across scientific simulation, blockchain, and cloud-native technologies, through hobby projects, hackathons, internships, and an LFX mentee. I will be working on &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/uutah/type-narrowing/">Type Narrowing: Evaluate New Gradual Languages and Do Unsound Narrowings Lead to Exploits&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/content/authors/bennn">Ben Greenman&lt;/a>. &lt;a href="https://docs.google.com/document/d/1QcfiOWQQBxTW3YnkCmgfz-xHwLGad4OuCMjyphbaz54/edit?usp=sharing" target="_blank" rel="noopener">My proposal can be viewed here!&lt;/a>&lt;/p></description></item><item><title>Building a Simulator for Benchmarking Replicated Systems</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/umass/edge-replication/20250613-mchan/</link><pubDate>Sat, 14 Jun 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/umass/edge-replication/20250613-mchan/</guid><description>&lt;p>Hi, I&amp;rsquo;m Michael. I&amp;rsquo;m currently contributing to the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/umass/edge-replication/">Open Testbed for Reproducible Evaluation of Replicated Systems at the Edges&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fadhil-kurnia/">Fadhil Kurnia&lt;/a>. You can find more details on the project proposal &lt;a href="https://drive.google.com/file/d/1LQCPu1h9vXAbdL6AX_E9S43dsIOndTyW/view?usp=sharing" target="_blank" rel="noopener">here&lt;/a>.&lt;/p>
&lt;p>What we are trying to achieve is to create a system to test and evaluate the performance of different consensus protocols and consistency models under the same application and workload. The consensus protocols and consistency models are both tested on various replicated black-box applications. Essentially, the testbed itself is able to deploy any arbitrary stateful application on multiple machines (nodes) as long as it is packaged in the form of a docker image. The consensus protocol is used to perform synchronization between the stateful part of the application (in most cases, the database). The goal is that by the end of this project, the testbed we are building has provided the functionality and abstraction to support the creation of new consensus protocols to run tests on.&lt;/p>
&lt;p>One major challenge in implementing this is with regards to the handling of replication on the running docker containers. Generally, the services that can be deployed in this system would be of two types:&lt;/p>
&lt;ol>
&lt;li>A Deterministic Application (An application that will always return the same output when given the same input. e.g., a simple CRUD app)&lt;/li>
&lt;li>A Non-Deterministic Application (An application that may return the different outputs when given the same input. e.g., an LLM which may return different response from the same prompt request)&lt;/li>
&lt;/ol>
&lt;p>Both of these application types requires different implementation of consensus protocols. In the case of a deterministic application, since all request will always yield the same response (and the same changes inside the database of the application itself), the replication protocol can perform replication on the request to all nodes. On the other hand, in a non-determinisitic application, the replication protocol applies synchronization on the state of the database directly since a different response may be returned on the same request.&lt;/p></description></item><item><title>MPI Appliance for HPC Research on Chameleon</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/mpi/20250614-rohan-babbar/</link><pubDate>Sat, 14 Jun 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/mpi/20250614-rohan-babbar/</guid><description>&lt;p>Hi Everyone,&lt;/p>
&lt;p>I’m Rohan Babbar from Delhi, India. This summer, I’m excited to be working with the Argonne National Laboratory and the Chameleon Cloud community. My &lt;a href="https://ucsc-ospo.github.io/project/osre25/uchicago/mpi/" target="_blank" rel="noopener">project&lt;/a> focuses on developing an MPI Appliance to support reproducible High-Performance Computing (HPC) research on the Chameleon testbed.&lt;/p>
&lt;p>For more details about the project and the planned work for the summer, you can read my proposal &lt;a href="https://docs.google.com/document/d/1iOx95-IcEOSVxpOkL20-jT5SSDOwBiP78ysSUNpRwXs/edit?usp=sharing" target="_blank" rel="noopener">here&lt;/a>.&lt;/p>
&lt;h3 id="-community-bonding-period">👥 Community Bonding Period&lt;/h3>
&lt;p>Although the project officially started on June 2, 2025, I made good use of the community bonding period beforehand.&lt;/p>
&lt;ul>
&lt;li>I began by getting access to the Chameleon testbed, familiarizing myself with its features and tools.&lt;/li>
&lt;li>I experimented with different configurations to understand the ecosystem.&lt;/li>
&lt;li>My mentor, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ken-raffenetti/">Ken Raffenetti&lt;/a>, and I had regular check-ins to align our vision and finalize our milestones, many of which were laid out in my proposal.&lt;/li>
&lt;/ul>
&lt;h3 id="-june-2--june-14-2025">🔧 June 2 – June 14, 2025&lt;/h3>
&lt;p>Our first milestone was to build a base image with MPI pre-installed. For this:&lt;/p>
&lt;ul>
&lt;li>We decided to use &lt;a href="https://spack.io/" target="_blank" rel="noopener">Spack&lt;/a>, a flexible package manager tailored for HPC environments.&lt;/li>
&lt;li>The image includes multiple MPI implementations, allowing users to choose the one that best suits their needs and switch between them using simple &lt;a href="https://lmod.readthedocs.io/en/latest/" target="_blank" rel="noopener">Lua Module&lt;/a> commands.&lt;/li>
&lt;/ul>
&lt;p>📌 That’s all for now! Stay tuned for more updates in the next blog.&lt;/p>
&lt;p>Thanks for reading!&lt;/p></description></item><item><title>StatWrap: Cross-Project Searching and Classification using Local Indexing</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/northwestern/statwrap/20250614-debangi29/</link><pubDate>Sat, 14 Jun 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/northwestern/statwrap/20250614-debangi29/</guid><description>&lt;p>Hello👋! I am &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/debangi-ghosh/">Debangi Ghosh&lt;/a>, currently pursuing a degree in Mathematics and Computing at IIT (BHU) Varanasi, India. This summer, I will be working on the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/northwestern/statwrap/">StatWrap: Cross-Project Searching and Classification using Local Indexing&lt;/a> project under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/luke-rasmussen/">Luke Rasmussen&lt;/a>. You can view my &lt;a href="https://drive.google.com/file/d/1dxyBP2oMJwYDCKyIWzr465zNmm6UWtnI/view?usp=sharing" target="_blank" rel="noopener">project proposal&lt;/a> for more details.&lt;/p>
&lt;p>My project aims to address the challenges in project navigation and discoverability by integrating a robust full-text search capability within the user interface. Instead of relying on basic keyword-based search—where remembering exact terms can be difficult—we plan to implement a natural language-based full-text search. This approach involves two main stages: indexing, which functions like creating a searchable map of the content, and searching, which retrieves relevant information from that map. We will evaluate and compare available open-source libraries to choose and implement the most effective one.
In addition, my project aims to enhance project organization by introducing a new classification system that clearly distinguishes between “Active” and “Past” projects in the user interface. This will improve clarity, reduce clutter, and provide a more streamlined experience as the number of projects grows.&lt;/p>
&lt;p>Stay tuned for updates on my progress in the coming weeks! 🚀&lt;/p></description></item><item><title>Applying MLOps to overcome reproducibility barriers in machine learning research</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/nyu/mlops/</link><pubDate>Sat, 01 Mar 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/nyu/mlops/</guid><description>&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> machine learning, MLOps, reproducibility&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, machine learning, GitOps, systems, Linux, data, Docker&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fraida-fund/">Fraida Fund&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mohamed-saeed/">Mohamed Saeed&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Project Idea Description&lt;/strong>&lt;/p>
&lt;p>Reproducibility remains a significant problem in machine learning research, both in core ML and in the application of ML to other areas of science. In many cases, due to inadequate experiment tracking, dependency capturing, source code versioning, data versioning, and artifact sharing, even the authors of a paper may find it challenging to reproduce their own study several years later. This makes it difficult to vaidate and build on previous work, and raises concerns about its trustworthiness.&lt;/p>
&lt;p>In contrast, outside of academic research, MLOps tools and frameworks have been identified as a key enabler of reliable, reproducible, and trustworthy machine learning systems in production. A good reference on this topic is:&lt;/p>
&lt;blockquote>
&lt;p>Firas Bayram and Bestoun S. Ahmed. 2025. Towards Trustworthy Machine Learning in Production: An Overview of the Robustness in MLOps Approach. ACM Comput. Surv. 57, 5, Article 121 (May 2025), 35 pages. &lt;a href="https://doi.org/10.1145/3708497" target="_blank" rel="noopener">https://doi.org/10.1145/3708497&lt;/a>&lt;/p>
&lt;/blockquote>
&lt;p>This project seeks to bridge the gap between widely adopted practices in industry and academic research:&lt;/p>
&lt;ul>
&lt;li>by making it easier for researchers and scientists to use MLOps tools to support reproducibility. To achieve this, we will develop starter templates and recipes for research in computer vision, NLP, and ML for science, that have reproducibility &amp;ldquo;baked in&amp;rdquo; thanks to the integration of MLOps tools and frameworks. Researchers will launch these templates on open access research facilities like &lt;a href="https://chameleoncloud.org/" target="_blank" rel="noopener">Chameleon&lt;/a>.&lt;/li>
&lt;li>and, by developing complementary education and training materials to emphasize the important of reproducibility in ML, and how the tools and frameworks used in the starter templates can support this goal.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Writing a successful proposal for this project&lt;/strong>&lt;/p>
&lt;p>A good proposal for this project should -&lt;/p>
&lt;ul>
&lt;li>demonstrate a good understanding of the current barriers to reproducibility in machine learning research (specific examples are welcome),&lt;/li>
&lt;li>describe a &amp;ldquo;base&amp;rdquo; starter template, including the platforms and tools that will be integrated, as well as specific adaptations of this template for computer vision, NLP, and ML for science,&lt;/li>
&lt;li>explain the &amp;ldquo;user flow&amp;rdquo; - how a researcher would use the template to conduct an experiment or series of experiments, what the lifecycle of that experiment would look like, and how it would be made reproducible,&lt;/li>
&lt;li>include the contributor&amp;rsquo;s own ideas about how to make the starter templates more usable, and how to make the education and training materials relatable and useful,&lt;/li>
&lt;li>and show that the contributor has the necessary technical background and soft skills to contribute to this project. In particular, the contributor will need to create education and training materials that are written in a clear, straightforward, and concise manner, without unncessary jargon. The proposal should show evidence of the contributor&amp;rsquo;s writing abilities.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Github link&lt;/strong>&lt;/p>
&lt;p>There is no pre-existing Git repository for this project - at the beginning of the summer, the contributor will create a new repository in the &lt;a href="https://github.com/teaching-on-testbeds/" target="_blank" rel="noopener">Teaching on Testbeds&lt;/a> organization, and the project materials will &amp;ldquo;live&amp;rdquo; there.&lt;/p></description></item><item><title>FairFace</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/fair-face/</link><pubDate>Fri, 28 Feb 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/fair-face/</guid><description>&lt;h3 id="fairface-reproducible-bias-evaluation-in-facial-ai-models-via-controlled-skin-tone-manipulation">FairFace: Reproducible Bias Evaluation in Facial AI Models via Controlled Skin Tone Manipulation&lt;/h3>
&lt;p>Bias in facial AI models remains a persistent issue, particularly concerning skin tone disparities. Many studies report that AI models perform differently on lighter vs. darker skin tones, but these findings are often difficult to reproduce due to variations in datasets, model architectures, and evaluation settings.
The goal of this project is to investigate bias in facial AI models by manipulating skin tone and related properties in a controlled, reproducible manner. By leveraging BioSkin, we will adjust melanin levels and other skin properties on existing human datasets to assess whether face-based AI models (e.g., classification and vision-language models) exhibit biased behavior toward specific skin tones.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Fairness &amp;amp; Bias in AI&lt;/code>, &lt;code>Face Recognition &amp;amp; Vision-Language Models&lt;/code>, &lt;code>Dataset Augmentation for Reproducibility&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Machine Learning &amp;amp; Computer Vision, Deep Learning (PyTorch/TensorFlow), Data Augmentation &amp;amp; Image Processing, Reproducibility &amp;amp; Documentation (GitHub, Jupyter Notebooks).&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large ( Can be completed in either 175 or 350 hours, depending on the depth of analysis and number of models tested.)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:davisje@ucsc.edu">James Davis&lt;/a>, &lt;a href="mailto:pang@soe.ucsc.edu">Alex Pang&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="key-research-questions">Key Research Questions&lt;/h3>
&lt;ol>
&lt;li>Do AI models perform differently based on skin tone?
&lt;ul>
&lt;li>How do classification accuracy, confidence scores, and error rates change when skin tone is altered systematically?&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>What are the underlying causes of bias?
&lt;ul>
&lt;li>Is bias solely dependent on skin tone, or do other skin-related properties (e.g., texture, reflectance) contribute to model predictions?&lt;/li>
&lt;li>Is bias driven by dataset imbalances (e.g., underrepresentation of certain skin tones)?&lt;/li>
&lt;li>Do facial features beyond skin tone (e.g., structure, expression, pose) contribute to biased predictions?&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Are bias trends reproducible?
&lt;ul>
&lt;li>Can we replicate bias patterns across different datasets, model architectures, and experimental setups?&lt;/li>
&lt;li>How consistent are the findings when varying image sources and preprocessing methods?&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;h3 id="specific-tasks">Specific Tasks:&lt;/h3>
&lt;ol>
&lt;li>Dataset Selection &amp;amp; Preprocessing
&lt;ul>
&lt;li>Choose appropriate face/human datasets (e.g., FairFace, CelebA, COCO-Human).&lt;/li>
&lt;li>Preprocess images to ensure consistent lighting, pose, and resolution before applying transformations.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Skin Tone Manipulation with BioSkin
&lt;ul>
&lt;li>Systematically modify melanin levels while keeping facial features unchanged.&lt;/li>
&lt;li>Generate multiple variations per image (lighter to darker skin tones).&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Model Evaluation &amp;amp; Bias Analysis
&lt;ul>
&lt;li>Test face classification models (e.g., ResNet, FaceNet) and vision-language models (e.g., BLIP, LLaVA) on the modified images.&lt;/li>
&lt;li>Compute fairness metrics (e.g., demographic parity, equalized odds).&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Investigate Underlying Causes of Bias
&lt;ul>
&lt;li>Compare model behavior across different feature sets.&lt;/li>
&lt;li>Test whether bias persists across multiple datasets and model architectures.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Ensure Reproducibility
&lt;ul>
&lt;li>Develop an open-source pipeline for others to replicate bias evaluations.&lt;/li>
&lt;li>Provide codebase and detailed documentation for reproducibility.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol></description></item><item><title>Enhancing Reproducibility in Distributed AI Training: Leveraging Checkpointing and Metadata Analytics</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/pnnl/reproducibility_w_checkpoint/</link><pubDate>Fri, 21 Feb 2025 09:00:00 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/pnnl/reproducibility_w_checkpoint/</guid><description>&lt;p>Reproducibility in distributed AI training is a crucial challenge due to several sources of uncertainty, including stragglers, data variability, and inherent randomness. Stragglers—slower processing nodes in a distributed system—can introduce timing discrepancies that affect the synchronization of model updates, leading to inconsistent states across training runs. Data variability, stemming from non-deterministic data shuffling and differing data partitions across nodes, can also lead to variations in model performance. Additionally, inherent randomness in algorithm initialization, such as random weight beginnings and stochastic processes like dropout, further compounds these challenges. Reproducibility in AI is pivotal for ensuring the credibility of AI-driven scientific findings, akin to how reproducibility underpins traditional scientific research.&lt;/p>
&lt;p>To enhance AI reproducibility, leveraging metadata analytics and visualization along with saved checkpoints offers a promising solution. Checkpointing in AI training is a pivotal technique that involves saving snapshots of a model and its parameters at regular intervals throughout the training process. This practice is essential for maintaining progress in the face of potential interruptions, such as hardware failures, and enables the resumption of training without having to restart from scratch. In the context of distributed AI training, checkpointing also provides a framework for analyzing and ensuring reproducibility, offering a means to systematically capture and review the training trajectory of models. Analyzing checkpoints can specifically help identify issues like stragglers, which are slower computing nodes in a distributed system that can impede synchronized progress. For example, by examining the time stamps and resource utilization data associated with each checkpoint, anomalies in processing time can be detected, revealing nodes that consistently lag behind others. This analysis enables teams to diagnose performance bottlenecks and optimize resource allocation across the distributed system, ensuring smoother and more consistent training runs. By combining checkpointing with metadata analytics, it becomes possible to pinpoint the exact training iterations where delays occur, thereby facilitating targeted investigations and solutions to improve overall system reproducibility and efficiency.&lt;/p>
&lt;h3 id="workplan">Workplan&lt;/h3>
&lt;p>The proposed work will include: 1) Setting up a checkpointing system within the distributed AI training framework to periodically save model states and metadata; 2) Designing a metadata analysis schema for populating model and system statistics from the saved checkpoints; 3) Conducting exploratory data analysis to identify patterns, anomalies, and sources of variability in the training process; 4) Creating visualization tools to represent metadata insights with collected statistics and patterns; 5) Using insights from metadata analytics and visualization to optimize resource distribution across the distributed system and mitigate straggler effects; and 6) Disseminating results and methodologies through academic papers, workshops, and open-source contributions.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Reproducibility&lt;/code> &lt;code>AI&lt;/code> &lt;code>distributed AI&lt;/code> &lt;code>checkpoint&lt;/code> &lt;code>metadata analysis&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> C/C++, Python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/luanzheng-lenny-guo/">Luanzheng &amp;quot;Lenny&amp;quot; Guo&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Enhancing Reproducibility in RAG Frameworks for Scientific Workflows</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/pnnl/llm_rag_reproducibility/</link><pubDate>Thu, 20 Feb 2025 09:00:00 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/pnnl/llm_rag_reproducibility/</guid><description>&lt;p>Retrieval-Augmented Generation (RAG) frameworks, which merge the capabilities of retrieval systems and generative models, significantly enhance the relevance and accuracy of responses produced by large language models (LLMs). These frameworks retrieve relevant documents from a large corpus and use these documents to inform the generative process, thereby improving the contextuality and precision of the generated content. Ensuring reproducibility in data queries using similarity search within these RAG frameworks is critical for maintaining the reliability and consistency of scientific workflows. Reproducibility ensures that the same input query consistently yields the same output, which is vital for scientific tasks that rely on precise and repeatable results. Inconsistencies can arise from various sources, affecting the trustworthiness of scientific outcomes. Differences in retrieval algorithms can lead to variable sets of documents being retrieved for the same query. Variations in data indexing methods can cause inconsistencies in how documents are ranked and accessed. The stochastic nature of LLM operations introduces an element of randomness in the generative process. Updates in datasets can also alter the baseline against which queries are processed and interpreted, leading to different results over time.&lt;/p>
&lt;p>This proposal aims to address these reproducibility challenges in similarity searches within RAG frameworks. This work involves analyzing the root causes of non-determinism, benchmarking and validating the consistency of query results, implementing enhancements to minimize variability, and developing tools and best practices to ensure reproducibility. Reproducibility in data queries can be influenced by several factors, including updates in datasets, differences in retrieval algorithms, varying data indexing methods, and the stochastic nature of LLM operations. Each of these factors can cause variability in the documents retrieved and in the generated responses. Ensuring consistency in query results across different runs is crucial for maintaining the integrity of LLM-driven scientific research, allowing researchers to confidently build upon prior work and achieve reliable, trustworthy outcomes.&lt;/p>
&lt;h3 id="workplan">Workplan&lt;/h3>
&lt;p>The proposed work will include: (1) Identifying sources of non-determinism and variability, such as algorithmic differences and indexing methods, in RAG; (2) Utilizing standardized scientific datasets to benchmark the reproducibility of similarity search results across different RAG frameworks; (3) Establishing protocols for handling dataset updates to ensure that such changes do not impact the reproducibility of similarity search results; and (4) Implementing mechanisms to track and document updates to datasets, ensuring that changes are reflected consistently across all instances of the RAG framework. By addressing these areas, the proposed work aims to mitigate challenges related to reproducibility in similarity search queries within RAG frameworks, ultimately enhancing the reliability and trustworthiness of scientific research outcomes.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Reproducibility&lt;/code> &lt;code>LLM&lt;/code> &lt;code>RAG&lt;/code> &lt;code>Scientific Workflows&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> C/C++, Python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/luanzheng-lenny-guo/">Luanzheng &amp;quot;Lenny&amp;quot; Guo&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Exploration of I/O Reproducibility with HDF5</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/pnnl/h5_reproducibility/</link><pubDate>Wed, 19 Feb 2025 09:00:00 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/pnnl/h5_reproducibility/</guid><description>&lt;p>Parallel I/O is a critical component in high-performance computing (HPC), allowing multiple processes to read and write data concurrently from a shared storage system. &lt;a href="https://github.com/HDFGroup/hdf5" target="_blank" rel="noopener">HDF5&lt;/a>—a widely adopted data model and library for managing complex scientific data—supports parallel I/O but introduces challenges in I/O reproducibility, where repeated executions do not always produce identical results. This lack of reproducibility can stem from non-deterministic execution orders, variations in collective buffering strategies, and race conditions in metadata and dataset chunking operations within HDF5’s parallel I/O hierarchy. Moreover, many HDF5 operations that leverage &lt;a href="%28https://www.hdfgroup.org/wp-content/uploads/2020/02/20200206_ECPTutorial-final.pdf%29">MPI I/O&lt;/a> require collective communication; that is, all processes within a communicator must participate in operations such as metadata creation, chunk allocation, and data aggregation. These collective calls ensure that the file structure and data layout remain consistent across processes, but they also introduce additional synchronization complexity that can impact reproducibility if not properly managed. In HPC scientific workflows, consistent I/O reproducibility is essential for accurate debugging, validation, and benchmarking, ensuring that scientific results are both verifiable and trustworthy. Tools such as &lt;a href="https://github.com/hpc-io/h5bench" target="_blank" rel="noopener">h5bench&lt;/a>—a suite of I/O kernels designed to exercise HDF5 I/O on parallel file systems—play an important role in identifying these reproducibility challenges, tuning performance, and ultimately supporting the overall robustness of large-scale scientific applications.&lt;/p>
&lt;h3 id="workplan">Workplan&lt;/h3>
&lt;p>The proposed work will include (1) analyzing and characterizing parallel I/O operations in &lt;a href="https://www.hdfgroup.org/wp-content/uploads/2020/02/20200206_ECPTutorial-final.pdf" target="_blank" rel="noopener">HDF5&lt;/a> with &lt;a href="https://github.com/hpc-io/h5bench" target="_blank" rel="noopener">h5bench&lt;/a> miniapps, (2) exploring and validating potential reproducibility challenges within the parallel I/O hierarchy (e.g., MPI I/O), and (3) implementing solutions to address parallel I/O reproducibility.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Parallel I/O&lt;/code> &lt;code>MPI-I/O&lt;/code> &lt;code>Reproducibility&lt;/code> &lt;code>HPC&lt;/code> &lt;code>HDF5&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> C/C++, Python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/luanzheng-lenny-guo/">Luanzheng &amp;quot;Lenny&amp;quot; Guo&lt;/a> and [Wei Zhang]&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/wei-zhang/">Wei Zhang&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Assessing and Enhancing CC-Snapshot for Reproducible Experiment Environments</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/uchicago/cc-snapshot/</link><pubDate>Tue, 18 Feb 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/uchicago/cc-snapshot/</guid><description>&lt;h2 id="overview">Overview&lt;/h2>
&lt;p>A critical challenge in computer systems research reproducibility is establishing and sharing experimental environments. While open testbeds like Chameleon provide access to hardware resources, researchers still face significant barriers when attempting to recreate the precise software configurations, dependencies, and system states needed for reproducible experiments. Environment snapshotting tools offer a solution, but face technical challenges in consistently capturing running systems without introducing distortions or requiring disruptive system modifications. This project addresses these fundamental reproducibility barriers by enhancing CC-Snapshot, an tool capturing the experimental environment configured by the user on bare metal images, to create more reliable and consistent system captures that can be shared and redeployed without loss of fidelity.&lt;/p>
&lt;p>&lt;a href="https://chameleoncloud.readthedocs.io/en/latest/technical/images.html#the-cc-snapshot-utility" target="_blank" rel="noopener">CC-Snapshot&lt;/a> is a tool on the &lt;a href="chameleoncloud.org">Chameleon&lt;/a> testbed that enables users to package their customized environments as complex images or appliances. By allowing researchers to share these environments easily, CC-Snapshot offers a powerful mechanism for reproducibility, ensuring that experiments can be replicated and extended by others.&lt;/p>
&lt;p>In this project, you will review existing CC-Snapshot workflows, research the latest snapshotting technologies, and develop enhancements that improve the tool’s usability and reliability. This includes ensuring snapshots are created consistently (even when the OS is actively running), preserving the integrity of user systems, and exploring advanced features such as out-of-band snapshotting and API-based triggers.&lt;/p>
&lt;h2 id="key-outcomes">Key Outcomes&lt;/h2>
&lt;ul>
&lt;li>Improved Snapshot Consistency: New methods to capture the full state of a disk without risking corruption or data inconsistency.&lt;/li>
&lt;li>Enhanced Reproducibility: A refined workflow that allows researchers to reliably share custom environments, facilitating collaborative and repeatable experiments.&lt;/li>
&lt;li>User-Friendly Tooling: Streamlined processes that reduce disruption to running systems—so installing dependencies or rebooting into special environments is less burdensome.&lt;/li>
&lt;li>Exploratory Features (Stretch Goals): Advanced mechanisms to stream disk data in real time during snapshotting and to initiate snapshots via an API call (for parity with VM snapshots).&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Topics&lt;/strong>: Cloud Computing, Systems &amp;amp; Infrastructure, Reproducibility, Operating System Internals&lt;/p>
&lt;p>&lt;strong>Skills&lt;/strong>: Linux / OS Concepts, Cloud Tools, Systems Programming / Scripting, DevOps / CI&lt;/p>
&lt;p>&lt;strong>Difficulty&lt;/strong>: Moderate&lt;/p>
&lt;p>&lt;strong>Size&lt;/strong>: Medium&lt;/p>
&lt;p>&lt;strong>Mentors&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/michael-sherman/">Michael Sherman&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mark-powers/">Mark Powers&lt;/a>&lt;/p>
&lt;p>&lt;strong>Tasks&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>Ensure Snapshot Consistency
&lt;ul>
&lt;li>Reboot into a ramdisk and copy the offline disk.&lt;/li>
&lt;li>Use kexec to switch to/from a ramdisk environment without a full reboot.&lt;/li>
&lt;li>Change images to use a snapshot-capable filesystem (e.g., LVM) for safer live snapshots.&lt;/li>
&lt;li>Investigate additional methods (e.g., blog.benjojo.co.uk) for safely imaging live disks.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Prevent System Modifications During Snapshot
&lt;ul>
&lt;li>Currently, CC-Snapshot installs dependencies (e.g., qemu-img) on the running system, affecting its state.&lt;/li>
&lt;li>In-Band Fix: Download and run tools in a temp directory with static linking, avoiding system-level changes.&lt;/li>
&lt;li>Out-of-Band Approach: Snapshots done via ramdisk or kexec do not require altering the running system.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>API-Triggered Snapshots
&lt;ul>
&lt;li>Extend or integrate with the Nova “snapshot instance” API to support the same workflow for bare metal.&lt;/li>
&lt;li>Leverage Ironic’s new “service steps” feature for an automated snapshot pipeline.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>(Stretch Goal) Streaming Snapshots
&lt;ul>
&lt;li>Modify the workflow to stream data directly to storage, rather than making a full local copy first.&lt;/li>
&lt;li>Explore incremental or differential snapshot techniques to reduce bandwidth usage and storage overhead.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul></description></item><item><title>Chameleon Trovi Support for Complex Experiment Appliances</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/uchicago/trovi/</link><pubDate>Tue, 18 Feb 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/uchicago/trovi/</guid><description>&lt;h2 id="overview">Overview&lt;/h2>
&lt;p>The discoverability and accessibility of research artifacts remains a significant barrier to reproducibility in computer science research. While digital libraries index research papers, they rarely provide direct access to the artifacts needed to reproduce experiments, especially complex multi-node systems. Additionally, when artifacts are available, they often lack standardized metadata, versioning, and deployment mechanisms that would enable researchers to easily find and reuse them. This project addresses these challenges by extending Trovi, a repository of experimental artifacts executable on open platforms, to support complex, multi-node appliances, making sophisticated experimental environments discoverable, shareable, and deployable through a standardized interface - ultimately lowering the barriers to reproducing complex systems experiments.&lt;/p>
&lt;p>&lt;a href="chameleoncloud.org/">Chameleon&lt;/a> has historically enabled researchers to orchestrate complex appliances—large, multi-node clusters configured via OpenStack Heat—to conduct advanced experiments. Meanwhile, Chameleon team introduced &lt;a href="chameleoncloud.org/experiment/share">Trovi&lt;/a> as repository for open platforms (beyond Chameleon) that pioneers mechanisms for artifact and platform integration leading to immediate execution for pratical reproducibility. This project aims to bridge the two by adding support in Trovi for importing, discovering, and launching complex appliances. By integrating these capabilities, researchers will be able to one-click deploy complex appliances directly from the Trovi dashboard, archive them for future reference, and reproduce experiments on demand.&lt;/p>
&lt;h2 id="key-outcomes">Key Outcomes&lt;/h2>
&lt;ul>
&lt;li>Extended Trovi API: Enable the import and management of complex appliances as artifacts.&lt;/li>
&lt;li>Streamlined One-Click Launch: Integrate with Chameleon’s existing provisioning workflows so users can launch multi-node clusters directly from Trovi.&lt;/li>
&lt;li>Enhanced Dashboard Experience: Provide UI assistance for discovering, reviewing, and customizing complex appliance artifacts.&lt;/li>
&lt;li>Improved Artifact Reproducibility: Automate the process of exporting CC-snapshot images and other resources to ensure everything is preserved across sites (UC, TACC), highlighting any parameters that need user attention for cross-site portability.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Topics&lt;/strong>: &lt;code>Reproducible Research&lt;/code>, &lt;code>Cloud Computing &amp;amp; Orchestration&lt;/code>, &lt;code>OpenStack Heat&lt;/code>, &lt;code>UI/UX &amp;amp; Web Development&lt;/code>&lt;/p>
&lt;p>&lt;strong>Skills&lt;/strong>: Python, APIs, Cloud (OpenStack), DevOps &amp;amp; Automation, Frontend&lt;/p>
&lt;p>&lt;strong>Difficulty&lt;/strong>: Hard&lt;/p>
&lt;p>&lt;strong>Size&lt;/strong>: Large&lt;/p>
&lt;p>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mark-powers/">Mark Powers&lt;/a>&lt;/p>
&lt;p>&lt;strong>Tasks&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>Extensions to the Trovi API
&lt;ul>
&lt;li>Add support for importing complex appliances as artifacts (including Heat templates, metadata, and associated disk images).&lt;/li>
&lt;li>Develop methods for tagging, versioning, and categorizing these appliances, making them easier to discover.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>One-Click Launch of Complex Appliances
&lt;ul>
&lt;li>Integrate with Chameleon’s orchestration engine, enabling single-click cluster deployments from the Trovi UI.&lt;/li>
&lt;li>Validate correct configuration and resource availability through automated checks.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Trovi Dashboard Enhancements
&lt;ul>
&lt;li>Update the front-end to provide intuitive controls for customizing or parameterizing complex appliances before launching.&lt;/li>
&lt;li>Offer a clear workflow for reviewing dependencies, resource requirements, and usage instructions.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Automated Export &amp;amp; Multi-Site Testing
&lt;ul>
&lt;li>Streamline the export of snapshots or images into Trovi as part of the appliance import process.&lt;/li>
&lt;li>Optionally re-run the imported appliances at multiple sites (UC, TACC), detecting any unparameterized settings or missing dependencies.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul></description></item><item><title>Contextualization – Extending Chameleon’s Orchestration for One-Click Experiment Deployment</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/uchicago/contextualization/</link><pubDate>Tue, 18 Feb 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/uchicago/contextualization/</guid><description>&lt;h2 id="overview">Overview&lt;/h2>
&lt;p>Reproducibility in computer systems research is often hindered by the quality and completeness of artifact descriptions and the complexity of establishing experimental environments. When experiments involve multiple interconnected components, researchers struggle with hardcoded configurations, inadequate documentation of setup processes, and missing validation steps that would verify correct environment establishment. This project addresses these challenges by extending orchestration capabilities beyond basic hardware provisioning to include comprehensive contextualization—making complex, multi-component experimental environments deployable via parameterized templates with clear validation points, standardized metadata, and minimal user intervention—thus significantly reducing the barriers to reproducing complex distributed systems experiments.&lt;/p>
&lt;p>&lt;a href="chameleoncloud.org">Chameleon&lt;/a> already provides powerful capabilities to orchestrate and configure resources through Heat templates (similar to Terraform) and the &lt;a href="https://python-chi.readthedocs.io/" target="_blank" rel="noopener">python-chi&lt;/a> library. However, these focus primarily on provisioning (i.e., allocating and configuring hardware resources). This project goes a step further by addressing contextualization—the process of creating complete, ready-to-use experimental environments that incorporate everything from network layout to instance-level configuration and discovery—with additional features such as parameterized templates, experiment-level metadata, and output reporting.&lt;/p>
&lt;h2 id="key-outcomes">Key Outcomes&lt;/h2>
&lt;ul>
&lt;li>Template-Based One-Click Launch: Users can deploy multi-resource experiments (VMs, networks, storage, etc.) via a single click or a minimal set of input parameters.&lt;/li>
&lt;li>Enhanced Experiment Contextualization: Each launched resource can gain access to global “experiment-level” metadata (e.g., IP-to-hostname mappings for cluster authentication) and outputs that summarize important details.&lt;/li>
&lt;li>Streamlined User Experience: An asynchronous deployment workflow that provides notifications and uses “outputs” to highlight critical connection information (e.g., bastion host IP, final results).&lt;/li>
&lt;li>Optional Advanced Features: Partial reconfiguration to avoid full rebuilds when changes are minor, an “export” function to capture existing deployments into a new template, and potential publishing to Trovi for reproducibility and archiving.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Topics&lt;/strong>: Cloud Computing &amp;amp; Orchestration, Infrastructure as Code, DevOps &amp;amp; Automation, Reproducible Research Environments&lt;/p>
&lt;p>&lt;strong>Skills&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>OpenStack &amp;amp; Heat Templates: Familiarity with provisioning resources on Chameleon using Heat or Terraform-like workflows.&lt;/li>
&lt;li>Python &amp;amp; Scripting: For enhancing or extending the python-chi library.&lt;/li>
&lt;li>Systems / Network Knowledge: Understanding multi-VM topologies, cluster configurations, and network-level interactions.&lt;/li>
&lt;li>CI/CD &amp;amp; DevOps: Experience building or integrating asynchronous deployment and notifications.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Difficulty&lt;/strong>: Hard&lt;/p>
&lt;p>&lt;strong>Size&lt;/strong>: Large (suitable for a semester-long project or a summer internship)&lt;/p>
&lt;p>&lt;strong>Mentors&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/paul-marshall/">Paul Marshall&lt;/a>&lt;/p>
&lt;p>&lt;strong>Tasks&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>One-Click Template Launch
&lt;ul>
&lt;li>Design a template (in Heat or similar) specifying multiple cloud resources (images, networks, disk images, SSH keys, etc.).&lt;/li>
&lt;li>Ensure the template author can define input parameters with defaults.&lt;/li>
&lt;li>Allow the user to launch the template quickly with default values or adjust parameters before deployment.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Asynchronous Provisioning &amp;amp; Notifications
&lt;ul>
&lt;li>Implement a long-running process that deploys resources step-by-step.&lt;/li>
&lt;li>Provide status updates to the user (e.g., via UI notifications, email, or logs) when deployments complete or fail.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Experiment-Level Metadata
&lt;ul>
&lt;li>Inject metadata such as IP-to-hostname mappings to each instance for easy cluster authentication.&lt;/li>
&lt;li>Allow the template to define “outputs” (like a public IP of a bastion or location of final results).&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Partial Reconfiguration (Optional)
&lt;ul>
&lt;li>Enable partial updates if only one of several servers changes, saving time and resources.&lt;/li>
&lt;li>Improve fault tolerance by avoiding full redeploys in the event of partial failures.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Export Running Configurations into a New Template (Optional)
&lt;ul>
&lt;li>Build a web-interface or script to detect existing user-owned resources (servers, networks, etc.).&lt;/li>
&lt;li>Generate a proposed template from those resources, suggesting parameters (e.g., flavor, disk image, or SSH key).&lt;/li>
&lt;li>Extend or modify existing templates by adding discovered resources.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Integration with Trovi / Multi-Site Testing (Optional)
&lt;ul>
&lt;li>Provide a method to archive or publish the final template (and associated disk images, data sets) in Trovi.&lt;/li>
&lt;li>Attempt to re-run the template at multiple Chameleon sites (e.g., UC, TACC) to identify parameters or modifications needed for cross-site reproducibility.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul></description></item><item><title>MPI Appliance for HPC Research on Chameleon</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/uchicago/mpi/</link><pubDate>Tue, 18 Feb 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/uchicago/mpi/</guid><description>&lt;h2 id="overview">Overview&lt;/h2>
&lt;p>Message Passing Interface (MPI) is the dominant programming model for high-performance computing (HPC), enabling applications to scale efficiently across thousands of processing cores. In reproducibility initiatives for HPC research, MPI implementations are critical as they manage the complex communications that underpin parallel scientific applications. However, reproducing MPI-based experiments remains challenging due to the need for specific library versions, network configurations, and multi-node setups that must be precisely orchestrated.&lt;/p>
&lt;p>The popularity of an “MPI cluster” as a base layer for many results in HPC caused support for MPI template and appliance to be specifically requested by the SC24 reproducibility chair to support the conference&amp;rsquo;s reproducibility initiative, providing researchers with standardized environments for validating results. By extending the work begun for SC24, this project aims to create higher-quality, ready-to-use, and maintainable MPI environments for the Chameleon testbed that abstracts away complex configuration details while ensuring consistent performance across experiments—thus making HPC experiments more accessible and reproducible for the broader research community.&lt;/p>
&lt;p>You will lead efforts to configure disk images with the necessary MPI dependencies and provide orchestration templates that set up networking and instances automatically. The resulting appliance will allow researchers to quickly and consistently deploy distributed computing environments with MPI. The goal is to facilitate reproducible and scalable computational experiments for a wide range of scientific and engineering applications.&lt;/p>
&lt;h1 id="key-outcomes">Key Outcomes&lt;/h1>
&lt;ul>
&lt;li>Ready-to-Use MPI Disk Images: Create one or more images pre-configured with the correct versions of MPI and dependencies, ensuring a consistent environment.&lt;/li>
&lt;li>Simple Cluster Configuration Scripts: Provide scripts or playbooks that efficiently bring up a fully functional MPI cluster on Chameleon, abstracting away manual setup steps.&lt;/li>
&lt;li>Orchestration Template: An automated workflow that sets up networks, instances, and additional resources needed to run large-scale MPI workloads.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Topics&lt;/strong>: High-Performance Computing (HPC), Cloud Computing, MPI &amp;amp; Distributed Systems, DevOps &amp;amp; Automation&lt;/p>
&lt;p>&lt;strong>Skills&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>MPI &amp;amp; Parallel Programming: Understanding of MPI libraries, cluster configuration, and typical HPC workflows.&lt;/li>
&lt;li>Cloud Orchestration: Familiarity with OpenStack Heat or other Infrastructure-as-Code (IaC) tools for provisioning resources.&lt;/li>
&lt;li>Linux System Administration: Experience configuring and troubleshooting packages, network settings, and performance optimizations.&lt;/li>
&lt;li>Scripting &amp;amp; Automation: Ability to write scripts (e.g., Bash, Python) to automate setup and deployment steps.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Difficulty&lt;/strong>: Moderate to Hard&lt;/p>
&lt;p>&lt;strong>Size&lt;/strong>: Medium&lt;/p>
&lt;p>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ken-raffenetti/">Ken Raffenetti&lt;/a>&lt;/p>
&lt;p>&lt;strong>Tasks&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Disk Images with MPI Dependencies
&lt;ul>
&lt;li>Build base images with the correct versions of MPI (e.g., MPICH, OpenMPI) and any required libraries (e.g., GCC, network libraries).&lt;/li>
&lt;li>Ensure all packages are up to date and tested for compatibility with Chameleon’s bare metal and/or VM environments.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Cluster Setup Scripts
&lt;ul>
&lt;li>Develop lightweight scripts or Ansible playbooks that join new instances into an MPI cluster, configuring hostnames, SSH keys, and MPI runtime settings.&lt;/li>
&lt;li>Validate cluster functionality by running simple distributed “Hello World” tests and more advanced benchmarks (e.g., Intel MPI Benchmarks).&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Orchestration Template
&lt;ul>
&lt;li>Provide a Heat template (or similar) specifying the network configuration, instance counts, and environment variables for MPI.&lt;/li>
&lt;li>Enable easy parameterization of cluster size, disk images, and other variables so users can customize their setups on the fly.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Integration &amp;amp; Testing
&lt;ul>
&lt;li>Document best practices for launching and using the MPI images in Chameleon.&lt;/li>
&lt;li>Demonstrate reproducibility with multiple cluster sizes and workloads to ensure reliability.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul></description></item><item><title>Smart Environments – An AI System for Reproducible Custom Computing Environments</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/uchicago/envgym/</link><pubDate>Tue, 18 Feb 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/uchicago/envgym/</guid><description>&lt;h2 id="overview">Overview&lt;/h2>
&lt;p>The complexity of environment setup and the expertise required to configure specialized software stacks can often hinder efforts to reproduce important scientific achievements in HPC and systems studies. Researchers often struggle with incomplete or ambiguous artifact descriptions that make assumptions about &amp;ldquo;common knowledge&amp;rdquo; that is actually specific domain expertise. When trying to reproduce experiments, reviewers may spend excessive time debugging environment inconsistencies rather than evaluating the actual research. These challenges are compounded when experiments need to run on different hardware configurations.&lt;/p>
&lt;p>This project seeks to address these fundamental reproducibility barriers by using AI to translate natural language environment requirements often used in papers or artifact descriptions into actionable, reproducible configurations—bridging the knowledge gap between experiment authors and reviewers while standardizing environment creation across different hardware platforms. We will develop an AI-driven system that automatically generates and configures reproducible computing environments based on artifact descriptions from conferences, Trovi artifacts on the &lt;a href="chameleoncloud.org">Chameleon&lt;/a> testbed, and other reliable sources for scientific experiment code and associated documentation. Leveraging Natural Language Processing (NLP), the system will allow researchers to describe desired environments in plain English, then map those descriptions onto predefined configuration templates. By simplifying environment creation and ensuring reproducibility, the system promises to eliminate duplicate setup efforts, accelerate research workflows, and promote consistent experimentation practices across diverse hardware.&lt;/p>
&lt;h2 id="key-outcomes">Key Outcomes&lt;/h2>
&lt;ul>
&lt;li>Working Prototype: A system that automatically generates machine images deployable on bare metal and VM instances, based on user-provided requirements.&lt;/li>
&lt;li>Comprehensive Documentation: Detailed user manuals, guides, and best practices tailored to researchers, ensuring a smooth adoption process.&lt;/li>
&lt;li>Live Demo: A demonstration environment (e.g., a web app or Jupyter notebook) that shows how to request, configure, and launch reproducible cloud environments on both hardware profiles.&lt;/li>
&lt;li>Long-Term Impact: Building blocks for future AI-driven automation of cloud infrastructure, reducing human error and enabling fast, repeatable research pipelines.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Topics&lt;/strong>: Reproducibility, AI &amp;amp; NLP, Cloud Computing, DevOps and Automation&lt;/p>
&lt;p>&lt;strong>Skills&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>Machine Learning / AI: Familiarity with NLP methods to interpret user requirements.&lt;/li>
&lt;li>Python: Primary language for backend services and cloud interactions.&lt;/li>
&lt;li>Cloud API Integration: Experience with OpenStack or similar APIs to provision and configure images on both bare metal and virtual machines.&lt;/li>
&lt;li>DevOps: Automated environment configuration, CI/CD workflows, and containerization.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Difficulty&lt;/strong>: Hard&lt;/p>
&lt;p>&lt;strong>Size&lt;/strong>: Large&lt;/p>
&lt;p>&lt;strong>Mentors&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/paul-marshall/">Paul Marshall&lt;/a>&lt;/p>
&lt;p>&lt;strong>Tasks&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>Requirement Gathering &amp;amp; NLP Design
&lt;ul>
&lt;li>Research the specific needs of researchers building experimental setups.&lt;/li>
&lt;li>Design an NLP pipeline to parse plain-English descriptions (e.g., “I need Python 3.9, CUDA 11, and scikit-learn”) into environment “recipes.”&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Backend Environment Builder
&lt;ul>
&lt;li>Implement logic that converts parsed user requirements into machine-image definitions for bare metal and VM instances.&lt;/li>
&lt;li>Integrate with Chameleon’s APIs to provision servers, install software, and run configuration validation automatically.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Front-End &amp;amp; User Experience
&lt;ul>
&lt;li>Develop an intuitive web or CLI interface that researchers can use to capture experiment environment requirements.&lt;/li>
&lt;li>Provide real-time status updates during environment setup, along with meaningful error messages and quick-start templates.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Testing &amp;amp; Validation
&lt;ul>
&lt;li>Conduct end-to-end tests using diverse software stacks (e.g., HPC libraries, machine learning frameworks) on bare metal and VM instances.&lt;/li>
&lt;li>Ensure reproducibility by re-creating the same environment multiple times and comparing configurations.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Documentation &amp;amp; Demonstration
&lt;ul>
&lt;li>Produce user-facing documentation, including tutorials and best practices for researchers who frequently run experiments on Chameleon Cloud.&lt;/li>
&lt;li>Create a short live demo or screencast showcasing how to configure an environment for a specific research workflow.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul></description></item><item><title>Smart Environments – An AI System for Reproducible Custom Computing Environments</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/uchicago/smart-environments/</link><pubDate>Tue, 18 Feb 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/uchicago/smart-environments/</guid><description>&lt;h2 id="overview">Overview&lt;/h2>
&lt;p>The complexity of environment setup and the expertise required to configure specialized software stacks can often hinder efforts to reproduce important scientific achievements in HPC and systems studies. Researchers often struggle with incomplete or ambiguous artifact descriptions that make assumptions about &amp;ldquo;common knowledge&amp;rdquo; that is actually specific domain expertise. When trying to reproduce experiments, reviewers may spend excessive time debugging environment inconsistencies rather than evaluating the actual research. These challenges are compounded when experiments need to run on different hardware configurations.&lt;/p>
&lt;p>This project seeks to address these fundamental reproducibility barriers by using AI to translate natural language environment requirements often used in papers or artifact descriptions into actionable, reproducible configurations—bridging the knowledge gap between experiment authors and reviewers while standardizing environment creation across different hardware platforms. We will develop an AI-driven system that automatically generates and configures reproducible computing environments based on artifact descriptions from conferences, Trovi artifacts on the &lt;a href="chameleoncloud.org">Chameleon&lt;/a> testbed, and other reliable sources for scientific experiment code and associated documentation. Leveraging Natural Language Processing (NLP), the system will allow researchers to describe desired environments in plain English, then map those descriptions onto predefined configuration templates. By simplifying environment creation and ensuring reproducibility, the system promises to eliminate duplicate setup efforts, accelerate research workflows, and promote consistent experimentation practices across diverse hardware.&lt;/p>
&lt;h2 id="key-outcomes">Key Outcomes&lt;/h2>
&lt;ul>
&lt;li>Working Prototype: A system that automatically generates machine images deployable on bare metal and VM instances, based on user-provided requirements.&lt;/li>
&lt;li>Comprehensive Documentation: Detailed user manuals, guides, and best practices tailored to researchers, ensuring a smooth adoption process.&lt;/li>
&lt;li>Live Demo: A demonstration environment (e.g., a web app or Jupyter notebook) that shows how to request, configure, and launch reproducible cloud environments on both hardware profiles.&lt;/li>
&lt;li>Long-Term Impact: Building blocks for future AI-driven automation of cloud infrastructure, reducing human error and enabling fast, repeatable research pipelines.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Topics&lt;/strong>: Reproducibility, AI &amp;amp; NLP, Cloud Computing, DevOps and Automation&lt;/p>
&lt;p>&lt;strong>Skills&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>Machine Learning / AI: Familiarity with NLP methods to interpret user requirements.&lt;/li>
&lt;li>Python: Primary language for backend services and cloud interactions.&lt;/li>
&lt;li>Cloud API Integration: Experience with OpenStack or similar APIs to provision and configure images on both bare metal and virtual machines.&lt;/li>
&lt;li>DevOps: Automated environment configuration, CI/CD workflows, and containerization.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Difficulty&lt;/strong>: Hard&lt;/p>
&lt;p>&lt;strong>Size&lt;/strong>: Large&lt;/p>
&lt;p>&lt;strong>Mentors&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/paul-marshall/">Paul Marshall&lt;/a>&lt;/p>
&lt;p>&lt;strong>Tasks&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>Requirement Gathering &amp;amp; NLP Design
&lt;ul>
&lt;li>Research the specific needs of researchers building experimental setups.&lt;/li>
&lt;li>Design an NLP pipeline to parse plain-English descriptions (e.g., “I need Python 3.9, CUDA 11, and scikit-learn”) into environment “recipes.”&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Backend Environment Builder
&lt;ul>
&lt;li>Implement logic that converts parsed user requirements into machine-image definitions for bare metal and VM instances.&lt;/li>
&lt;li>Integrate with Chameleon’s APIs to provision servers, install software, and run configuration validation automatically.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Front-End &amp;amp; User Experience
&lt;ul>
&lt;li>Develop an intuitive web or CLI interface that researchers can use to capture experiment environment requirements.&lt;/li>
&lt;li>Provide real-time status updates during environment setup, along with meaningful error messages and quick-start templates.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Testing &amp;amp; Validation
&lt;ul>
&lt;li>Conduct end-to-end tests using diverse software stacks (e.g., HPC libraries, machine learning frameworks) on bare metal and VM instances.&lt;/li>
&lt;li>Ensure reproducibility by re-creating the same environment multiple times and comparing configurations.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Documentation &amp;amp; Demonstration
&lt;ul>
&lt;li>Produce user-facing documentation, including tutorials and best practices for researchers who frequently run experiments on Chameleon Cloud.&lt;/li>
&lt;li>Create a short live demo or screencast showcasing how to configure an environment for a specific research workflow.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul></description></item><item><title>Widgets for Python-chi in Jupyter</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/uchicago/jupyter-widgets/</link><pubDate>Tue, 18 Feb 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/uchicago/jupyter-widgets/</guid><description>&lt;h2 id="overview">Overview&lt;/h2>
&lt;p>Reproducibility challenges in research extend beyond code and environments to the experimental workflow itself. When experiments involve dynamic resource allocation, monitoring, and reconfiguration, researchers often struggle to document these interactive steps in a way that others can precisely follow. The lack of structured workflow documentation and real-time feedback creates barriers for reviewers attempting to reproduce experiments, as they cannot easily verify whether their resource configurations match the original experiment&amp;rsquo;s state. This project addresses these challenges by developing interactive Jupyter widgets that make experiment resource management more visual, intuitive, and self-documenting—transforming ad-hoc command sequences into reproducible workflows that automatically log interactions and configuration changes while providing immediate visual feedback on experiment topology and resource states.&lt;/p>
&lt;p>As cloud researchers often work with Jupyter Notebooks for interactive data analysis and experimentation, the &lt;a href="https://python-chi.readthedocs.io/" target="_blank" rel="noopener">python-chi&lt;/a> library offers a powerful way to automate and control resources on &lt;a href="chameleoncloud.org">Chameleon Cloud&lt;/a>. This project will extend python-chi by adding interactive widgets specifically designed for use in Jupyter, empowering users to launch, monitor, and manage their experiments without leaving the notebook environment. By bringing visual and intuitive controls directly into the user’s workflow, we aim to improve both reproducibility and usability for complex resource management tasks.&lt;/p>
&lt;h2 id="key-outcomes">Key Outcomes&lt;/h2>
&lt;ul>
&lt;li>User-Friendly Jupyter Widgets: Develop a suite of widgets to visualize reserved resources, hardware availability, and experiment topologies in real time.&lt;/li>
&lt;li>Integrated Experiment Management: Enable researchers to orchestrate experiments (launch, configure, monitor) within a single, notebook-centric workflow.&lt;/li>
&lt;li>Enhanced Feedback &amp;amp; Usability: Provide clear, asynchronous status updates and resource reconfiguration progress, reducing confusion and user error.&lt;/li>
&lt;li>Improved Reproducibility: By automating and logging widget interactions, experiments become more traceable and easier to replicate.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Topics&lt;/strong>: Interactive Data Tools, Cloud Resource Management, DevOps &amp;amp; Automation, User Experience (UX)&lt;/p>
&lt;p>&lt;strong>Skills&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>Python &amp;amp; Jupyter: Experience creating custom Jupyter widgets, using ipywidgets or similar frameworks.&lt;/li>
&lt;li>Cloud Automation: Familiarity with how resources are provisioned, monitored, and deprovisioned on Chameleon.&lt;/li>
&lt;li>Frontend / GUI Development: Basic understanding of web technologies (HTML/CSS/JavaScript) can be helpful for widget design.&lt;/li>
&lt;li>Software Engineering &amp;amp; CI: Ability to version-control, test, and deploy Python packages.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Difficulty&lt;/strong>: Moderate&lt;/p>
&lt;p>&lt;strong>Size&lt;/strong>: Medium&lt;/p>
&lt;p>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/michael-sherman/">Michael Sherman&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mark-powers/">Mark Powers&lt;/a>&lt;/p>
&lt;p>&lt;strong>Tasks&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>Resource Visualization Widgets
&lt;ul>
&lt;li>Build custom widgets that show reserved resources (nodes, networks, storage) in Jupyter.&lt;/li>
&lt;li>Provide an interactive topology view for experiments, indicating node statuses and connections.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Experiment Setup &amp;amp; Execution
&lt;ul>
&lt;li>Add controls for launching and managing experiments directly from notebooks.&lt;/li>
&lt;li>Show feedback (e.g., progress bars, status messages) as resources are being allocated or reconfigured.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Hardware Availability &amp;amp; Status Tracking
&lt;ul>
&lt;li>Implement a widget that provides real-time data on Chameleon’s hardware availability (bare metal, VMs, GPU nodes, etc.).&lt;/li>
&lt;li>Allow users to filter or select specific resources based on current hardware states.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Usability &amp;amp; Feedback Loop
&lt;ul>
&lt;li>Gather user feedback on the widget designs and workflows.&lt;/li>
&lt;li>Refine the interface to minimize clicks, improve clarity, and reduce friction for common tasks.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul></description></item><item><title>Open Testbed for Reproducible Evaluation of Replicated Systems at the Edges</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/umass/edge-replication/</link><pubDate>Sat, 15 Feb 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/umass/edge-replication/</guid><description>&lt;h2 id="project-description">Project Description&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: Distributed systems&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Java, Go, Python, Bash scripting, Linux, Docker.&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Hard&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors&lt;/strong>: &lt;a href="mailto:fikurnia@cs.umass.edu">Fadhil I. Kurnia&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Replication is commonly employed to improve system availability and reduce latency. By maintaining multiple copies, the system can continue operating even if some replicas fail, thereby ensuring consistent availability. Placing replicas closer to users further decreases latency by minimizing the distance data must travel. A typical illustration of these advantages is a Content Delivery Network (CDN), where distributing content to edge servers can yield latencies of under 10 milliseconds when users and contents are in the same city.&lt;/p>
&lt;p>In recent times, numerous edge datastores have emerged, allowing dynamic data to be served directly from network-edge replicas. Each of these replicated systems may employ different coordination protocols to synchronize replicas, leading to varied performance and consistency characteristics. For instance, Workers KV relies on a push-based coordination mechanism that provides eventual consistency, whereas Cloudflare Durable Objects and Turso deliver stronger consistency guarantees. Additionally, researchers have introduced various coordination protocols—such as SwiftPaxos, EPaxos, OPaxos, WPaxos, Raft, PANDO, and QuePaxa—each exhibiting its own performance profile, especially when being used in geo-distributed deployment.&lt;/p>
&lt;p>This project aims to develop an open testbed for evaluating replicated systems and their coordination protocols under edge deployment. Currently, researchers face challenges in fairly comparing different replicated systems, as they often lack control over replica placement. Many previous studies on coordination protocols and replicated systems relied on mock implementations, particularly for well-known systems like Dynamo and Spanner, which are not open source. An open testbed would provide a standardized environment where researchers can compare various replicated systems, classes of coordination protocols, and specific protocol implementations using common benchmarks. Since the performance of replicated systems and coordination protocols varies depending on the application, workload, and replica placement, this testbed would offer a more systematic and fair evaluation framework. Furthermore, by enabling easier testing and validation, the testbed could accelerate the adoption of research prototypes in the industry.&lt;/p>
&lt;h2 id="project-deliverables">Project Deliverables&lt;/h2>
&lt;ul>
&lt;li>Compilation of traces and applications from various open traces and open benchmarks.&lt;/li>
&lt;li>Distributed workload generator to run the traces and applications.&lt;/li>
&lt;li>Test framework to simulate latency of 100s of edge servers for measurement.&lt;/li>
&lt;li>Open artifact of the traces, applications, workload generator, and test framework, published on Github.&lt;/li>
&lt;/ul></description></item><item><title>StatWrap</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/northwestern/statwrap/</link><pubDate>Sun, 09 Feb 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/northwestern/statwrap/</guid><description>&lt;p>&lt;a href="https://sites.northwestern.edu/statwrap/" target="_blank" rel="noopener">StatWrap&lt;/a> is a free and open-source assistive, non-invasive discovery and inventory tool to document research projects. It inventories project assets (e.g., code files, data files, manuscripts, documentation) and organizes information without additional input from the user. It also provides structure for users to add searchable and filterable notes connected to files to help communicate metadata about intent and analysis steps.&lt;/p>
&lt;p>At its core, StatWrap helps investigators identify and track changes in a research project as it evolves - which may affect reproducibility. For example: (1) people on the project can change over time, so processes may not be consistently executed due to transitions in employment; (2) data changes over time, due to accruing additional cases, adding new variables, or correcting mistakes in existing data; (3) software (e.g. used for data preparation and statistical analysis) evolves as it is edited, improved, and optimized; and (4) software can break or produce different results due to changes &amp;lsquo;under the hood&amp;rsquo; such as updates to statistical packages, compilers, or interpreters. StatWrap passively and actively documents these changes to support reproducibility.&lt;/p>
&lt;p>Additional information:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://sites.northwestern.edu/statwrap/" target="_blank" rel="noopener">StatWrap home&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/stattag/statwrap" target="_blank" rel="noopener">StatWrap code (GitHub)&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="project-search">Project Search&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>search&lt;/code>, &lt;code>user interface&lt;/code>, &lt;code>indexing&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: JavaScript, React&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/luke-rasmussen/">Luke Rasmussen&lt;/a>, &lt;a href="mailto:ewhitley@northwestern.edu">Eric Whitley&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The goal of this project is to leverage the information entered by users and passively discovered by StatWrap to facilitate cross-project searching. This functionality will allow investigators to search across projects (current and past) to find relevant projects, assets, and notes. Given the potentially sensitive nature of data included in projects, the indexing of content for searching must be done locally.&lt;/p>
&lt;p>The specific tasks of the project include:&lt;/p>
&lt;ul>
&lt;li>Identify and evaluate open-source projects to index content for searching&lt;/li>
&lt;li>Add a new classification for projects of “Active” and “Past” in the user interface&lt;/li>
&lt;li>Implement the search capability within the user interface&lt;/li>
&lt;li>Develop unit tests and conduct system testing&lt;/li>
&lt;/ul></description></item><item><title>Final Blog: SS_Bench - Benchmarking SciStream</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/scistream/20240820-kraislaik/</link><pubDate>Fri, 31 Jan 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/scistream/20240820-kraislaik/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Hello! My name is Acheme, and I&amp;rsquo;m thrilled to have collaborated with my mentors &lt;a href="https://github.com/ucsc-ospo/ucsc-ospo.github.io/blob/main/content/authors/chungmiranda/_index.md" target="_blank" rel="noopener">Joaquin Chung&lt;/a> and &lt;a href="https://github.com/ucsc-ospo/ucsc-ospo.github.io/blob/main/content/authors/fcastro/_index.md" target="_blank" rel="noopener">Flavio Castro&lt;/a> under the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/anl/scistream/">SciStream&lt;/a> project. This project aims to develop SciStream-bench, a set of benchmarks and artifacts designed to precisely evaluate the performance of scientific streaming applications across diverse traffic patterns when running over the SciStream framework.&lt;/p>
&lt;p>In the first half of the project, I focused on describing scientific streaming profiles based on use-cases experienced at Argonne National Lab. The necessary python scripts were developed to generate bursty and constant rate streaming traffic profiles.&lt;/p>
&lt;p>In the second half, I built upon this foundation by conducting experiments with the traffic profiles and measuring performance through metrics of latency, jitter and throughput. These experiments were conducted with different message sizes across LAN and WAN network topology.&lt;/p>
&lt;h2 id="key-achievements">Key Achievements&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Streaming Traffic Profile:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Developed scripts to generate streaming traffic profiles with configurable parameters.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Created an Artifact:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>I created an artifact using a Jupyter notebook to document an easy to follow integration of SciStream with FABRIC testbed for future experimenters.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;h2 id="conclusion-and-future-work">Conclusion and Future Work&lt;/h2>
&lt;p>The work demonstrated that SciStream offers tolerable overhead for secure data streaming and experimentation with this middlebox is possible in publicly available testbed like FABRIC.
Future work would be to look into the comparative analysis of the performance of SciStream with or without hardware acceleration or offloading.&lt;/p>
&lt;h2 id="deliverables">Deliverables&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>SciStream on FABRIC Demo:&lt;/strong> A demo can be found here on how to integrate SciStream on the FABRIC testbed &lt;a href="https://www.youtube.com/watch?v=2NNAWPAreU8" target="_blank" rel="noopener">SciStream on FABRIC&lt;/a>.&lt;/li>
&lt;li>&lt;strong>Jupyter Notebook:&lt;/strong> An Artifact on FABRIC portal: &lt;a href="https://artifacts.fabric-testbed.net/artifacts/1d604943-b5c0-4046-9971-ffb8f2535e42" target="_blank" rel="noopener">FABRIC Artifact&lt;/a>.&lt;/li>
&lt;/ul></description></item><item><title>Final Report: Deriving Realistic Performance Benchmarks for Python Interpreters</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uutah/static-python-perf/20241113-mrigankpawagi/</link><pubDate>Tue, 12 Nov 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uutah/static-python-perf/20241113-mrigankpawagi/</guid><description>&lt;p>Hi, I am Mrigank. As a &lt;em>Summer of Reproducibility 2024&lt;/em> fellow, I have been working on &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uutah/static-python-perf/20240817-mrigankpawagi/">deriving realistic performance benchmarks for Python interpreters&lt;/a> with &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ben-greenman/">Ben Greenman&lt;/a> from the University of Utah. In particular, we want to benchmark Meta&amp;rsquo;s Static Python interpreter (which is a part of their Cinder project) and compare its performance with CPython on different levels of typing. In this post, I will share updates on my work since my &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uutah/static-python-perf/20240909-mrigankpawagi/">last update&lt;/a>. This post forms my final report for the &lt;em>Summer of Reproducibility 2024&lt;/em>.&lt;/p>
&lt;h2 id="since-last-time-typing-django-files">Since Last Time: Typing Django Files&lt;/h2>
&lt;p>Based on the profiling results from load testing a Wagtail blog site, I identified three modules in Django that were performance bottlenecks and added shallow types to them. These are available on our GitHub repository.&lt;/p>
&lt;ol>
&lt;li>&lt;a href="https://github.com/utahplt/static-python-perf/blob/main/Benchmark/django/shallow/db/backends/sqlite3/_functions.py" target="_blank" rel="noopener">&lt;code>django.db.backends.sqlite3._functions&lt;/code>&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/utahplt/static-python-perf/blob/main/Benchmark/django/shallow/utils/functional.py" target="_blank" rel="noopener">&lt;code>django.utils.functional&lt;/code>&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/utahplt/static-python-perf/blob/main/Benchmark/django/shallow/views/debug.py" target="_blank" rel="noopener">&lt;code>django.views.debug&lt;/code>&lt;/a>&lt;/li>
&lt;/ol>
&lt;p>I also wrote a &lt;a href="https://github.com/utahplt/static-python-perf/tree/main/Tool_shed/driver" target="_blank" rel="noopener">script&lt;/a> to mix untyped, shallow-typed, and advanced-typed versions of a Python module and create a series of such &lt;em>gradually typed&lt;/em> versions.&lt;/p>
&lt;h2 id="summary-of-experience-and-contributions">Summary of Experience and Contributions&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>I tried to set up different versions of Zulip to make them work with Static Python. My setup scripts are available in our &lt;a href="https://github.com/utahplt/static-python-perf/tree/main/Benchmark/zulip" target="_blank" rel="noopener">repository&lt;/a>. Unfortunately, Zulip&amp;rsquo;s Zerver did not run with Static Python due to incompatibility of some Django modules. A few non-Django modules were also initially throwing errors when run with Static Python due to a &lt;a href="https://github.com/facebookincubator/cinder/issues/137" target="_blank" rel="noopener">bug in Cinder&lt;/a> – but I was able to get around with a hack (which I have described in the linked GitHub issue I opened on Cinder&amp;rsquo;s repository).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>I created a &lt;em>locust-version&lt;/em> of the small Django-related benchmarks available in &lt;a href="https://github.com/python/pyperformance" target="_blank" rel="noopener">pyperperformance&lt;/a> and &lt;a href="https://github.com/facebookarchive/skybison" target="_blank" rel="noopener">skybison&lt;/a>. This helped me confirm that Django is by itself compatible with Static Python, and helped me get started with Locust. This too is available in our &lt;a href="https://github.com/utahplt/static-python-perf/tree/main/Benchmark/django_sample" target="_blank" rel="noopener">repository&lt;/a>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>As described in the midterm report, I created a complete pipeline with Locust to simulate real-world load on a Wagtail blog site. The instructions and scripts for running these load tests as well as profiling the Django codebase are available (like everything else!) in our &lt;a href="https://github.com/utahplt/static-python-perf/tree/main/Benchmark/wagtail" target="_blank" rel="noopener">repository&lt;/a>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>We added shallow types to the three Django modules mentioned above, and I created scripts to mix untyped, shallow-typed, and advanced-typed versions of a Python module to create a series of &lt;em>gradually typed&lt;/em> versions to be tested for performance. We found that advanced-typed code may often be structurally incompatible with shallow-typed code and are looking for a solution for this. We are tracking some examples of this in a &lt;a href="https://github.com/utahplt/static-python-perf/issues/16" target="_blank" rel="noopener">GitHub issue&lt;/a>.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="going-forward">Going Forward&lt;/h2>
&lt;p>I had a great time exploring Static Python, typing in Python, load testing, and all other aspects of this project. I was also fortunate to have a helpful mentor along with other amazing team members in the group. During this project, we hit several roadblocks like the challenges in setting up real-world applications with Static Python and the difficulty in adding &lt;em>advanced&lt;/em> types – but are managing to work around them. I will be continuing to work on this project until we have a complete set of benchmarks and a comprehensive report on the performance of Static Python.&lt;/p>
&lt;p>Our work will continue to be open-sourced and available on our &lt;a href="https://github.com/utahplt/static-python-perf" target="_blank" rel="noopener">GitHub repository&lt;/a> for anyone interested in following along or contributing.&lt;/p></description></item><item><title>[Final Report] Automated Reproducibility Checklist support within StatWrap</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/statwrap/20241102-adi/</link><pubDate>Sat, 02 Nov 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/statwrap/20241102-adi/</guid><description>&lt;p>Namaste🙏🏻! I&amp;rsquo;m &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/adi-akhilesh-singh/">Adi Akhilesh Singh&lt;/a>, and I&amp;rsquo;m excited to share my final updates on the &lt;a href="https://drive.google.com/file/d/1xV7eHL9lIWGKueQJxBks6OB_rcXCr8JY/view?usp=sharing" target="_blank" rel="noopener">Reproducibility Checklists project&lt;/a> by StatWrap, under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/luke-rasmussen/">Luke Rasmussen&lt;/a>.&lt;/p>
&lt;h2 id="project-overview">Project Overview&lt;/h2>
&lt;p>This project introduces customizable reproducibility checklists in StatWrap, enabling metadata-driven and user-guided generation of checklists. The goal is to enhance the reproducibility of research projects by providing researchers with structured and comprehensive checklist to ensure their work is reproducible.&lt;/p>
&lt;h2 id="project-links">Project Links&lt;/h2>
&lt;p>Explore the StatWrap project repository and my contributions during GSoC &amp;lsquo;24:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/StatTag/StatWrap" target="_blank" rel="noopener">StatWrap&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/StatTag/StatWrap/tree/gsoc24" target="_blank" rel="noopener">GSoC &amp;lsquo;24 Contributions&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="progress-and-achievements">Progress And Achievements&lt;/h2>
&lt;p>During the timeline of this project, I worked on designing the interface for the checklist page and the data structure to support the project needs.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Checklist Interface" srcset="
/report/osre24/ucsc/statwrap/20241102-adi/interface_hu5405d4f4fe0fcc5c29037ce596b14456_175744_0e20d5ebd32af685d0d2ccea73085611.webp 400w,
/report/osre24/ucsc/statwrap/20241102-adi/interface_hu5405d4f4fe0fcc5c29037ce596b14456_175744_3b54d1a2b420d4de3f33b717849e243e.webp 760w,
/report/osre24/ucsc/statwrap/20241102-adi/interface_hu5405d4f4fe0fcc5c29037ce596b14456_175744_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/statwrap/20241102-adi/interface_hu5405d4f4fe0fcc5c29037ce596b14456_175744_0e20d5ebd32af685d0d2ccea73085611.webp"
width="760"
height="432"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
The interface was designed with user needs in mind, featuring components such as:&lt;/p>
&lt;ul>
&lt;li>URLs component to manage external links or file URIs, attached to the project.&lt;/li>
&lt;li>Images component to display project image files.&lt;/li>
&lt;li>Checklist Notes component to manage user-added notes.&lt;/li>
&lt;/ul>
&lt;p>All these assets (Files, URLs, Images) can be added to each checklist statement using the existing assets and external resources(urls) present in the project.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Add Asset Dialog" srcset="
/report/osre24/ucsc/statwrap/20241102-adi/addasset_huf5f93b812eac7fe9e6235b66e18b25cf_152586_046cbc4227c33853dde195b066b2af19.webp 400w,
/report/osre24/ucsc/statwrap/20241102-adi/addasset_huf5f93b812eac7fe9e6235b66e18b25cf_152586_2076fca94822416fc8dfb0806ae54833.webp 760w,
/report/osre24/ucsc/statwrap/20241102-adi/addasset_huf5f93b812eac7fe9e6235b66e18b25cf_152586_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/statwrap/20241102-adi/addasset_huf5f93b812eac7fe9e6235b66e18b25cf_152586_046cbc4227c33853dde195b066b2af19.webp"
width="760"
height="432"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
Additionally, for each checklist item, StatWrap runs relevant scans to provide meaningful data based on its requirements. For example, for the item, “All the software dependencies for the project are documented,” StatWrap scans project files to list the languages and dependencies detected.
For each checklist statement supported in StatWrap, we implement methods to retrieve specific information by scanning project data. StatWrap currently supports six such checklist statements identified as foundational for ensuring research reproducibility.
Additionally, the checklist can be exported as a PDF summary, generated by StatWrap using the checklist data, with options to include notes.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Checklist Report" srcset="
/report/osre24/ucsc/statwrap/20241102-adi/report_hu7a01407dc27c71052bc56e4eb6e3d4fb_270768_06eff861f558dd904b00349a9a2d2717.webp 400w,
/report/osre24/ucsc/statwrap/20241102-adi/report_hu7a01407dc27c71052bc56e4eb6e3d4fb_270768_70ef332c2c3871d3a097995a59a7dd65.webp 760w,
/report/osre24/ucsc/statwrap/20241102-adi/report_hu7a01407dc27c71052bc56e4eb6e3d4fb_270768_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/statwrap/20241102-adi/report_hu7a01407dc27c71052bc56e4eb6e3d4fb_270768_06eff861f558dd904b00349a9a2d2717.webp"
width="760"
height="432"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="future-prospects">Future Prospects&lt;/h2>
&lt;p>As the project concludes, several areas for growth have emerged:&lt;/p>
&lt;ul>
&lt;li>Expanding language support within StatWrap. While StatWrap already includes key languages used in research, there is always a scope to extend compatibility to cover even more technologies.&lt;/li>
&lt;li>Options to export a data-extensive report that includes checklist and their associated scan results.
These and other enhancements, like adding new checklist statements with their scanning methods, will extend StatWrap’s impact on reproducibility in research.&lt;/li>
&lt;/ul>
&lt;h2 id="earlier-blogs">Earlier Blogs&lt;/h2>
&lt;p>If you’re interested in seeing the project’s evolution, check out my earlier posts:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/statwrap/20240614-adi/">Intro Blog&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/statwrap/20240916-adi/">MidTerm Blog&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Thank you for reading!&lt;/p></description></item><item><title>ML-Powered Problem Detection in Chameleon</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/chameleoncloud/20241018-syed/</link><pubDate>Fri, 18 Oct 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/chameleoncloud/20241018-syed/</guid><description>&lt;p>Hello! My name is &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/syed-mohammad-qasim/">Syed Mohammad Qasim&lt;/a>, a PhD candidate at the Department of Electrical and Computer Engineering, Boston University.
This summer I worked on the project &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uchicago/ml_detect_chameleon/">ML-Powered Problem Detection in Chameleon&lt;/a>
as part of the Summer of Reproducibility (SoR) program with the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ayse-coskun/">Ayse Coskun&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/michael-sherman/">Michael Sherman&lt;/a>.&lt;/p>
&lt;p>Chameleon is an open testbed that has supported over 5,000 users working on more than 500 projects.
It provides access to over 538 bare metal nodes across various sites, offering approximately 15,000 CPU cores and 5 petabytes of storage.
Each site runs independent OpenStack services to deliver its offerings.
Currently, Chameleon Cloud comprehensively monitors the sites at the Texas Advanced Computing Center (TACC) and the University of Chicago.
Metrics are collected using Prometheus at each site and fed into a central Mimir cluster.
All logs are sent to a central Loki, with Grafana used for visualization and alerting.
Chameleon currently collects around 3,000 metrics. Manually reviewing and setting alerts for them is time-consuming and labor-intensive.
This project aims to help Chameleon operators monitor their systems more effectively and improve overall reliability by creating an anomaly detection service to augment the existing alerting framework.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="High level data flow" srcset="
/report/osre24/uchicago/chameleoncloud/20241018-syed/ad_hubea58d1d850a610a2195fe38eece12fb_58199_deb097bd50da0d94a76fc0dc7719233e.webp 400w,
/report/osre24/uchicago/chameleoncloud/20241018-syed/ad_hubea58d1d850a610a2195fe38eece12fb_58199_deeb0941e942a319e1cc5a8b743b6993.webp 760w,
/report/osre24/uchicago/chameleoncloud/20241018-syed/ad_hubea58d1d850a610a2195fe38eece12fb_58199_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/chameleoncloud/20241018-syed/ad_hubea58d1d850a610a2195fe38eece12fb_58199_deb097bd50da0d94a76fc0dc7719233e.webp"
width="760"
height="412"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>Over the summer, we focused on analyzing the data and identified 33 key metrics, after discussions with Chameleon operators, from the Prometheus Node Exporter that serve as leading indicators of resource usage on the nodes. For example:&lt;/p>
&lt;ul>
&lt;li>CPU usage: Metrics like node_load1, node_load5, and node_load15.&lt;/li>
&lt;li>Memory usage: Including buffer utilization.&lt;/li>
&lt;li>Disk usage: Metrics for I/O time, and read/write byte rates.&lt;/li>
&lt;li>Network activity: Rate of bytes received and transmitted.&lt;/li>
&lt;li>Filesystem metrics: Such as inode_utilization_ratio and node_procs_blocked.&lt;/li>
&lt;li>System-level metrics: Including node forks, context switches, and interrupts.&lt;/li>
&lt;/ul>
&lt;p>Collected at a rate of every 5 minutes, these metrics provide a comprehensive view of node performance and resource consumption.
After finalizing the metrics we wanted to monitor, we selected the following four anomaly detection methods, primarily due to their popularity in academia and recent publication in high-impact conferences such as SIG-KDD and SC.&lt;/p>
&lt;ul>
&lt;li>Omni Anomaly, [KDD 2019] [without POT selection as it requires labels.]&lt;/li>
&lt;li>USAD, [KDD 2020]&lt;/li>
&lt;li>TranAD, [KDD 2022]&lt;/li>
&lt;li>Prodigy, [SC 2023] [Only the VAE, not using their feature selection as it requires labels.]&lt;/li>
&lt;/ul>
&lt;p>We collected 75 days of healthy data from Chameleon, and after applying min-max scaling, we trained the models.
We then used these models to run inference on the metrics collected during outages, as marked by Chameleon operators.
The goal was to determine whether the outage data revealed something interesting or anomalous.
We can verify our approach by manually reviewing the results generated by these four anomaly detection methods.
Below are the results from the four methods on different outages, followed by an example of how these methods identified the root cause of an anomaly.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Resulsts of different approaches" srcset="
/report/osre24/uchicago/chameleoncloud/20241018-syed/comparison_plot_huca4b68c1c2625c3b2e86230c54612ea2_129034_adb242a18524d714dae87d46b29e1612.webp 400w,
/report/osre24/uchicago/chameleoncloud/20241018-syed/comparison_plot_huca4b68c1c2625c3b2e86230c54612ea2_129034_9dcdbbc6bac285c06195f54d49bd5ffe.webp 760w,
/report/osre24/uchicago/chameleoncloud/20241018-syed/comparison_plot_huca4b68c1c2625c3b2e86230c54612ea2_129034_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/chameleoncloud/20241018-syed/comparison_plot_huca4b68c1c2625c3b2e86230c54612ea2_129034_adb242a18524d714dae87d46b29e1612.webp"
width="760"
height="355"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>The above figure shows the percentage of outage data that was flagged as anomalous by different models.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="cause of anomaly according to each model" srcset="
/report/osre24/uchicago/chameleoncloud/20241018-syed/partial-authentication-outage_plot_hu92456580e56fddf4f3c592621d13c105_392593_6edd22782678b48ce3a7cebad859b982.webp 400w,
/report/osre24/uchicago/chameleoncloud/20241018-syed/partial-authentication-outage_plot_hu92456580e56fddf4f3c592621d13c105_392593_3da0e020cdc4ddfd508b77b6a0adc3d2.webp 760w,
/report/osre24/uchicago/chameleoncloud/20241018-syed/partial-authentication-outage_plot_hu92456580e56fddf4f3c592621d13c105_392593_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/chameleoncloud/20241018-syed/partial-authentication-outage_plot_hu92456580e56fddf4f3c592621d13c105_392593_6edd22782678b48ce3a7cebad859b982.webp"
width="760"
height="532"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="cause of anomaly according to each model" srcset="
/report/osre24/uchicago/chameleoncloud/20241018-syed/chiuc-uplink-networking_plot_hu92456580e56fddf4f3c592621d13c105_376789_03e6f344d24d9b37a7d615ee3207586b.webp 400w,
/report/osre24/uchicago/chameleoncloud/20241018-syed/chiuc-uplink-networking_plot_hu92456580e56fddf4f3c592621d13c105_376789_6193a4b7b9107cb2693435514d80d21d.webp 760w,
/report/osre24/uchicago/chameleoncloud/20241018-syed/chiuc-uplink-networking_plot_hu92456580e56fddf4f3c592621d13c105_376789_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/chameleoncloud/20241018-syed/chiuc-uplink-networking_plot_hu92456580e56fddf4f3c592621d13c105_376789_03e6f344d24d9b37a7d615ee3207586b.webp"
width="760"
height="532"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>The above two plots shows two examples of the top 5 metrics which contributed to the anomaly score by each anomaly detection model.&lt;/p>
&lt;p>Although the methods seem to indicate anomalies during outages, they are not able to pinpoint the affected service or the exact cause.
For example, the first partial authentication outage was due to a DNS error, which can manifest in various ways, such as reduced CPU, memory, or network usage.
This work is still in progress, and we are conducting the same analysis on container-level metrics for each service, allowing us to narrow the scope to the affected service and more effectively identify the root cause of anomalies.
We will share the next set of results soon.&lt;/p>
&lt;p>Thanks for your time, please feel free to reach out to me for any details or questions.&lt;/p></description></item><item><title>Data Leakage in Applied ML: model uses features that are not legitimate</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/nyu/data-leakage/20240924-shaivimalik/</link><pubDate>Tue, 24 Sep 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/nyu/data-leakage/20240924-shaivimalik/</guid><description>&lt;p>Hello everyone!&lt;/p>
&lt;p>I have been working on reproducing the results from &lt;strong>Identification of COVID-19 Samples from Chest X-Ray Images Using Deep Learning: A Comparison of Transfer Learning Approaches&lt;/strong>. This study aimed to distinguish COVID-19 cases from normal and pneumonia cases using chest X-ray images. Since my last blog post, we have successfully reproduced the results using the VGG19 model, achieving a 92% accuracy on the test set. However, a significant demographic inconsistency exists: normal and pneumonia chest X-ray images were from pediatric patients, while COVID-19 chest X-ray images were from adults. This allowed the model to achieve high accuracy by learning features that were not clinically relevant.&lt;/p>
&lt;p>In &lt;a href="https://github.com/shaivimalik/covid_illegitimate_features/blob/main/notebooks/Correcting_Original_Result.ipynb" target="_blank" rel="noopener">Reproducing “Identification of COVID-19 samples from chest X-Ray images using deep learning: A comparison of transfer learning approaches” without Data Leakage&lt;/a>, we followed the methodology outlined in the paper, but with a key change: we used datasets containing adult chest X-ray images. This time, the model achieved an accuracy of 51%, a 41% drop from the earlier results, confirming that the metrics reported in the paper were overly optimistic due to data leakage, where the model learned illegitimate features.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="GradCAM from husky vs wolf example " srcset="
/report/osre24/nyu/data-leakage/20240924-shaivimalik/gradcam_hu02772b80d1d95ff5ae817af6261a6059_438521_7bc94e0816aa962665434756bf41e27d.webp 400w,
/report/osre24/nyu/data-leakage/20240924-shaivimalik/gradcam_hu02772b80d1d95ff5ae817af6261a6059_438521_a160058d1708baa257daa63de5fada34.webp 760w,
/report/osre24/nyu/data-leakage/20240924-shaivimalik/gradcam_hu02772b80d1d95ff5ae817af6261a6059_438521_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/nyu/data-leakage/20240924-shaivimalik/gradcam_hu02772b80d1d95ff5ae817af6261a6059_438521_7bc94e0816aa962665434756bf41e27d.webp"
width="760"
height="329"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>To further illustrate this issue, we created a &lt;a href="https://github.com/shaivimalik/covid_illegitimate_features/blob/main/notebooks/Exploring_ConvNet_Activations.ipynb" target="_blank" rel="noopener">toy example&lt;/a> demonstrating how a model can learn illegitimate features. Using a small dataset of wolf and husky images, the model achieved an accuracy of 90%. We then revealed that this performance was due to a data leakage issue: all wolf images had snowy backgrounds, while husky images had grassy backgrounds. When we trained the model on a dataset where both wolf and husky images had white backgrounds, the accuracy dropped to 70%. This shows that the accuracy obtained earlier was an overly optimistic measure due to data leakage.&lt;/p>
&lt;p>You can explore our work on the COVID-19 paper &lt;a href="https://github.com/shaivimalik/covid_illegitimate_features" target="_blank" rel="noopener">here&lt;/a>.&lt;/p>
&lt;p>Lastly, I would like to thank &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fraida-fund/">Fraida Fund&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mohamed-saeed/">Mohamed Saeed&lt;/a> for their support and guidance throughout my SoR journey.&lt;/p></description></item><item><title>Towards Scalable Performance Benchmarking of Genomics Workflows</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uga/genomicswf/20240919-martinputra/</link><pubDate>Thu, 19 Sep 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uga/genomicswf/20240919-martinputra/</guid><description>&lt;h2 id="project-background">Project Background&lt;/h2>
&lt;p>Optimizing genomics workflows execution on a large-scale &amp;amp; heterogeneous cluster requires in-depth understanding of resource requirement and utilization pattern of each application in the workflows. Such information can be obtained by using a benchmarking tool. However, performance data generated by such tool should represent the scale of its target system, lest the design decisions made from it is misguided. My project aims to build &lt;em>GenScale&lt;/em>, the first benchmarking tool which can rapidly generate genomics workload performance data at the scale representative of production systems.&lt;/p>
&lt;p>As Summer of Reproduciblity (SoR) 2024 comes to an end, I took the time to reflect on my time working on GenScale, the challenges I faced, and the future works &amp;amp; impacts I hope &lt;em>GenScale&lt;/em> create for our community.&lt;/p>
&lt;h2 id="milestones--challenges">Milestones &amp;amp; Challenges&lt;/h2>
&lt;p>The time I spent working on &lt;em>GenScale&lt;/em> during SoR can be classified into three phases:&lt;/p>
&lt;p>&lt;strong>1. Per-Application Container &amp;amp; Input Creation.&lt;/strong>&lt;/p>
&lt;p>Containerization is the current de-facto standard for genomics workflow execution, thus I designed &lt;em>GenScale&lt;/em> to execute applications as containers. This requires me to package each application included in the benchmark as a container. I use state-of-art DNA-Seq &amp;amp; RNA-Seq alignment workflows as references for the list of applications &amp;amp; workflow structure. The container images &amp;amp; source files I created are publicy available in GitHub &lt;a href="#deliverables">(Deliverables #1)&lt;/a>&lt;/p>
&lt;p>I also prepare sample inputs for each application to ease the burden of users who do not have sufficient familiarity with genomics applications. The effort is not trivial, because in a workflow, the inputs for a certain step depend on the outputs of previous step(s). Simply speaking, to prepare inputs for the last application in a workflow, we need to get the outputs of applications executed before it, which also requires the outputs of another set of applications, and so on until we arrive at the beginning of workflow. This translates into significant manual labor of carefully tracing &amp;amp; collecting intermediate files from each step of the reference workflows.&lt;/p>
&lt;p>All inputs are hosted in a public Google Drive and ChameleonCloud object store &lt;a href="#deliverables">(Deliverables #2)&lt;/a>. In total, I prepared containers and inputs for 7 popular genomics applications: BWA, FastQC, Fastq Cleaner, GATK, Picard, STAR, and Trimmomatic.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:center">
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="" srcset="
/report/osre24/uga/genomicswf/20240919-martinputra/genscale-stack_hu2d5cfbf95523918b0bcbd89f95a37c1b_91166_5d15908a9f03f47b787a549dbd280a24.webp 400w,
/report/osre24/uga/genomicswf/20240919-martinputra/genscale-stack_hu2d5cfbf95523918b0bcbd89f95a37c1b_91166_b606b0529a38b68c5979566b35e267ed.webp 760w,
/report/osre24/uga/genomicswf/20240919-martinputra/genscale-stack_hu2d5cfbf95523918b0bcbd89f95a37c1b_91166_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uga/genomicswf/20240919-martinputra/genscale-stack_hu2d5cfbf95523918b0bcbd89f95a37c1b_91166_5d15908a9f03f47b787a549dbd280a24.webp"
width="760"
height="353"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:center">&lt;strong>Figure 1.&lt;/strong> Production-grade softwares used in GenScale: Kubernetes for task orchestration, and Prometheus + Grafana for real-time resource monitoring.&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>&lt;strong>2. Components Development.&lt;/strong>&lt;/p>
&lt;p>In this phase, &lt;em>GenScale&lt;/em> main components were developed. &lt;em>GenScale&lt;/em> consists of three components: (a) Workflow Manager, (b) Task Orchestrator, and (c) Resource Monitor. The Workflow Manager is built from scratch to allow high degree of freedom when scheduling workflows. I use industry-grade solutions for the other components, namely Kubernetes for orchestrating tasks / containers, and Prometheus + Grafana for real-time resource monitoring. My deliverables include semi-automatic installation scripts &amp;amp; easy-to-follow instructions to set up all three components. &lt;a href="#deliverables">(Deliverables #3)&lt;/a>&lt;/p>
&lt;p>&lt;strong>3. Performance Data Generation.&lt;/strong>&lt;/p>
&lt;p>The last phase is to use &lt;em>GenScale&lt;/em> prototype to generate performance data of each application. I focused on collecting data for three types of resources: compute (CPU utilization), memory (resident set size), and I/O (read &amp;amp; write operations over time). &lt;em>GenScale&lt;/em> export these information into a single CSV file to facilitate easy analysis. My deliverables include performance data for DNA-Seq and RNA-Seq workflows. I also provide a sample Python Notebook which analyzes the CPU utilization pattern of each application in DNA-Seq workflow. &lt;a href="#deliverables">(Deliverables #4)&lt;/a>&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:center">
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="" srcset="
/report/osre24/uga/genomicswf/20240919-martinputra/dnaseq-cpu_util_hu80d53d27b8c7b822ba2a4a4a343ec503_499906_9d39d7375c21c3eae305d20af9a8b7ee.webp 400w,
/report/osre24/uga/genomicswf/20240919-martinputra/dnaseq-cpu_util_hu80d53d27b8c7b822ba2a4a4a343ec503_499906_b8d8ac52b9cb53496558934c8a2b441b.webp 760w,
/report/osre24/uga/genomicswf/20240919-martinputra/dnaseq-cpu_util_hu80d53d27b8c7b822ba2a4a4a343ec503_499906_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uga/genomicswf/20240919-martinputra/dnaseq-cpu_util_hu80d53d27b8c7b822ba2a4a4a343ec503_499906_9d39d7375c21c3eae305d20af9a8b7ee.webp"
width="760"
height="614"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:center">&lt;strong>Figure 2.&lt;/strong> CPU utilization pattern of 9 applications in DNA-Seq Alignment workflow collected by &lt;em>GenScale&lt;/em>. &lt;strong>y-axis&lt;/strong>: &lt;em>(num. cores) x 100%&lt;/em>, &lt;strong>x-axis&lt;/strong>: time elapsed in seconds.&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="deliverables">Deliverables&lt;/h2>
&lt;p>This project&amp;rsquo;s deliverables can be found in the following Github repo: &lt;a href="https://github.com/martinluttap/sor24-genscale/tree/main" target="_blank" rel="noopener">https://github.com/martinluttap/sor24-genscale/tree/main&lt;/a>. In summary, the deliverables include:&lt;/p>
&lt;ol>
&lt;li>Container Images&lt;/li>
&lt;li>Input Dataset&lt;/li>
&lt;li>Source Code&lt;/li>
&lt;li>Performance Data &amp;amp; Sample Analysis Notebook&lt;/li>
&lt;/ol>
&lt;h2 id="future-works-broader-impacts">Future Works, Broader Impacts&lt;/h2>
&lt;p>Understanding workload characteristics is a crucial step for designing efficient scheduling policy &amp;amp; resource management techniques. &lt;em>GenScale&lt;/em> and the performance data it can generate might be a starting point for such effort. Furthermore, I hope &lt;em>GenScale&lt;/em> will catalyze meaningful engagements between the computer systems community and bioinformatics community. I believe state-of-arts systems techniques can greatly aid the computing efforts among bioinformatics community. Similarly, domain-specific knowledge &amp;amp; problems within bioinformatics provide unique grounds for the systems community to further advance their field.&lt;/p></description></item><item><title>[Final] ScaleRep: Reproducing and benchmarking scalability bugs hiding in cloud systems</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/osu/scalerep/20240918-imzahra/</link><pubDate>Wed, 18 Sep 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/osu/scalerep/20240918-imzahra/</guid><description>&lt;p>Hello everyone,&lt;/p>
&lt;p>In my SoR 2024 project, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/osu/scalerep/">ScaleRep project&lt;/a> for SoR 2024 under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/bogdan-bo-stoica/">Bogdan &amp;quot;Bo&amp;quot; Stoica&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/yang-wang/">Yang Wang&lt;/a>. I’m excited to share the final progress and insights we’ve gathered on tackling scalability bugs in large-scale distributed systems. I aimed to tackle the reproducibility challenges posed by scalability bugs in large-scale distributed systems. Below is a detailed summary of the investigations and findings we&amp;rsquo;ve conducted on scalability bugs in large-scale distributed systems.&lt;/p>
&lt;h2 id="project-overview">Project Overview&lt;/h2>
&lt;p>As you may recall, our project, ScaleRep, aimed to tackle the challenge of scalability bugs—those insidious issues that often arise in large-scale distributed systems under heavy workloads. These bugs, when triggered, can lead to significant system issues such as downtime, performance bottlenecks, and even data loss. They are particularly difficult to catch using traditional testing methods.&lt;/p>
&lt;p>Our primary focus was on reproducing these bugs, documenting the challenges involved, and providing insights into how these bugs manifest under various conditions. This documentation will help researchers identify, benchmark, and resolve similar issues in the future.&lt;/p>
&lt;h2 id="progress">Progress&lt;/h2>
&lt;p>Since the midterm update, several Apache Ignite bugs have been investigated, some of which have been successfully reproduced and uploaded to Trovi for the research community to access and reuse. Below is the progress on the bugs investigated:&lt;/p>
&lt;h3 id="bugs-investigated">Bugs Investigated&lt;/h3>
&lt;ol>
&lt;li>&lt;strong>&lt;a href="https://issues.apache.org/jira/browse/IGNITE-20614" target="_blank" rel="noopener">IGNITE-20614&lt;/a>&lt;/strong>&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://issues.apache.org/jira/browse/IGNITE-17407" target="_blank" rel="noopener">IGNITE-17407&lt;/a>&lt;/strong>&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://issues.apache.org/jira/browse/IGNITE-20602" target="_blank" rel="noopener">IGNITE-20602&lt;/a>&lt;/strong>&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://issues.apache.org/jira/browse/IGNITE-16600" target="_blank" rel="noopener">IGNITE-16600&lt;/a>&lt;/strong>&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://issues.apache.org/jira/browse/IGNITE-16072" target="_blank" rel="noopener">IGNITE-16072&lt;/a>&lt;/strong>&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://issues.apache.org/jira/browse/IGNITE-16582" target="_blank" rel="noopener">IGNITE-16582&lt;/a>&lt;/strong>&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://issues.apache.org/jira/browse/IGNITE-16581" target="_blank" rel="noopener">IGNITE-16581&lt;/a>&lt;/strong>&lt;/li>
&lt;/ol>
&lt;h2 id="key-insights--challenges">Key Insights &amp;amp; Challenges&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>Complexity of Scalability Bugs
Many scalability bugs involve subtle and complex interactions that are not easily detected in standard testing environments. For instance, IGNITE-20602 only manifested under certain high-load conditions and required a specific workload and environment to reliably trigger the issue. This highlights the importance of large-scale testing when investigating scalability issues.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Dependency and Documentation Gaps
We encountered significant challenges with outdated dependencies and incomplete documentation, particularly in older bugs like IGNITE-16072. In these cases, reproducing the bug required extensive modifications or wasn’t feasible without investing disproportionate effort in updating dependencies.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Effectiveness of Trovi and Chameleon
Packaging and sharing our reproducible investigations through Trovi and Chameleon have proven highly effective. By providing researchers with pre-configured environments and detailed documentation, we’ve laid the groundwork for future collaboration and further research on these bugs. We expect this to greatly benefit others attempting to reproduce similar issues.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Impact of Speed-Based Throttling
Our investigation into IGNITE-16600 revealed several important insights into speed-based throttling and its impact on system performance under high-load conditions. By analyzing the checkpoint starvation and thread throttling mechanisms, we were able to identify areas for improvement in the latest Ignite releases.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;p>Expanding Collaboration: The packaged bugs and replayable Trovi experiments will be made available to the broader research community, encouraging further investigation and enhancements to large-scale distributed systems.&lt;/p>
&lt;p>The ScaleRep project has been an exciting journey into the world of scalability bugs, pushing the boundaries of what’s possible in terms of reproducibility and benchmarking. Through this project, we’ve demonstrated the importance of rigorous testing and comprehensive documentation in improving the reliability of distributed systems.&lt;/p></description></item><item><title>Final Blog: Enhancing User Experience Reproducibility through TROVI Redesign</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/chameleontroviredesign/20240918-aliciaem/</link><pubDate>Wed, 18 Sep 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/chameleontroviredesign/20240918-aliciaem/</guid><description>&lt;p>Hello! My name is &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/alicia-esquivel-morel/">Alicia Esquivel Morel&lt;/a>, and I&amp;rsquo;m a graduate research assistant at the University of Missouri – Columbia, pursuing a PhD in Computer Science. This summer, I worked on a project to &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uchicago/trovi/">improve user experience reproducibility through a redesign of TROVI&lt;/a>, as part of the Summer of Reproducibility (SoR) program.&lt;/p>
&lt;p>Before even starting this project, and me as a rising researcher, I always saw reproducibility as one of the biggest challenges in research. What I wanted to see was always as reproducibility—being able to consistently replicate experiments and share them in a way that others can follow.&lt;/p>
&lt;p>&lt;strong>TROVI&lt;/strong>, is a platform designed to help with this. However, as I joined the project, I knew it had room for improvement, not oly in the user interface, but also in the ease of integrating code and data.&lt;/p>
&lt;p>This project aimed to address these challenges by redesigning TROVI to streamline experiment replication, making the platform more intuitive and accessible. The goal was simple: create a user-friendly experience that eliminates confusion and frustration, allowing researchers to focus on their work instead of the technical aspects of running a research experiment.&lt;/p>
&lt;h2 id="our-goals-in-the-beginning-of-the-summer">Our goals in the beginning of the summer:&lt;/h2>
&lt;ul>
&lt;li>We wanted to simplify TROVI’s interface for intuitive navigation, inspired by platforms like Google Colab.&lt;/li>
&lt;li>We wanted to make uploading and sharing code and data easier, with seamless integration with tools like GitHub.&lt;/li>
&lt;li>We wanted to create a mechanism for users to provide feedback, allowing TROVI to evolve based on real user needs.&lt;/li>
&lt;/ul>
&lt;h2 id="how-was-the-progress-and-what-we-have-achieved">How was the progress and what we have achieved&lt;/h2>
&lt;p>I started by conducting thorough UX research and a literature review on reproducibility platforms, establishing a solid foundation for the redesign. With user feedback guiding the process, I created wireframes and low-fidelity prototypes, focusing on making the platform more intuitive.&lt;/p>
&lt;p>As the project progressed, I built a higher-fidelity prototype that connected various components of the platform, ensuring a seamless user journey. I then tackled the back-end integration, which tied together the front-end flows with TROVI’s API.&lt;/p>
&lt;p>Throughout this project, I received &lt;strong>valuable support and guidance from my mentors&lt;/strong>. &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mark-powers/">Mark Powers&lt;/a> walked me through TROVI’s architecture and helped me understand exactly what was needed for a successful redesign. Thanks to his mentorship, I not only completed the project but learned a great deal along the way. Thanks &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mark-powers/">Mark Powers&lt;/a>!!&lt;/p>
&lt;p>Through iterations and feedback from initial user testing, and we the help of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/kate-keahey/">Kate Keahey&lt;/a>, I refined the design to ensure it met the needs of the research community. By the end of the program, TROVI had evolved into a cohesive, user-friendly platform that leads to enhanced experiment reproducibility.&lt;/p>
&lt;h2 id="accomplishments">Accomplishments&lt;/h2>
&lt;ul>
&lt;li>A simplified interface that makes navigating, uploading, and collaborating much easier.&lt;/li>
&lt;li>GitHub integration that streamlines the process of sharing code and data with collaborators.&lt;/li>
&lt;li>A built-in feedback loop that enables TROVI to grow with its users, adapting to their needs as they arise.&lt;/li>
&lt;/ul>
&lt;p>The platform is also getting ready to move into &lt;strong>production&lt;/strong> and will soon be available for the research community.&lt;/p>
&lt;h2 id="whats-next">What’s Next?&lt;/h2>
&lt;p>While the core objectives have been successfully met, future improvements could further enhance the platform&amp;rsquo;s capabilities, such as additional integrations and more advanced collaboration features. User testing will continue to provide insights for ongoing development.&lt;/p>
&lt;p>I&amp;rsquo;m grateful for this opportunity! Thank you for following along!&lt;/p></description></item><item><title>[MidTerm] StatWrap: Automated Reproducibility Checklists Generation</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/statwrap/20240916-adi/</link><pubDate>Mon, 16 Sep 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/statwrap/20240916-adi/</guid><description>&lt;p>Namaste🙏🏻! I&amp;rsquo;m &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/adi-akhilesh-singh/">Adi Akhilesh Singh&lt;/a>, and I&amp;rsquo;m excited to share progress updates on the &lt;a href="https://drive.google.com/file/d/1xV7eHL9lIWGKueQJxBks6OB_rcXCr8JY/view?usp=sharing" target="_blank" rel="noopener">Reproducibility Checklists project&lt;/a> by StatWrap, under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/luke-rasmussen/">Luke Rasmussen&lt;/a>.&lt;/p>
&lt;h2 id="project-overview">Project Overview&lt;/h2>
&lt;p>The project aims to integrate customizable reproducibility checklists into StatWrap, using metadata and user input to automate their generation. The goal is to enhance the reproducibility of research projects by providing researchers with structured and comprehensive checklists to ensure their work is reproducible.&lt;/p>
&lt;h2 id="progress">Progress&lt;/h2>
&lt;p>Over the past few months, my mentors and I have worked on developing the interface for the checklists page and designed key components to support our project goals. We’ve implemented logic that iterates over each checklist item, displaying its statement along with Boolean controls (Yes/No buttons) for user interaction.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Checklists Page" srcset="
/report/osre24/ucsc/statwrap/20240916-adi/checklist1_hu49ca38eb7e3448bf4ed2dfab22f3668a_108784_5fcb2c29a07fa3c85a9668932f8201f8.webp 400w,
/report/osre24/ucsc/statwrap/20240916-adi/checklist1_hu49ca38eb7e3448bf4ed2dfab22f3668a_108784_0a0b05429d19c5fac3cf4bd2f233cd58.webp 760w,
/report/osre24/ucsc/statwrap/20240916-adi/checklist1_hu49ca38eb7e3448bf4ed2dfab22f3668a_108784_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/statwrap/20240916-adi/checklist1_hu49ca38eb7e3448bf4ed2dfab22f3668a_108784_5fcb2c29a07fa3c85a9668932f8201f8.webp"
width="760"
height="416"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>We’ve also developed components to display attached images and URLs linked to each checklist item. Additionally, we’ve integrated a notes feature that allows users to add, edit, and view project-related notes. Currently, we are writing methods to integrate real-time project data into the checklists. For example, one method we’ve implemented scans project files (assets) to detect the languages used.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Checklists Details" srcset="
/report/osre24/ucsc/statwrap/20240916-adi/checklist2_hu44aad6e1d0078aeaafbbf946cadf1130_201385_ed67e0f337158d4ded72db604d4b14df.webp 400w,
/report/osre24/ucsc/statwrap/20240916-adi/checklist2_hu44aad6e1d0078aeaafbbf946cadf1130_201385_c2fdd3677c5b7099074f7846753c15ff.webp 760w,
/report/osre24/ucsc/statwrap/20240916-adi/checklist2_hu44aad6e1d0078aeaafbbf946cadf1130_201385_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/statwrap/20240916-adi/checklist2_hu44aad6e1d0078aeaafbbf946cadf1130_201385_ed67e0f337158d4ded72db604d4b14df.webp"
width="760"
height="416"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="whats-next">What&amp;rsquo;s Next?&lt;/h2>
&lt;p>As we move closer to the final evaluation phase, our focus will be on the following objectives:&lt;/p>
&lt;ul>
&lt;li>Implement methods for each checklist item, integrating real-time data from the project data to auto-populate checklist answers.&lt;/li>
&lt;li>Enhance the &lt;code>Attached Images&lt;/code> component to allow users to select and attach existing image assets from the project.&lt;/li>
&lt;li>Display the results of the scans for each checklist item, providing users with detailed outputs based on the automated analysis.&lt;/li>
&lt;/ul>
&lt;p>Stay tuned for further updates as we continue developing this feature set! 🚀&lt;/p></description></item><item><title>Final Post: Enhancing Reproducibility and Portability in Network Experiments</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/tum/slices/20240905-warmuth/</link><pubDate>Thu, 05 Sep 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/tum/slices/20240905-warmuth/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>As my project with the Summer of Reproducibility (SoR) 2024 comes to a close, I’d like to reflect on the journey and the outcomes achieved. My project focused on &lt;strong>enhancing the reproducibility and portability of network experiments&lt;/strong> by integrating the &lt;strong>RO-Crate standard&lt;/strong> into the &lt;strong>TUM intern testbed pos (plain orchestrating service)&lt;/strong>, and deploying this testbed on the &lt;strong>Chameleon cloud infrastructure&lt;/strong>. The aim was to ensure that experiments conducted on one platform could be seamlessly reproduced on another, adhering to the &lt;strong>FAIR principles&lt;/strong> (Findable, Accessible, Interoperable, Reusable) for research data.&lt;/p>
&lt;h2 id="project-recap">Project Recap&lt;/h2>
&lt;p>The core goal was to make the experiments reproducible and portable between different testbeds like TUM’s pos and Chameleon. To achieve this, I integrated the &lt;strong>RO-Crate standard&lt;/strong>, which ensures that all experiment data is automatically documented and stored with metadata, making it easier for others and especially for machines to understand, replicate, and build on the results. Additionally, deploying a lightweight version of pos on the &lt;strong>Chameleon testbed&lt;/strong> enabled cross-testbed execution, allowing experiments to be replicated across both environments without significant modifications.&lt;/p>
&lt;h2 id="key-achievements">Key Achievements&lt;/h2>
&lt;p>Over the course of the project, several key milestones were achieved:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>RO-Crate Integration&lt;/strong>: The first step was restructuring the results folder and automating the generation of metadata using RO-Crate. This ensured that all experiment data was comprehensively documented with details like author information, hardware configurations, and experiment scripts resulting in comprehensive &lt;code>ro-crate-metadata.json&lt;/code> files as important part of each result folder.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Improved Data Management&lt;/strong>: The integration of RO-Crate greatly simplified the process of organizing and retrieving experiment data and metadata with information about the experiment and the result files. All metadata was automatically generated, making it easier to share and document the experiments for other researchers to replicate.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Automatic Upload to Zenodo&lt;/strong>: Another crucial achievement was the implementation of automatic uploading of pos experiment result folders to &lt;strong>Zenodo&lt;/strong>, an open-access repository. This step significantly improved the reproducibility and sharing of experiment results, making them easily accessible to the broader scientific community. By utilizing Zenodo, we ensured that experiment results, along with their RO-Crate metadata, could be archived and referenced, fostering greater transparency and collaboration in scientific research.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Chameleon Deployment&lt;/strong>: Deploying the pos testbed within the Chameleon environment required managing various complexities, particularly related to Chameleon’s OpenStack API, networking setup, and hardware configurations. Coordinating the network components and infrastructure to support pos functionality in this testbed environment demanded significant adjustments to ensure smooth integration and operation.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="challenges">Challenges&lt;/h2>
&lt;p>Like any project, this one came with its own set of challenges:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Balancing Automation and Flexibility&lt;/strong>: While automating the generation of RO-Crate metadata, it was crucial to ensure that the flexibility required by researchers for customizing their documentation was not compromised. Finding this balance required in-depth adjustments to the testbed infrastructure.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Complexity of Testbed Systems&lt;/strong>: Integrating RO-Crate into a complex system like pos, and ensuring it works seamlessly with Chameleon, involved understanding and adapting to the complexities of both testbeds.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="future-directions">Future Directions&lt;/h2>
&lt;p>As I move forward with my master&amp;rsquo;s thesis working on these challenges, we plan to expand on this work by:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Extending the Chameleon Deployment&lt;/strong>: We aim to deploy the full version of pos on Chameleon, supporting more complex and larger-scale experiments.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Supporting Complex Experiment Workflows&lt;/strong>: Future work will focus on handling more intricate and larger datasets, ensuring reproducibility for complex workflows. Only by executing more complex experiments will we be able to thoroughly analyze and compare the differences between executions in pos and the pos deployed on Chameleon, helping us better understand the impact of different testbed environments on experiment outcomes.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Automation&lt;/strong>: The ultimate goal is to fully automate the process of experiment execution, result documentation, and sharing across testbeds, reducing manual intervention and further enhancing reproducibility.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="reflections">Reflections&lt;/h2>
&lt;p>By integrating the RO-Crate standard and deploying pos on the Chameleon testbed, we have made significant steps toward enhancing the reproducibility, accessibility, and portability of network experiments across research platforms. These efforts contribute to more shareable, and replicable research processes in the scientific community.&lt;/p>
&lt;p>I am excited about the future work ahead and am grateful for the mentorship and support I received during this project.&lt;/p>
&lt;h2 id="deliverables-and-availability">Deliverables and Availability&lt;/h2>
&lt;p>Due to the current non-public status of the pos framework, &lt;strong>the code and deliverables are not publicly available&lt;/strong> at the moment.&lt;/p>
&lt;h2 id="previous-blogs">Previous Blogs&lt;/h2>
&lt;p>Make sure to check out my other blogs to see how I started this project and the challenges I faced along the way:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/tum/slices/20240517-warmuth/">Introduction&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/tum/slices/20240716-warmuth/">Midterm Blog&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Servus!&lt;/p></description></item><item><title>Understanding Data Leakage in Machine Learning: A Focus on TF-IDF</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/nyu/data-leakage/20240905-kyrillosishak/</link><pubDate>Thu, 05 Sep 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/nyu/data-leakage/20240905-kyrillosishak/</guid><description>&lt;p>Hello again!&lt;/p>
&lt;p>This is my final blog post, and I will be discussing the second material I created for the 2024 Summer of Reproducibility Fellowship. As you may recall from my first post, I am working on the &lt;strong>Exploring Data Leakage in Applied ML: Reproducing Examples of Irreproducibility&lt;/strong> project with &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fraida-fund/">Fraida Fund&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mohamed-saeed/">Mohamed Saeed&lt;/a> as my mentors.&lt;/p>
&lt;p>This blog post will explore how data leakage can occur during feature extraction, particularly with the commonly used &lt;strong>TF-IDF&lt;/strong> vectorizer, and its impact on model generalization.&lt;/p>
&lt;h1 id="introduction">Introduction&lt;/h1>
&lt;p>In machine learning, data leakage is a critical issue that can severely impact model performance. It occurs when information from outside the training dataset is improperly used to create the model, leading to overly optimistic performance during evaluation. One common source of leakage comes from how features, such as those extracted using &lt;strong>TF-IDF&lt;/strong> (Term Frequency-Inverse Document Frequency), are handled. In this post, we&amp;rsquo;ll explore how data leakage can happen during feature extraction with TF-IDF and how it affects model accuracy.&lt;/p>
&lt;h1 id="what-is-tf-idf">What is TF-IDF?&lt;/h1>
&lt;p>TF-IDF is a method used to evaluate how important a word is in a document relative to a collection of documents. It consists of two components:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Term Frequency (TF)&lt;/strong>: Measures how frequently a term appears in a document.&lt;/li>
&lt;li>&lt;strong>Inverse Document Frequency (IDF)&lt;/strong>: Reduces the importance of terms that appear frequently across many documents.&lt;/li>
&lt;/ol>
&lt;p>Together, they provide a weighted value for each word, reflecting its importance relative to the dataset.&lt;/p>
&lt;h1 id="how-data-leakage-occurs-with-tf-idf">How Data Leakage Occurs with TF-IDF&lt;/h1>
&lt;p>Data leakage with TF-IDF happens when the inverse document frequency (IDF) is calculated using the entire dataset (including the test set) before splitting it into training and test sets. This means the model has access to information from the test set during training, leading to artificially inflated results. This is a subtle form of data leakage, as it often goes unnoticed.&lt;/p>
&lt;p>For example, when calculating the TF-IDF score, if the word &amp;ldquo;banana&amp;rdquo; appears more frequently in the test set but is considered during training, the model downplays its significance. As a result, the model may fail to predict correctly when &amp;ldquo;banana&amp;rdquo; is important in the test data.&lt;/p>
&lt;h1 id="why-does-this-matter">Why Does This Matter?&lt;/h1>
&lt;p>If the test data is included when calculating the IDF, the model gains unintended insight into the test set&amp;rsquo;s word distribution. In real-world scenarios, the test data is supposed to be unseen during training. By allowing the model to see this information, you&amp;rsquo;re essentially reducing the uncertainty that the model should have about future data.&lt;/p>
&lt;h1 id="impact-of-data-leakage-on-model-performance">Impact of Data Leakage on Model Performance&lt;/h1>
&lt;p>Let&amp;rsquo;s consider two cases to understand the impact of data leakage in detail:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>When a word is rare in the training set but common in the test set&lt;/strong>: The model will underestimate the importance of this word during training, leading to poor performance when the word is critical in test documents.&lt;/li>
&lt;li>&lt;strong>When a word is common in the training set but rare in the test set&lt;/strong>: The model will overemphasize the word during training, leading to poor predictions when the word doesn’t appear as often in unseen data.&lt;/li>
&lt;/ol>
&lt;h3 id="case-study-data-leakage-in-tf-idf">Case Study: Data Leakage in TF-IDF&lt;/h3>
&lt;p>To see this effect in action, consider a small toy dataset where the presence of the word &amp;ldquo;banana&amp;rdquo; determines the label. If the word &amp;ldquo;banana&amp;rdquo; appears in a sentence, the label is 1; otherwise, the label is 0. Using &lt;strong>TF-IDF&lt;/strong> to vectorize the text, we train a machine learning model to predict this label.&lt;/p>
&lt;p>In the &lt;strong>first scenario&lt;/strong>, we calculate the &lt;strong>TF-IDF&lt;/strong> using the entire dataset before splitting it into training and testing sets. This causes data leakage since the model now knows the distribution of words across both sets. For instance, if &amp;ldquo;banana&amp;rdquo; is more common in the test set than the training set, the &lt;strong>IDF&lt;/strong> score for &amp;ldquo;banana&amp;rdquo; will be lower across the entire dataset, leading the model to downplay its importance.&lt;/p>
&lt;p>In the &lt;strong>second scenario&lt;/strong>, we calculate &lt;strong>TF-IDF&lt;/strong> only on the training set, ensuring that the test set remains unseen. This preserves the integrity of the test set, giving us a more realistic evaluation of the model&amp;rsquo;s performance.&lt;/p>
&lt;p>In both scenarios, the model&amp;rsquo;s accuracy is drastically different. When leakage is present, performance is artificially high during training but poor when tested on unseen data. Without leakage, the model generalizes better, as it is evaluated on truly unseen data.&lt;/p>
&lt;h1 id="avoiding-data-leakage">Avoiding Data Leakage&lt;/h1>
&lt;p>Avoiding data leakage is essential for building reliable machine learning models that generalize well to new data. Here are a few guidelines to help prevent leakage:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Split the dataset before feature extraction&lt;/strong>: Always divide your data into training and test sets before applying any feature engineering techniques.&lt;/li>
&lt;li>&lt;strong>Ensure proper cross-validation&lt;/strong>: When using cross-validation, ensure that the training and test splits do not overlap in any way that can leak information between them.&lt;/li>
&lt;li>&lt;strong>Be cautious with time-series data&lt;/strong>: In time-series models, avoid using future data to predict past events, as this can lead to leakage.&lt;/li>
&lt;/ol>
&lt;h1 id="conclusion">Conclusion&lt;/h1>
&lt;p>Avoiding data leakage is crucial for building robust machine learning models. In the case of TF-IDF, ensuring that feature extraction is done &lt;strong>only on the training set&lt;/strong> and not on the entire dataset is key to preventing leakage. Properly addressing this issue leads to better generalization and more reliable models in real-world applications.&lt;/p>
&lt;p>This blog post provided a case study on how TF-IDF can introduce data leakage and why it&amp;rsquo;s important to carefully handle your dataset before feature extraction. By splitting your data properly and ensuring that no test data &amp;ldquo;leaks&amp;rdquo; into the training process, you can build models that truly reflect real-world performance.&lt;/p>
&lt;p>Thanks for reading!&lt;/p></description></item><item><title>AutoAppendix: Towards One-Click reproducibility of high-performance computing experiments</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/tuwien/autoappendix/20240904-kkrassni/</link><pubDate>Wed, 04 Sep 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/tuwien/autoappendix/20240904-kkrassni/</guid><description>&lt;p>Hi everyone,&lt;/p>
&lt;p>I&amp;rsquo;m excited to wrap up the AutoAppendix project with our final findings and
insights. Over the course of this initiative, we’ve worked to assess the
reproducibility of artifacts submitted to the SC24 conference and create
guidelines that aim to improve the standard for reproducible experiments in the
future. Here&amp;rsquo;s a summary of the project&amp;rsquo;s final phase and what we’ve learned.&lt;/p>
&lt;h2 id="project-goals-and-progress">Project Goals and Progress&lt;/h2>
&lt;p>The goal of AutoAppendix was to evaluate the computational artifacts provided by
SC24 paper submissions, focusing on reproducibility. These artifacts accompany
papers applying for the &amp;ldquo;Artifact Replicable&amp;rdquo; badge in the conference&amp;rsquo;s
reproducibility initiative. Volunteer members of this initiative assess 1-2 paper appendices each. In this project, we analyzed a larger portion of artifacts to gain a broader perspective on potential improvements to the reproducibility process.&lt;/p>
&lt;p>We selected 18 out of 45 submissions, focusing on experiments that could be
easily replicated on Chameleon Cloud. Our evaluation criteria were based on
simplicity (single-node setups) and availability of resources. The final
analysis expanded on the earlier midterm findings, shedding light on various
challenges and best practices related to artifact reproducibility.&lt;/p>
&lt;h2 id="artifact-evaluation-process">Artifact Evaluation Process&lt;/h2>
&lt;p>During the evaluation process, we focused on examining the completeness and
clarity of the provided artifacts, looking closely at documentation, setup
instructions, and the degree of automation.&lt;/p>
&lt;p>Our first step was to replicate the environments used in the original
experiments as closely as possible using the resources from Chameleon. Many papers included instructions for creating the necessary software environments,
but the clarity of these instructions varied significantly across submissions.
In some cases, we even encountered challenges in reproducing results due to unclear
instructions or missing dependencies, which reinforced the need for
standardized, clear documentation as part of the artifact submission process.&lt;/p>
&lt;p>We observed that &lt;em>containerization&lt;/em> and &lt;em>semi-automated setups&lt;/em> (with scripts
that break down the experiment into smaller steps) were particularly effective
in enhancing the reproducibility of the artifacts. One artifact
particularly caught our attention due to its usage of the Chameleon JupyterHub
platform, making it reproducible with a &lt;em>single click&lt;/em>. This highlighted the
potential for
streamlining the reproducibility process and showcased that, with sufficient
effort and the right tools, experiments can indeed be made replicable by
&lt;em>anyone&lt;/em>.&lt;/p>
&lt;h2 id="results">Results&lt;/h2>
&lt;p>Throughout the evaluation, we observed that reproducibility could vary widely
based on the clarity and completeness of the documentation and the automation of
setup procedures. Artifacts that were structured with clear, detailed steps for
installation and execution tended to perform well in terms of replicability.&lt;/p>
&lt;p>From our evaluation, we derived a set of guidelines (intended as must-haves) and
best practices (recommended) for artifact reproducibility, which can be found below.&lt;/p>
&lt;p>Due to our fascination of the potential of the Chameleon JupyterHub platform and its adjacent &lt;a href="https://www.chameleoncloud.org/experiment/share/" target="_blank" rel="noopener">Trovi&lt;/a> artifact repository, we decided to create
several templates that can be used as a starting point for authors to make integration
of their artifacts with the platform easier. In the design of these templates,
we made sure that artifacts structured according to our guidelines are
particularly easy to integrate.&lt;/p>
&lt;h3 id="guidelines">Guidelines&lt;/h3>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Clear Documentation&lt;/strong>: Provide clear and detailed documentation for the artifact in the corresponding appendix, such that the artifact can be replicated without the need for additional information. For third-party software, it is acceptable to refer to the official documentation.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Software Setup&lt;/strong>: Clearly specify the versions of all (necessary) software components used
in the creation of the artifact. This includes the operating system, libraries, and tools.
Particularly, state all software setup steps to replicate the software environment&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Hardware Specifications&lt;/strong>: Specify the hardware the experiment was conducted on. Importantly,
state the architecture the experiments are intended to run on, and ensure that
provided software (e.g. docker images) are compatible with commonly available
architectures.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Expected Results&lt;/strong>: Always provide the expected outputs of the experiment, especially when run on different hardware, to make it easier for reviewers to assess the success of the replication.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Public Data&lt;/strong>: Publish the experiment data to a public repository, and make
sure the data is available for download to reviewers and readers, especially during
the evaluation period. Zenodo is a recommended repository for this purpose.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Automated Reproducibility&lt;/strong>: For long-running experiments, provide
progress output to the reviewer to ensure the experiment is running as expected.
Give an idea in the documentation of&lt;/p>
&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>how much time long-running steps in the reproduction will take&lt;/li>
&lt;li>what the progress output looks like or how frequently it is emitted&lt;/li>
&lt;/ul>
&lt;ol start="7">
&lt;li>&lt;strong>Sample Execution&lt;/strong>: Conduct a sample evaluation with hardware and software
as similar as possible to the intended reproduction environment.&lt;/li>
&lt;/ol>
&lt;h3 id="best-practices">Best Practices&lt;/h3>
&lt;ol>
&lt;li>&lt;strong>Reproduciible Environment&lt;/strong>:
Use a reproducible environment for the artifact. This can come in several forms:&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>&lt;strong>Containerization&lt;/strong>: Provide instructions for building the environment, or,
ideally, provide a ready-to-use image. For example, Docker, Signularity or VirtualBox images can be used for this purpose&lt;/li>
&lt;li>&lt;strong>Reproducible Builds&lt;/strong>: Package managers like &lt;a href="https://nixos.org/" target="_blank" rel="noopener">Nix&lt;/a> or &lt;a href="https://guix.gnu.org/" target="_blank" rel="noopener">Guix&lt;/a> have recently spiked in popularity and allow their users to create reproducible environments, matching the exact software versions across different systems.&lt;/li>
&lt;/ul>
&lt;ol start="2">
&lt;li>
&lt;p>&lt;strong>Partial Automation&lt;/strong>: It often makes sense to break an experiment down into
smaller, more manageable steps. For Linux-based systems, bash scripts are particularly viable for this purpose. We recommend prefixing the scripts for each step with
a number, such that the order of execution is clear.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>X11 Availability&lt;/strong>: Usually, reviewers will not have access to a graphical user
interface on the system where the artifact is evaluated. If the artifact requires a
graphical user interface, provide a way to run the artifact without it. For example,
save &lt;code>matplotlib&lt;/code> plots to disk instead of showing them with &lt;code>plt.show()&lt;/code>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Experiment output&lt;/strong>: Do not provide output files of the experiment in your artifact,
unless explicitly intended. If provided output files are intended for comparison,
they should be marked as such (e.g. in their filename). Similarly, any output logs
or interactive outputs in Jupyter notebook should not be part of the artifact, but
rather be initially generate during the artifact evaluation.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h3 id="trovi-templates">Trovi Templates&lt;/h3>
&lt;p>Our templates share a common base that features
a &lt;em>central configuration file&lt;/em> for modifying the
Chameleon experiment parameters (such as node type). Building on this base, we provide three templates with sample experiments that each use different environments:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Docker template&lt;/strong>: This template is designed for containerized experiments and supports nvidia GPUs over the &lt;code>nvidia-container-toolkit&lt;/code> integration.&lt;/li>
&lt;li>&lt;strong>Nix template&lt;/strong>: Sets up the Nix package manager with a &lt;code>shell.nix&lt;/code> file that can be used to configure the environment.&lt;/li>
&lt;li>&lt;strong>Guix template&lt;/strong>: Installs the Guix package manager and executes a sample experiment from an existing reproducible paper that hinges on the reproducibility of the software environment.&lt;/li>
&lt;/ul>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>In summary, the AutoAppendix project has been an insightful journey into the
complexities of artifact reproducibility. Our evaluations highlight both the
challenges and potential solutions for future reproducibility initiatives. By
following these essential guidelines and implementing best practices, we aim for the
research community to achieve higher standards of transparency and reliability
in scientific research and help to ensure that the results of experiments can be replicated by others.&lt;/p>
&lt;p>Thanks for following along with our progress! We’re excited to see the positive
impact these findings will have on the research community.&lt;/p>
&lt;p>If you are interested in the full project report, you can find it &lt;a href="https://drive.google.com/drive/folders/113OsxGAlfyvlJnvpH5zL2XD-8gE3CYyu?usp=sharing" target="_blank" rel="noopener">here&lt;/a>, together with the &lt;em>Trovi&lt;/em> templates.&lt;/p></description></item><item><title>Reflecting on the ScaleRep Project: Achievements and Insights</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/osu/scalerep/20240902-shuangliang/</link><pubDate>Mon, 02 Sep 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/osu/scalerep/20240902-shuangliang/</guid><description>&lt;p>Hello everyone,&lt;/p>
&lt;p>As we reach the conclusion of our ScaleRep project, I want to take a moment to reflect on the journey we’ve undertaken and the significant milestones we’ve achieved. Throughout this project, our primary focus was on identifying, reproducing, and analyzing scalability bugs in cloud systems such as Cassandra, HDFS, and Hadoop. Under the mentorship of Professor Yang Wang and Bogdan “Bo” Stoica, we have gained valuable insights into the complexities of scalability issues and their impact on large-scale distributed systems.&lt;/p>
&lt;h1 id="key-accomplishments">Key Accomplishments&lt;/h1>
&lt;p>Over the course of the project, we delved into various aspects of scalability bugs, reproducing some of the most challenging issues faced by cloud systems. One of our notable accomplishments was the successful reproduction and validation of developer fixes for several critical bugs in HDFS. These included:&lt;/p>
&lt;h2 id="1-throttling-bugs-in-hdfs">1. Throttling Bugs in HDFS:&lt;/h2>
&lt;p>We investigated HDFS-17087, where the absence of a throttler in led to unregulated data reads, causing potential performance degradation. By reproducing the bug and applying the developer’s patch, we were able to observe significant improvements in system stability.DataXceiver#readBlock&lt;/p>
&lt;h2 id="2-reducing-datanode-load">2. Reducing DataNode Load:&lt;/h2>
&lt;p>HDFS-16386 was another crucial bug we worked on, which involved reducing the load on DataNodes when was working. By analyzing the effects of high CPU and memory usage, we proposed and validated a solution that reduced the number of concurrent threads, ultimately improving the DataNode’s performance.FsDatasetAsyncDiskService&lt;/p>
&lt;h2 id="3-improving-log-throttling">3. Improving Log Throttling:&lt;/h2>
&lt;p>In HDFS-16872, we addressed excessive logging caused by unshared instances of . By making a static member, we were able to share throttling across instances, reducing unnecessary log entries and improving system efficiency.LogThrottlingHelperLogThrottlingHelper&lt;/p>
&lt;h1 id="insights-and-learnings">Insights and Learnings&lt;/h1>
&lt;h2 id="1-systematic-bug-reproduction">1. Systematic Bug Reproduction:&lt;/h2>
&lt;p>One of the most critical aspects of our work was developing a systematic approach to bug reproduction. This involved carefully setting up the environment, applying patches, and validating results through detailed monitoring and analysis. Our reproducible artifacts and investigation scripts will serve as a resource for future researchers and developers.&lt;/p>
&lt;h2 id="2-impact-of-throttling-mechanisms">2. Impact of Throttling Mechanisms:&lt;/h2>
&lt;p>Our exploration of throttling bugs highlighted the importance of accurate throttling mechanisms in maintaining system performance and stability. Small issues, such as incorrect data rate calculations, can have significant ripple effects on system behavior, emphasizing the need for precise and effective solutions.&lt;/p>
&lt;h2 id="3-collaboration-and-open-source-contribution">3. Collaboration and Open Source Contribution:&lt;/h2>
&lt;p>Working on an open-source project like ScaleRep underscored the importance of collaboration within the community. The bugs we analyzed and fixed not only improved the systems we worked on but also contributed to the broader effort of enhancing the reliability of cloud systems.&lt;/p>
&lt;h1 id="conclusion">Conclusion&lt;/h1>
&lt;p>As we wrap up the ScaleRep project, I am proud of the progress we have made and the contributions we have delivered to the open-source community. The knowledge and experience gained from this project will undoubtedly shape our future endeavors in the field of distributed systems and cloud computing. I am grateful for the guidance and support provided by Professor Yang Wang and Bogdan “Bo” Stoica throughout this journey.&lt;/p>
&lt;p>Thank you for following along, and I look forward to continuing to explore the future of scalable and reliable cloud systems!&lt;/p></description></item><item><title>Static and Interactive Visualization Capture</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/niu/repro-vis/20250301-aryas/</link><pubDate>Fri, 30 Aug 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/niu/repro-vis/20250301-aryas/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Hello! My name is &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/arya-sarkar/">Arya Sarkar&lt;/a> a machine learning engineer and researcher based out of Kolkata, a city in Eastern India dubbed the City of Joy.
During summer of 2024, I worked closely with Professor &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/david-koop/">David Koop&lt;/a> on the project titled &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/niu/repro-vis/">Reproducibility in Data Visualization&lt;/a>.
We explored multiple existing solutions and tested different stratergies and made great progress in the capture of visualiations using a relatively less used method of embedding visualization meta-information into the final resultant visualizations jpg as a json object.&lt;/p>
&lt;h2 id="progress-and-challenges">Progress and Challenges&lt;/h2>
&lt;p>Static Visualization Capture&lt;/p>
&lt;p>We successfully developed a method to capture static visualizations as .png files along with embedded metadata in a JSON format.
This approach enables seamless reproducibility of the visualization by storing all necessary metadata within the image file itself.
Our method supports both Matplotlib and Bokeh libraries and demonstrated near-perfect reproducibility, with only a minimal 1-2% pixel difference in cases where jitter (randomness) was involved.&lt;/p>
&lt;p>Interactive Visualization Capture&lt;/p>
&lt;p>For interactive visualizations, our focus shifted to capturing state changes in Plotly visualizations on the web.
We developed a script that tracks user interactions (e.g., zoom, box, lasso, slider) using event listeners and automatically captures the visualization state as both image and metadata files.
This script also maintains a history of interactions to ensure reproducibility of all interaction states.&lt;/p>
&lt;p>The challenge of capturing web-based visualizations from platforms like ObservableHq remains, as iframe restrictions prevent direct access to SVG elements.
Further exploration is needed to create a more robust capture method for these environments.&lt;/p>
&lt;p align="center">
&lt;img src="./bokeh_interactive.png" alt="bokeh interactive capture" style="width: 80%; height: auto;">
&lt;/p>
&lt;h1 id="future-work">Future Work&lt;/h1>
&lt;p>We aim to package our interactive capture script into a Google Chrome extension.&lt;/p>
&lt;p>Temporarily store interaction session files in the browser’s local storage.&lt;/p>
&lt;p>Enable users to download captured files as a zip archive, using base64 encoding for images.&lt;/p>
&lt;h1 id="conclusion">Conclusion&lt;/h1>
&lt;p>The last summer, we made significant strides in enhancing data visualization reproducibility.
Our innovative approach to embedding metadata directly into visualization files offers a streamlined method for recreating static visualizations.
The progress in capturing interactive visualization states opens new possibilities for tackling a long-standing challenge in the field of reproducibility.&lt;/p></description></item><item><title>Final Blog: BenchmarkST: Cross-Platform, Multi-Species Spatial Transcriptomics Gene Imputation Benchmarking</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uci/benchmarkst/20240829-qianru/</link><pubDate>Thu, 29 Aug 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uci/benchmarkst/20240829-qianru/</guid><description>&lt;p>Hello! I&amp;rsquo;m Qianru! I have been contributing to the BenchmarkST: Cross-Platform, Multi-Species Spatial Transcriptomics Gene Imputation Benchmarking project under the mentorship of Ziheng Duan. My project aims to provide a standardized, easily accessible evaluation framework for gene imputation in spatial transcriptomics.&lt;/p>
&lt;h1 id="motivation-and-overview">Motivation and Overview&lt;/h1>
&lt;p>The &amp;ldquo;BenchmarkST&amp;rdquo; project was driven by the need to address a critical challenge in spatial transcriptomics: the impact of sparse data on downstream tasks, such as spatial domain identification. Sparse data can significantly degrade the performance of these tasks. For example, in a 10X Visium dataset of human brain Dorsolateral Prefrontal Cortex (DLPFC), using the complete dataset with GraphST (a state-of-the-art clustering method) for clustering resulted in an ARI (Adjusted Rand Index) of 0.6347. However, when using only 20% of the data—a common scenario—the performance dropped dramatically to 0.1880. This stark difference highlights the importance of effective gene imputation, which can help restore the lost information and improve the accuracy of downstream analyses.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="fig1" srcset="
/report/osre24/uci/benchmarkst/20240829-qianru/fig1_hu72c585df7604f28a748aa64a85602fac_159578_1bdac9436ddd84b83023a2cd20d76fb3.webp 400w,
/report/osre24/uci/benchmarkst/20240829-qianru/fig1_hu72c585df7604f28a748aa64a85602fac_159578_8a97a3a52a0fad3fb5d2dbf596e883a9.webp 760w,
/report/osre24/uci/benchmarkst/20240829-qianru/fig1_hu72c585df7604f28a748aa64a85602fac_159578_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uci/benchmarkst/20240829-qianru/fig1_hu72c585df7604f28a748aa64a85602fac_159578_1bdac9436ddd84b83023a2cd20d76fb3.webp"
width="760"
height="496"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
To tackle this issue, the BenchmarkST project led to the creation of the Impeller package. This package provides a standardized, easily accessible evaluation framework for gene imputation in spatial transcriptomics, offering preprocessed datasets, reproducible evaluation methods, and flexible inference interfaces. It spans across different platforms, species, and organs, aiming to enhance the integrity and usability of spatial transcriptomics data.&lt;/p>
&lt;h1 id="what-was-accomplished">What Was Accomplished&lt;/h1>
&lt;h2 id="development-of-the-impeller-package">Development of the Impeller Package&lt;/h2>
&lt;h4 id="data-aggregation-and-preprocessing">Data Aggregation and Preprocessing:&lt;/h4>
&lt;p>We aggregated and preprocessed spatial transcriptomic datasets from multiple platforms (10X Visium, StereoSeq, SlideSeqV2), species (human, mouse), and organs (Dorsolateral Prefrontal Cortex, olfactory bulb). These datasets are readily available for download within the package.&lt;/p>
&lt;h4 id="unified-evaluation-framework">Unified Evaluation Framework:&lt;/h4>
&lt;p>A reproducible framework was developed, integrating methods such as K-Nearest Neighbors (KNN) and the deep learning-based Impeller method, enabling users to easily evaluate the performance of different gene imputation techniques.&lt;/p>
&lt;h4 id="inference-interfaces">Inference Interfaces:&lt;/h4>
&lt;p>We provided interfaces that allow users to apply gene imputation on custom datasets, offering the flexibility to predict any gene in any cell, maximizing the utility for diverse research needs.&lt;/p>
&lt;h2 id="code-contributions-and-documentation">Code Contributions and Documentation&lt;/h2>
&lt;h4 id="repository">Repository:&lt;/h4>
&lt;p>All code related to the Impeller package has been committed to the &lt;a href="https://pypi.org/project/impeller/0.1.2/#files" target="_blank" rel="noopener">Impeller&lt;/a> repository.&lt;/p>
&lt;h4 id="link-to-versions">Link to Versions:&lt;/h4>
&lt;p>&lt;a href="https://pypi.org/project/impeller/0.1.2/#history" target="_blank" rel="noopener">Here&lt;/a> you can find all the versions made during the project, with detailed descriptions of each change.&lt;/p>
&lt;h4 id="readmemdhttpspypiorgprojectimpeller012description">&lt;a href="https://pypi.org/project/impeller/0.1.2/#description" target="_blank" rel="noopener">README.md&lt;/a>:&lt;/h4>
&lt;p>Detailed documentation on how to use the Impeller package, including installation instructions, usage examples, and explanations of the key components.&lt;/p></description></item><item><title>Final Blog: ML in Detecting and Addressing System Drift</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/last/20240829-joanna/</link><pubDate>Thu, 29 Aug 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/last/20240829-joanna/</guid><description>&lt;p>Hello! I&amp;rsquo;m Joanna! I have been contributing to the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/anl/last">ML in Detecting and Addressing System Drift&lt;/a> project under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ray-andrew-sinurat/">Ray Andrew Sinurat&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sandeep-madireddy/">Sandeep Madireddy&lt;/a>. My project aims to design a pipeline to evaluate drift detection algorithms on system traces.&lt;/p>
&lt;h1 id="methodology">Methodology&lt;/h1>
&lt;p>Here is some background on my project: Model drift, or the degradation of model performance, is typically caused by data drift, which is a shift in the input distribution, and concept drift, which is a change in the relationship between input and output. This project focuses specifically on data drift, aiming to design a pipeline for evaluating drift detection algorithms on system traces. The goal is to benchmark different drift detection algorithms and have a better understanding of the features of system traces. The project is divided into two main parts: dataset construction and algorithm benchmarking.&lt;/p>
&lt;h3 id="part-1-dataset-construction">PART 1: Dataset Construction&lt;/h3>
&lt;p>To benchmark drift detection algorithms in system data, it&amp;rsquo;s important to recognize that system trace data is inherently different from other data types, often containing more noise, which can complicate detection efforts. Therefore, constructing a labeled dataset specific to system data is crucial. In our case, we utilize the Tencent I/O block trace data as the dataset. This raw data was processed to extract timestamps along with various features such as IOPS, write size ratio, read write ratio, and etc., which were then used to create a data drift dataset.&lt;/p>
&lt;p>I constructed this dataset by labeling segments of the trace data as either exhibiting drift or not. To identify where the drift occurs and to help construct the dataset, I employed several offline drift detection algorithms, including Kolmogorov-Smirnov, Cramer-von Mises, KL-Divergence, and Jensen-Shannon Distance.&lt;/p>
&lt;p>To enhance the accuracy of the drift detection, especially in the presence of noise common in trace data, I applied additional preprocessing steps such as Fourier transform and moving average. These techniques help to smooth the data, making it easier to detect true drift signals. Finally, a voting strategy was used in combination with post-processing methods to build and refine the final datasets.&lt;/p>
&lt;p>The first figure below illustrates the segments of IOPS where drift has been detected. The second figure shows the segments of data where no drift occurs.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Drift Data" srcset="
/report/osre24/anl/last/20240829-joanna/drift_hueed6613a6bb326df79ee6a6125caea30_218453_ed8c1284ad85bf6b4049e6c666e015b1.webp 400w,
/report/osre24/anl/last/20240829-joanna/drift_hueed6613a6bb326df79ee6a6125caea30_218453_8be8f041f7c86965792bf781e2489836.webp 760w,
/report/osre24/anl/last/20240829-joanna/drift_hueed6613a6bb326df79ee6a6125caea30_218453_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/last/20240829-joanna/drift_hueed6613a6bb326df79ee6a6125caea30_218453_ed8c1284ad85bf6b4049e6c666e015b1.webp"
width="715"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Non-Drift Data" srcset="
/report/osre24/anl/last/20240829-joanna/nondrift_hu71453c92de8e4df0dd4aefaf6b160e99_327249_ac2898dbe6747b2a53de6ee136def2e4.webp 400w,
/report/osre24/anl/last/20240829-joanna/nondrift_hu71453c92de8e4df0dd4aefaf6b160e99_327249_046f624ccca1c3537b820060909a7bd2.webp 760w,
/report/osre24/anl/last/20240829-joanna/nondrift_hu71453c92de8e4df0dd4aefaf6b160e99_327249_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/last/20240829-joanna/nondrift_hu71453c92de8e4df0dd4aefaf6b160e99_327249_ac2898dbe6747b2a53de6ee136def2e4.webp"
width="734"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h3 id="part-2-benchmark-drift-detection-algorithms">PART 2: Benchmark Drift Detection Algorithms&lt;/h3>
&lt;p>This part focuses on benchmarking the Jensen-Shannon and Wasserstein drift detection methods using system trace data. The evaluation metrics are categorized into three main areas:&lt;/p>
&lt;ol>
&lt;li>Detection Accuracy Metrics&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>True Positive Rate (Recall)&lt;/li>
&lt;li>True Negative Rate (Specificity)&lt;/li>
&lt;li>Precision&lt;/li>
&lt;li>F1-Score&lt;/li>
&lt;/ul>
&lt;ol start="2">
&lt;li>Detection Overhead Metrics&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>Time Taken: The computational time required to detect drifts, critical&lt;/li>
&lt;/ul>
&lt;ol start="3">
&lt;li>Stability Metrics&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>False Positive Rate&lt;/li>
&lt;li>False Negative Rate&lt;/li>
&lt;/ul>
&lt;p>(Additional) Comparative Analysis:&lt;/p>
&lt;ul>
&lt;li>Accuracy Across Different Features: How well the detection algorithms perform when applied to various features within the system trace data.&lt;/li>
&lt;/ul>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Jensen-Shannon Distance Results" srcset="
/report/osre24/anl/last/20240829-joanna/js-result_hufb49199342183a3232a30a04b1d40959_183762_a7d269c0f217c0b79c79d4f011f54fd9.webp 400w,
/report/osre24/anl/last/20240829-joanna/js-result_hufb49199342183a3232a30a04b1d40959_183762_06f6c204e8a4457868c4b2bc43fb7c28.webp 760w,
/report/osre24/anl/last/20240829-joanna/js-result_hufb49199342183a3232a30a04b1d40959_183762_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/last/20240829-joanna/js-result_hufb49199342183a3232a30a04b1d40959_183762_a7d269c0f217c0b79c79d4f011f54fd9.webp"
width="760"
height="607"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Wasserstein Distance Results" srcset="
/report/osre24/anl/last/20240829-joanna/wd-result_hufb49199342183a3232a30a04b1d40959_190137_36d4aa25ff595624ac289c635f82a085.webp 400w,
/report/osre24/anl/last/20240829-joanna/wd-result_hufb49199342183a3232a30a04b1d40959_190137_efd489c01a8f75e3d059b819fc51eb25.webp 760w,
/report/osre24/anl/last/20240829-joanna/wd-result_hufb49199342183a3232a30a04b1d40959_190137_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/last/20240829-joanna/wd-result_hufb49199342183a3232a30a04b1d40959_190137_36d4aa25ff595624ac289c635f82a085.webp"
width="760"
height="607"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h1 id="discussion">Discussion&lt;/h1>
&lt;p>The results clearly demonstrate that the Jensen-Shannon distance method outperforms the Wasserstein distance method in detecting drift. Additionally, the write size ratio proves to be a more effective feature for representing the variations in the data, offering a more nuanced understanding of the underlying changes.&lt;/p>
&lt;h1 id="conclusion-and-next-steps">Conclusion and Next Steps&lt;/h1>
&lt;p>In conclusion, this project establishes a pipeline that encompasses data labeling, data processing, and the benchmarking of drift detection algorithms. This just serves as the first step in detecting drift in system data.&lt;/p>
&lt;p>There is significant potential for further improvement. Future work should focus on enhancing dataset construction by incorporating large language models (LLMs) and other advanced techniques to further clean and refine the datasets. Additionally, the evaluation of drift detection methods should be expanded beyond the current benchmarks, which only include two statistical methods. Incorporating additional statistical methods, as well as machine learning (ML) and deep learning (DL) approaches, could provide a more comprehensive analysis. Furthermore, exploring a broader range of evaluation metrics will ensure a more robust and accurate assessment of drift detection performance. These steps will help to advance the accuracy and reliability of drift detection in system trace data.&lt;/p>
&lt;h1 id="deliverables">Deliverables&lt;/h1>
&lt;p>The following are the deliverables of this project:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://www.chameleoncloud.org/experiment/share/384ee2bd-853c-427b-877b-3af2993fb502" target="_blank" rel="noopener">Trovi Artifact&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/JoannaCCJH/drift-detection-OSRE24" target="_blank" rel="noopener">Github Repository&lt;/a>: This repository contains the code for generating drift datasets with labels and notebooks with benchmarking results&lt;/li>
&lt;/ul></description></item><item><title>Final Blogpost: Reproducibility in Data Visualization</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/niu/repro-vis/20240828-triveni5/</link><pubDate>Wed, 28 Aug 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/niu/repro-vis/20240828-triveni5/</guid><description>&lt;p>Hello everyone!&lt;/p>
&lt;p>I&amp;rsquo;m Triveni, a Master&amp;rsquo;s student in Computer Science at Northern Illinois University (NIU). I&amp;rsquo;m excited to share my progress on the OSRE 2024 project &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/niu/repro-vis/">Categorize Differences in Reproduced Visualizations&lt;/a> focusing on data visualization reproducibility. Working under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/david-koop/">David Koop&lt;/a>, I&amp;rsquo;ve made some significant strides and faced some interesting challenges.&lt;/p>
&lt;h1 id="reproducibility-in-data-visualization">Reproducibility in data visualization&lt;/h1>
&lt;p>Reproducibility is crucial in data visualization, ensuring that two visualizations accurately convey the same data. This is essential for maintaining transparency and trust in data-driven decision-making. When comparing two visualizations, the challenge is not just spotting differences but determining which differences are meaningful. Tools like OpenCV are often used for image comparison, but they may detect all differences, including those that do not impact the data&amp;rsquo;s interpretation. For example, slight shifts in labels might be flagged as differences even if the underlying data remains unchanged, making it challenging to assess whether the visualizations genuinely differ in terms of the information they convey.&lt;/p>
&lt;h1 id="a-breakthrough-with-chartdetective">A Breakthrough with ChartDetective&lt;/h1>
&lt;p>Among various tools like ChartOCR and ChartReader, ChartDetective proved to be the most effective. This tool enabled me to extract data from a range of visualizations, including bar charts, line charts, box plots, and scatter plots. To enhance its capabilities, I modified the codebase to capture pixel values alongside the extracted data and store both in a CSV file. This enhancement allowed for a direct comparison of data values and their corresponding pixel coordinates between two visualizations, focusing on meaningful differences that truly impact data interpretation.&lt;/p>
&lt;h1 id="example-comparing-two-bar-plots-with-chartdetective">Example: Comparing Two Bar Plots with ChartDetective&lt;/h1>
&lt;p>Consider two bar plots that visually appear similar but have slight differences in their data values. Using ChartDetective, I extracted the data and pixel coordinates from both plots and stored this information in a CSV file. The tool then compared these values to identify any discrepancies.&lt;/p>
&lt;p>For instance, in one bar plot, the height of a specific bars were slightly increased. By comparing the CSV files generated by ChartDetective, I was able to pinpoint these differences precisely. The final step involved highlighting these differences on one of the plots using OpenCV, making it clear where visualizations diverged.This approach ensures that only meaningful differences—those that reflect changes in the data—are considered when assessing reproducibility.&lt;/p>
&lt;ul>
&lt;li>ChartDetective: SVG or PDF file of the visualization is uploaded to extract data.&lt;/li>
&lt;/ul>
&lt;p align="center">
&lt;img src="./barplot_chartdetective.png" alt="ChartDetective" style="width: 80%; height: auto;">
&lt;/p>
- Data Extraction: Data values along with pixel details are stored in the CSV files.
&lt;p align="center">
&lt;img src="./barplots_pixels.png" alt="Data_Extraction" style="width: 80%; height: auto;">
&lt;/p>
- Highlighting the differences: Differences are highlighted on one of the plots using OpenCV
&lt;p align="center">
&lt;img src="./Highlighted_differences.png" alt="Highlighting the differences" style="width: 60%; height: auto;">
&lt;/p>
&lt;h1 id="understanding-user-perspectives-on-reproducibility">Understanding User Perspectives on Reproducibility&lt;/h1>
&lt;p>To complement the technical analysis, I created a pilot survey to understand how users perceive reproducibility in data visualizations. The survey evaluates user interpretations of two visualizations and explores which visual parameters impact their decision-making. This user-centered approach is crucial because even minor differences in visual representation can significantly affect how data is interpreted and used.&lt;/p>
&lt;p>Pilot Survey Example:&lt;/p>
&lt;p>Pixel Differences: In one scenario, the height of two bars was altered slightly, introducing a noticeable yet subtle change.&lt;/p>
&lt;p>Label Swapping: In another scenario, the labels of two bars were swapped without changing their positions or heights.&lt;/p>
&lt;p align="center">
&lt;img src="./barchart_labels_swap.png" alt="Label Swapping" style="width: 80%; height: auto;">
&lt;/p>
&lt;p>Participants will be asked to evaluate the reproducibility of these visualizations, considering whether the differences impacted their interpretation of the data. The goal was to determine which visual parameters—such as bar height or label positioning—users find most critical when assessing the similarity of visualizations.&lt;/p>
&lt;h1 id="future-work-and-conclusion">Future Work and Conclusion&lt;/h1>
&lt;p>Going forward, I plan to develop a proof of concept based on these findings and implement an extensive survey to further explore the impact of visual parameters on users&amp;rsquo; perceptions of reproducibility. Understanding this will help refine tools and methods for comparing visualizations, ensuring they not only look similar but also accurately represent the same underlying data.&lt;/p></description></item><item><title>Final Blogpost: Drift Management Strategies Benchmark</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/last/20240827-williamn/</link><pubDate>Sat, 24 Aug 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/last/20240827-williamn/</guid><description>&lt;h1 id="background">Background&lt;/h1>
&lt;p>Hello there! I&amp;rsquo;m William and this is my final blog for my proposal &amp;ldquo;Developing A Comprehensive Pipeline to Benchmark Drift Management Approaches&amp;rdquo; under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ray-andrew-sinurat/">Ray Andrew Sinurat&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sandeep-madireddy/">Sandeep Madireddy&lt;/a> under the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/anl/last">LAST&lt;/a> project.&lt;/p>
&lt;p>If you&amp;rsquo;re not familiar with it, this project aims to address the issue of model aging, where machine learning (ML) models experience a decline in effectiveness over time due to environmental changes, known as drift. My goal is to design an extensible pipeline that evaluates and benchmarks the robustness of state-of-the-art algorithms in addressing these drifts.&lt;/p>
&lt;h1 id="deliverables">Deliverables&lt;/h1>
&lt;p>You can find my list of deliverables here:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://docs.google.com/document/d/14tSmBndX1RBv_d3luRcqFDmbuMk6XsGOB8G7tzTcHnE/edit" target="_blank" rel="noopener">Final report&lt;/a>, this blog is a summarized version of my final report, so do take a look if you&amp;rsquo;d like to know more!&lt;/li>
&lt;li>&lt;a href="https://github.com/williamnixon20/osre-drift" target="_blank" rel="noopener">Github repository&lt;/a>, contains code as well as the raw experiment results.&lt;/li>
&lt;li>&lt;a href="https://www.chameleoncloud.org/experiment/share/e3ae5f07-4340-48c0-94e8-ba99ee2bf691" target="_blank" rel="noopener">Trovi artifact&lt;/a>&lt;/li>
&lt;/ul>
&lt;h1 id="evaluation">Evaluation&lt;/h1>
&lt;p>Here are some of the graphs that show the performance of every algorithm on the created datasets. For more graphs and figures, you can check out my final report:&lt;/p>
&lt;ul>
&lt;li>CIRCLE: AUE demonstrates stability, maintaining a high accuracy even as the data drifts, which may be due to its ensemble nature. It is even more stable than baseline retraining algorithms. Matchmaker is also able to recover quickly upon experiencing drift, which maybe again due to its ranking the most high performing models to do inference, recovering faster than RetrainWin. On the other hand, DriftSurf experiences several random drops in accuracy, indicating that it can be somewhat unstable.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Circle" srcset="
/report/osre24/anl/last/20240827-williamn/circle_hubbc4ef0a01f86beba0bcc28be93ed90c_288185_1f197cacfff5e655fb7250e99000891a.webp 400w,
/report/osre24/anl/last/20240827-williamn/circle_hubbc4ef0a01f86beba0bcc28be93ed90c_288185_f3747fe006d5ab30df409c38f9f518fd.webp 760w,
/report/osre24/anl/last/20240827-williamn/circle_hubbc4ef0a01f86beba0bcc28be93ed90c_288185_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/last/20240827-williamn/circle_hubbc4ef0a01f86beba0bcc28be93ed90c_288185_1f197cacfff5e655fb7250e99000891a.webp"
width="760"
height="481"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/li>
&lt;li>SINE: Similar to CIRCLE, AUE demonstrates stability throughout the dataset, maintaining a high accuracy even as the data drifts. Matchmaker however was struggling to adapt as fast when encountering such a sudden drift, as it needed some time/windows to recover from the drop. Driftsurf&amp;rsquo;s performance was notably better than baseline, as unlike them, it was able to recover successfully fairly quickly upon experiencing drift.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Sine" srcset="
/report/osre24/anl/last/20240827-williamn/sine_huc62276117407137b51fc43d9c9e20c37_686961_16a2edab20232bdbb4c23e0fa37398dd.webp 400w,
/report/osre24/anl/last/20240827-williamn/sine_huc62276117407137b51fc43d9c9e20c37_686961_cc1b5a8e845047c50f95da38dbf5a262.webp 760w,
/report/osre24/anl/last/20240827-williamn/sine_huc62276117407137b51fc43d9c9e20c37_686961_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/last/20240827-williamn/sine_huc62276117407137b51fc43d9c9e20c37_686961_16a2edab20232bdbb4c23e0fa37398dd.webp"
width="760"
height="500"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/li>
&lt;li>CovCon: In CovCon, Matchmaker was able to achieve the best accuracy, as it is able to select the models most relevant to each incoming batch (model trained on the most similar features), performing comparably to retrain window. Most of the other algorithms suffered in this dataset, particularly AUE whose performance is now becoming comparable to the rest of the algorithms and baseline.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="CovCon" srcset="
/report/osre24/anl/last/20240827-williamn/covcon_hua1f3c6557283d861ec9482d15a04c8d4_668622_658cedab5102506c34376dd9e6fe748e.webp 400w,
/report/osre24/anl/last/20240827-williamn/covcon_hua1f3c6557283d861ec9482d15a04c8d4_668622_9bd5024236a880f53d616f4ed4a7294f.webp 760w,
/report/osre24/anl/last/20240827-williamn/covcon_hua1f3c6557283d861ec9482d15a04c8d4_668622_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/last/20240827-williamn/covcon_hua1f3c6557283d861ec9482d15a04c8d4_668622_658cedab5102506c34376dd9e6fe748e.webp"
width="760"
height="502"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/li>
&lt;li>IOAdmission: Performance on this dataset was led by AUE, which was able to maintain impressive stability amongst all of the algorithms used. This is followed closely by Matchmaker. The other algorithms used undergo a lot of fluctuations in accuracy.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="IOAdmission" srcset="
/report/osre24/anl/last/20240827-williamn/ioadm_hubb89d2e4deb0623f90d37221d48664dd_505735_6022f696d548c8dea5e78a1144b71820.webp 400w,
/report/osre24/anl/last/20240827-williamn/ioadm_hubb89d2e4deb0623f90d37221d48664dd_505735_b58b4d4753cc231ef4ada6a72fbb773a.webp 760w,
/report/osre24/anl/last/20240827-williamn/ioadm_hubb89d2e4deb0623f90d37221d48664dd_505735_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/last/20240827-williamn/ioadm_hubb89d2e4deb0623f90d37221d48664dd_505735_6022f696d548c8dea5e78a1144b71820.webp"
width="760"
height="500"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/li>
&lt;/ul>
&lt;h1 id="findings--discussion">Findings / Discussion&lt;/h1>
&lt;p>From the experiments conducted, the findings are as follows:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Matchmaker was able to perform particularly well in the CovCon dataset. This maybe due to its ability to choose the most relevant trained model from its ensemble during inference time. Its training time is also the best compared to other algorithms, especially considering that it keeps data for training an additional random forest model for ranking the models. However, its inference time was the longest amongst all other algorithms. This may be due to the fact that on inference time, one needs to traverse all of the leaf nodes of the random forest used to rank it (computing covariate shift).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>AUE was able to perform particularly well in the CIRCLE and IOAdmission dataset. However, it is quite competitive on other datasets too. It&amp;rsquo;s weighting function which incentives highly relevant models and eviction of less relevant ones may be key. Its inference time is decent compared to other algorithms, being slower than most baselines and Driftsurf, but faster than Matchmaker. However, its training time took the longest amongst other competitors, as it has an expensive weighting function to weight, evict, or retrain models on every retraining.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>DriftSurf was performing very similarly to the RetrainWindow baseline, in almost all datasets, except for IO Admission and SINE where it did better. This may be because of the fact that it maintains only at most 2 models every iteration, and as such, its performance was not competitive against the mult-models approach used in Matchmaker and AUE. On the plus side, its inference time is comparable to the baseline single model, having almost no inference overhead compared to most of the competitors out there. Another plausible explanation for the lack of performance is the lack of tuning, such as the number of windows retained, the length of its reactive period, and its reactivity sensitivity threshold. A better performance could be achieved if these parameters were tuned further.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h1 id="next-steps">Next Steps&lt;/h1>
&lt;p>These are some of the potential extensions for this project:&lt;/p>
&lt;ol>
&lt;li>Optimize Matchmaker&amp;rsquo;s inference time improving Matchmaker&amp;rsquo;s efficiency, especially in covariate shift ranking, can reduce inference time. Simplifying the random forest traversal could make Matchmaker faster without impacting performance.&lt;/li>
&lt;li>Extending the work to include other frameworks like TensorFlow or PyTorch, as it can now only support a scikit-learn base model.&lt;/li>
&lt;/ol>
&lt;p>Thank you for reading!&lt;/p></description></item><item><title>Reproducing and addressing Data Leakage issue : Duplicates in dataset</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/nyu/data-leakage/20240823-kyrillosishak/</link><pubDate>Fri, 23 Aug 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/nyu/data-leakage/20240823-kyrillosishak/</guid><description>&lt;p>Hello!&lt;/p>
&lt;p>In this blog post, I will explore a common issue in machine learning called data leakage, using an example from the paper:&lt;/p>
&lt;blockquote>
&lt;p>Benedetti, P., Perri, D., Simonetti, M., Gervasi, O., Reali, G., Femminella, M. (2020). Skin Cancer Classification Using Inception Network and Transfer Learning. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2020. ICCSA 2020. Lecture Notes in Computer Science(), vol 12249. Springer, Cham. &lt;a href="https://doi.org/10.1007/978-3-030-58799-4_39" target="_blank" rel="noopener">https://doi.org/10.1007/978-3-030-58799-4_39&lt;/a> &lt;a href="https://arxiv.org/pdf/2111.02402v1" target="_blank" rel="noopener">arXiv&lt;/a>&lt;/p>
&lt;/blockquote>
&lt;h1 id="overview-of-the-paper">Overview of the Paper&lt;/h1>
&lt;p>In this paper, the authors use transfer learning on a pretrained convolutional neural network (CNN) to classify skin lesions in dermatoscopic images from the HAM10000 (Human Against Machine with 10,000 training images) dataset. The paper reports a final accuracy of 78.9% on the validation set.&lt;/p>
&lt;p>While this reported result appears to be impressive, there are concerns regarding the validity of this performance metric due to data leakage. Data leakage occurs when the model is trained or evaluated on data that it would not have access to during real-world deployment, leading to an overestimation of the model&amp;rsquo;s true performance.&lt;/p>
&lt;h1 id="identifying-data-leakage-in-the-original-paper">Identifying Data Leakage in the Original Paper&lt;/h1>
&lt;p>Upon closer inspection, it appears that the original experiment suffers from data leakage in two significant ways:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Duplicate Images in Training and Validation Sets:&lt;/p>
&lt;p>The HAM10000 dataset contains near-duplicate images of the same lesions in both the training and validation sets. This results in the model seeing very similar images during training and then again during validation. Consequently, the model&amp;rsquo;s performance is artificially inflated because it has already been &amp;ldquo;trained&amp;rdquo; on images similar to those in the validation set, making the task easier than it should be.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Lesions" srcset="
/report/osre24/nyu/data-leakage/20240823-kyrillosishak/Near-duplicate_HAM10000_hu729e02c2ef4cc1a337c6f61174a87df8_81762_dd4057c4bb0e4dc6092a43699881c4f4.webp 400w,
/report/osre24/nyu/data-leakage/20240823-kyrillosishak/Near-duplicate_HAM10000_hu729e02c2ef4cc1a337c6f61174a87df8_81762_95478285e99e3f18a8b724b5a0a3dbb5.webp 760w,
/report/osre24/nyu/data-leakage/20240823-kyrillosishak/Near-duplicate_HAM10000_hu729e02c2ef4cc1a337c6f61174a87df8_81762_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/nyu/data-leakage/20240823-kyrillosishak/Near-duplicate_HAM10000_hu729e02c2ef4cc1a337c6f61174a87df8_81762_dd4057c4bb0e4dc6092a43699881c4f4.webp"
width="620"
height="104"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Lesions2" srcset="
/report/osre24/nyu/data-leakage/20240823-kyrillosishak/Near-duplicate2_HAM10000_hu1c922099c4dc23532306de6197bf4d86_99960_a0e705292db1f7f683a77cd92f29edc0.webp 400w,
/report/osre24/nyu/data-leakage/20240823-kyrillosishak/Near-duplicate2_HAM10000_hu1c922099c4dc23532306de6197bf4d86_99960_fae997e88feee11018435aebd9ed6c88.webp 760w,
/report/osre24/nyu/data-leakage/20240823-kyrillosishak/Near-duplicate2_HAM10000_hu1c922099c4dc23532306de6197bf4d86_99960_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/nyu/data-leakage/20240823-kyrillosishak/Near-duplicate2_HAM10000_hu1c922099c4dc23532306de6197bf4d86_99960_a0e705292db1f7f683a77cd92f29edc0.webp"
width="620"
height="104"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Using the Validation Set for Early Stopping and Final Evaluation:&lt;/p>
&lt;p>Another critical issue is the use of the validation set for both early stopping and final model evaluation. Early stopping is a technique where training is halted when the model&amp;rsquo;s performance on a validation set no longer improves, preventing overfitting. However, if this same validation set is later used to evaluate the model&amp;rsquo;s final performance, it can lead to overfitting on the validation data itself, resulting in an overly optimistic estimate of model accuracy.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h1 id="our-reproduction-and-results">Our Reproduction and Results&lt;/h1>
&lt;p>To demonstrate the impact of these data leakage issues, we reproduced the experiment with corrected methodologies:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Corrected Data Split: We ensured that there were no duplicate images between the training and validation sets. This setup is crucial to simulate a realistic scenario where the model encounters completely unseen data during validation.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Separate Validation and Test Sets: We introduced a distinct test set to evaluate the final model performance, independent of the data used for early stopping.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Results Comparison&lt;/strong>&lt;/p>
&lt;table>
&lt;tr>
&lt;td>&lt;/td>
&lt;td>Original results&lt;/td>
&lt;td>Our results&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>
Accuracy
&lt;/td>
&lt;td>
78.9%
&lt;/td>
&lt;td>
78.6%
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>
Number of epochs
&lt;/td>
&lt;td>
Approx. 42 epochs
&lt;/td>
&lt;td>
40 epochs
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>
Training size
&lt;/td>
&lt;td>
Unknown
&lt;/td>
&lt;td>
7000 samples
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>
Validation size
&lt;/td>
&lt;td>
478 samples
&lt;/td>
&lt;td>
478 samples
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>
Confusion martix
&lt;/td>
&lt;td>
&lt;img src="https://raw.githubusercontent.com/kyrillosishak/re-SkinCancer/main/assets/paper's_results.jpeg" />
&lt;/td>
&lt;td>
&lt;img src="https://raw.githubusercontent.com/kyrillosishak/re-SkinCancer/main/assets/Our_results.jpeg" />
&lt;/td>
&lt;/tr>
&lt;/table>
&lt;h1 id="analysis-of-the-results">Analysis of the Results&lt;/h1>
&lt;p>While our reproduced accuracy of 78.6% is close to the original reported accuracy, it is based on a properly separated training and validation set, avoiding the data leakage pitfalls of the original paper. The slight drop in accuracy further highlights the overestimation of the original model&amp;rsquo;s performance due to data leakage.&lt;/p>
&lt;p>Moreover, using a separate test set for final evaluation provides a more reliable measure of the model&amp;rsquo;s ability to generalize to new, unseen data. The confusion matrices show that our model&amp;rsquo;s performance is consistent across different lesion classes, confirming the robustness of the evaluation.&lt;/p>
&lt;h1 id="conclusion">Conclusion&lt;/h1>
&lt;p>Data leakage is a common and often overlooked problem in applied machine learning, leading to misleading performance metrics and irreproducible results. By carefully examining and correcting these issues in our reproduction, we hope to provide a clearer understanding of the importance of proper data handling and validation practices.&lt;/p>
&lt;p>It is crucial for researchers and practitioners to be vigilant about data leakage and ensure that their models are trained, validated, and tested under realistic conditions. This not only ensures the credibility of their results but also enhances the real-world applicability of their models.&lt;/p>
&lt;p>Thank you for reading, and stay tuned for more insights on machine learning reproducibility!&lt;/p></description></item><item><title>Final blog: Automatic reproducibility of COMPSs experiments through the integration of RO-Crate in Chameleon</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/intel/20240822-architd/</link><pubDate>Thu, 22 Aug 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/intel/20240822-architd/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Hello everyone,&lt;/p>
&lt;p>I&amp;rsquo;m Archit from India, an undergraduate student at the Indian Institute of Technology, Banaras Hindu University (IIT BHU), Varanasi. As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/bsc/ro-crate-compss/">Automatic Reproducibility of COMPSs Experiments through the Integration of RO-Crate in Chameleon&lt;/a> project, my &lt;a href="https://drive.google.com/file/d/1qY-uipQZPox144LD4bs05rn3islfcjky/view" target="_blank" rel="noopener">proposal&lt;/a>, under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/raul-sirvent/">Raül Sirvent&lt;/a>, aims to develop a service that facilitates the automated replication of COMPSs experiments within the Chameleon infrastructure.&lt;/p>
&lt;h2 id="about-the-project">About the Project&lt;/h2>
&lt;p>The project proposes to create a service that can take a COMPSs crate (an artifact adhering to the RO-Crate specification) and, through analysis of the provided metadata, construct a Chameleon-compatible image for replicating the experiment on the testbed.&lt;/p>
&lt;h2 id="final-product">Final Product&lt;/h2>
&lt;p align="center">
&lt;img src="./logo.png" alt="Logo" style="width: 60%; height: auto;">
&lt;/p>
&lt;p>The basic workflow of the COMPSs Reproducibility Service can be explained as follows:&lt;/p>
&lt;ol>
&lt;li>The service takes the workflow path or link as the first argument from the user.&lt;/li>
&lt;li>The program shifts the execution to a separate sub-directory, &lt;code>reproducibility_service_{timestamp}&lt;/code>, to store the results from the reproducibility process.&lt;/li>
&lt;li>Two main flags are required:
&lt;ul>
&lt;li>&lt;strong>Provenance flag&lt;/strong>: If you want to generate the provenance of the workflow via the runcompss runtime.&lt;/li>
&lt;li>&lt;strong>New Dataset flag&lt;/strong>: If you want to reproduce the experiment with a new dataset instead of the one originally used.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>If there are any remote datasets, they are fetched into the sub-directory.&lt;/li>
&lt;li>The main work begins with parsing the metadata from &lt;code>ro-crate-metadata.json&lt;/code> and verifying the files present inside the dataset, as well as any files downloaded as remote datasets. This step generates a status table for the user to check if any files are missing or have modified sizes.&lt;/li>
&lt;/ol>
&lt;p align="center">
&lt;img src="./status_table.png" alt="Status Table" style="width: 70%; height: auto;">
&lt;/p>
&lt;ol start="6">
&lt;li>The final step is to transform the &lt;code>compss-command-line.txt&lt;/code> and all the paths specified inside it to match the local environment where the experiment will be reproduced. This includes:
&lt;ul>
&lt;li>Mapping the paths from the old machine to new paths inside the RO-Crate.&lt;/li>
&lt;li>Changing the runtime to &lt;code>runcompss&lt;/code> or &lt;code>enqueue_compss&lt;/code>, depending on whether the environment is a SLURM cluster.&lt;/li>
&lt;li>Detecting if the paths specified in the command line are for results, and redirecting them to new results inside the &lt;code>reproducibility_service_{timestamp}\Results&lt;/code> directory.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>After this, the service prompts the user to add any additional flags to the final command. Upon final verification, the command is executed via Python&amp;rsquo;s subprocess pipe.&lt;/li>
&lt;/ol>
&lt;p align="center">
&lt;img src="./end.png" alt="End Image" style="width: 50%; height: auto;">
&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Logging System&lt;/strong>: All logs related to the Reproducibility Service are stored inside the &lt;code>reproducibility_service_{timestamp}\log&lt;/code>.&lt;/li>
&lt;/ul>
&lt;p>You can view the basic &lt;a href="https://github.com/Minimega12121/COMPSs-Reproducibility-Service/blob/main/pseudocode.txt" target="_blank" rel="noopener">pseudocode&lt;/a> of the service.&lt;/p>
&lt;h2 id="conclusion-and-future-work">Conclusion and Future Work&lt;/h2>
&lt;p>It&amp;rsquo;s been a long journey since I started this project, and now it&amp;rsquo;s finally coming to an end. I have learned a lot from this experience, from weekly meetings with my mentor to working towards long-term goals—it has all been thrilling. I would like to thank the OSRE community and my mentor for providing me with this learning opportunity.&lt;/p>
&lt;p>This is only version 1.0.0 of the Reproducibility Service. If I have time from my coursework, I would like to fix any bugs or improve the service further to meet user needs.&lt;/p>
&lt;p>However, the following issues still exist with the service and can be improved upon:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Third-party software dependencies&lt;/strong>: Automatic detection and loading of these dependencies on a SLURM cluster are not yet implemented. Currently, these must be handled manually by the user.&lt;/li>
&lt;li>&lt;strong>Support for workflows with &lt;code>data_persistence = False&lt;/code>&lt;/strong>: There is no support for workflows where all datasets are remote files.&lt;/li>
&lt;/ul>
&lt;h2 id="deliverables">Deliverables&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://github.com/Minimega12121/COMPSs-Reproducibility-Service" target="_blank" rel="noopener">Reproducibility Service Repository&lt;/a>: This repository contains the main service along with guidelines on how to use it. The service will be integrated with the COMPSs official distribution in its next release.&lt;/li>
&lt;li>&lt;a href="https://www.chameleoncloud.org/appliances/121/" target="_blank" rel="noopener">Chameleon Appliance&lt;/a> : This is a single-node appliance with COMPSs 3.3.1 installed, so that anyone with access to Chameleon can reproduce experiments.&lt;/li>
&lt;/ul>
&lt;!-- - [Experiments Analysis](https://docs.google.com/spreadsheets/d/1W4CKqiYVPquSwXFRITbb1Hga1xcyv2_3DJIcq7JalZk/edit?gid=0#gid=0) : This report contains details of experiments I have reproduced using the Reproducibility Service on a SLURM cluster, a local machine, and a Chameleon appliance, along with observations. -->
&lt;h2 id="previous-blogs">Previous Blogs&lt;/h2>
&lt;p>Make sure to check out my other blogs to see how I started this project and the challenges I faced along the way:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/intel/20240612-architd/">First blog&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/intel/20240729-architd/">Mid-term blog&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Thank you for reading the blog, have a nice day!!&lt;/p></description></item><item><title>Final Blogpost: HDEval's LLM Benchmarking for HDL Design</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240821-ashwinbardhwaj/</link><pubDate>Wed, 21 Aug 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240821-ashwinbardhwaj/</guid><description>&lt;h1 id="introduction">Introduction&lt;/h1>
&lt;p>Hello everyone! I&amp;rsquo;m Ashwin Bardhwaj, an undergraduate student studying at UC Berkeley. As part of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/livehd">Micro Architecture Santa Cruz (MASC)&lt;/a> my &lt;a href="https://drive.google.com/file/d/1Fnr85lqrTs7OBohfHfSZI2K3wZU3zJm0/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jose-renau/">Jose Renau&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sakshi-garg/">Sakshi Garg&lt;/a> looks to create a suite of benchmark programs for &lt;a href="https://github.com/masc-ucsc/hdeval" target="_blank" rel="noopener">HDEval&lt;/a>.&lt;/p>
&lt;p>The goal of this project is to create large-scale Verilog programs in order to benchmark that capability of LLMs to develop HDL code. Throughout this project, I have created 3 of the large Verilog testbenches called 3-Stage-RISC_V processor, Gameboy Emulator, and Sorts. The benchmark programs will lose their effectriveness if LLMs such as ChatGPT scrape over Github reposotires and learn from them. As a result, the code itself cannot be made public due to LLM scraping over repositories, this file will cover the test report for all 3 of these projects.&lt;/p>
&lt;h1 id="3-stage-risc-v-processor">3 Stage RISC V Processor&lt;/h1>
&lt;p>This is a pipelined RISC processor developed to to handle RV32I instructions. A 3-Stage processsor will typically contain a Fetch, Decode, and Execute cycle. As a result, every instruction will take exactly 3 clock cycles. For this processor, instructions can be formatted into R, I (Load), S (Store), B (Cond), and J (Jump and Link) type instructions. Once a 32 bit instruction is fetched at the location in memory specifed by the pc (Program Counter) register, it is sent to be decoded by the &amp;ldquo;decode unit&amp;rdquo;. Through decoding an instruction, we can determine the exact operation code, register location of the 2 operands (rs1 and rs2), and the destination register (rd) at which to write the calculated result. After decoding, an activation flag is sent to the excetution cycle to then take and access the register file at address rs1 and rs2 in order to get the correct operand data. The data and operation is then sent to the ALU to compute the result based on the opcode. The result is then written back into the register file at the rd address and the program counter is incremented and the next instruction is fetched.&lt;/p>
&lt;p>The prompts for each module in this processor have been generated and tested against a GPT 3 turbo and GPT 4o models as an example. In the RISC V tab in my test report, I have provided the exact prompts and results after running on MASC&amp;rsquo;s &lt;a href="https://github.com/masc-ucsc/hdlagent" target="_blank" rel="noopener">HDLAgent&lt;/a> tool which can access the APIs of many LLMs.&lt;/p>
&lt;h1 id="gameboy-emulator">Gameboy Emulator&lt;/h1>
&lt;p>The Gameboy Emulator is a Verilog implementation of the classic GameBoy console that was widely popular in the 1990s. The main aspects of the GameBoy that were focused on in this project were the Z-80 like CPU, memory objects like RAM, VRAM, and ROM, the PPU (Picture Processing Unit), and other peripherals. The instructions are given to the CISC (variable-length instructions) CPU where they are decoded and executed based on the details and expectations of that specific instruction. In some cases, timing becomes a concern and there is significant effort made to ensure that instructions can be parsed and run predictably and effictively. Instructions from the ROM may take between 1 to 4 clock cycles to run depending on the requirements. For example, the instruction &amp;ldquo;LD B, HL&amp;rdquo; , loads the data found at the 16 bit address given by registers H and L into register B is a 2 cycle instruction. The first cycle decodes the HL address and fetches the data at the accurate location, while the second cycle takes the new input data at writes it into register B. This requires accurate timing control between different asects of the GameBoy.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Gameboy Emulator Top Level Wave File" srcset="
/report/osre24/ucsc/livehd/20240821-ashwinbardhwaj/Gameboy_Wave_File_hueac052dcb2b1c9a531ecc9cf3de73e1f_112493_1c31333f2eab882478c68b3e4fe07ef4.webp 400w,
/report/osre24/ucsc/livehd/20240821-ashwinbardhwaj/Gameboy_Wave_File_hueac052dcb2b1c9a531ecc9cf3de73e1f_112493_afc571aac140f2cd4e9e117826b4bf3a.webp 760w,
/report/osre24/ucsc/livehd/20240821-ashwinbardhwaj/Gameboy_Wave_File_hueac052dcb2b1c9a531ecc9cf3de73e1f_112493_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240821-ashwinbardhwaj/Gameboy_Wave_File_hueac052dcb2b1c9a531ecc9cf3de73e1f_112493_1c31333f2eab882478c68b3e4fe07ef4.webp"
width="760"
height="402"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>The Picture Processing Unit is also an integral feature of the gameboy. Three frames called Background, Window, and Sprite are combined into the classic Gameboy screens we know today. White the Background and Window data are consistently called from the VRAM after certain clock cycle times, the Sprite and sprtite attributes are accessed using DMA (Direct Memory Access) from OAM (Object Attribute Memory). This reduces the CPU load and improves the speed of sprite data.&lt;/p>
&lt;h1 id="deliverables">Deliverables&lt;/h1>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>HDEval Test Report&lt;/strong>: The &lt;a href="https://docs.google.com/spreadsheets/d/1vDh_k75h0sG8JGRDDZcdBM4AprVcw9l1/edit?usp=sharing&amp;amp;ouid=102173779464961795129&amp;amp;rtpof=true&amp;amp;sd=true" target="_blank" rel="noopener">HDEval Test Report&lt;/a> contains the module prompts for each testbench, the results after testing on GPT 3 turbo and 4o, and test cases to ensure code correctness and reliability.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>HDEval Repo&lt;/strong>: &lt;a href="https://github.com/masc-ucsc/hdeval" target="_blank" rel="noopener">HDEval&lt;/a> contains the encrypted version of the yaml files that encapsulate the code, prompts, and additional data.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h1 id="next-steps">Next Steps&lt;/h1>
&lt;p>Given these benchmarks, it is important to track the abilities of these LLMs to generate HDL code. Therefore, including GPT 3-turbo and 4o. I would like these benchmarks to be applied to more models so that we can track their growth and keep informed on their effectiveness in HDL and hardware.&lt;/p>
&lt;h1 id="previous-blogs">Previous Blogs&lt;/h1>
&lt;p>Please feel free to check out my previous blogs!&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240611-ashwinbardhwaj/">First Blog&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240718-ashwinbardhwaj/">Midterm Blog&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Thank you for reading!&lt;/p></description></item><item><title>Deriving Realistic Performance Benchmarks for Python Interpreters</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uutah/static-python-perf/20240817-mrigankpawagi/</link><pubDate>Sat, 17 Aug 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uutah/static-python-perf/20240817-mrigankpawagi/</guid><description>&lt;p>Hi, I am Mrigank. I am one of the &lt;em>Summer of Reproducibility&lt;/em> fellows for 2024, and I will be working on &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uutah/static-python-perf/">deriving realistic performance benchmarks for Python interpreters&lt;/a> with &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ben-greenman/">Ben Greenman&lt;/a> from the University of Utah.&lt;/p>
&lt;h2 id="background-and-motivation">Background and Motivation&lt;/h2>
&lt;p>Recent work by Meta on a statically typed variant of Python – Static Python – which has provided immense promise in moving towards gradually typed languages without compromising on performance due to lack of complete soundness. Lu et al.&lt;sup id="fnref:1">&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref">1&lt;/a>&lt;/sup> provide an evaluation of Static Python and conclude that the enhancement in performance reported by Meta on their web servers for Instagram is reasonable and is not just the result of refactoring. In fact, the study notes that very little refactoring is typically required for converting existing Python programs to Static Python. However, this study depends on a limited model of the language and does not represent real-world software applications.&lt;/p>
&lt;p>In our project, we aim to create a realistic performance benchmark to reproduce performance improvements reported by Meta and to evaluate the performance of Static Python in real-world software applications. In addition, we will analyze partially-typed code to understand the performance implications of gradual typing in Python.&lt;/p>
&lt;h2 id="key-objectives">Key Objectives&lt;/h2>
&lt;p>We will use widely-used open-sourced applications to derive realistic performance benchmarks for evaluating Static Python. In particular, we will focus on projects that utilize the Python framework &lt;a href="https://www.djangoproject.com/" target="_blank" rel="noopener">Django&lt;/a>, which is also known to power the backend of Instagram. We plan to begin with &lt;a href="https://github.com/wagtail/wagtail" target="_blank" rel="noopener">Wagtail&lt;/a>, a popular CMS built on Django. We have also identified other potential projects like &lt;a href="https://github.com/zulip/zulip" target="_blank" rel="noopener">Zulip&lt;/a>, &lt;a href="https://github.com/makeplane/plane" target="_blank" rel="noopener">Plane&lt;/a> and &lt;a href="https://github.com/LibrePhotos/librephotos" target="_blank" rel="noopener">LibrePhotos&lt;/a>. These are all actively maintained projects with significantly large codebases.&lt;/p>
&lt;p>Further, we will analyze the performance of partially-typed code. This will be of value to the Python community as it will provide confidence in gradually moving towards Static Python for improving performance. We will make our benchmarks publicly available for the community to use, reproduce, and extend.&lt;/p>
&lt;h2 id="methodology">Methodology&lt;/h2>
&lt;h3 id="load-testing">Load Testing&lt;/h3>
&lt;p>For each project that we derive benchmarks from, we will design user pipelines that simulate real-world usage and implement them to create load tests using the open-sourced &lt;a href="https://github.com/locustio/locust" target="_blank" rel="noopener">Locust&lt;/a> framework. This will allow us to evaluate the performance of Static Python in real-world loads and scenarios. Locust can spawn thousands of users, each of which independently bombards the system with HTTP requests for a range of tasks that are defined in their user pipeline. We will host each project on a server (local or cloud) to run these load tests.&lt;/p>
&lt;p>We will profile each project to ensure that our tests cover different parts of the codebase and to identify performance bottlenecks. We can then focus on these bottlenecks while gradually typing the codebase.&lt;/p>
&lt;h3 id="gradual-typing">Gradual Typing&lt;/h3>
&lt;p>For typing the code in these projects, we will create two versions of each project: one with the so-called &amp;ldquo;shallow&amp;rdquo; type annotations and another with &amp;ldquo;advanced&amp;rdquo; type annotations. The former is relatively easier to implement and we can use tools like &lt;a href="https://github.com/Instagram/MonkeyType" target="_blank" rel="noopener">MonkeyType&lt;/a> to generate stubs that can be quickly verified manually. The latter is quite non-trivial and will require manual effort. We will then mix-and-match the three versions of each project to create different combinations of typed and untyped code. Note that this mix-and-match can be done at both the module level and also at the function or class level.&lt;/p>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>This is my first time working on performance-benchmarking and I am excited to pick up new skills in the process. I am also looking forward to interacting with people from the Python community, people from Meta&amp;rsquo;s Static Python team, and also with the maintainers of the projects we will be working on. I will be posting more updates on this project as we make progress. Stay tuned!&lt;/p>
&lt;div class="footnotes" role="doc-endnotes">
&lt;hr>
&lt;ol>
&lt;li id="fn:1">
&lt;p>Kuang-Chen Lu, Ben Greenman, Carl Meyer, Dino Viehland, Aniket Panse, and Shriram Krishnamurthi. Gradual soundness: Lessons from static python. &lt;em>The Art, Science, and Engineering of Programming&lt;/em>.&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/div></description></item><item><title>Midterm Report: Deriving Realistic Performance Benchmarks for Python Interpreters</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uutah/static-python-perf/20240909-mrigankpawagi/</link><pubDate>Sat, 17 Aug 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uutah/static-python-perf/20240909-mrigankpawagi/</guid><description>&lt;p>Hi, I am Mrigank. As a &lt;em>Summer of Reproducibility 2024&lt;/em> fellow, I am working on &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uutah/static-python-perf/20240817-mrigankpawagi/">deriving realistic performance benchmarks for Python interpreters&lt;/a> with &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ben-greenman/">Ben Greenman&lt;/a> from the University of Utah. In this post, I will provide an update on the progress we have made so far.&lt;/p>
&lt;h2 id="creating-a-performance-benchmark">Creating a Performance Benchmark&lt;/h2>
&lt;p>We are currently focusing on applications built on top of Django, a widely used Python web framework. For our first benchmark, we chose &lt;a href="https://github.com/wagtail/wagtail" target="_blank" rel="noopener">Wagtail&lt;/a>, a popular content management system. We created a pipeline with locust to simulate real-world load on the application. All of our work is open-sourced and available on our &lt;a href="https://github.com/utahplt/static-python-perf/blob/main/Benchmark/wagtail/locustfile.py" target="_blank" rel="noopener">GitHub repository&lt;/a>.&lt;/p>
&lt;p>This load-testing pipeline creates hundreds of users who independently create many blog posts on a Wagtail blog site. At the same time, thousands of users are spawned to view these blog posts. Wagtail does not have a built-in API and so it took some initial effort to figure out the endpoints to hit, which I did by inspecting the network logs in the browser while interacting with the Wagtail admin interface.&lt;/p>
&lt;p>A snapshot from a run of the load test with Locust is shown in the featured image above. This snapshot was generated by spawning users from 24 different parallel locust processes. This was done on a local server, and we plan to perform the same experiments on CloudLab soon.&lt;/p>
&lt;h2 id="profiling">Profiling&lt;/h2>
&lt;p>On running the load tests with a profiler, we found that the bottlenecks in the performance arose not from the Wagtail codebase but from the Django codebase. In particular, we identified three modules in Django that consumed the most time during the load tests: &lt;code>django.db.backends.sqlite3._functions&lt;/code>, &lt;code>django.utils.functional&lt;/code>, and &lt;code>django.views.debug&lt;/code>. &lt;a href="https://github.com/dibrinsofor" target="_blank" rel="noopener">Dibri&lt;/a>, a graduate student in Ben&amp;rsquo;s lab, is helping us add types to these modules.&lt;/p>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;p>Based on these findings, we are now working on typing these modules to see if we can improve the performance of the application by using Static Python. Typing Django is a non-trivial task, and while there have been some efforts to do so, previous attempts like &lt;a href="https://github.com/typeddjango/django-stubs" target="_blank" rel="noopener">django-stubs&lt;/a> are incomplete for our purpose.&lt;/p>
&lt;p>We are also writing scripts to mix untyped, shallow-typed, and advanced-typed versions of a Python file, and run each mixed version several times to obtain a narrow confidence interval for the performance of each version.&lt;/p>
&lt;p>We will be posting more updates as we make progress. Thank you for reading!&lt;/p></description></item><item><title>Final Blog: FEP-Bench: Benchmarking for Enhanced Feature Engineering and Preprocessing in Machine Learning</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/fep_bench/20240816-jaycezhu/</link><pubDate>Fri, 16 Aug 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/fep_bench/20240816-jaycezhu/</guid><description>&lt;h2 id="background">Background&lt;/h2>
&lt;p>Hello, I’m &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/lihaowen-jayce-zhu/">Lihaowen (Jayce) Zhu&lt;/a>, a 2024 SoR contributor for the FEP-bench project, under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/yuyang-roy-huang/">Yuyang (Roy) Huang&lt;/a>. Before we started, let&amp;rsquo;s recap the goal of our project and our progress until mid term. The FEP-Bench project proposes to address the significant bottlenecks encountered during this phase, particularly focusing on the challenges posed by data retrieval from data lakes and computational inefficiencies in data operations. In order to solve these challenges, we have collected the basic information of various common datasets for different machine learning tasks, and corresponding preprocessing pipelines.&lt;/p>
&lt;h2 id="methodology">Methodology&lt;/h2>
&lt;p>Since our goal is to improve the efficiency of the machine learning preprocessing pipeline and keep the training process of the Deep Learning model busy, it means that we need to enhance the preprocessing throughput which is the feed rate from the preprocessing stage to the training stage. According to some previous works, we have a new way to look at the Deep Learning Preprocessing Pipelines. The preprocessing pipeline can be split into 2 parts. The first part contains steps that are run once (S1-Sm). We can call it the “offline” part. The second part includes all the rest steps, which are run at every iteration of training. We call it the ”online” part. After the offline preprocessing steps, the output data is written back to disk. Then the online preprocessing steps need to load that data from storage first and do the following operations. We can split the pipeline at any step, and each split is a preprocessing strategy. By using this method, some specific strategies can achieve a much higher final preprocessing throughput. Our project adopts this method to profile the performance of different strategies. And our goal is to maximize the final preprocessing throughput into training, for a specific pipeline. We want to make this an automatic process, rather than ask for extra user instructions or parameters.&lt;/p>
&lt;h2 id="experiment">Experiment&lt;/h2>
&lt;p>Next, we did the data preprocessing strategy experiment on the LibriSpeech dataset, which is an audio dataset for ML tasks like Auto Speech Recognition. The dataset size is 6.3 GB with almost 30000 samples. Each audio file is in a binary format FLAC. As a result, the first step of the preprocessing pipeline we use is decoding, which converts the binary data into arrays of floats. Then we applied some typical audio preprocessing steps of transformation (normalization, padding, extract loudest section) and augmentation (random cut, random shift audio, random mask, random add noise) to audio data. Finally, the audio data is converted to Log-Mel Spectrogram signal, which is commonly used in audio tasks like Speech Recognition and Speaker identification.&lt;/p>
&lt;p>We have benchmarked the throughput performance and storage overhead of all possible strategy split points, and have seen some trade-offs between them. Both storage overhead and throughput speed-up use the fully online method as the baseline. What we&amp;rsquo;ve observed from our results is that the speed-up keeps increasing when we put operations into the offline part, and the storage consumption is very low for the strategies after audio decoding. Also, we analysed the performance of individual methods of transformation and augmentation steps. We find that the speed-up performance is quite stable between 1.0 and 1.2 across these methods, but some methods can have a high storage overhead, like normalization and random noise.&lt;/p>
&lt;p>Another thing we observed during our experiments is that different dataset sizes can influence the preprocessing pipeline throughput. We found that the throughput speed-up of 10000 samples is almost double the speed-up of 5000 samples. It seems like a larger dataset size may lead to a higher speed-up. So, we were thinking that does every operation follows this pattern or only certain operations can have increasing throughput with increasing dataset size, and then did experiments about the throughput speed-ups on different dataset sizes of all operations in the audio preprocessing pipeline. The results showed that only the audio decoding step can have a great increase in speed-up for larger dataset sizes. But for transformation, augmentation and LMS, the throughputs always stay at a steady level. This indicates that the only audio decoding step can become faster and faster when the dataset size grows.&lt;/p>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>In our work, we have built up a collection of common datasets and their preprocessing pipelines for different machine-learning tasks. For the audio dataset LibriSpeech, we have done experiments about the trade-offs between throughput speed-ups and storage overhead, and dataset sizes. We have found that speed-ups keep increasing when more and more operations are divided into the offline part. Only the audio decoding step can become faster and faster when the dataset size grows.&lt;/p>
&lt;h2 id="future-works">Future works&lt;/h2>
&lt;p>In the near future, we still want to find the optimal preprocessing strategy by profiling only a small part of the original enormous dataset. The second thing is that besides the audio dataset, we must expand the range of our experiments on other datasets and ML tasks. Finally, we need to implement our goal of building an automatic system that decides the optimal strategy of a preprocessing pipeline.&lt;/p></description></item><item><title>Final Blog: FSA - Benchmarking Fail-Slow Algorithms</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/fsa-benchmarking/20240814-xikangsong/</link><pubDate>Wed, 14 Aug 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/fsa-benchmarking/20240814-xikangsong/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Hello! I hope you&amp;rsquo;re enjoying the summer as much as I am. I&amp;rsquo;m excited to join the SOR community as a 2024 contributor. My name is &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/xikang-song/">Xikang Song&lt;/a>, and I&amp;rsquo;m thrilled to collaborate with mentors &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ruidan-li/">Ruidan Li&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/kexin-pei/">Kexin Pei&lt;/a> on the FSA-Benchmark project. This project is dedicated to exploring and benchmarking various machine learning models to identify disks at high risk of fail-slow anomalies. Throughout this journey, we tested a broad range of algorithms, from traditional approaches to state-of-the-art techniques, using a robust evaluation system to compare their effectiveness.&lt;/p>
&lt;p>In the first half of the project, I focused on implementing and testing different machine learning models for detecting disks at high risk of fail-slow anomalies. This involved setting up initial models such as the Cost-Sensitive Ranking Model and Multi-Prediction Models, and beginning to explore LSTM networks for analyzing input disk data.&lt;/p>
&lt;p>In the second half, I built upon this foundation by refining the evaluation processes, exploring advanced models like PatchTST, and investigating the potential of large language models (LLMs) for detecting subtle fail-slow conditions in storage systems. This blog post will summarize the key achievements, findings, and comparisons with baseline models from this phase.&lt;/p>
&lt;h2 id="key-achievements">Key Achievements&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Comprehensive Benchmarking and Evaluation:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>I extended the benchmarking framework to evaluate multiple algorithms across 25 different data clusters on PERSEUS. This process involved generating and analyzing heatmaps that visualized the precision and recall of each model under various settings, providing a clear understanding of each approach&amp;rsquo;s strengths and limitations.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Exploration of Advanced Machine Learning Models:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>LSTM Model:&lt;/strong> I implemented the Long Short-Term Memory (LSTM) model, specifically designed for sequential data, to capture temporal dependencies in disk performance metrics. This model was used to predict potential fail-slow anomalies by analyzing historical data. Using Mean Squared Error (MSE) as a risk indicator, the LSTM model outperformed baseline approaches like the Cost-Sensitive Ranking Model and Multi-Prediction Models, especially in clusters where latency patterns between faulty and normal disks were distinct, such as in Cluster_P. This resulted in a higher precision and fewer false positives. However, in clusters with more complex and overlapping data distributions, like Cluster_L, the LSTM model&amp;rsquo;s performance diminished, similar to that of the baseline models&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>PatchTST Model:&lt;/strong> I also introduced and evaluated the PatchTST model, which is built on a transformer-based architecture known for its ability to handle sequential data by capturing long-range dependencies and intricate temporal patterns. Unlike traditional models, PatchTST processes time series data in segments or &amp;ldquo;patches,&amp;rdquo; enhancing its ability to predict disk behavior over extended periods. Like the LSTM model, PatchTST uses outlier MSE values to assess disk risk. In clusters with a clear separation between faulty and normal disks, PatchTST outperformed baseline models by effectively identifying faulty patterns. However, similar to the LSTM model, PatchTST encountered difficulties in clusters with significant data overlap, such as Cluster_L.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Investigation into Large Language Models (LLMs):&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>I explored the use of GPT-4-o-mini for fail-slow detection. While large language models (LLMs) showed potential, particularly in reducing false positives and improving precision over baseline models, they did not consistently outperform specialized models like LSTM and PatchTST in this context. LLMs struggled with recall, especially as thresholds increased, revealing the challenges of adapting LLMs to time series data. This limitation arises because LLMs are primarily trained for natural language generation tasks, not for analyzing time series data. As a result, their ability to fully capture anomalies is limited. To improve their effectiveness, we need to develop methods that help LLMs better understand time series data. For example, incorporating statistical information about each disk’s performance could enhance LLMs&amp;rsquo; understanding, leading to better precision in fail-slow detection.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;h2 id="conclusion-and-future-work">Conclusion and Future Work&lt;/h2>
&lt;p>The work in this project demonstrated that while advanced machine learning models like LSTM and PatchTST offer significant potential for detecting fail-slow conditions, challenges remain in ensuring consistent performance across diverse clusters. Compared to baseline models, these advanced approaches generally provided better precision and recall, especially in clusters with distinct data patterns between faulty and normal disk performance time series. However, the persistent difficulties in more complex clusters indicate the need for further refinement.&lt;/p>
&lt;p>Moving forward, future work will focus on refining these models, particularly in improving their performance in challenging clusters like Cluster_L. Additionally, I plan to further explore techniques such as prompt engineering for LLMs to better tailor them for time series analysis and fail-slow detection tasks.&lt;/p>
&lt;h2 id="deliverables">Deliverables&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Repository:&lt;/strong> All comprehensive analysis code and source code can be found in the &lt;a href="https://github.com/songxikang/FSA_BENCHMARK" target="_blank" rel="noopener">FSA_BENCHMARK GitHub Repository&lt;/a>.&lt;/li>
&lt;li>&lt;strong>Jupyter Notebook:&lt;/strong> A notebook to reproduce the experiments and benchmarks on Chameleon: &lt;a href="https://chameleoncloud.org/experiment/share/585c1fc0-924c-4501-b143-ad6476339aa8" target="_blank" rel="noopener">Chameleon Experiment Notebook&lt;/a>.&lt;/li>
&lt;li>&lt;strong>Final Report:&lt;/strong> Comprehensive algorithm performance evaluation for all methods in &lt;a href="https://docs.google.com/document/d/1NONl23sXK-qE4Krx3JwG7gCrNiNmaaW1t4WVzMmomLQ/edit?usp=sharing" target="_blank" rel="noopener">FSA-Benchmarking Final Report&lt;/a>.&lt;/li>
&lt;/ul></description></item><item><title>Data Leakage in Applied ML</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/nyu/data-leakage/20240813-shaivimalik/</link><pubDate>Tue, 13 Aug 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/nyu/data-leakage/20240813-shaivimalik/</guid><description>&lt;p>Hello everyone!&lt;/p>
&lt;p>I have been working on reproducing the results from &lt;strong>Characterization of Term and Preterm Deliveries using Electrohysterograms Signatures&lt;/strong>. This paper aims to predict preterm birth using Support Vector Machine with RBF kernel. However, there is a major flaw in the methodology: &lt;strong>preprocessing on training and test set&lt;/strong>. This happens when preprocessing is performed on the entire dataset before splitting it into training and test sets.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Sample produced from test and training set samples" srcset="
/report/osre24/nyu/data-leakage/20240813-shaivimalik/leakage_hu7171e85c8455cc3219721a2e3b71a711_62548_687703a1dee465e80fb3dbe262dd5860.webp 400w,
/report/osre24/nyu/data-leakage/20240813-shaivimalik/leakage_hu7171e85c8455cc3219721a2e3b71a711_62548_42051adaf7804083284553c10ca73861.webp 760w,
/report/osre24/nyu/data-leakage/20240813-shaivimalik/leakage_hu7171e85c8455cc3219721a2e3b71a711_62548_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/nyu/data-leakage/20240813-shaivimalik/leakage_hu7171e85c8455cc3219721a2e3b71a711_62548_687703a1dee465e80fb3dbe262dd5860.webp"
width="760"
height="589"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Sample produced from training set samples" srcset="
/report/osre24/nyu/data-leakage/20240813-shaivimalik/no_leakage_hu60ff986c558a17237e53708798334267_66856_47e6397030251c1681ff92260f687641.webp 400w,
/report/osre24/nyu/data-leakage/20240813-shaivimalik/no_leakage_hu60ff986c558a17237e53708798334267_66856_8bad9197813df4344757765d43878a56.webp 760w,
/report/osre24/nyu/data-leakage/20240813-shaivimalik/no_leakage_hu60ff986c558a17237e53708798334267_66856_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/nyu/data-leakage/20240813-shaivimalik/no_leakage_hu60ff986c558a17237e53708798334267_66856_47e6397030251c1681ff92260f687641.webp"
width="760"
height="594"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>Reproducing the published results came with its own challenges, including updating EHG-Oversampling to extract meaningful features from EHG signals and finding optimal hyperparameters for the model. Through our work on reproducing the published results and creating toy example notebooks, we have been able to demonstrate that data leakage leads to overly optimistic measures of model performance and models trained with data leakage fail to generalize to real-world data. In such cases, performance on test set doesn&amp;rsquo;t translate to performance in the real-world.&lt;/p>
&lt;p>Next, I&amp;rsquo;ll be reproducing the results published in &lt;strong>Identification of COVID-19 Samples from Chest X-Ray Images Using Deep Learning: A Comparison of Transfer Learning Approaches&lt;/strong>.&lt;/p>
&lt;p>You can follow my work on the EHG paper &lt;a href="https://github.com/shaivimalik/medicine_preprocessing-on-entire-dataset" target="_blank" rel="noopener">here&lt;/a>.&lt;/p>
&lt;p>Stay tuned for more insights on data leakage and updates on our progress!&lt;/p></description></item><item><title>Midterm Check-In: Progress on the AutoAppendix Project</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/tuwien/autoappendix/20240803-kkrassni/</link><pubDate>Sat, 03 Aug 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/tuwien/autoappendix/20240803-kkrassni/</guid><description>&lt;p>Hi all,&lt;/p>
&lt;p>I&amp;rsquo;m happy to share a quick update on the AutoAppendix project as we’re about
halfway through. We’ve made some steady progress on evaluating artifacts from SC24 papers, and we&amp;rsquo;re starting
to think about how we can use what we&amp;rsquo;ve learned to
improve the artifact evaluation process in the future.&lt;/p>
&lt;h2 id="what-weve-been-up-to">What We’ve Been Up To&lt;/h2>
&lt;p>As a quick reminder, the goal of our project is to develop a set of guidelines that
researchers can use to improve the reproducibility of their work. We&amp;rsquo;re focusing
on papers from the Supercomputing Conference 2024 that applied for an &amp;ldquo;Artifact Replicable&amp;rdquo; badge, and we&amp;rsquo;re
evaluating their artifacts to see how well the experiments can be replicated. As it was difficult to make assumptions about the exact outcomes of the project besides detailed experiment recreation, our main goal of this
midterm check-in is to share what insights we have gathered so far and to set the stage for the final outcomes.&lt;/p>
&lt;p>Our main task so far has been making a selection of submissions with experiments designed
for Chameleon Cloud, or those that could be easily adapted to run on Chameleon. As there were 45 submissions that applied
for an &amp;ldquo;Artifact Replicable&amp;rdquo; badge, it was not easy
to choose which ones to evaluate, but we managed to narrow
it down to 18 papers that we thought would be a good fit for our project.&lt;/p>
&lt;p>We&amp;rsquo;ve chosen to focus on papers that do not require
special hardware (like a specific supercomputer) or
complex network setups, as it would be difficult to
generalize the insights from these kinds of
experiments. Instead, we&amp;rsquo;ve been looking at those
that require only a &lt;em>single computation node&lt;/em>, and
could theoretically be run with the available hardware
on Chameleon.&lt;/p>
&lt;h2 id="observations-and-learning-points">Observations and Learning Points&lt;/h2>
&lt;p>At the moment, we&amp;rsquo;re about halfway through the
evaluation process. So far, we&amp;rsquo;ve noticed a range of
approaches to documenting and setting up computational
experiments. Even without looking at the appendices in
detail, it&amp;rsquo;s clear that there’s a lot of room for
standardization of the documentation format and software setup, which could make life easier for
everyone involved. This particularly applies to
software setups, which are often daunting to replicate,
especially when there are specific version requirements, version
incompatibilities or outright missing dependencies. Since the main goal of this
project is to develop a set of guidelines that
researchers can use to improve the reproducibility of
their work, suggesting a way to deal with software
versions and dependencies will be a key part of our
results.&lt;/p>
&lt;p>We’ve observed that submissions with well-structured and detailed appendices
tend to fare better in reproducibility checks. This includes those that utilized
containerization solutions like Docker, which encapsulate the computing
environment needed to run the experiments and thus
eliminates the need for installing specific software
packages. It’s these kinds of practices that we
think could be encouraged more broadly.&lt;/p>
&lt;h2 id="looking-ahead">Looking Ahead&lt;/h2>
&lt;p>The next steps are pretty exciting! We’re planning to use what we’ve learned to draft some
guidelines that could help future SC conference submissions be more consistent.
This might include templates or checklists that ensure all the necessary details
are covered.&lt;/p>
&lt;p>Additionally, we’re thinking about ways to automate some parts of the artifact
evaluation process. The goal here is to make it less labor-intensive and more
objective. A particularly nice way
of reproducible artifact evaluation is
Chameleon&amp;rsquo;s JupyterHub interface, which in conglomeration with the &lt;em>Trovi&lt;/em>
artifact sharing platform makes it easy to share artifacts and allow interested
parties to reproduce the experiments with minimal effort. We are thus looking into ways to
utilize and contribute to these tools in a way that could benefit the broader research community.&lt;/p>
&lt;h2 id="wrapping-up">Wrapping Up&lt;/h2>
&lt;p>That’s it for now! We are working towards getting
as many insights as possible from the rest of the
artifact evaluations, and hopefully, by the end of this project, we’ll have some solid
recommendations and tools to show for it. Thanks for keeping up with our
progress, and I’ll be back with more updates as we move into the final stages of
our work.&lt;/p></description></item><item><title>[MidTerm] ScaleRep: Reproducing and benchmarking scalability bugs hiding in cloud systems</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/osu/scalerep/20240801-imzahra/</link><pubDate>Thu, 01 Aug 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/osu/scalerep/20240801-imzahra/</guid><description>&lt;p>Hey there, scalability enthusiasts and fellow researchers! I’m excited to share my progress on the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/osu/scalerep/">ScaleRep project&lt;/a> for SoR 2024 under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/bogdan-bo-stoica/">Bogdan &amp;quot;Bo&amp;quot; Stoica&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/yang-wang/">Yang Wang&lt;/a>. Here’s a glimpse into how we’re tackling scalability bugs in large-scale distributed systems.&lt;/p>
&lt;h2 id="project-overview">Project Overview&lt;/h2>
&lt;p>Large-scale distributed systems are the backbone of modern computing, powering various applications and services. However, these systems often face challenges related to reliability and performance, particularly scalability bugs. These bugs manifest in large-scale deployments, causing issues such as system downtime, reduced responsiveness, and data loss. Traditional bug-finding methods fall short in detecting these bugs, which are triggered by factors like component count, system load, workload size, recovery protocol reliability, and intermediate failure magnitude.&lt;/p>
&lt;p>Our project, ScaleRep, aims to address these challenges by analyzing recent scalability issues from ten popular open-source large-scale systems. We are providing detailed accounts of bug reproduction experiences, identifying common challenges, and developing protocols for triggering and quantifying the impact of scalability bugs.&lt;/p>
&lt;h2 id="progress-highlights">Progress Highlights&lt;/h2>
&lt;p>So far, I have been working on the following bugs and have successfully uploaded some of them to Trovi. Here’s a brief overview of my progress:&lt;/p>
&lt;h3 id="bugs-worked-on">Bugs Worked On:&lt;/h3>
&lt;ol>
&lt;li>&lt;strong>&lt;a href="https://issues.apache.org/jira/browse/IGNITE-20614" target="_blank" rel="noopener">IGNITE-20614&lt;/a>&lt;/strong>: Uploaded to Trovi &lt;a href="https://www.chameleoncloud.org/experiment/share/9f045059-011e-4089-90d4-0f5845ef3c73" target="_blank" rel="noopener">Trovi Link&lt;/a>&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://issues.apache.org/jira/browse/IGNITE-17407" target="_blank" rel="noopener">IGNITE-17407&lt;/a>&lt;/strong>: Uploaded to Trovi &lt;a href="https://www.chameleoncloud.org/experiment/share/9cfd42b7-c7c9-4b6b-a538-b6c496eb1bed" target="_blank" rel="noopener">Trovi Link&lt;/a>&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://issues.apache.org/jira/browse/IGNITE-20692" target="_blank" rel="noopener">IGNITE-20692&lt;/a>&lt;/strong>&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://issues.apache.org/jira/browse/IGNITE-16600" target="_blank" rel="noopener">IGNITE-16600&lt;/a>&lt;/strong>&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://issues.apache.org/jira/browse/IGNITE-16072" target="_blank" rel="noopener">IGNITE-16072&lt;/a>&lt;/strong>&lt;/li>
&lt;/ol>
&lt;h2 id="what-is-chameleon-and-trovi">What is Chameleon and Trovi?&lt;/h2>
&lt;p>&lt;strong>&lt;a href="https://chameleoncloud.org/" target="_blank" rel="noopener">Chameleon&lt;/a>&lt;/strong> is a configurable experimental environment for large-scale cloud research. It provides a platform for running and testing distributed systems at scale, allowing researchers to reproduce and study scalability issues in a controlled setting.&lt;/p>
&lt;p>&lt;strong>&lt;a href="https://chameleoncloud.org/experiment/share/" target="_blank" rel="noopener">Trovi&lt;/a>&lt;/strong> is a platform that facilitates the sharing of reproducible artifacts. By uploading our bug reproduction artifacts to Trovi, we enable other researchers to easily reproduce scalability bugs, fostering collaboration and advancing the field of distributed systems research.&lt;/p>
&lt;h2 id="short-description-of-the-bugs">Short Description of the Bugs&lt;/h2>
&lt;ol>
&lt;li>&lt;a href="https://issues.apache.org/jira/browse/IGNITE-20614" target="_blank" rel="noopener">IGNITE-20614&lt;/a>
This bug refers to an issue where the Ignite service grid experiences degradation or hangs under specific conditions related to service deployment and node restarts.&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Root Causes&lt;/strong>: The root cause is a race condition during the deployment and undeployment of services in the service grid, particularly when nodes are restarted or when there is a significant amount of concurrent service deployment and undeployment activity.&lt;/p>
&lt;p>&lt;strong>Impact&lt;/strong>: The impact of this bug includes potential service grid hangs, degraded performance, and possible inability to deploy or undeploy services as expected, which can disrupt the overall operation of the Ignite cluster.&lt;/p>
&lt;p>&lt;strong>Fix&lt;/strong>: The fix involves adding proper synchronization mechanisms to handle concurrent service deployment and undeployment operations more gracefully, ensuring that race conditions are avoided.&lt;/p>
&lt;ol start="2">
&lt;li>&lt;a href="https://issues.apache.org/jira/browse/IGNITE-17407" target="_blank" rel="noopener">IGNITE-17407&lt;/a>
This issue pertains to the incorrect behavior of the Ignite thin client protocol, particularly when dealing with binary objects and schema changes.&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Root Causes&lt;/strong>: The root cause lies in the way the thin client handles binary object schema changes. The thin client was not correctly updating the schema cache, leading to inconsistencies and incorrect behavior when deserializing binary objects.&lt;/p>
&lt;p>&lt;strong>Impact&lt;/strong>: Users of the thin client may experience issues with binary object deserialization, leading to potential data corruption, incorrect query results, and overall application instability.&lt;/p>
&lt;p>&lt;strong>Fix&lt;/strong>: The fix involves updating the thin client protocol to properly handle schema changes by ensuring that the schema cache is correctly updated and synchronized with the server.&lt;/p>
&lt;ol start="3">
&lt;li>&lt;a href="https://issues.apache.org/jira/browse/IGNITE-20692" target="_blank" rel="noopener">IGNITE-20692&lt;/a>
This bug is related to the performance degradation observed in the Ignite SQL engine when executing certain complex queries.&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Root Causes&lt;/strong>: The root cause is identified as inefficient query planning and execution strategies for specific types of complex SQL queries, leading to excessive resource consumption and slow query performance.&lt;/p>
&lt;p>&lt;strong>Impact&lt;/strong>: Users running complex SQL queries may experience significant performance degradation, leading to slower response times, increased CPU and memory usage, and potentially impacting the overall performance of the Ignite cluster.&lt;/p>
&lt;p>&lt;strong>Fix&lt;/strong>: The fix involves optimizing the SQL query planner and executor to handle complex queries more efficiently, including better indexing strategies, improved query plan caching, and more effective resource management during query execution.&lt;/p>
&lt;ol start="4">
&lt;li>&lt;a href="https://issues.apache.org/jira/browse/IGNITE-16600" target="_blank" rel="noopener">IGNITE-16600&lt;/a>
This bug involves an issue with speed-based throttling in the checkpoint process, leading to possible starvation of the checkpoint thread under heavy load.&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Root Causes&lt;/strong>: The root cause is the absence of proper mechanisms to wake up throttled threads when they no longer need to be throttled, resulting in unnecessary waiting and potential starvation of the checkpoint thread.&lt;/p>
&lt;p>&lt;strong>Impact&lt;/strong>: Under heavy load, the checkpoint process can be significantly delayed, leading to slower checkpoint completion times, increased risk of data loss, and overall degraded performance of the Ignite cluster.&lt;/p>
&lt;p>&lt;strong>Fix&lt;/strong>: The fix includes implementing methods to wake up throttled threads when they no longer need to be throttled (tryWakeupThrottledThreads and shouldThrottle), ensuring that the checkpoint process can proceed without unnecessary delays.&lt;/p>
&lt;ol start="5">
&lt;li>&lt;a href="https://issues.apache.org/jira/browse/IGNITE-16072" target="_blank" rel="noopener">IGNITE-16072&lt;/a>
This issue pertains to the incorrect handling of SQL queries involving NULL values in the Ignite SQL engine, leading to unexpected query results.&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Root Causes&lt;/strong>: The root cause is an incorrect implementation of SQL semantics for handling NULL values in certain query conditions, particularly in the presence of complex joins and subqueries.&lt;/p>
&lt;p>&lt;strong>Impact&lt;/strong>: Users may experience incorrect query results when NULL values are involved, leading to potential data inconsistencies and incorrect application behavior.&lt;/p>
&lt;p>&lt;strong>Fix&lt;/strong>: The fix involves correcting the SQL engine&amp;rsquo;s implementation to properly handle NULL values according to the SQL standard, ensuring that queries involving NULL values produce the expected results.&lt;/p>
&lt;h2 id="whats-next">What&amp;rsquo;s Next?&lt;/h2>
&lt;h4 id="continued-bug-reproduction">Continued Bug Reproduction:&lt;/h4>
&lt;ul>
&lt;li>Focus on reproducing more scalability bugs&lt;/li>
&lt;/ul>
&lt;h4 id="documentation-of-challenges">Documentation of Challenges:&lt;/h4>
&lt;ul>
&lt;li>Breakdown specific challenges encountered during attempts to reproduce scalability bugs.&lt;/li>
&lt;li>Categorize challenges, including technical complexities, environmental dependencies, and lack of documentation in bug reports.&lt;/li>
&lt;/ul>
&lt;h4 id="finalizing-project-deliverables">Finalizing Project Deliverables:&lt;/h4>
&lt;ul>
&lt;li>Package artifacts using Jupyter notebook scripts for convenient replay of investigation steps.&lt;/li>
&lt;li>Upload the package to Trovi for replayable artifacts, enabling other researchers to easily reproduce scalability bugs for our benchmark applications.&lt;/li>
&lt;/ul>
&lt;h3 id="conclusion">Conclusion&lt;/h3>
&lt;p>The ScaleRep project has made significant strides in reproducing and benchmarking scalability bugs in large-scale distributed systems. By successfully reproducing and documenting scalability bugs, we are contributing valuable insights to the research community, aiding in the development of more robust distributed systems. The protocols and methodologies devised in this project will serve as valuable tools for researchers exploring similar issues.&lt;/p>
&lt;p>Stay tuned for more updates as we continue to tackle scalability bugs and improve the reliability and performance of large-scale distributed systems.&lt;/p></description></item><item><title>Mid-term Blog: Automatic reproducibility of COMPSs experiments through the integration of RO-Crate in Chameleon</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/intel/20240729-architd/</link><pubDate>Mon, 29 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/intel/20240729-architd/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Hello everyone
I&amp;rsquo;am Archit from India. An undergraduate student at the Indian Institute of Technology, Banaras Hindu University, IIT (BHU), Varanasi. As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/bsc/ro-crate-compss/">Automatic reproducibility of COMPSs experiments through the integration of RO-Crate in Chameleon&lt;/a> my &lt;a href="https://drive.google.com/file/d/1qY-uipQZPox144LD4bs05rn3islfcjky/view" target="_blank" rel="noopener">proposal&lt;/a> under mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/raul-sirvent/">Raül Sirvent&lt;/a> aims to develop a service that facilitates the automated replication of COMPSs experiments within the Chameleon infrastructure.&lt;/p>
&lt;h2 id="about-the-project">About the project:&lt;/h2>
&lt;p>The project proposes to create a service that will have the capability to take a COMPSs crate (an artifact adhering to the RO-Crate specification) and, through analysis of the provided metadata construct a Chameleon-compatible image for replicating the experiment on the testbed.&lt;/p>
&lt;h2 id="progress">Progress&lt;/h2>
&lt;p>It has been more than six weeks since the ReproducibilityService project began, and significant progress has been made. You can test the actual service from my GitHub repository: &lt;a href="https://github.com/Minimega12121/COMPSs-Reproducibility-Service" target="_blank" rel="noopener">ReproducibilityService&lt;/a>. Let&amp;rsquo;s break down what the ReproducibilityService is capable of doing now:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Support for Reproducing Basic COMPSs Experiments&lt;/strong>: The RS program is now fully capable of reproducing basic COMPSs experiments with no third-party dependencies on any device with the COMPSs Runtime installed. Here&amp;rsquo;s how it works:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Getting the Crate&lt;/strong>: The RS program can accept the COMPSs workflow from the user either as a path to the crate or as a link from WorkflowHub. In either case, it creates a sub-directory for further execution named &lt;code>reproducibility_service_{timestamp}&lt;/code> and stores the workflow as &lt;code>reproducibility_service_{timestamp}/Workflow&lt;/code>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Address Mapping&lt;/strong>: The ro-crate contains &lt;code>compss_submission_command_line.txt&lt;/code>, which is the command originally used to execute the experiment. This command may include many paths such as &lt;code>runcompss flag1 flag2 ... flagn &amp;lt;main_workflow_file.py&amp;gt; input1 input2 ... inputn output&lt;/code>. The RS program maps all the paths for &lt;code>&amp;lt;main_workflow_file.py&amp;gt; input1 input2 ... inputn output&lt;/code> to paths inside the machine where we want to reproduce the experiment. The flags are dropped as they may be device-specific, and the service asks the user for any new flags they want to add to the COMPSs runtime.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Verifying Files&lt;/strong>: Before reproducing an experiment, it&amp;rsquo;s crucial to check whether the inputs or outputs have been tampered with. The RS program cross-verifies the &lt;code>contentSize&lt;/code> from the &lt;code>ro-crate-metadata.json&lt;/code> and generates warnings in case of any abnormalities.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Error Logging&lt;/strong>: In case of any problems during execution, the &lt;code>std_out&lt;/code> and &lt;code>std_err&lt;/code> are stored inside &lt;code>reproducibility_service_{timestamp}/log&lt;/code>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Results&lt;/strong>: If any results do get generated by the experiment, the RS program stores them inside &lt;code>reproducibility_service_{timestamp}/Results&lt;/code>. If we
ask for the provenance of the workflow also, the ro-crate thus generated is also stored here only.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="REPRODUCIBILITY SERVICE FLOWCHART" srcset="
/report/osre24/intel/20240729-architd/RS_chart_hu1a952b7a4697c53cd74822153911f260_56808_4df9e9a771513277aaf5c7a4d8182666.webp 400w,
/report/osre24/intel/20240729-architd/RS_chart_hu1a952b7a4697c53cd74822153911f260_56808_0b96071409b70d8356241465bf214510.webp 760w,
/report/osre24/intel/20240729-architd/RS_chart_hu1a952b7a4697c53cd74822153911f260_56808_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/intel/20240729-architd/RS_chart_hu1a952b7a4697c53cd74822153911f260_56808_4df9e9a771513277aaf5c7a4d8182666.webp"
width="760"
height="267"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;ol start="2">
&lt;li>&lt;strong>Support for Reproducing Remote Datasets&lt;/strong>: If a remote dataset is specified inside the metadata file, the RS program fetches the dataset from the specified link using &lt;code>wget&lt;/code>, stores the remote dataset inside the crate, and updates the path in the new command line it generates.&lt;/li>
&lt;/ol>
&lt;h2 id="challenges-and-end-term-goals">Challenges and End-Term Goals&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Support for DATA_PERSISTENCE_FALSE&lt;/strong>: The RS program still needs to support crates with &lt;code>dataPersistence&lt;/code> set to false. After weeks of brainstorming ideas on how to implement this, we recently concluded that since the majority of &lt;code>DATA_PERSISTENCE_FALSE&lt;/code> crates are run on SLURM clusters, and the dataset required to fetch in such a case is somewhere inside the cluster, the RS program will support this case for such clusters. Currently, I am working with the Nord3v2 cluster to further enhance the functionality of ReproducibilityService.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Chameleon Cluster Setup&lt;/strong>: I have made some progress towards creating a new COMPSs 3.3 Appliance on Chameleon to test the service. However, creating the cluster setup script needed for the service to run on a COMPSs 3.3.1 cluster to execute large experiments has been challenging.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Integrating with COMPSs Repository&lt;/strong>: After completing the support for &lt;code>dataPersistence&lt;/code> false cases, we aim to launch this service as a tool inside the &lt;a href="https://github.com/bsc-wdc/compss" target="_blank" rel="noopener">COMPSs repository&lt;/a>. This will be a significant milestone in my developer journey as it will be the first real-world project I have worked on, and I hope everything goes smoothly.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Stay tuned for the next blog!!&lt;/p></description></item><item><title>Final Blog: FetchPipe: Data Science Pipeline for ML-based Prefetching</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/fetchpipe/20240918-peiranqin/</link><pubDate>Sat, 27 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/fetchpipe/20240918-peiranqin/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Hello, I’m &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/peiran-qin/">Peiran Qin&lt;/a>, a CS student at the University of Chicago. This summer I worked on the project &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uchicago/fetchpipe/">FetchPipe: Data Science Pipeline for ML-based Prefetching&lt;/a> under the mentorship of Prof. &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/haryadi-s.-gunawi/">Haryadi S. Gunawi&lt;/a>. The FetchPipe project focuses on building a unified Python simulator and evaluating the existing cache-eviction policy and ML-based prefetcher under this simulator. Through this projects, we make the following contributions and get several insights that can share with the community:&lt;/p>
&lt;ol>
&lt;li>We built up a simulator to evaluate various prefetchers under a unified framework, under the production level traces from Alibaba, Microsoft Research, and Tencent.&lt;/li>
&lt;li>Through the evaluation, we discover several downsides that existing heuristic-based prefetchers encounter.&lt;/li>
&lt;li>We draw several insights that can guide the future prefetchers&amp;rsquo; design.&lt;/li>
&lt;/ol>
&lt;h2 id="methodology">Methodology&lt;/h2>
&lt;p>In the first half of the SoR project, I mainly focus on the &lt;strong>simulator building of I/O prefetcher&lt;/strong>. The simulator should mimic the real OS-level prefetching as much as possible. First, we develop a mechanism that mimics the users sending I/O requests to the underlying systems. Then, we simulate the process of page division, and memory management inside the systems. Finally, we designed a sleep-based mechanism to mimic the I/O latency of backend storage. The outcome system can eventually simulate the data path of I/O request and prefetching of real systems, and collect the crucial metrics such as hit rate, total prefetched data, bandwidth usage, prefetch accuracy, total cache eviction, etc.&lt;/p>
&lt;p>In the second half of the SoR project, I concentrate on the &lt;strong>evaluation of existing prefetchers&lt;/strong>. First, we surveyed the existing state-of-the-art prefetchers and divided them into two categories: (1) Heuristic-based prefetchers and (2) ML-based prefetchers. Next, for each category, we picked several representative prefetchers and implemented them within our simulator. Then, we evaluated those prefetchers using the production-level over 600 traces from Alibaba, Tencent, and Microsoft Research. Finally, we analyzed the performance of those prefetchers and discovered some interesting insights that might guide the future prefeters&amp;rsquo;s design.&lt;/p>
&lt;p>Finally, based on the achievements of the SoR project, I will continue involving this interesting project with Prof. &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/haryadi-s.-gunawi/">Haryadi S. Gunawi&lt;/a>. We are leveraging the current insights we get to build an I/O prefetcher that mitigates the downsides of existing prefetchers.&lt;/p>
&lt;h2 id="insights">Insights&lt;/h2>
&lt;p>Based on our experiments on the existing prefetchers, we would like the share the following insights:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Heuristic-based prefetchers, including Linux Readahead and Stride prefetcher, rely on strict pre-fined rules and detect straightforward access patterns. However, those prefetchers are too conservative to recognize the increasingly complex access patterns. Especially, in real-world applications, sequential accesses are interweaved with random accesses, leading to a next-level complexity that makes it difficult for Linux Readahead and Stride prefetchers to recognize.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Offline learning-based prefetchers learn the access patterns by training machine learning models on pre-collected historical access patterns. Blessed by the representational power of machine learning, these prefetchers excel at recognizing complex access patterns. However, their effectiveness is constrained by their dependence on the patterns encountered during offline training, making them less adaptable to previously unseen patterns in online scenarios. Moreover, due to not relying on the pre-defined rule of prefetching, Offline learning-based prefetchers are more prone to prefetch useless data, which causes cache pollution and extra pressure on backend storage.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>We argue that a good prefetcher under nowadays complex and changing workload should have three properties: (1) Complexity-Recognition: which means the prefetcher should be able to recognize the complex access pattern of a complex workload. (2) Reliability: means the prefetcher should reduce its possibility to prefetch using less data and cause cache pollution. (3) Adaptability: means the prefetcher should adapt itself to the changing workload.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="future-works">Future Works&lt;/h2>
&lt;p>Based on the above insights, we are now designing our own prefetchers that can mitigate the downsides of existing prefetchers. We will make our code public after we finalize our design.&lt;/p>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>Through the SoR project, I delved into the research area of I/O prefetching by reproducing the related works, characterizing their performance, and designing our own prefetcher. We contribute to the community with a comprehensive simulator, evaluation results of related prefetchers, and insights that can guide the future prefetchers&amp;rsquo; design. In the future, I will continue working on the research area of prefetcher and keep making contributions.&lt;/p></description></item><item><title>Mid Term Blog: FetchPipe: Data Science Pipeline for ML-based Prefetching</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/fetchpipe/20240727-peiranqin/</link><pubDate>Sat, 27 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/fetchpipe/20240727-peiranqin/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Hello, I’m &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/peiran-qin/">Peiran Qin&lt;/a>, a CS student at the University of Chicago, currently working on the project &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uchicago/fetchpipe/">FetchPipe: Data Science Pipeline for ML-based Prefetching&lt;/a> under the mentorship of Prof. &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/haryadi-s.-gunawi/">Haryadi S. Gunawi&lt;/a>. The FetchPipe project focuses on building a unified python simulator and evaluating the existing chache-eviction and ML-Based prefetcher under this simulator.&lt;/p>
&lt;h2 id="motivation">Motivation&lt;/h2>
&lt;p>Existing prefetching algorithms can be categorized into (a) heuristic-based methods such as the Linux lookahead prefetcher and (b) machine learning-based methods like Long Short Term Memory (LSTM) models. However, there is a research gap in comprehensively comparing all existing ML solutions, such as Leap and LSTM Prefetcher, under a consistent evaluation setup. To ensure the fairness of evaluations, it is essential to integrate all baselines and our prefetcher into a homogeneous evaluation environment. Additionally, there is a need to evaluate cache eviction algorithms under prefetching scenarios.&lt;/p>
&lt;p>Therefore, in this project, we aim to build a fair simulator, deploy state-of-the-art prefetchers and cache eviction algorithms onto this platform, and then evaluate them using comprehensive metrics. The state-of-the-art prefetchers we consider include Pythia (MICRO'21), SGDP (arXiv), and the Markov-Chain prefetcher. For cache eviction algorithms, we consider S3FIFO (SOSP'23) and SIEVE (NSDI'24). Our focus is on implementing these algorithms on our simulator and evaluating their performance using block storage datasets from Alibaba, Tencent, and MSR. Besides evaluating the prefetchers and eviction algorithms individually, we also aim to combine prefetchers with cache eviction algorithms to test overall performance.&lt;/p>
&lt;h2 id="current-progress">Current Progress&lt;/h2>
&lt;p>In the past one and a half months, I have focused on (1) implementing our Python simulator and (2) deploying state-of-the-art prefetchers and cache eviction algorithms on this simulator. The implementation phase is now complete. The detailed progress is as follows:&lt;/p>
&lt;ol>
&lt;li>The python simulator of evaluating both ML-based or heuristic-based prefetchers and cache eviction are done.&lt;/li>
&lt;li>Evaluations metrics collection, such as hit rate, total prefetched data, prefetch overhead, prefetch accuracy are implemented on the simulator.&lt;/li>
&lt;li>Two ML-based prefetchers, SGDP, Pythia and Markov-Chain are deployed on the simulator. SGDP is a graphed neural network based prefetcher, and Pythia is a reinforment learning based prefetcher.&lt;/li>
&lt;li>State-of-the-art heuristic based eviction algorithms are implemented in the simulator, including S3FIFO and SIEVE.&lt;/li>
&lt;/ol>
&lt;p>With the simulator and state-of-the-art ML-based prefetchers and eviction algorithms in place, the next steps are to (1) organize a large-scale dataset (including over 600 traces from real storage servers) for testing performance and (2) evaluate the implemented prefetchers and eviction algorithms on this dataset. Finally, I will analyze the evaluation results and provide insights from the experimental outcomes. For the ML-based prefetchers, I will analyze both ML-related metrics such as accuracy and F1-score, and system metrics such as hit rate and various overheads.&lt;/p>
&lt;h2 id="challenges">Challenges&lt;/h2>
&lt;p>The biggest challenge is implementing existing prefetchers correctly and fairly. Since some state-of-the-art prefetchers are designed for DRAM prefetching, adapting them for SSD prefetching in the simulator is challenging. Additionally, the lack of source code for some works makes it difficult to reproduce their algorithms accurately based solely on their paper descriptions.&lt;/p></description></item><item><title>Improving Usability and Performance in cc-snapshot: My Midterm Update</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/cc-snapshot/20250724-zahratm/</link><pubDate>Wed, 24 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/cc-snapshot/20250724-zahratm/</guid><description>&lt;p>Hi! I&amp;rsquo;m Zahra Temori, a rising junior studying Computer Science at the University of Delaware. This summer, I’ve had the exciting opportunity to participate in the Chameleon Summer Reproducibility Program, where I’ve been working under the mentorship of Paul Marshall.
In this blog post, I’d love to share a midterm update on my project &lt;a href="https://github.com/ChameleonCloud/cc-snapshot" target="_blank" rel="noopener">cc-snapshot&lt;/a> and highlight what I’ve accomplished so far, what I’ve learned, and what’s coming next. It&amp;rsquo;s been a challenging but rewarding experience diving into real-world research and contributing to tools that help make science more reproducible!&lt;/p>
&lt;h2 id="project-overview">Project Overview&lt;/h2>
&lt;p>CC-Snapshot is a powerful tool on the Chameleon testbed that enables users to package their customized environments for reproducibility and experiment replication. In research, reproducibility is essential. It allows scientists to run experiments consistently, share complete setups with others, and avoid environment-related errors. However, the current snapshotting mechanism has limitations that make it unreliable and inefficient, particularly in terms of usability and performance. These issues can slow down workflows and create barriers for users trying to reproduce results. Our goal is to improve both the usability and performance of the cc-snapshot tool. A more user-friendly and optimized system means that users can create and restore snapshots more quickly and easily, without needing to manually rebuild environments, ultimately saving time and improving reliability in scientific computing.&lt;/p>
&lt;h2 id="progress-so-far">Progress So Far&lt;/h2>
&lt;p>To structure the work, we divided the project into two main phases:&lt;/p>
&lt;ol>
&lt;li>Improving usability, and&lt;/li>
&lt;li>Optimizing performance.&lt;/li>
&lt;/ol>
&lt;p>I’ve nearly completed the first phase and have just started working on the second.&lt;/p>
&lt;h2 id="phase-one--usability-improvements">Phase One – Usability Improvements&lt;/h2>
&lt;p>The original version of the cc-snapshot tool had several usability challenges that made it difficult for users to interact with and for developers to maintain. These issues included a rigid interface, lack of flexibility, and limited testing support. All of which made the tool harder to use and extend.
To address these, I worked on the following improvements:&lt;/p>
&lt;p>&lt;strong>Problem&lt;/strong>: The command-line interface was limited and inflexible. Users couldn’t easily control features or customize behavior, which limited their ability to create snapshots in different scenarios.&lt;/p>
&lt;p>&lt;strong>Solution&lt;/strong>: I enhanced the CLI by adding:&lt;/p>
&lt;ul>
&lt;li>A flag to disable automatic updates, giving users more control.&lt;/li>
&lt;li>A &amp;ndash;dry-run flag to simulate actions before actually running them which is useful for testing and safety.&lt;/li>
&lt;li>Support for a custom source path, allowing snapshots of specific directories. This makes the tool much more useful for testing smaller environments.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Problem&lt;/strong>: The code lacked automated tests. Without tests, developers have to manually verify everything, which is time-consuming and error-prone.&lt;/p>
&lt;p>&lt;strong>Solution&lt;/strong>: I implemented a basic test suite and integrated it with GitHub Actions, so the tool is automatically tested on every pull request.&lt;/p>
&lt;p>&lt;strong>Problem&lt;/strong>: The tool didn’t follow a modular design. The logic was tightly coupled, making it hard to isolate or extend parts of the code.&lt;/p>
&lt;p>&lt;strong>Solution&lt;/strong>: I refactored the code by extracting key functions. This makes the code cleaner, easier to understand, and more maintainable in the long term.&lt;/p>
&lt;h2 id="next-steps--phase-two-performance-optimization">Next Steps – Phase Two: Performance Optimization&lt;/h2>
&lt;p>After improving the usability of the cc-snapshot tool, the next phase of the project focuses on addressing key performance bottlenecks. Currently, the snapshotting process can be slow and resource-intensive, which makes it less practical for frequent use especially with large environments.&lt;/p>
&lt;p>&lt;strong>Problem 1: Slow Image Compression&lt;/strong>
The current implementation uses the qcow2 image format with zlib compression, which is single-threaded and often inefficient for large disk images. This leads to long snapshot creation times and high CPU usage.&lt;/p>
&lt;p>&lt;strong>Solution&lt;/strong>: I will benchmark and compare different compression strategies, specifically:&lt;/p>
&lt;ul>
&lt;li>qcow2 with no compression&lt;/li>
&lt;li>qcow2 with zstd compression, which is faster and multi-threaded&lt;/li>
&lt;li>raw image format, which has no compression but may benefit from simpler processing&lt;/li>
&lt;/ul>
&lt;p>These tests will help determine which method provides the best tradeoff between speed, size, and resource usage.&lt;/p>
&lt;p>&lt;strong>Problem 2: Suboptimal Storage Backend&lt;/strong>
Snapshots are currently uploaded to Glance, which can be slow and unreliable. Uploading large images can take several minutes, and this slows down the user workflow.&lt;/p>
&lt;p>&lt;strong>Solution&lt;/strong>: I will compare Glance with a faster alternative, the Object Store. Smaller, compressed images may upload significantly faster to the Object Store e.g. 30 seconds vs. 2 minutes. By measuring upload speeds and reliability, I can recommend a better default or optional backend for users.&lt;/p>
&lt;h2 id="how-i-will-measure-performance">How I will Measure Performance&lt;/h2>
&lt;p>To understand the impact of different strategies, I will try to collect detailed metrics across three stages:&lt;/p>
&lt;ol>
&lt;li>Image creation: How long it takes to build the image, depending on compression and format&lt;/li>
&lt;li>Image upload: How quickly the snapshot can be transferred to Glance or Object Store&lt;/li>
&lt;li>Instance boot time: How fast a new instance can start from that image (compressed formats must be decompressed)&lt;/li>
&lt;/ol>
&lt;p>I will run multiple tests for each scenario and record performance metrics like CPU usage, memory usage, disk throughput, and total time for each step. This will help identify the most efficient and practical configuration for real-world use.&lt;/p>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>Addressing the current usability and performance issues in cc-snapshot is essential to improving the overall user experience. By making the tool easier to use, faster, and more flexible, we can support researchers and developers who depend on reproducible computing for their work. So far, I’ve worked on enhancing the tool’s interface, adding testing support, and refactoring the codebase for better maintainability. In the next phase, I’ll be focusing on benchmarking different compression methods, image formats, and storage backends to improve speed and efficiency.
These improvements will help make cc-snapshot a more powerful and user-friendly tool for the scientific community.&lt;/p>
&lt;p>Stay tuned for the next update and thank you for following my journey!&lt;/p></description></item><item><title>Halfway Blog: FSA: Benchmarking Fail-Slow Algorithms</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/fsa-benchmarking/20240723-xikangsong/</link><pubDate>Tue, 23 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/fsa-benchmarking/20240723-xikangsong/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Hi, I&amp;rsquo;m &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/xikang-song/">Xikang Song&lt;/a>, a 2024 SoR contributor to the project, working with mentors &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ruidan-li/">Ruidan Li&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/kexin-pei/">Kexin Pei&lt;/a>. Our FSA-Benchmark project is dedicated to exploring and benchmarking various machine learning models to identify disks at high risk of fail-slow anomalies. We will benchmark a range of machine learning algorithms, from traditional to advanced methods, and compare the results using a comprehensive evaluation system. This will provide a clear view of how machine learning impacts critical error detection in RAID systems.&lt;/p>
&lt;h2 id="motivation">Motivation&lt;/h2>
&lt;p>Fail-slow issues in storage systems , where a disk operates at a significantly reduced speed without completely failing, are subtle and can manifest as consistently higher latency compared to peer disks or recurrent abnormal latency spikes. These issues are challenging to detect but can significantly degrade overall system performance over time. Fixed thresholds are ineffective because latency distributions vary across different clusters, leading to thresholds that are either too low or too high, resulting in numerous false alerts. Therefore, we are enthusiastic about using machine learning models to analyze disk performance data. Machine learning algorithms can deeply learn the trends in the data, providing better detection capabilities.&lt;/p>
&lt;h2 id="current-progress-and-challenges">Current Progress and Challenges&lt;/h2>
&lt;h3 id="algorithm-implementation">Algorithm Implementation:&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Cost-Sensitive Ranking Model&lt;/strong>: Inspired by the paper &amp;ldquo;Improving Service Availability of Cloud Systems by Predicting Disk Error&amp;rdquo; presented at the USENIX ATC &amp;lsquo;18 conference, this model ranks disks based on fail-slow risk.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Multi-Prediction Models&lt;/strong>: Drawing from &amp;ldquo;Improving Storage System Reliability with Proactive Error Prediction&amp;rdquo; presented at the USENIX ATC &amp;lsquo;17 conference, this approach uses multiple traditional machine learning models to evaluate disk health using diverse features. Various models were tested, with the Random Forest classifier proving most effective.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>LSTM Model&lt;/strong>: This model employs Long Short-Term Memory (LSTM) networks, trained on the first day&amp;rsquo;s data for each cluster and evaluated on data spanning all days. It captures temporal dependencies to accurately predict fail-slow anomalies over time.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="comprehensive-evaluation">Comprehensive Evaluation:&lt;/h3>
&lt;ol>
&lt;li>Collected outputs from all algorithms on Chameleon for Perseus data A to Y (25 clusters).&lt;/li>
&lt;li>Parsed the outputs through a comprehensive evaluation system, recording the true/false positives/negatives.&lt;/li>
&lt;li>Plotted heat maps to show precision and recall with different look-back days and alert threshold settings.&lt;/li>
&lt;li>Compared the performance across different clusters to draw conclusions.&lt;/li>
&lt;/ol>
&lt;h3 id="packaging-code">Packaging Code:&lt;/h3>
&lt;ul>
&lt;li>Packaged all the code into a Trovi Jupyter notebook, including the Chameleon server setup, to provide clear steps for running the code and reproducing the experiments. All algorithm testing and result parsing can be easily done here.&lt;/li>
&lt;/ul>
&lt;h3 id="challenges">Challenges&lt;/h3>
&lt;p>Initially, I was unsure how to evaluate the performance of different algorithms. &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ruidan-li/">Ruidan Li&lt;/a> provided comprehensive guidance on collecting all the results uniformly and parsing them to gather true/false positives/negatives. This approach enabled us to derive meaningful metrics and plot heatmaps for precision and recall. I learned the scientific method of benchmarking performance, and I am grateful for the guidance.&lt;/p>
&lt;h2 id="future-steps">Future Steps&lt;/h2>
&lt;h3 id="further-investigation-of-advanced-algorithms">Further Investigation of Advanced Algorithms&lt;/h3>
&lt;p>We plan to explore advanced algorithms such as PatchTST. This will involve systematically collecting outputs and conducting comprehensive benchmarking to assess their performance in identifying fail-slow anomalies.&lt;/p>
&lt;h3 id="transition-to-large-language-models-llms">Transition to Large Language Models (LLMs)&lt;/h3>
&lt;p>Recognizing the limitations of traditional machine learning methods, we intend to transition to utilizing Large Language Models (LLMs). LLMs have demonstrated superior capabilities in understanding complex patterns and making accurate predictions. We anticipate that incorporating LLMs into our analysis will enhance our ability to detect and predict fail-slow anomalies more accurately, leading to better overall system reliability.&lt;/p></description></item><item><title>Exploring Throttling Bugs in HDFS: Reproducing Developer Fixes</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/osu/scalerep/20240722-shuangliang/</link><pubDate>Mon, 22 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/osu/scalerep/20240722-shuangliang/</guid><description>&lt;p>Scalability is a critical concern for large-scale distributed systems like the Hadoop Distributed File System (HDFS). Throttling bugs, which affect the system&amp;rsquo;s ability to manage data transfer rates effectively, can lead to performance issues and system instability. In my recent work, I focused on reproducing the effects of two specific throttling bugs in HDFS, which were fixed by developers. This blog provides an overview of these bugs and the process of reproducing their effects to validate the fixes.&lt;/p>
&lt;h1 id="hdfs-17087-missing-throttler-in-dataxceiverreadblock">HDFS-17087: Missing Throttler in DataXceiver#readBlock&lt;/h1>
&lt;p>One of the throttling bugs I explored was HDFS-17087. The DataXceiver#readBlock function in HDFS lacked a throttler, resulting in unregulated data reads. This absence could lead to potential performance degradation under heavy loads. The developer fixed this issue by adding a throttler to regulate the data transfer rate. In my work, I reproduced the bug and observed the system&amp;rsquo;s behavior both before and after applying the developer&amp;rsquo;s patch. The results showed a significant improvement in stability and performance post-fix.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./Figure1.png" alt="Figure 1" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h1 id="hdfs-17216-incorrect-data-rate-calculation">HDFS-17216: Incorrect Data Rate Calculation&lt;/h1>
&lt;p>Another crucial bug was HDFS-17216. The issue stemmed from the use of integer division in the getBytesPerSec function, which caused incorrect speed calculations and failed to trigger the throttle, resulting in overspeed. The developer addressed this by switching from integer to float for calculating the elapsed time, ensuring accurate speed measurements. I reproduced the conditions that highlighted the bug&amp;rsquo;s effects and compared the system&amp;rsquo;s performance with and without the fix. The post-fix results confirmed that the throttling mechanism worked correctly, effectively preventing overspeed.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./Figure2.png" alt="Figure 2" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h1 id="conclusion">Conclusion&lt;/h1>
&lt;p>Reproducing these throttling bugs and validating the developer fixes was a vital step in understanding their impact on HDFS&amp;rsquo;s scalability. The improvements observed in system stability and performance underscore the importance of accurate throttling mechanisms. This work contributes to the broader effort of maintaining robust and scalable distributed systems, ensuring they can handle increasing loads efficiently.&lt;/p></description></item><item><title>Trovi redesign process and low fidelity prototype in Figma</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/chameleontroviredesign/20240722-aliciaem/</link><pubDate>Mon, 22 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/chameleontroviredesign/20240722-aliciaem/</guid><description>&lt;p>Hello! My name is &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/alicia-esquivel-morel/">Alicia Esquivel Morel&lt;/a>, and I&amp;rsquo;m a graduate research assistant at the University of Missouri – Columbia, pursuing a PhD in Computer Science. This summer, I&amp;rsquo;m working on a project to &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uchicago/trovi/">improve user experience reproducibility through a redesign of TROVI&lt;/a>, as part of the Summer of Reproducibility (SoR) program. I&amp;rsquo;m excited to be working with two fabulous mentors, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/kate-keahey/">Kate Keahey&lt;/a>, and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mark-powers/">Mark Powers&lt;/a>. .&lt;/p>
&lt;h2 id="research-reproducibility-with-a-trovi-redesign">Research Reproducibility with a TROVI Redesign&lt;/h2>
&lt;p>As researchers, we constantly face challenges replicating experiments due to limitations in current tools. &lt;a href="https://chameleoncloud.readthedocs.io/en/latest/technical/sharing.html" target="_blank" rel="noopener">TROVI&lt;/a>, a platform designed to facilitate experiment replication, can be hindered by hard-to-follow interfaces and difficulties integrating code and data. This leads to confusion and frustration.&lt;/p>
&lt;p>My SoR project tackles these issues by redesigning TROVI to enhance user experience reproducibility. Imagine a user-friendly platform where uploading code, sharing data, and collaborating with colleagues becomes easy and straighforward.&lt;/p>
&lt;h2 id="the-redesigns-goals">The Redesign&amp;rsquo;s Goals&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Enhanced User Experience:&lt;/strong> Inspired by user-friendly platforms like Google Colab, we&amp;rsquo;ll simplify TROVI&amp;rsquo;s interface for intuitive navigation and ease of use.&lt;/li>
&lt;li>&lt;strong>Uploads and Sharing:&lt;/strong> Uploading code and data, as well as collaborating with researchers, are key goals. Integration with platforms like GitHub will further streamline collaboration.&lt;/li>
&lt;li>&lt;strong>Continuous Improvement:&lt;/strong> A built-in feedback loop will allow users to provide input and suggestions, ensuring TROVI constantly evolves based on user needs.&lt;/li>
&lt;/ul>
&lt;h2 id="progress-i-have-made-so-far">Progress I have made so far&lt;/h2>
&lt;p>The first stage of my project began with conducting User Experience (UX) research and identifying user requirements for TROVI. I then conducted a literature review on reproducibility platforms to learn about efficient methodologies and platforms for reproducibility. This helped establish a clearer project scope. Additionally, I analyzed TROVI end-user feedback to understand redesign needs.&lt;/p>
&lt;p>In summary, during the first weeks of the project, I focused on research and requirements gathering, including the literature review on state-of-the-art reproducibility platforms. Before midterm assessment, my work also involved the redesign process, prioritizing improved usability and user experience. I designed wireframes following requierements and user feedback and later translated them into a low-fidelity prototypes. Front-end and back-end considerations were made, such as selecting a front-end language (Vue.js) and a collaborative design tool (Figma).&lt;/p>
&lt;h2 id="what-do-i-plan-to-do-over-the-next-weeks">What do I plan to do over the next weeks?&lt;/h2>
&lt;p>During the next two weeks, I will address challenges encountered in the design process and make the necessary adjustments to ensure the success of the next steps of the project. A higher-fidelity prototype will be completed, including connections between the different objects and frames. This will facilitate the creation of a front-end with multiple flows in the prototype. Additionally, this will provide a preview of the end-user experience through the design process, without requiring the back-end to be functional or connected yet. I&amp;rsquo;m also investigating design tool API integrations to access TROVI&amp;rsquo;s APIs. This will give us the ability to access and isolate any TROVI artifact properties associated with it.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>I&amp;rsquo;m halfway in the redesign process. Next steps will include the integration of both the backend and frontend components to create a cohesive and functional system. We will also facilitate initial user interactions and testing to gather valuable feedback and ensure that the system meets the needs and expectations of end users.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>In addition, as I progress, my focus will shift towards enhancing the user experience and refining the final product based on the feedback received. The final two weeks of the program will be dedicated to this critical phase, where I will implement user experience techniques and conduct thorough testing to polish the product. This period will involve close analysis and iteration to address any issues, and an optimize functionality.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>By the end of the program, I aim to deliver a functional and user-friendly product that not only meets the initial project goals but also exceeds user expectations.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Stay tuned to see how TROVI is built for reproducible research!!&lt;/strong>&lt;/p></description></item><item><title>Data Engineering and Automated Evaluation for OpenROAD's Chat Assistant: Midterm Update</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/</link><pubDate>Sun, 21 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/</guid><description>&lt;p>Hello everyone! We&amp;rsquo;ve reached the halfway point of our Google Summer of Code 2024 journey, and it&amp;rsquo;s time for an update on our project to build a conversational chat assistant for OpenROAD. Under the guidance of our mentors, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/indira-iyer/">Indira Iyer&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jack-luar/">Jack Luar&lt;/a>, we&amp;rsquo;re making significant strides in enhancing OpenROAD&amp;rsquo;s user support capabilities.&lt;/p>
&lt;h2 id="project-focus">Project Focus&lt;/h2>
&lt;p>My project focuses on two crucial aspects of our chat assistant:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Data Engineering&lt;/strong>: Ensuring our assistant has access to comprehensive and relevant information.&lt;/li>
&lt;li>&lt;strong>Evaluation&lt;/strong>: Developing robust methods to assess and improve the assistant&amp;rsquo;s performance.&lt;/li>
&lt;/ol>
&lt;p>The ultimate goal is to create a more responsive and accurate chat assistant capable of aiding users with troubleshooting, installation, and general queries about OpenROAD. I&amp;rsquo;m working in tandem with &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/palaniappan-r/">Palaniappan R&lt;/a>, who is developing the RAG architecture for our assistant.&lt;/p>
&lt;h2 id="progress">Progress&lt;/h2>
&lt;p>Since our initial deployment, I&amp;rsquo;ve been concentrating on implementing automated evaluation systems for our RAG architecture. We&amp;rsquo;ve developed two primary evaluation methods:&lt;/p>
&lt;h3 id="basic-abbreviation-evaluation">Basic Abbreviation Evaluation&lt;/h3>
&lt;p>This method assesses the model&amp;rsquo;s ability to accurately identify and explain common abbreviations used within the OpenROAD community. It ensures that our assistant can effectively communicate using domain-specific terminology.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Figure 1: Flow Chart of Basic Abbreviation Evaluation" srcset="
/report/osre24/ucsd/openroad/20240621-aviral/figure1_basic_abbreviation_evaluation_hud808c3411b9bf24258c9c6d4950618ae_122195_7793f2944668d59749f48f3848acfba7.webp 400w,
/report/osre24/ucsd/openroad/20240621-aviral/figure1_basic_abbreviation_evaluation_hud808c3411b9bf24258c9c6d4950618ae_122195_c0340ef0448a8f440bce5566986a10ef.webp 760w,
/report/osre24/ucsd/openroad/20240621-aviral/figure1_basic_abbreviation_evaluation_hud808c3411b9bf24258c9c6d4950618ae_122195_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/figure1_basic_abbreviation_evaluation_hud808c3411b9bf24258c9c6d4950618ae_122195_7793f2944668d59749f48f3848acfba7.webp"
width="469"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Examples" srcset="
/report/osre24/ucsd/openroad/20240621-aviral/figure2_sample_examples_hu78acdf5642b62d8d730a2574d861f211_90900_f04196ec40b94ffced2a574cbd37ad44.webp 400w,
/report/osre24/ucsd/openroad/20240621-aviral/figure2_sample_examples_hu78acdf5642b62d8d730a2574d861f211_90900_1a776103bd42be9525343172ad16d2a2.webp 760w,
/report/osre24/ucsd/openroad/20240621-aviral/figure2_sample_examples_hu78acdf5642b62d8d730a2574d861f211_90900_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/figure2_sample_examples_hu78acdf5642b62d8d730a2574d861f211_90900_f04196ec40b94ffced2a574cbd37ad44.webp"
width="760"
height="431"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h3 id="llm-judge-based-evaluation">LLM Judge-Based Evaluation&lt;/h3>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Figure 2: Flow Chart of LLM Judge-Based Evaluation" srcset="
/report/osre24/ucsd/openroad/20240621-aviral/figure3_llm_judge_evaluation_hu385b71952e0ff054a9e7c96b25e3d452_269894_8dfc4bba33d8ad8d797f27f1c7a1eaaf.webp 400w,
/report/osre24/ucsd/openroad/20240621-aviral/figure3_llm_judge_evaluation_hu385b71952e0ff054a9e7c96b25e3d452_269894_6ef7c0153c7e61298bbf98aa15f5d69d.webp 760w,
/report/osre24/ucsd/openroad/20240621-aviral/figure3_llm_judge_evaluation_hu385b71952e0ff054a9e7c96b25e3d452_269894_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/figure3_llm_judge_evaluation_hu385b71952e0ff054a9e7c96b25e3d452_269894_8dfc4bba33d8ad8d797f27f1c7a1eaaf.webp"
width="689"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>For this more comprehensive evaluation, we:&lt;/p>
&lt;ol>
&lt;li>Prepared a dataset of question-answer pairs relevant to OpenROAD.&lt;/li>
&lt;li>Queried our model with these questions to generate answers.&lt;/li>
&lt;li>Employed LLMs (including GPT-4o and Gemini 1.5 Flash) to act as judges.&lt;/li>
&lt;li>Evaluated our model&amp;rsquo;s responses against ground truth answers.&lt;/li>
&lt;/ol>
&lt;p>Here&amp;rsquo;s a glimpse of our early benchmark results:&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Benchmark" srcset="
/report/osre24/ucsd/openroad/20240621-aviral/figure4_model_performance_comparison_hu7cf4636aada9c277e08b0256b02e5dd8_206498_06ea37525851a60dad5bd072a03cd329.webp 400w,
/report/osre24/ucsd/openroad/20240621-aviral/figure4_model_performance_comparison_hu7cf4636aada9c277e08b0256b02e5dd8_206498_d9a11b8b08e2634c01f9063cc78ab134.webp 760w,
/report/osre24/ucsd/openroad/20240621-aviral/figure4_model_performance_comparison_hu7cf4636aada9c277e08b0256b02e5dd8_206498_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/figure4_model_performance_comparison_hu7cf4636aada9c277e08b0256b02e5dd8_206498_06ea37525851a60dad5bd072a03cd329.webp"
width="760"
height="701"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Example" srcset="
/report/osre24/ucsd/openroad/20240621-aviral/figure5_sample_examples_hu7826377a5560a22c15aefc27866894bb_575795_f63055fd0281e09d0ef800e1e444c7f9.webp 400w,
/report/osre24/ucsd/openroad/20240621-aviral/figure5_sample_examples_hu7826377a5560a22c15aefc27866894bb_575795_91c683a3ebadbf3ce5a21099a81b1836.webp 760w,
/report/osre24/ucsd/openroad/20240621-aviral/figure5_sample_examples_hu7826377a5560a22c15aefc27866894bb_575795_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/figure5_sample_examples_hu7826377a5560a22c15aefc27866894bb_575795_f63055fd0281e09d0ef800e1e444c7f9.webp"
width="577"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="exploratory-data-analysis-eda-on-github-openroad-issues">Exploratory Data Analysis (EDA) on GitHub OpenROAD issues&lt;/h2>
&lt;p>To gather more data, I performed Exploratory Data Analysis (EDA) on GitHub OpenROAD issues using GitHub&amp;rsquo;s GraphQL API. This allowed us to:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Filter data based on parameters such as:&lt;/p>
&lt;ul>
&lt;li>Minimum number of comments&lt;/li>
&lt;li>Date range&lt;/li>
&lt;li>Mentioned PRs&lt;/li>
&lt;li>Open or closed status&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Structure the data, focusing on issues tagged with Build, Query, Installation, and Runtime.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Process the data into JSONL format with key fields including:&lt;/p>
&lt;ul>
&lt;li>&lt;code>url&lt;/code>: URL of the GitHub issue&lt;/li>
&lt;li>&lt;code>id&lt;/code>: Unique issue number&lt;/li>
&lt;li>&lt;code>title&lt;/code>: Issue title&lt;/li>
&lt;li>&lt;code>author&lt;/code>: Username of the issue creator&lt;/li>
&lt;li>&lt;code>description&lt;/code>: Initial issue description&lt;/li>
&lt;li>&lt;code>content&lt;/code>: Array of messages related to the issue&lt;/li>
&lt;li>&lt;code>category&lt;/code>: General category of the issue&lt;/li>
&lt;li>&lt;code>subcategory&lt;/code>: More specific category of the issue&lt;/li>
&lt;li>&lt;code>tool&lt;/code>: Relevant tools or components&lt;/li>
&lt;li>&lt;code>date&lt;/code>: Issue creation timestamp&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Figure 5: Sample structure of our processed JSONL data" srcset="
/report/osre24/ucsd/openroad/20240621-aviral/figure6_jsonl_data_structure_hua8b5c15add3c3268be11381a14b4e3cd_555437_fd103ea5ef1fa131b8bc806db99a24d1.webp 400w,
/report/osre24/ucsd/openroad/20240621-aviral/figure6_jsonl_data_structure_hua8b5c15add3c3268be11381a14b4e3cd_555437_c30d5d185fec144cfca686499f464f19.webp 760w,
/report/osre24/ucsd/openroad/20240621-aviral/figure6_jsonl_data_structure_hua8b5c15add3c3268be11381a14b4e3cd_555437_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/figure6_jsonl_data_structure_hua8b5c15add3c3268be11381a14b4e3cd_555437_fd103ea5ef1fa131b8bc806db99a24d1.webp"
width="692"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>After curating this dataset, I was able to run an Analysis on OpenROAD Github Issues, identifying multiple categories of issues in the form of a pie chart.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Figure 6: Distribution of OpenROAD issue types" srcset="
/report/osre24/ucsd/openroad/20240621-aviral/figure7_issue_type_distribution_hu54d4d8b580cc8ae9261f464a2e9181da_130314_d788906f5395b26ab2030fb056e45941.webp 400w,
/report/osre24/ucsd/openroad/20240621-aviral/figure7_issue_type_distribution_hu54d4d8b580cc8ae9261f464a2e9181da_130314_ebae2b4145d035c9521679314911236b.webp 760w,
/report/osre24/ucsd/openroad/20240621-aviral/figure7_issue_type_distribution_hu54d4d8b580cc8ae9261f464a2e9181da_130314_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/figure7_issue_type_distribution_hu54d4d8b580cc8ae9261f464a2e9181da_130314_d788906f5395b26ab2030fb056e45941.webp"
width="760"
height="504"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Figure 7: Breakdown of issues by specific OpenROAD tools" srcset="
/report/osre24/ucsd/openroad/20240621-aviral/figure8_issues_by_openroad_tools_hu9e03c2c37c64c392ac5e783cdb492b5c_174856_3af195a89fadc1379452709cdea50d22.webp 400w,
/report/osre24/ucsd/openroad/20240621-aviral/figure8_issues_by_openroad_tools_hu9e03c2c37c64c392ac5e783cdb492b5c_174856_e171fcc132e7c13ef62f2a192ed18b62.webp 760w,
/report/osre24/ucsd/openroad/20240621-aviral/figure8_issues_by_openroad_tools_hu9e03c2c37c64c392ac5e783cdb492b5c_174856_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/figure8_issues_by_openroad_tools_hu9e03c2c37c64c392ac5e783cdb492b5c_174856_3af195a89fadc1379452709cdea50d22.webp"
width="760"
height="511"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="looking-ahead">Looking Ahead&lt;/h2>
&lt;p>As we move into the second half of the GSOC period, our plans include:&lt;/p>
&lt;ul>
&lt;li>Incorporating GitHub Discussions data into our knowledge base.&lt;/li>
&lt;li>Utilizing this expanded dataset to enhance our RAG architecture.&lt;/li>
&lt;li>Continually refining and improving our model&amp;rsquo;s performance based on evaluation results.&lt;/li>
&lt;/ul>
&lt;p>We&amp;rsquo;re excited about the progress we&amp;rsquo;ve made and look forward to delivering an even more capable and helpful chat assistant for the OpenROAD community. Stay tuned for more updates as we continue this exciting journey!&lt;/p></description></item><item><title>Halfway Through SoR24: Building a Scalable Performance Benchmarking Tool for Genomics Workflows</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uga/genomicswf/20240721-martinputra/</link><pubDate>Sun, 21 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uga/genomicswf/20240721-martinputra/</guid><description>&lt;h2 id="project-overview">Project Overview&lt;/h2>
&lt;p>Hi! I&amp;rsquo;m Martin Putra, and I&amp;rsquo;m working on the &amp;ldquo;Reproducible Performance Benchmarking for Genomics Workflows on HPC Cluster&amp;rdquo; project under the supervision of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/in-kee-kim/">In Kee Kim&lt;/a>. We are building GenScale, a scalable benchmarking tool for genomics workfload which leverages industrial-grade cluster manager and monitoring systems. GenScale will allow us to generate performance data under a setup that is representative of large-scale production settings. Ultimately, we hope GenScale and the datasets it produces will catalyze engagement between the computer systems and bioinformatics community, thus accelerating the pace of discovery at both fields.&lt;/p>
&lt;h2 id="progress-and-challenges">Progress and Challenges&lt;/h2>
&lt;p>We have built a prototype using Kubernetes as cluster manager and Prometheus for monitoring systems. At its current state, the prototype can support an arbitrary number of compute nodes, owing to Kubernetes’ notable scaling capability. This provides a suitable environment for small- to mid-scale experiments. We leverage ChameleonCloud to provide the necessary computational and reproducibility infrastructure. The monitoring system supports cluster-level, node-level, and container-level metrics collection and failure detection. We integrated Grafana dashboards for visualizations.&lt;/p>
&lt;p>The prototype also supports the execution of user-defined workflows. During the design process, we considered integrating one of existing workflow execution systems, such as &lt;a href="https://github.com/common-workflow-language/cwltool" target="_blank" rel="noopener">cwltool&lt;/a>, &lt;a href="https://www.nextflow.io" target="_blank" rel="noopener">Nextflow&lt;/a>, or &lt;a href="https://github.com/broadinstitute/cromwell" target="_blank" rel="noopener">Cromwell&lt;/a>. Each system has its own pros and cons when placed within the context of how we envision GenScale. However, we ultimately decided to build our own workflow execution system in order to provide maximum flexibility for the capabilities we plan to add in the future. For example, we believe it will be interesting to study how hardware heterogeneity affects the performance of each application in the workflow (a well-known workflow scheduling problem). Studying the problem requires capability to schedule execution on specific machines. In addition, if we want to study contention, we may need to execute on machines which are currently running specific workflows, too. While there are ways to do them with existing workflow execution systems + Kubernetes stack, we believe it will be hugely simplified if we build our own workflow execution system.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:center">
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="" srcset="
/report/osre24/uga/genomicswf/20240721-martinputra/dnaseq-exec_time_proportion_hu2a99ec14fc56f180a344028699f1df1c_255588_8dedba866f2dae2e3c155c6037bb3c4c.webp 400w,
/report/osre24/uga/genomicswf/20240721-martinputra/dnaseq-exec_time_proportion_hu2a99ec14fc56f180a344028699f1df1c_255588_0134d07c43c3857435ab5c59f410ed7f.webp 760w,
/report/osre24/uga/genomicswf/20240721-martinputra/dnaseq-exec_time_proportion_hu2a99ec14fc56f180a344028699f1df1c_255588_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uga/genomicswf/20240721-martinputra/dnaseq-exec_time_proportion_hu2a99ec14fc56f180a344028699f1df1c_255588_8dedba866f2dae2e3c155c6037bb3c4c.webp"
width="760"
height="497"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:center">&lt;strong>Figure 1.&lt;/strong> Proportion of execution time for DNA Alignment applications, executed on Chameleon&amp;rsquo;s &lt;em>cascadelake_r&lt;/em> node with 1500MB paired-end input. &lt;strong>y-axis:&lt;/strong> proportion of application&amp;rsquo;s exec. time out of the whole workflow&amp;rsquo;s exec. time, &lt;strong>x-axis:&lt;/strong> top 10 applications accounting for 97% exec. time, sorted by proportion. Other applications are aggregated.&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>We confirmed GenScale’s capability to produce useful data by executing a DNA alignment workflow and capturing its runtime resource usage. We use &lt;a href="https://github.com/NCI-GDC/gdc-dnaseq-cwl" target="_blank" rel="noopener">Genomics Data Commons’ (GDC) DNA alignment workflow&lt;/a> as reference, which has a total of 27 applications ranging from quality check, read trimming, actual alignment, indexing, and various metrics collection. We wrote our own simplified version of the workflow by first analyzing the execution time &amp;amp; resource usage of each application, then we chose 10 applications which represents 97% of the workflow execution time. We took into account that containerization is the de-facto standard for workflow execution among the bioinformatics community. Thus, we packaged each application as its own separate container, then hosted their Dockerfiles &amp;amp; containers in a private Github Container Registry (GHCR). We plan to make them public in the future. Our monitoring system is able to show resource usage in real time. We also built sidecar containers which use Unix’s pidstats to generate a CSV of cores, memory, and storage utilization throughout each workflow’s execution. This will allow easier analysis and data sharing for GenScale’s users.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:center">
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="" srcset="
/report/osre24/uga/genomicswf/20240721-martinputra/bwa_picardwgs_picardvalidate-cpu_hu75193f344cb2afdf6e001b1bc5f51540_1054163_7dea08952ec6bc07cee0579c31500d17.webp 400w,
/report/osre24/uga/genomicswf/20240721-martinputra/bwa_picardwgs_picardvalidate-cpu_hu75193f344cb2afdf6e001b1bc5f51540_1054163_0a311fc327ad5f4a739e574c86795b70.webp 760w,
/report/osre24/uga/genomicswf/20240721-martinputra/bwa_picardwgs_picardvalidate-cpu_hu75193f344cb2afdf6e001b1bc5f51540_1054163_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uga/genomicswf/20240721-martinputra/bwa_picardwgs_picardvalidate-cpu_hu75193f344cb2afdf6e001b1bc5f51540_1054163_7dea08952ec6bc07cee0579c31500d17.webp"
width="760"
height="209"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:center">&lt;strong>Figure 2.&lt;/strong> CPU utilization pattern of &lt;a href="https://github.com/lh3/bwa" target="_blank" rel="noopener">BWA&lt;/a>, &lt;a href="https://gatk.broadinstitute.org/hc/en-us/articles/360037269351-CollectWgsMetrics-Picard" target="_blank" rel="noopener">Picard&amp;rsquo;s CollectWGSMetrics&lt;/a>, and &lt;a href="https://gatk.broadinstitute.org/hc/en-us/articles/360036854731-ValidateSamFile-Picard" target="_blank" rel="noopener">Picard&amp;rsquo;s ValidateSamFile&lt;/a> collected by &lt;em>GenScale&lt;/em>. &lt;strong>y-axis&lt;/strong>: &lt;em>(num. cores) x 100%&lt;/em>, &lt;strong>x-axis&lt;/strong>: time elapsed in seconds.&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>One technical challenge is in automating the creation of Kubernetes cluster and in keeping it alive. We believe GenScale’s users would be interested in the performance of workflows under dynamic cluster sizes, either due to intentional scaling or machine failures. While the current prototype supports creating a cluster with arbitrary nodes, there are still steps which require a reboot when adding nodes. This makes cluster creation and horizontal scaling not fully automated yet. Keeping a cluster alive is also expensive. Since we use ChameleonCloud as our testbed, we have a choice of either keeping the cluster alive at the cost of significant service units (SU) usage, or save SUs by terminating our leases at the cost of rebuilding the cluster from scratch later. We choose a middle ground by keeping only Kubernetes’ control plane alive. The approach works well so far.&lt;/p>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;p>For the remaining weeks, we plan to work on the second workflow, namely &lt;a href="https://github.com/NCI-GDC/gdc-rnaseq-cwl" target="_blank" rel="noopener">RNA Alignment&lt;/a>. We would also like to add simple user interfaces if time permits. Finally, we plan to package GenScale’s source code, container images, and sample benchmark results for the open-source community. We look forward to the second half of Summer of Reproducibility!&lt;/p></description></item><item><title>Midterm Blog: ML in Detecting and Addressing System Drift</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/last/20240721-joanna/</link><pubDate>Sun, 21 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/last/20240721-joanna/</guid><description>&lt;p>Hello! I&amp;rsquo;m Joanna! Over the past month, I have been contributing to the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/anl/last">ML in Detecting and Addressing System Drift&lt;/a> project under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ray-andrew-sinurat/">Ray Andrew Sinurat&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sandeep-madireddy/">Sandeep Madireddy&lt;/a>. My project aims to design a pipeline to evaluate drift detection algorithms on system traces. The goal is to characterize different drifts, understand how they affect model performance, and evaluate the performance of state-of-the-art (SOTA) drift detection algorithms.&lt;/p>
&lt;h1 id="progress">Progress&lt;/h1>
&lt;p>Here is some background on my project: Model drift, or the degradation of model performance, is typically caused by data drift, which is a shift in the input distribution, and concept drift, which is a change in the relationship between input and output. The project aims to detect both data drifts and concept drifts, analyze these drifts, and try to improve the model performance in computer system.&lt;/p>
&lt;p>Over the past month, I’ve primarily been constructing a data drift dataset from the Tencent I/O block trace, which includes both drift and non-drift data. By combining offline drift detection algorithms such as Maximum Mean Discrepancy, Cramér-von Mises, and Kolmogorov-Smirnov, I am developing a dataset that contains segments with and without drifts for features such as IOPS (Input/Output Operations Per Second), read/write size ratio, write size, and other relevant performance metrics. The diagrams below illustrate the data segments identified with and without drifts, respectively.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Drift Data" srcset="
/report/osre24/anl/last/20240721-joanna/drift_hucf4bc0bd843fb60be6a646f8116e435c_702790_3e192a352fe3c303df3195f4c92fe970.webp 400w,
/report/osre24/anl/last/20240721-joanna/drift_hucf4bc0bd843fb60be6a646f8116e435c_702790_174dedff6f5c0b72b3d10b82cb9d1a86.webp 760w,
/report/osre24/anl/last/20240721-joanna/drift_hucf4bc0bd843fb60be6a646f8116e435c_702790_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/last/20240721-joanna/drift_hucf4bc0bd843fb60be6a646f8116e435c_702790_3e192a352fe3c303df3195f4c92fe970.webp"
width="760"
height="752"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Non-Drift Data" srcset="
/report/osre24/anl/last/20240721-joanna/nondrift_hue8c599cc81a10cc02482d676af5d2cf8_455606_3d133279bb8e32779c8b94396ec0ef5d.webp 400w,
/report/osre24/anl/last/20240721-joanna/nondrift_hue8c599cc81a10cc02482d676af5d2cf8_455606_c450f1394af9661bb2d86f0232d340d1.webp 760w,
/report/osre24/anl/last/20240721-joanna/nondrift_hue8c599cc81a10cc02482d676af5d2cf8_455606_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/last/20240721-joanna/nondrift_hue8c599cc81a10cc02482d676af5d2cf8_455606_3d133279bb8e32779c8b94396ec0ef5d.webp"
width="760"
height="757"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>In addition to constructing the datasets, I have begun evaluating some online drift detection algorithms and designing metrics to assess their performance. I have tested the performance of online drift detection algorithms such as Online Maximum Mean Discrepancy and Online Cramér-von Mises under various settings, including different window lengths and sensitivity levels. The following diagrams illustrate the drift points detected for the IOPS feature under these different settings.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Evaluation" srcset="
/report/osre24/anl/last/20240721-joanna/evaluation_hu21317d2d7888d01b0cbee9e7a13940af_724895_51a49130183b2dfa7ad977d297aa0f3b.webp 400w,
/report/osre24/anl/last/20240721-joanna/evaluation_hu21317d2d7888d01b0cbee9e7a13940af_724895_a3c8bf9bdc160fc4323a73bc0ac837b7.webp 760w,
/report/osre24/anl/last/20240721-joanna/evaluation_hu21317d2d7888d01b0cbee9e7a13940af_724895_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/last/20240721-joanna/evaluation_hu21317d2d7888d01b0cbee9e7a13940af_724895_51a49130183b2dfa7ad977d297aa0f3b.webp"
width="760"
height="584"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h1 id="next-steps">Next Steps&lt;/h1>
&lt;p>Here are my plans for the next month:&lt;/p>
&lt;ul>
&lt;li>Complete the experiments on data drift and generate improved visualizations to summarize the performance of these online drift detection algorithms, including their overhead and accuracy over time.&lt;/li>
&lt;li>Characterize drifts by identifying the types of drifts that lead to model performance degradation&lt;/li>
&lt;li>Evaluate drift detection algorithms in the context of concept drifts.&lt;/li>
&lt;/ul>
&lt;p>Stay tuned for my future updates on this project!&lt;/p></description></item><item><title>Enabling VAA Execution: Environment and VAA Preparation and/or Reproducibility for Dynamic Bandwidth Allocation (CONCIERGE)</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/edgerep/20240720-rafaelsw/</link><pubDate>Sat, 20 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/edgerep/20240720-rafaelsw/</guid><description>&lt;p>Hi there!&lt;/p>
&lt;p>I am Rafael Sinjunatha Wulangsih, a Telecommunication Engineering graduate from the Bandung Institute of Technology (ITB), Bandung, Indonesia. I&amp;rsquo;m currently contributing to the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uchicago/edgerep">&amp;ldquo;EdgeRep: Reproducing and benchmarking edge analytic systems&amp;rdquo;&lt;/a> project under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/yuyang-roy-huang/">Yuyang (Roy) Huang&lt;/a> and Prof. &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/junchen-jiang/">Junchen Jiang&lt;/a>. You can find more details about the project proposal &lt;a href="https://drive.google.com/file/d/1GUMiglFqezOqEeQiMaL4QVgsXZOHYoEK/view?usp=drive_link" target="_blank" rel="noopener">here&lt;/a>.&lt;/p>
&lt;p>This project addresses the challenges posed by the massive deployment of edge devices, such as traffic or security cameras, in smart cities and other environments. In the previous Edgebench project, the team proposed a solution to dynamically allocate bandwidth and compute resources to video analytic applications (VAAs) running on edge devices. However, that project was limited to a single VAA, which may not represent the diverse applications running on edge devices. Therefore, the main goal of this project, &amp;ldquo;EdgeRep,&amp;rdquo; is to diversify the VAAs running on edge devices while utilizing a solution similar to that of the Edgebench project. EdgeRep aims to reproduce state-of-the-art self-adaptive VAAs (with seven candidates) and maintain self-adaptation in these video analytics pipelines. We will implement it ourselves if the video analytics applications do not support self-adaptation.&lt;/p></description></item><item><title>Halfway Through GSOC: Heterogeneous Graph Neural Networks for I/O Performance Bottleneck Diagnosis</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/lbl/aiio/20240720-mahdi/</link><pubDate>Sat, 20 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/lbl/aiio/20240720-mahdi/</guid><description>&lt;p>Hello, I&amp;rsquo;m &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mahdi-banisharifdehkordi/">Mahdi Banisharifdehkordi&lt;/a>, a Ph.D. student in Computer Science at Iowa State University. I&amp;rsquo;m currently working on the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/lbl/aiio/">AIIO / Graph Neural Network&lt;/a> project under the guidance of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/bin-dong/">Bin Dong&lt;/a> and Suren Byna. Our project focuses on enhancing the AIIO framework to automatically diagnose I/O performance bottlenecks in high-performance computing (HPC) systems using Graph Neural Networks (GNNs).&lt;/p>
&lt;h1 id="project-overview">Project Overview&lt;/h1>
&lt;p>Our primary goal is to tackle the persistent issue of I/O bottlenecks in HPC applications. Identifying these bottlenecks manually is often labor-intensive and prone to errors. By integrating GNNs into the AIIO framework, we aim to create an automated solution that can diagnose these bottlenecks with high accuracy, ultimately improving the efficiency and reliability of HPC systems.&lt;/p>
&lt;h1 id="progress-and-challenges">Progress and Challenges&lt;/h1>
&lt;p>Over the past few weeks, my work has been centered on developing a robust data pre-processing pipeline. This pipeline is crucial for converting raw I/O log data into a graph format suitable for GNN analysis. The data pre-processing involves extracting relevant features from Darshan I/O logs, which include job-related information and performance metrics. One of the main challenges has been dealing with the heterogeneity and sparsity of the data, which can affect the accuracy of our models. To address this, we&amp;rsquo;ve focused on using correlation analysis to identify and select the most relevant features, ensuring that the dataset is well-structured and informative for GNN processing.&lt;/p>
&lt;p>We&amp;rsquo;ve also started constructing the GNN model. The model is designed to capture the complex relationships between different I/O operations and their impact on system performance. This involves defining nodes and edges in the graph that represent job IDs, counter types, and their values. We explored different graph structures, including those that focus on counter types and those that incorporate more detailed information. While more detailed graphs offer better accuracy, they also require more computational resources.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Overview" srcset="
/report/osre24/lbl/aiio/20240720-mahdi/overview_hu3b8a0374313d077c49f26c894c548b00_437453_efa6bf6f7434ca74fff6a35fcb540861.webp 400w,
/report/osre24/lbl/aiio/20240720-mahdi/overview_hu3b8a0374313d077c49f26c894c548b00_437453_de1d11a65f3f46dfd75b1bc00e8e6406.webp 760w,
/report/osre24/lbl/aiio/20240720-mahdi/overview_hu3b8a0374313d077c49f26c894c548b00_437453_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/lbl/aiio/20240720-mahdi/overview_hu3b8a0374313d077c49f26c894c548b00_437453_efa6bf6f7434ca74fff6a35fcb540861.webp"
width="760"
height="566"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h1 id="current-achievements">Current Achievements&lt;/h1>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Data Pre-processing Pipeline&lt;/strong>: We have successfully developed and tested the pipeline to transform Darshan I/O logs into graph-structured data. This was a significant milestone, as it sets the foundation for all subsequent GNN modeling efforts.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>GNN Model Construction&lt;/strong>: The initial version of our GNN model has been implemented. This model is now capable of learning from the graph data and making predictions about I/O performance bottlenecks.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Correlation Analysis for Graph Structure Design&lt;/strong>: We have used correlation analysis on the dataset to understand the relationships between I/O counters. This analysis has been instrumental in designing a more effective graph structure, helping to better capture the dependencies and interactions critical for accurate performance diagnosis.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Correlation Analysis1" srcset="
/report/osre24/lbl/aiio/20240720-mahdi/correlation1_huf0ba9e5fcd08c89560bf3e668ac22994_763024_211eb50374f4febd5aee688644797792.webp 400w,
/report/osre24/lbl/aiio/20240720-mahdi/correlation1_huf0ba9e5fcd08c89560bf3e668ac22994_763024_fd5992e42a60d6cb85be9cd136a5d93b.webp 760w,
/report/osre24/lbl/aiio/20240720-mahdi/correlation1_huf0ba9e5fcd08c89560bf3e668ac22994_763024_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/lbl/aiio/20240720-mahdi/correlation1_huf0ba9e5fcd08c89560bf3e668ac22994_763024_211eb50374f4febd5aee688644797792.webp"
width="760"
height="614"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Correlation Analysis2" srcset="
/report/osre24/lbl/aiio/20240720-mahdi/correlation2_hu550eeb7f303ef774f36732146058c5a3_277912_b05324cc90f73bd1b2ff53c9d2d04ecb.webp 400w,
/report/osre24/lbl/aiio/20240720-mahdi/correlation2_hu550eeb7f303ef774f36732146058c5a3_277912_0115179de349c5834c2b3fc2636ecd23.webp 760w,
/report/osre24/lbl/aiio/20240720-mahdi/correlation2_hu550eeb7f303ef774f36732146058c5a3_277912_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/lbl/aiio/20240720-mahdi/correlation2_hu550eeb7f303ef774f36732146058c5a3_277912_b05324cc90f73bd1b2ff53c9d2d04ecb.webp"
width="760"
height="309"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;ol start="4">
&lt;li>&lt;strong>Training for Different Graph Structures&lt;/strong>: We are currently training our model using various graph structures to determine the most effective configuration for accurate I/O performance diagnosis. This ongoing process aims to refine our approach and improve the model&amp;rsquo;s predictive accuracy.&lt;/li>
&lt;/ol>
&lt;h1 id="next-steps">Next Steps&lt;/h1>
&lt;p>Looking ahead, we plan to focus on several key areas:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Refinement and Testing&lt;/strong>: We&amp;rsquo;ll continue refining the GNN model, focusing on improving its accuracy and efficiency. This includes experimenting with different graph structures and training techniques.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>SHAP Analysis&lt;/strong>: To enhance the interpretability of our model, we&amp;rsquo;ll incorporate SHAP (SHapley Additive exPlanations) values. This will help us understand the contribution of each feature to the model&amp;rsquo;s predictions, making it easier to identify critical factors in I/O performance.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Documentation and Community Engagement&lt;/strong>: As we make progress, we&amp;rsquo;ll document our methods and findings, sharing them with the broader community. This includes contributing to open-source repositories and engaging with other researchers in the field.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>This journey has been both challenging and rewarding, and I am grateful for the support and guidance from my mentors and the community. I look forward to sharing more updates as we continue to advance this exciting project.&lt;/p></description></item><item><title>Optimizing Scientific Data Streaming: Developing Reproducible Benchmarks for High-Speed Memory-to-Memory Data Transfer over SciStream</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/scistream/20240720-kraislaik/</link><pubDate>Sat, 20 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/scistream/20240720-kraislaik/</guid><description>&lt;p>Hello there! I&amp;rsquo;m Acheme and I&amp;rsquo;m thrilled to share the progress on my project, &amp;ldquo;Optimizing Scientific Data Streaming: Developing Reproducible Benchmarks for High-Speed Memory-to-Memory Data Transfer over SciStream&amp;rdquo; under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/joaquin-chung/">Joaquin Chung&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/flavio-castro/">Flavio Castro&lt;/a> under the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/anl/scistream/">SciStream&lt;/a> project.&lt;/p>
&lt;h1 id="project-overview">Project Overview&lt;/h1>
&lt;p>This project aims to develop SciStream-bench, a set of benchmarks and artifacts designed to precisely evaluate the performance of scientific streaming applications across diverse traffic patterns when running over the SciStream framework.&lt;/p>
&lt;h1 id="progress">Progress&lt;/h1>
&lt;p>One of the first points of call in the project was consultation with SciStream team members working at Argonne to identify use cases in scientific streaming applications and what typical traffic profiles they represent. The goal was to simulate these profiles using traffic generator tools and network configuration of network resources on the FABRIC/Chameleon testbed. The following traffic profiles were identified to meet many use-cases including one of the ESnet’s broad categorization, “The Time-Sensitive Pattern”, in integrated research workflows:&lt;/p>
&lt;ol>
&lt;li>Throughput intensive startup&lt;/li>
&lt;li>Intermittent burst of traffic for a duration of time&lt;/li>
&lt;li>Constant rate traffic&lt;/li>
&lt;li>Latency sensitive&lt;/li>
&lt;/ol>
&lt;p>Since data streaming applications have some unique requirements for optimum performance, the following metrics were selected as important for testing streaming performance.&lt;/p>
&lt;ol>
&lt;li>Latency&lt;/li>
&lt;li>Jitter&lt;/li>
&lt;li>Packet loss / message loss&lt;/li>
&lt;li>Throughput&lt;/li>
&lt;/ol>
&lt;p>Subsequently, about seventeen open-source traffic generator applications were identified and compared to determine a few suitable ones for generating our defined traffic profiles and that expose the desired performance metrics.
We ultimately settled on iperf3 and pvaPy (a scientific streaming application developed at Argonne National Lab)
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Traffic generator selection" srcset="
/report/osre24/anl/scistream/20240720-kraislaik/traffic_gen_table_hu1de203c8f0a7ec60f5933544490e9409_166049_e6d6efa6a70c4df7ca728d23dc563c54.webp 400w,
/report/osre24/anl/scistream/20240720-kraislaik/traffic_gen_table_hu1de203c8f0a7ec60f5933544490e9409_166049_26e55280da7d2ab0a44eba5e07d9d657.webp 760w,
/report/osre24/anl/scistream/20240720-kraislaik/traffic_gen_table_hu1de203c8f0a7ec60f5933544490e9409_166049_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/scistream/20240720-kraislaik/traffic_gen_table_hu1de203c8f0a7ec60f5933544490e9409_166049_e6d6efa6a70c4df7ca728d23dc563c54.webp"
width="760"
height="667"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>So far, the first set of tools for benchmarking using iperf3 as traffic generator with profiles of constant rate and intermittent bursts have been developed, the tools generate traffic, collects the metrics that iperf3 exposes metrics including throughput, jitter and datagram losses, and saved to a csv file for further analysis. A Jupyter notebook is used to setup a FABRIC slice and configure a four-node experiment suitable for benchmarking SciStream base architecture. After running the experiments on the nodes on FABRIC and collecting results in a CSV file, cells in the Jupyter notebook were coded to analyze the data.
In the analysis includes average, min, max and standard deviation of the various metric performances.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Average performance analysis" srcset="
/report/osre24/anl/scistream/20240720-kraislaik/average_analysis_hub26a1bd8be2bfada2c95935ad89433f1_90557_28252818b6ed679b5bf748d6df51a729.webp 400w,
/report/osre24/anl/scistream/20240720-kraislaik/average_analysis_hub26a1bd8be2bfada2c95935ad89433f1_90557_c45bee687ef4762a87c2455789146763.webp 760w,
/report/osre24/anl/scistream/20240720-kraislaik/average_analysis_hub26a1bd8be2bfada2c95935ad89433f1_90557_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/scistream/20240720-kraislaik/average_analysis_hub26a1bd8be2bfada2c95935ad89433f1_90557_28252818b6ed679b5bf748d6df51a729.webp"
width="760"
height="728"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Minimum performance analysis" srcset="
/report/osre24/anl/scistream/20240720-kraislaik/min_hue760cbb8515e83e802b6292f194d0407_90680_ab7516f0f11772bbbfee6b83998545a1.webp 400w,
/report/osre24/anl/scistream/20240720-kraislaik/min_hue760cbb8515e83e802b6292f194d0407_90680_031545108de74ea71be7a8c9f221286d.webp 760w,
/report/osre24/anl/scistream/20240720-kraislaik/min_hue760cbb8515e83e802b6292f194d0407_90680_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/scistream/20240720-kraislaik/min_hue760cbb8515e83e802b6292f194d0407_90680_ab7516f0f11772bbbfee6b83998545a1.webp"
width="760"
height="739"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Maximum performance analysis" srcset="
/report/osre24/anl/scistream/20240720-kraislaik/max_hud1db05134f576847a7f30efa01c43981_86991_8c9589d9aceb571572d05f6f1c20f03b.webp 400w,
/report/osre24/anl/scistream/20240720-kraislaik/max_hud1db05134f576847a7f30efa01c43981_86991_f2f60074ded7d8cf873fb29edc8ed917.webp 760w,
/report/osre24/anl/scistream/20240720-kraislaik/max_hud1db05134f576847a7f30efa01c43981_86991_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/scistream/20240720-kraislaik/max_hud1db05134f576847a7f30efa01c43981_86991_8c9589d9aceb571572d05f6f1c20f03b.webp"
width="760"
height="725"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="STD performance analysis" srcset="
/report/osre24/anl/scistream/20240720-kraislaik/std_hu03e1d4d764aea6b18d9fe63b88967d71_86805_0dedae6ede701f5dd0913e39862edcbb.webp 400w,
/report/osre24/anl/scistream/20240720-kraislaik/std_hu03e1d4d764aea6b18d9fe63b88967d71_86805_459f5f4ecf29042cd486cd7465f9e6b8.webp 760w,
/report/osre24/anl/scistream/20240720-kraislaik/std_hu03e1d4d764aea6b18d9fe63b88967d71_86805_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/scistream/20240720-kraislaik/std_hu03e1d4d764aea6b18d9fe63b88967d71_86805_0dedae6ede701f5dd0913e39862edcbb.webp"
width="760"
height="719"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h1 id="findings">Findings&lt;/h1>
&lt;p>From the experiments conducted so far, the findings are as follows:&lt;/p>
&lt;ul>
&lt;li>We could not properly simulate some of the listed traffic profiles initially defined: for example, to simulate a latency-sensitive traffic profile, we needed the ability to set timeouts in iperf3 which is not available at the moment&lt;/li>
&lt;li>It is not straightforward to implement SciStream on the Chameleon testbed at the moment.&lt;/li>
&lt;li>Iperf3 does not expose the latency metric and the jitter computation is suspect.&lt;/li>
&lt;/ul>
&lt;h1 id="next-steps">Next Steps&lt;/h1>
&lt;p>Similar to the iperf3-based benchmarking tool developed and the analysis tools, I will focus next on pvaPy:&lt;/p>
&lt;ul>
&lt;li>Fully develop traffic generator and metric collection tools for pvaPy for the defined traffic profiles and exposing the chosen metrics&lt;/li>
&lt;li>Perform initial experiment like for iperf3 before&lt;/li>
&lt;li>Repeat both iperf3 and pvaPy-based benchmarking operation in multiple scenario (LAN, METRO, WAN), compare performance and explain results.&lt;/li>
&lt;/ul>
&lt;p>Stay tuned for my final blog as I present deeper results and insights!&lt;/p></description></item><item><title>Reproducibility in Data Visualization</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/niu/repro-vis/20240719-triveni5/</link><pubDate>Fri, 19 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/niu/repro-vis/20240719-triveni5/</guid><description>&lt;p>Hello everyone!&lt;/p>
&lt;p>I&amp;rsquo;m Triveni, a Master&amp;rsquo;s student in Computer Science at Northern Illinois University (NIU). I&amp;rsquo;m excited to share my progress on the OSRE 2024 project &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/niu/repro-vis/">Categorize Differences in Reproduced Visualizations&lt;/a> focusing on data visualization reproducibility. Working under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/david-koop/">David Koop&lt;/a>, I&amp;rsquo;ve made some significant strides and faced some interesting challenges.&lt;/p>
&lt;h2 id="initial-approach-and-challenges">Initial Approach and Challenges&lt;/h2>
&lt;p>I began my work by comparing original visualizations with reproduced ones using OpenCV for pixel-level comparison. This method helped highlight structural differences but also brought to light some challenges. Different versions of libraries rendered visualizations slightly differently, causing minor positional changes that didn&amp;rsquo;t affect the overall message but were still flagged as discrepancies.&lt;/p>
&lt;p>To address this, I experimented with machine learning models like VGG16, ResNet, and Detectron2. These models are excellent for general image recognition but fell short for our specific needs with charts and visualizations. The results were not as accurate as I had hoped, primarily because these models aren&amp;rsquo;t tailored to handle the unique characteristics of data visualizations.&lt;/p>
&lt;h2 id="shifting-focus-to-chart-specific-models">Shifting Focus to Chart-Specific Models&lt;/h2>
&lt;p>Recognizing the limitations of general ML models, I shifted my focus to chart-specific models like ChartQA, ChartOCR, and ChartReader. These models are designed to understand and summarize chart data, making them more suitable for our goal of comparing visualizations based on the information they convey.&lt;/p>
&lt;h2 id="generating-visualization-variations-and-understanding-human-perception">Generating Visualization Variations and Understanding Human Perception&lt;/h2>
&lt;p>Another exciting development in my work has been generating different versions of visualizations. This will allow me to create a survey to collect human categorization of visualizations. By understanding how people perceive differences whether it&amp;rsquo;s outliers, shapes, data points, or colors. We can gain insights into what parameters impact human interpretation of visualizations.&lt;/p>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;p>Moving forward, I&amp;rsquo;ll continue to delve into chart-specific models to refine our comparison techniques. Additionally, the survey will provide valuable data on human perception, which can be used to improve our automated comparison methods. By combining these approaches, I hope to create a robust framework for reliable and reproducible data visualizations.&lt;/p>
&lt;p>I&amp;rsquo;m thrilled about the progress made so far and eager to share more updates with you all. Stay tuned for more insights and developments on this exciting journey!&lt;/p></description></item><item><title> Reproducibility in Data Visualization</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/niu/repro-vis/20240718-aryas/</link><pubDate>Thu, 18 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/niu/repro-vis/20240718-aryas/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Hello! My name is &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/arya-sarkar/">Arya Sarkar&lt;/a> a machine learning engineer and researcher based out of Kolkata, a city in Eastern India dubbed the City of Joy.
For the last month and a half I have been working closely with Professor &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/david-koop/">David Koop&lt;/a> on the project titled &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/niu/repro-vis/">Reproducibility in Data Visualization&lt;/a>. I’m thrilled to be able to make my own little mark on this amazing project and aid in exploring solutions to capture visualizations in hopes of making reproducibility easier in this domain.&lt;/p>
&lt;h2 id="progress-and-challenges">Progress and Challenges&lt;/h2>
&lt;p>The last month and a half have mostly been spent trying to explore best possible solutions to facilitate the reproducibility of STATIC visualizations from local sources and/or the web.
We have taken inspiration from existing work in the domain and successfully captured meta-information required to ensure reproducibility in the regenerated visualizations from the said metadata. The metadata extracted is saved into the generated .png figure of the visualization therefore allowing reproducibility as long as you have (a) The original dataset (b) The generated .png of the visualization. Every other information is stored inside the .png file as a json object and can be used to regenerate the original image with a very high accuracy.&lt;/p>
&lt;p>The problem however remains with visualizations where randomness such as jitter is involved. Capturing the randomness has not been 100% successful as of now, and we are looking into options to ensure the capture of certain plots that contains randomness.&lt;/p>
&lt;p>The following images can be used to highlight some results from our reproducibility experiments:
Original Histogram using Matplotlib on the iris dataset:
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="original_figure4" srcset="
/report/osre24/niu/repro-vis/20240718-aryas/original_histogram_hua04132746cb0ed26b86c32673b823c8f_29642_4d5ccda2a3e4409f5fb5bfccad4abae9.webp 400w,
/report/osre24/niu/repro-vis/20240718-aryas/original_histogram_hua04132746cb0ed26b86c32673b823c8f_29642_3d4477374e3469fd72bbb32675129816.webp 760w,
/report/osre24/niu/repro-vis/20240718-aryas/original_histogram_hua04132746cb0ed26b86c32673b823c8f_29642_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/niu/repro-vis/20240718-aryas/original_histogram_hua04132746cb0ed26b86c32673b823c8f_29642_4d5ccda2a3e4409f5fb5bfccad4abae9.webp"
width="760"
height="468"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
Reproduced Histogram using metainformation from the original:
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="reproduced_figure4" srcset="
/report/osre24/niu/repro-vis/20240718-aryas/Reproduced_histogram_hub205e2d6c877abb784c35befc8616823_26597_9ca3975509f66dbedf2746a253660ec4.webp 400w,
/report/osre24/niu/repro-vis/20240718-aryas/Reproduced_histogram_hub205e2d6c877abb784c35befc8616823_26597_ca77d573979d523935009285864d087b.webp 760w,
/report/osre24/niu/repro-vis/20240718-aryas/Reproduced_histogram_hub205e2d6c877abb784c35befc8616823_26597_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/niu/repro-vis/20240718-aryas/Reproduced_histogram_hub205e2d6c877abb784c35befc8616823_26597_9ca3975509f66dbedf2746a253660ec4.webp"
width="760"
height="490"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="the-next-steps">The next steps&lt;/h2>
&lt;p>We have already started looking into solutions and ways to capture visualizations from the web i.e. from platforms such as ObservableHq and use these experiments to transition into capturing interactive visualizations from the web.&lt;/p>
&lt;p>Capturing user interactions and all states in an interactive visualization can prove to be very useful as it is a very known pain-point in the reproducibility community and has been a challenge that needs to be solved. My next steps involve working on finding a solution to capture these interactive visualizations especially those living on the web and ensuring their reproducibility.&lt;/p></description></item><item><title>Halfway Through GSOC: My Experience and Learnings</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uci/benchmarkst/20240718-qianru/</link><pubDate>Thu, 18 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uci/benchmarkst/20240718-qianru/</guid><description>&lt;p>Hello there! I&amp;rsquo;m Qianru, and this is my mid-term blog post for the 2024 Google Summer of Code. I am working on the BenchmarkST project, focusing on benchmarking gene imputation methods in spatial transcriptomics. My goal is to create a comprehensive, reproducible platform for evaluating these methods across various datasets and conditions.&lt;/p>
&lt;p>In this post, I will share some of the progress I have made so far, the challenges I have faced, and how I overcame them. I will also highlight some specific accomplishments and what I plan to do next.&lt;/p>
&lt;hr>
&lt;h3 id="achievements">Achievements:&lt;/h3>
&lt;ol>
&lt;li>&lt;strong>Developed the Python Package:&lt;/strong> I created the &amp;ldquo;Impeller&amp;rdquo; Python package, which includes tools for downloading example data, processing it, and training models. This package aims to standardize gene imputation tasks in spatial transcriptomics.&lt;/li>
&lt;li>&lt;strong>Example Data Integration:&lt;/strong> Successfully integrated various spatial transcriptomics datasets into the package for benchmarking purposes.&lt;/li>
&lt;li>&lt;strong>Benchmarking Framework:&lt;/strong> Established a framework for objective comparison of different gene imputation methodologies.&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Python Package: Installation and Usage&lt;/strong>&lt;/p>
&lt;p>You can install the package using pip:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">pip install Impeller
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Download Example Data&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">from Impeller import download_example_data
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">download_example_data&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Load and Process Data&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">from Impeller import load_and_process_example_data, val_mask, test_mask, x, &lt;span class="nv">original_x&lt;/span> &lt;span class="o">=&lt;/span> load_and_process_example_data&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Train Model&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">from Impeller import create_args, train &lt;span class="nv">args&lt;/span> &lt;span class="o">=&lt;/span> create_args&lt;span class="o">()&lt;/span>,test_l1_distance, test_cosine_sim, &lt;span class="nv">test_rmse&lt;/span> &lt;span class="o">=&lt;/span> train&lt;span class="o">(&lt;/span>args, data, val_mask, test_mask, x, original_x&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;hr>
&lt;h3 id="challenges">Challenges:&lt;/h3>
&lt;p>Reproducing the results of various gene imputation methods was not an easy task. I faced several challenges along the way:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Lack of Standardized Data:&lt;/strong> Some methods had incomplete or missing code, making it difficult to reproduce their results accurately.&lt;/li>
&lt;li>&lt;strong>Reproducibility Issues:&lt;/strong> Successfully integrated various spatial transcriptomics datasets into the package for benchmarking purposes.&lt;/li>
&lt;li>&lt;strong>Resource Limitations:&lt;/strong> Running large-scale experiments required significant computational resources, which posed constraints on the project timeline.&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h3 id="future-work">Future Work:&lt;/h3>
&lt;p>Moving forward, I plan to:&lt;/p>
&lt;ol>
&lt;li>Extend the package&amp;rsquo;s functionalities to include more datasets and imputation methods.&lt;/li>
&lt;li>Enhance the benchmarking framework for more comprehensive evaluations.&lt;/li>
&lt;li>Collaborate with other researchers to validate and improve the package&amp;rsquo;s utility in the bioinformatics community.&lt;/li>
&lt;/ol>
&lt;hr>
&lt;p>I hope you found this update informative and interesting. If you have any questions or feedback, please feel free to contact me. Thank you for your attention and support!&lt;/p></description></item><item><title>Mid Term Blog: FEP-Bench: Benchmarking for Enhanced Feature Engineering and Preprocessing in Machine Learning</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/fep_bench/20240718-jaycezhu/</link><pubDate>Thu, 18 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/fep_bench/20240718-jaycezhu/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Hello, I’m &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/lihaowen-jayce-zhu/">Lihaowen (Jayce) Zhu&lt;/a>, a 2024 SoR contributor for the FEP-bench project, under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/yuyang-roy-huang/">Yuyang (Roy) Huang&lt;/a>. The FEP-Bench project proposes to address the significant bottlenecks encountered during this phase, particularly focusing on the challenges posed by data retrieval from data lakes and computational inefficiencies in data operations. By exploring innovative caching, prefetching, and heuristic strategies, this proposal aims to optimize the preprocessing workflow, thereby enhancing efficiency and reducing the required resources of ML projects.&lt;/p>
&lt;h2 id="motivation">Motivation&lt;/h2>
&lt;p>Our research project is based on the context of Deep Neural Networks. To train a DNN, we first need a large amount of data. All raw data must be preprocessed by a data preprocessing pipeline, which is specific to different ML tasks. As usual, in a preprocessing pipeline, the data must be loaded from the disk and converted to the correct format, transformed and augmented. And then, it can be fed into the training stage. In common ML training tasks and datasets, the data preprocessing stage can consume almost 65% of the total training time. However, compared with the fast development of computing hardware including GPUs and TPUs, the speed of data preprocessing pipelines has not been improved by a lot and cannot keep up with these hardware innovations, which leads to a bottleneck in the efficiency of Deep Neural Network training.&lt;/p>
&lt;p>The bottlenecks can be divided into 2 categories: the data side and the computation side. The data side bottleneck is mainly caused by the data transfer in the system, including data fetching, I/O bound, huge size of data, and complex data format. However, the computation side bottleneck can always happen during data preprocessing operations and data shuffling. For distributed Machine Learning training systems, gathering the distributed data can also lead to the computation side bottleneck.&lt;/p>
&lt;h2 id="current-progress">Current Progress&lt;/h2>
&lt;p>In order to improve the efficiency of the machine learning preprocessing pipeline, we first need to understand and document the preprocessing workflows commonly used in machine learning, including pipelines of Natural Language Processing, Computer Vision, and Audio datasets. As a result, for the past month, we have built up a collection of common datasets for different machine learning tasks. The dataset types include NLP, CV, Audio, Linear Regression, Video and LiDAR. The machine learning job types are collected based on the dataset types, such as sentiment analysis for NLP, and image classification for CV. The data has either a structured or unstructured format. In addition, our collection contains the following attributes:&lt;/p>
&lt;ul>
&lt;li>Data/Sample size&lt;/li>
&lt;li>Typical preprocessing operations&lt;/li>
&lt;li>Preprocessing difficulty: hard/easy&lt;/li>
&lt;li>Input splittable&lt;/li>
&lt;li>Output reusable&lt;/li>
&lt;li>CPU/GPU/IO Bound&lt;/li>
&lt;li>Dataset and preprocessing links.&lt;/li>
&lt;/ul>
&lt;p>By collecting all this data, we can gain an overview of all common preprocessing pipelines in the current machine learning research field, and build up a solid basis for the next phase of our project, which requires hard work on benchmark profiling. For example, for the Audio datasets, we focus on the LibriSpeech dataset. It contains 1000 hours of speech sampled at 16kHz, making it one of the largest publicly available datasets for speech recognition tasks. The typical preprocessing steps of the LibriSpeech dataset include feature extraction, label to integer conversion, and padding.&lt;/p>
&lt;h2 id="challenges">Challenges&lt;/h2>
&lt;p>During the first phase of the project, I met a lot of challenges as I had not been exposed to topics similar to this project. The first big problem was that I needed to learn the concepts of some machine learning tasks from scratch, such as NLP, so that I could have a better understanding of the common datasets and pipelines. Also, I needed to deeply review a lot of different preprocessing pipelines for each machine learning task, to make the table more comprehensive.&lt;/p></description></item><item><title>Midterm Blogpost: Drift Management Strategies Benchmark</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/last/20240718-williamn/</link><pubDate>Thu, 18 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/last/20240718-williamn/</guid><description>&lt;p>Hello there! I&amp;rsquo;m William and I&amp;rsquo;m thrilled to share the progress on my project, &amp;ldquo;Developing A Comprehensive Pipeline to Benchmark Drift Management Approaches&amp;rdquo; under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ray-andrew-sinurat/">Ray Andrew Sinurat&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sandeep-madireddy/">Sandeep Madireddy&lt;/a> under the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/anl/last">LAST&lt;/a> project.&lt;/p>
&lt;h1 id="project-overview">Project Overview&lt;/h1>
&lt;p>If you&amp;rsquo;re not familiar with it, this project aims to address the issue of model aging, where machine learning (ML) models experience a decline in effectiveness over time due to environmental changes, known as drift. My goal is to design an extensible pipeline that evaluates and benchmarks the robustness of state-of-the-art algorithms in addressing these drifts.&lt;/p>
&lt;h1 id="progress">Progress&lt;/h1>
&lt;p>So far, I&amp;rsquo;ve generated various synthetic datasets, which include:&lt;/p>
&lt;ul>
&lt;li>CIRCLE: This dataset contains two features x1, x2 drawn uniformly from the interval [0, 1]. Each data point is labeled as per the condition (x1 − c1)^2 + (x2 − c2)^2 &amp;lt;= r where the center (c1, c2) and radius r of the circular decision boundary changes gradually over a period of time introducing (gradual) concept drift.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Circle" srcset="
/report/osre24/anl/last/20240718-williamn/circlestream_hued24f09167f45c96aadcbb5d8e5dff3c_90965_f33edae3375218d9e5b25a95221be6a3.webp 400w,
/report/osre24/anl/last/20240718-williamn/circlestream_hued24f09167f45c96aadcbb5d8e5dff3c_90965_8c85ac190035be7be37c64958f68df3c.webp 760w,
/report/osre24/anl/last/20240718-williamn/circlestream_hued24f09167f45c96aadcbb5d8e5dff3c_90965_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/last/20240718-williamn/circlestream_hued24f09167f45c96aadcbb5d8e5dff3c_90965_f33edae3375218d9e5b25a95221be6a3.webp"
width="760"
height="315"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/li>
&lt;li>COVCON: This 2-dimensional dataset has covariate shift and concept drift. The decision boundary at each point is given by α ∗ sin(πx1) &amp;gt; x2. We use 10000 points (100 batches, 1000 points per batch). Covariate shift is introduced by changing the location of x1 and x2 (for batch t x1 and x2). Concept drift is introduced by alternating the value of α.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="CovCon" srcset="
/report/osre24/anl/last/20240718-williamn/covcon_hu4a191415f686125b6694a5c631bd0a53_84580_71acf32897baf4ad41a1f28aa45ddbe3.webp 400w,
/report/osre24/anl/last/20240718-williamn/covcon_hu4a191415f686125b6694a5c631bd0a53_84580_2b3b3648f56332fa89453ae08fff34a0.webp 760w,
/report/osre24/anl/last/20240718-williamn/covcon_hu4a191415f686125b6694a5c631bd0a53_84580_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/last/20240718-williamn/covcon_hu4a191415f686125b6694a5c631bd0a53_84580_71acf32897baf4ad41a1f28aa45ddbe3.webp"
width="760"
height="314"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/li>
&lt;li>SINE: This dataset contains two features x1, x2 drawn uniformly from the interval [0, 1]. In the first context all points below the curve y = sin(x) are classified as positive. The label for the classes are flipped after.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Sine" srcset="
/report/osre24/anl/last/20240718-williamn/sinestream_hu58fb0770a1ec970143181b61b6554e14_112143_9cf339d9d956964ac3129aed212bbbb4.webp 400w,
/report/osre24/anl/last/20240718-williamn/sinestream_hu58fb0770a1ec970143181b61b6554e14_112143_f77fa8ca9a78666905791587c0ea3cd5.webp 760w,
/report/osre24/anl/last/20240718-williamn/sinestream_hu58fb0770a1ec970143181b61b6554e14_112143_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/last/20240718-williamn/sinestream_hu58fb0770a1ec970143181b61b6554e14_112143_9cf339d9d956964ac3129aed212bbbb4.webp"
width="760"
height="317"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/li>
&lt;/ul>
&lt;p>Additionally, I&amp;rsquo;ve also curated drifting data from the Tencent I/O block trace. These datasets will be used to benchmark model performance under different drift conditions.&lt;/p>
&lt;p>The pipeline can receive a base sci-kit learn model, and evaluate their performance on these datasets prequentially. Here are some of the initial results for the performance of the models on these drifting dataset, under a never retraining and retraining, using 1 &amp;amp; 7 past windows. As you can see, model performance degrades upon encountering extreme drift.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Circle" srcset="
/report/osre24/anl/last/20240718-williamn/featured_hu314786216917ab73ec11106e3fbdbfd6_47344_cd621ffcf8e0112adfbd5a4b18eed098.webp 400w,
/report/osre24/anl/last/20240718-williamn/featured_hu314786216917ab73ec11106e3fbdbfd6_47344_70229554177ef52d038b9d63d9ddea31.webp 760w,
/report/osre24/anl/last/20240718-williamn/featured_hu314786216917ab73ec11106e3fbdbfd6_47344_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/last/20240718-williamn/featured_hu314786216917ab73ec11106e3fbdbfd6_47344_cd621ffcf8e0112adfbd5a4b18eed098.webp"
width="760"
height="459"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="CovCon" srcset="
/report/osre24/anl/last/20240718-williamn/covconmodel_hud438b5726b48ee443ea38ddb26d42afb_81013_71acdd86c329186013de6ffb21b90cd6.webp 400w,
/report/osre24/anl/last/20240718-williamn/covconmodel_hud438b5726b48ee443ea38ddb26d42afb_81013_252db232f127d0cb909594060e49d831.webp 760w,
/report/osre24/anl/last/20240718-williamn/covconmodel_hud438b5726b48ee443ea38ddb26d42afb_81013_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/last/20240718-williamn/covconmodel_hud438b5726b48ee443ea38ddb26d42afb_81013_71acdd86c329186013de6ffb21b90cd6.webp"
width="760"
height="482"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Sine" srcset="
/report/osre24/anl/last/20240718-williamn/sinemodel_hufe9ad7a15a34440d5b75d411ad8cc905_51948_e28e854413249c9d3e580bd86e496391.webp 400w,
/report/osre24/anl/last/20240718-williamn/sinemodel_hufe9ad7a15a34440d5b75d411ad8cc905_51948_04c9aca01ba0c5693cdfe730cc25bc07.webp 760w,
/report/osre24/anl/last/20240718-williamn/sinemodel_hufe9ad7a15a34440d5b75d411ad8cc905_51948_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/last/20240718-williamn/sinemodel_hufe9ad7a15a34440d5b75d411ad8cc905_51948_e28e854413249c9d3e580bd86e496391.webp"
width="760"
height="479"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h1 id="findings">Findings&lt;/h1>
&lt;p>From the experiments conducted so far, the findings are as follows:&lt;/p>
&lt;ul>
&lt;li>A model without retraining struggles to maintain performance when drift occurs.&lt;/li>
&lt;li>Retraining on data from previous drifting windows, whether abruptly (SINE) or gradually (CIRCLE), leads to poorer performance, especially evident in the retrain Window, which incorporates data up to 7 windows prior.&lt;/li>
&lt;li>However, retraining on previous data proves beneficial in cases of covariate shift (CovCon), allowing the model to better align with the evolving real-world feature distributions.&lt;/li>
&lt;/ul>
&lt;h1 id="next-steps">Next Steps&lt;/h1>
&lt;p>As the base template for the pipeline and dataset curation is done, as I move forward, my focus will be on:&lt;/p>
&lt;ul>
&lt;li>Implementing three advanced algorithms: AUE (Accuracy Updated Ensemble), MATCHMAKER, and Driftsurf, then integrating them into the pipeline.&lt;/li>
&lt;li>Enhancing the benchmarking process by adding more metrics and plots, such as training time and inference time, to better evaluate the strategies.&lt;/li>
&lt;li>Packaging the entire experiment into a Chameleon Trovi Artifact, ensuring ease of reproducibility and extension.&lt;/li>
&lt;/ul>
&lt;p>Stay tuned for my final blog as I delve deeper into this project!&lt;/p></description></item><item><title>Midterm Blogpost: HDEval's LLM Benchmarking for HDL Design</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240718-ashwinbardhwaj/</link><pubDate>Thu, 18 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240718-ashwinbardhwaj/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Hello! My name is &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ashwin-bardhwaj/">Ashwin Bardhwaj&lt;/a>, an electrical engineering and computer science student based in San Diego, CA. For the past 6 weeks, I have been working closely with Professor &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jose-renau/">Jose Renau&lt;/a> on the &lt;a href="https://github.com/masc-ucsc/hdeval" target="_blank" rel="noopener">HDEval&lt;/a> project. The aim of this project is to create multiple project sized HDL benchmarks to evaluate how well existing LLMs can generate Verilog/Chisel code. These benchmarks will include my own &amp;ldquo;golden&amp;rdquo; HDL implementation of the project as well as respective English prompts to guide the LLM. I am excited to be able to work with these tools that have the potential to become a valuable resource for HDL design. So far, I have been successful in creating the first benchmark, a pipelined 3 stage RISC-V core, as well as working through by second project, a Gameboy Emulator.&lt;/p>
&lt;h2 id="risc-v-implementation">RISC-V Implementation&lt;/h2>
&lt;p>Over this past month and a half, I have successfully completed my first benchmark which focuses on creating, modeling, and testing a pipelined 3-stage RISC-V core. The core uses the fetch, decode, and execute structure and is functional for most RV32I instructions. I synthesized and simulated my Verilog using Icarus Verilog and displayed the waveforms on GTKWave. After development, a good section of time was spent creating and tuning the English explanation of each Verilog module. After running these benchmark files through several LLM APIs, we compared the existing &amp;ldquo;golden&amp;rdquo; modules with the generated ones and noticed that more recent versions of LLMs such as GPT 4o and Claude 3 preform much better at creating syntactically correct and efficient code.&lt;/p>
&lt;p>In addition, I have also created a tool that will parse the Verilog and instruction files into the necessary json structure to then test on various models.&lt;/p>
&lt;h2 id="gameboy-emulator">Gameboy Emulator&lt;/h2>
&lt;p>I am also in the process of developing the second benchmark, which targets a Gameboy emulator. This will challenge the LLMs much more than the RISC-V project because apart from the custom CISC CPU, the model should also understand how to handle various other blocks of the hardware system including memory, picture processing unit (PPU), sound processing unit (SPU), various input/output systems like the buttons and cartridge, and interrupt handlers. As a result, it will challenge the model to understand the system as a whole when creating each individual module.&lt;/p>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;p>As we continue on to the second half of the project, I will continue working on my gameboy emulator. I have already completely developed and tested the Z80-esque CPU, DMA, and interrupt handler but need to continue working on the display and sound interfaces. Also, I will also continue to evaluate and run these tests over a wider range of LLMs to get a better picture of what models and versions are best suited for HDL design as well as the direction these models are going in.&lt;/p></description></item><item><title>Halfway Through OSRE24: My Experience and Learnings</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/tum/slices/20240716-warmuth/</link><pubDate>Mon, 15 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/tum/slices/20240716-warmuth/</guid><description>&lt;p>Hello there! I’m Kilian Warmuth, a computer science student from Germany. This summer, I’m part of the 2024 Summer of Reproducibility (SoR) initiative. My project, &amp;ldquo;Reproducible Experiment Workflows in SLICES/pos,&amp;rdquo; aims to enhance reproducibility in scientific research, aligning with the FAIR principles (Findable, Accessible, Interoperable, Reusable).&lt;/p>
&lt;h1 id="project-overview">Project Overview&lt;/h1>
&lt;p>The &amp;ldquo;Reproducible Experiment Workflows in SLICES/pos&amp;rdquo; project is part of the larger SLICES-RI initiative, designed to improve the reproducibility and reusability of large-scale experimental research. The project focuses on integrating the RO-Crate standard into the pos testbed to organize and document experiment results systematically. This integration will enhance the accessibility and comprehensibility of research findings, ensuring they adhere to the FAIR principles. Additionally, the project aims to improve the portability of pos experiments to the Chameleon testbed, facilitating collaboration and seamless execution across different research environments.&lt;/p>
&lt;h1 id="progress-and-challenges">Progress and Challenges&lt;/h1>
&lt;p>The first half of the project is done, marked by significant progress and learnings. My initial focus was on familiarizing myself with the pos framework and the RO-Crate standard. This foundational knowledge was crucial for the subsequent steps of restructuring the results folder and integrating automated RO-Crate generation into the pos framework.&lt;/p>
&lt;h2 id="key-achievements">Key Achievements:&lt;/h2>
&lt;ul>
&lt;li>Restructured Results Folder: The structure of the results folder has been redesigned to streamline navigation and enable systematic storage of result data.&lt;/li>
&lt;li>Automated RO-Crate Generation: Successfully integrated the basics of the RO-Crate standard into the pos framework, enabling the automated generation of comprehensive results documentation.&lt;/li>
&lt;li>Metadata Documentation: Added comprehensive documentation to the results data, including essential metadata such as author details, user scripts, and hardware information, enhancing reproducibility and interpretability.&lt;/li>
&lt;/ul>
&lt;h2 id="challenges-encountered">Challenges Encountered:&lt;/h2>
&lt;ul>
&lt;li>Balancing Automation with Flexibility: Ensuring the automated generation of RO-Crates did not compromise the flexibility required by researchers to customize their experiment documentation and mess with the complex requirements of a testbed.&lt;/li>
&lt;li>Complexity of Testbed Systems: FIntegrating the RO-Crate implementation for a complex system like a testbed has required deep dives into the code base of the testbed.&lt;/li>
&lt;/ul>
&lt;p>Despite these challenges, the progress made has been rewarding, laying a solid foundation for the next phase of the project.&lt;/p>
&lt;h1 id="learnings-and-skills-gained">Learnings and Skills Gained&lt;/h1>
&lt;p>&lt;strong>Understanding the Complexity of Testbeds&lt;/strong>: One of the key learnings from this project has been the realization that testbeds are complex systems. Despite their complexity, the process became manageable thanks to well-documented software and the invaluable support of top mentors who provided detailed answers to in-depth questions. Their guidance was crucial in navigating the challenges of the project.&lt;/p>
&lt;p>&lt;strong>Open Source Development in an Educational Environment&lt;/strong>: My experience in open source development has been enriched by working within an educational context. This skill is particularly important when adapting and simplifying code to ensure that users can follow along and gain a deeper understanding of the experiments, improving the quality of research experiments.&lt;/p>
&lt;h1 id="next-steps">Next Steps&lt;/h1>
&lt;p>As we move into the second half of this project, our primary focus will be on enhancing the portability of pos experiments to the Chameleon testbed. Key tasks include:&lt;/p>
&lt;ul>
&lt;li>Finetune RO-Crate Implementation: Continue refining the RO-Crate integration to handle the complexities of testbed systems more effectively like special edge cases.&lt;/li>
&lt;li>Enhance Portability: Refine the integration with Trovi, ensuring seamless upload and retrieval of experiment results across testbeds.&lt;/li>
&lt;li>Develop Introductory Examples: Create examples demonstrating the use of pos in various testbed environments to guide researchers.&lt;/li>
&lt;li>Execute and Analyze Experiments: Design and execute a complex network experiment on both SLICES/pos and Chameleon, validating and refining portability features.&lt;/li>
&lt;/ul>
&lt;p>These steps are crucial to achieving our goal of making pos experiments more accessible and reproducible across different research environments.&lt;/p>
&lt;h1 id="conclusion">Conclusion&lt;/h1>
&lt;p>Reflecting on the first half of my OSRE24 journey, I am incredibly grateful for the opportunity to work on the &amp;ldquo;Reproducible Experiment Workflows in SLICES/pos&amp;rdquo; project. The experience has been both challenging and rewarding, providing valuable insights into open-source development, machine learning techniques, and the creation of educational resources.&lt;/p>
&lt;p>As we move forward, I am excited about the coming weeks. The completion of the portability enhancements and the execution of complex experiments lie ahead, marking significant milestones in our project. The skills and lessons I have acquired will guide me in future endeavors.&lt;/p></description></item><item><title>Data leakage in applied ML: reproducing examples from genomics, medicine and radiology</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/nyu/data-leakage/20240701-shaivimalik/</link><pubDate>Mon, 01 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/nyu/data-leakage/20240701-shaivimalik/</guid><description>&lt;p>Hello everyone! I&amp;rsquo;m Shaivi Malik, a computer science and engineering student. I am thrilled to announce that I have been selected as a Summer of Reproducibility Fellow. I will be contributing to the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/nyu/data-leakage/">Data leakage in applied ML: reproducing examples of irreproducibility&lt;/a> project under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fraida-fund/">Fraida Fund&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mohamed-saeed/">Mohamed Saeed&lt;/a>. You can find my proposal &lt;a href="https://drive.google.com/file/d/1WAsDif61O2fWgtkl75bQAnIcm2hryt8z/view?usp=sharing" target="_blank" rel="noopener">here&lt;/a>.&lt;/p>
&lt;p>This summer, we will reproduce studies from medicine, radiology and genomics. Through these studies, we&amp;rsquo;ll explore and demonstrate three types of data leakage:&lt;/p>
&lt;ol>
&lt;li>Pre-processing on train and test sets together&lt;/li>
&lt;li>Model uses features that are not legitimate&lt;/li>
&lt;li>Feature selection on training and test sets&lt;/li>
&lt;/ol>
&lt;p>For each paper, we will replicate the published results with and without the data leakage error, and present performance metrics for comparison. We will also provide explanatory materials and example questions to test understanding. All these resources will be bundled together in a dedicated repository for each paper.&lt;/p>
&lt;p>This project aims to address the need for accessible educational material on data leakage. These materials will be designed to be readily adopted by instructors teaching machine learning in a wide variety of contexts. They will be presented in a clear and easy-to-follow manner, catering to a broad range of backgrounds and raising awareness about the consequences of data leakage.&lt;/p>
&lt;p>Stay tuned for updates on my progress! You can follow me on &lt;a href="https://github.com/shaivimalik" target="_blank" rel="noopener">GitHub&lt;/a> and watch out for my upcoming blog posts.&lt;/p></description></item><item><title>FetchPipe: Data Science Pipeline for ML-based Prefetching</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/fetchpipe/20240625-peiranqin/</link><pubDate>Tue, 25 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/fetchpipe/20240625-peiranqin/</guid><description>&lt;p>Hello, I&amp;rsquo;m &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/peiran-qin/">Peiran Qin&lt;/a>, a first-year Pre-Doctoral student in Computer Science at the University of Chicago. In this summer I will focus
working on the project &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uchicago/fetchpipe/">FetchPipe: Data Science Pipeline for ML-based Prefetching&lt;/a> under the mentorship of Prof. &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/haryadi-s.-gunawi/">Haryadi S. Gunawi&lt;/a>. This is my &lt;a href="https://docs.google.com/document/d/1Bq4tulf6bd9HuKyy3mxC-LRKwe9e7YAOVNYQNJTPsys/edit#heading=h.pwfhd8ioumbq" target="_blank" rel="noopener">proposal&lt;/a>.&lt;/p>
&lt;p>Caching and prefetching are integral components of modern storage systems, aimed at reducing I/O latency by utilizing faster but less dense memory for storing data that is accessed frequently. Traditional prefetching strategies, which primarily rely on heuristic-based methods, often fall short in performance, particularly in complex scenarios. To address the complex scenarios, in recent years, machine learning solutions have emerged as a promising alternative, offering the ability to learn and predict complicated data access patterns. However, each existing ML prefetcher may bias toward different scenarios and distinct evaluation metrics. There is still a necessity to evaluate state-of-the-art machine learning based literatures comprehensively and fairly under an aligned evaluation framework and extensive performance metrics. Therefore, It becomes the motivation for me to spend my summer on this interesting project!&lt;/p></description></item><item><title>Assessing the Computational Reproducibility of Jupyter Notebooks</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/depaul/20240618-nbrewer/</link><pubDate>Tue, 18 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/depaul/20240618-nbrewer/</guid><description>&lt;p>Like so many authors before me, my first reproducibility study and very first academic publication started with the age-old platitude, &amp;ldquo;Reproducibility is a cornerstone of the scientific method.&amp;rdquo; My team and I participated in a competition to replicate the performance improvements promised by a paper presented at last year&amp;rsquo;s Supercomputing conference. We weren&amp;rsquo;t simply re-executing the same experiment with the same cluster; instead, we were trying to confirm that we got similar results on a different cluster with an entirely different architecture. From the very beginning, I struggled to wrap my mind around the many reasons for reproducing computational experiments, their significance, and how to prioritize them. All I knew was that there seemed to be a consensus that reproducibility is important to science and that the experience left me with more questions than answers.&lt;/p>
&lt;p>Not long after that, I started a job as a research software engineer at Purdue University, where I worked heavily with Jupyter Notebooks. I used notebooks and interactive components called widgets to create a web application, which I turned into a reusable template. Our team was enthusiastic about using Jupyter Notebooks to quickly develop web applications because the tools were accessible to the laboratory researchers who ultimately needed to maintain them. I was fortunate to receive the &lt;a href="https://bssw.io/fellows/nicole-brewer" target="_blank" rel="noopener">Better Scientific Software Fellowship&lt;/a> to develop tutorials to teach others how to use notebooks to turn their scientific workflows into web apps. I collected those and other resources and established the &lt;a href="https://www.jupyter4.science" target="_blank" rel="noopener">Jupyter4Science&lt;/a> website, a knowledgebase and blog about Jupyter Notebooks in scientific contexts. That site aims to improve the accessibility of research data and software.&lt;/p>
&lt;p>There seemed to be an important relationship between improved accessibility and reuse of research code and data and computational reproducibility, but I still had trouble articulating it. In pursuit of answers, I moved to sunny Arizona to pursue a History and Philosophy of Science degree. My research falls at the confluence of my prior experiences; I&amp;rsquo;m studying the reproducibility of scientific Jupyter Notebooks. I have learned that questions about reproducibility aren&amp;rsquo;t very meaningful without considering specific aspects such as who is doing the experiment and replication, the nature of the experimental artifacts, and the context in which the experiment takes place.&lt;/p>
&lt;p>I was fortunate to have found a mentor for the Summer of Reproducibility, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/tanu-malik/">Tanu Malik&lt;/a>, who shares the philosophy that the burden of reproducibility should not solely rest on domain researchers who must develop other expertise. She and her lab have developed &lt;a href="https://github.com/depaul-dice/Flinc" target="_blank" rel="noopener">FLINC&lt;/a>, an application virtualization tool that improves the portability of computational notebooks. Her prior work demonstrated that FLINC provides efficient reproducibility of notebooks and takes significantly less time and space to execute and repeat notebook execution than Docker containers for the same notebooks. My work will expand the scope of this original experiment to include more notebooks to FLINC&amp;rsquo;s test coverage and show robustness across even more diverse computational tasks. We expect to show that infrastructural tools like FLINC improve the success rate of automated reproducibility.&lt;/p>
&lt;p>I&amp;rsquo;m grateful to both the Summer of Reproducibility program managers and my research mentor for this incredible opportunity to further my dissertation research in the context of meaningful collaboration.&lt;/p></description></item><item><title>Exploring Reproducibility in High-Performance Computing Publications with the Chameleon Cloud</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/tuwien/autoappendix/20240615-kkrassni/</link><pubDate>Sat, 15 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/tuwien/autoappendix/20240615-kkrassni/</guid><description>&lt;p>Hello everyone,&lt;/p>
&lt;p>I&amp;rsquo;m &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/klaus-kra%C3%9Fnitzer/">Klaus Kraßnitzer&lt;/a> and am currently finishing up my Master&amp;rsquo;s degree at
the Technical University of Vienna. This summer, under the guidance of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sascha-hunold/">Sascha Hunold&lt;/a>,
I&amp;rsquo;m excited to dive into a project that aims to enhance reproducibility in
high-performance computing research.&lt;/p>
&lt;p>Our project, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/tuwien/autoappendix/">AutoAppendix&lt;/a>, focuses on the rigorous evaluation and potential
automation of Artifact Description (AD) and Artifact Evaluation (AE) appendices
from publications to this year&amp;rsquo;s &lt;a href="https://supercomputing.org/" target="_blank" rel="noopener">Supercomputing Conference (SC)&lt;/a>. Due to a sizeable
chunk of SC publications utlizing &lt;a href="https://chameleoncloud.org/" target="_blank" rel="noopener">Chameleon Cloud&lt;/a>, a
platform known for its robust and scalable experiment setups, the project will
be focused on and creating guidelines (and
potentially, software tools) that users of the Chameleon Cloud can utilize to
make their research more easily reproducible. You can learn more about the project
and read the full proposal &lt;a href="https://drive.google.com/file/d/1J9-Z0WSIqyJpnmd_uxtEm_m4ZIO87dBH/view?usp=drive_link" target="_blank" rel="noopener">here&lt;/a>.&lt;/p>
&lt;p>My fascination with open-source development and research reproducibility was sparked during my undergraduate studies and further nurtured by my role as a teaching assistant. Hands-on projects and academic courses, like those in chemistry emphasizing precise experimental protocols, have deeply influenced my approach to computational science.&lt;/p>
&lt;h2 id="project-objectives">Project Objectives&lt;/h2>
&lt;ol>
&lt;li>&lt;strong>Analyze and Automate&lt;/strong>: Assess current AE/AD appendices submitted for SC24, focusing on their potential for automation.&lt;/li>
&lt;li>&lt;strong>Develop Guidelines&lt;/strong>: Create comprehensive guidelines to aid future SC conferences in artifact submission and evaluation.&lt;/li>
&lt;li>&lt;strong>Build Tools (Conditionally)&lt;/strong>: Develop automation tools to streamline the evaluation process.&lt;/li>
&lt;/ol>
&lt;p>The ultimate aim of the project is to work towards a more efficient, transparent, and
reproducible research environment, and I&amp;rsquo;m committed to making it simpler for
researchers to demonstrate and replicate scientific work. I look forward to
sharing insights and progress as we move forward.&lt;/p>
&lt;p>Thanks for reading, and stay tuned for more updates!&lt;/p></description></item><item><title> Reproducibility in Data Visualization</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/niu/repro-vis/20240614-aryas/</link><pubDate>Fri, 14 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/niu/repro-vis/20240614-aryas/</guid><description>&lt;p>Hello! My name is &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/arya-sarkar/">Arya Sarkar&lt;/a> and I will be contributing to the research project titled &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/niu/repro-vis/">Reproducibility in Data Visualization&lt;/a>, with a focus on investigating and coming up with novel solutions to capture both static and dynamic visualizations from different sources. My project is titled Investigate Solutions for Capturing Visualizations and I am mentored by Prof. &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/david-koop/">David Koop&lt;/a>.&lt;/p>
&lt;p>Open-source has always piqued my interest, but often I found it hard to get started in as a junior in university. I spent a lot of time working with data visualizations but had never dived into the problem of reproducibility before diving into this project. When I saw a plethora of unique and interesting projects during the contribution phase of OSRE-2024, I was confused at the beginning. However, the more I dived into this project and understood the significance of research in this domain to ensure reproducibility, the more did I find myself getting drawn towards it. I am glad to be presented this amazing opportunity to work in the Open-source space as a researcher in reproducibility.&lt;/p>
&lt;p>This project aims to investigate, augment, and/or develop solutions to capture visualizations that appear in formats including websites and Jupyter notebooks. We have a special interest on capturing the state of interactive visualizations and preserving the user interactions required to reach a certain visualization in an interactive environment to ensure reproducibility.&lt;a href="https://drive.google.com/file/d/1SGLd37zBjnAU-eYytr7mYzfselHgxvK1/view?usp=sharing" target="_blank" rel="noopener">My proposal can be viewed here!&lt;/a>&lt;/p></description></item><item><title>Data leakage in applied ML: reproducing examples of irreproducibility</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/nyu/data-leakage/20240614-kyrillosishak/</link><pubDate>Fri, 14 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/nyu/data-leakage/20240614-kyrillosishak/</guid><description>&lt;p>Hello,&lt;/p>
&lt;p>I am Kyrillos Ishak I am happy to be part of SOR 2024, I am working on &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/nyu/data-leakage/">Data leakage in applied ML: reproducing examples of irreproducibility&lt;/a> project. My &lt;a href="https://drive.google.com/file/d/1u9FGQqxlPMhceKwS_NJxIhkIrQVGIp-0/view" target="_blank" rel="noopener">proposal&lt;/a> was accepted.&lt;/p>
&lt;p>I am excited to work with &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fraida-fund/">Fraida Fund&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mohamed-saeed/">Mohamed Saeed&lt;/a> as my mentors. The objective of the project is to develop educational resources that can be adjusted by professors/instructors to explain specific data leakage problems. This involves ensuring the reproducibility of certain research papers that contain data preprocessing issues, then fixing these issues to demonstrate how they can affect the results.&lt;/p>
&lt;p>Data leakage is a problem caused when information from outside the training dataset is used to create the model. This issue can lead to overly optimistic performance estimates and, ultimately, models that do not perform well on new, unseen data.&lt;/p>
&lt;p>Despite the importance of addressing data leakage, many people from fields not closely related to computer science, are often unfamiliar with it, even if they are aware of best practices for data preprocessing. Developing educational materials on this topic will greatly benefit them.&lt;/p>
&lt;p>I am excited to dive into the topic of data leakage in machine learning. Throughout the summer, I will be sharing regular updates and insightful blog posts on this subject. Stay tuned for more information!&lt;/p></description></item><item><title>Heterogeneous Graph Neural Networks for I/O Performance Bottleneck Diagnosis</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/lbl/aiio/20240614-mahdi/</link><pubDate>Fri, 14 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/lbl/aiio/20240614-mahdi/</guid><description>&lt;p>Hello, I am &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mahdi-banisharifdehkordi/">Mahdi Banisharifdehkordi&lt;/a>, a Ph.D. student in Computer Science at Iowa State University, specializing in Artificial Intelligence. This summer, I will be working on the project &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/lbl/aiio/">AIIO / Graph Neural Network&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/bin-dong/">Bin Dong&lt;/a> and Suren Byna.&lt;/p>
&lt;p>High-Performance Computing (HPC) applications often face performance issues due to I/O bottlenecks. Manually identifying these bottlenecks is time-consuming and error-prone. My project aims to enhance the AIIO framework by integrating a Graph Neural Network (GNN) model to automatically diagnose I/O performance bottlenecks at the job level. This involves developing a comprehensive data pre-processing pipeline, constructing and validating a tailored GNN model, and rigorously testing the model&amp;rsquo;s accuracy using test cases from the AIIO dataset.&lt;/p>
&lt;p>Through this project, I seek to provide a sophisticated, AI-driven approach to understanding and improving I/O performance in HPC systems, ultimately contributing to more efficient and reliable HPC applications.&lt;/p></description></item><item><title>StatWrap: Automated Reproducibility Checklists Generation</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/statwrap/20240614-adi/</link><pubDate>Fri, 14 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/statwrap/20240614-adi/</guid><description>&lt;p>Namaste🙏🏻! I am &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/adi-akhilesh-singh/">Adi Akhilesh Singh&lt;/a>, currently pursuing a degree in Computer Science and Engineering at IIT(BHU). This summer, I will be working on the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/northwestern/statwrap/">StatWrap: Automated Reproducibility Checklists Generation&lt;/a> project under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/luke-rasmussen/">Luke Rasmussen&lt;/a>. You can view my &lt;a href="https://drive.google.com/file/d/1xV7eHL9lIWGKueQJxBks6OB_rcXCr8JY/view?usp=sharing" target="_blank" rel="noopener">project proposal&lt;/a> for more details.&lt;/p>
&lt;p>My project aims to integrate customizable reproducibility checklists into StatWrap, using metadata and user input to automate their generation. The goal is to enhance the reproducibility of research projects by providing researchers with structured and comprehensive checklists to ensure their work is reproducible.&lt;/p>
&lt;p>Stay tuned for updates on my progress in the coming weeks! 🚀&lt;/p></description></item><item><title>LLM Assistant for OpenROAD - Data Engineering and Testing</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240613-aviral/</link><pubDate>Thu, 13 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240613-aviral/</guid><description>&lt;p>Hello! My name is Aviral Kaintura, and I will be contributing to &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD" target="_blank" rel="noopener">OpenROAD&lt;/a>, a groundbreaking open-source toolchain for digital integrated circuit automation (RTL to GDSII) during &lt;a href="https://summerofcode.withgoogle.com/" target="_blank" rel="noopener">GSoC 2024&lt;/a>.&lt;/p>
&lt;p>My project, &lt;a href="https://summerofcode.withgoogle.com/programs/2024/projects/J8uAFNCu" target="_blank" rel="noopener">LLM Assistant for OpenROAD - Data Engineering and Testing&lt;/a>, is jointly mentored by &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/indira-iyer/">Indira Iyer&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jack-luar/">Jack Luar&lt;/a>.&lt;/p>
&lt;p>The aim of this project is to develop a chat assistant to improve the user experience with OpenROAD. My focus will be on developing a well-curated dataset from OpenROAD&amp;rsquo;s knowledge base. This dataset will be fundamental for another project led by &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/palaniappan-r/">Palaniappan R&lt;/a>, which involves building the chatbot&amp;rsquo;s architecture. It will be used for training and validating the model and ensuring efficient context retrieval to generate accurate user responses, aiding in troubleshooting, installation, and other common issues to reduce the maintainers&amp;rsquo; workload.&lt;/p>
&lt;p>In addition to dataset creation, I will be working on testing and evaluation. This includes developing metrics for model evaluation, incorporating both human and automated techniques.&lt;/p>
&lt;p>Our human evaluation framework will utilize chatbot feedback for valuable insights, enhancing the model and dataset. An automated batch testing application is also used to further enhance the evaluation process.&lt;/p>
&lt;p>Here is an early build of the evaluation framework.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Screenshots" srcset="
/report/osre24/ucsd/openroad/20240613-aviral/image_hu3257e6557164f1033894cc91760eaec1_709148_ccb0a69833aa5c774f30b616a038edd6.webp 400w,
/report/osre24/ucsd/openroad/20240613-aviral/image_hu3257e6557164f1033894cc91760eaec1_709148_25ece2ab19d666f60342ed2d6dcb217f.webp 760w,
/report/osre24/ucsd/openroad/20240613-aviral/image_hu3257e6557164f1033894cc91760eaec1_709148_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240613-aviral/image_hu3257e6557164f1033894cc91760eaec1_709148_ccb0a69833aa5c774f30b616a038edd6.webp"
width="760"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
By leveraging advanced data engineering and testing methodologies, we aim to build an assistant that combines high accuracy with optimal response times. Additionally, we will collaborate with research teams at NYU and ASU to contribute to the research on AI-based chat assistants for electronic design automation.&lt;/p>
&lt;p>I am thrilled to be part of this journey and look forward to making a meaningful impact on the OpenROAD project.&lt;/p>
&lt;p>Stay tuned for more updates on the project!&lt;/p></description></item><item><title>Reproducibility in Data Visualization</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/niu/repro-vis/20240613-triveni5/</link><pubDate>Thu, 13 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/niu/repro-vis/20240613-triveni5/</guid><description>&lt;p>Hello everyone!&lt;/p>
&lt;p>I&amp;rsquo;m Triveni, a Master&amp;rsquo;s student in Computer Science at Northern Illinois University (NIU). When I came across the OSRE 2024 project &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/niu/repro-vis/">Categorize Differences in Reproduced Visualizations&lt;/a> focusing on data visualization reproducibility, I was excited because it aligned with my interest in data visualization. While my initial interest was in geospatial data visualization, the project&amp;rsquo;s goal of ensuring reliable visualizations across all contexts really appealed to me. So, I actively worked on understanding the project’s key concepts and submitted my proposal &lt;a href="https://drive.google.com/file/d/1R1c23oUC7noZo5NrUzuDbjwo0OqbkrAK/view" target="_blank" rel="noopener">My proposal can be viewed here&lt;/a> under mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/david-koop/">David Koop&lt;/a> to join the project.&lt;/p>
&lt;h2 id="early-steps-and-challenges">Early Steps and Challenges:&lt;/h2>
&lt;p>I began working on the project on May 27th, three weeks ago. Setting up the local environment initially presented some challenges, but I persevered and successfully completed the setup process. The past few weeks have been spent exploring the complexities of reproducibility in visualizations, particularly focusing on capturing the discrepancies that arise when using different versions of libraries to generate visualizations. Working with Dr. David Koop as my mentor has been an incredible experience. Our weekly report meetings keep me accountable and focused. While exploring different algorithms and tools to compare visualizations can be challenging at times, it&amp;rsquo;s a fantastic opportunity to learn cutting-edge technologies and refine my problem-solving skills.&lt;/p>
&lt;h2 id="looking-ahead">Looking Ahead:&lt;/h2>
&lt;p>I believe this project can make a valuable contribution to the field of reproducible data visualization. By combining automated comparison tools with a user-centric interface, we can empower researchers and data scientists to make informed decisions about the impact of visualization variations. In future blog posts, I&amp;rsquo;ll share more about the specific tools and techniques being used, and how this framework will contribute to a more reliable and trustworthy approach to data visualization reproducibility.&lt;/p>
&lt;p>Stay tuned!&lt;/p>
&lt;p>I&amp;rsquo;m excited to embark on this journey and share my progress with all of you.&lt;/p></description></item><item><title>Automatic reproducibility of COMPSs experiments through the integration of RO-Crate in Chameleon</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/intel/20240612-architd/</link><pubDate>Wed, 12 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/intel/20240612-architd/</guid><description>&lt;p>Hello everyone
I&amp;rsquo;am Archit from India. An undergraduate student at the Indian Institute of Technology, Banaras Hindu University, IIT (BHU), Varanasi. As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/bsc/ro-crate-compss/">Automatic reproducibility of COMPSs experiments through the integration of RO-Crate in Chameleon&lt;/a> my &lt;a href="https://drive.google.com/file/d/1qY-uipQZPox144LD4bs05rn3islfcjky/view" target="_blank" rel="noopener">proposal&lt;/a> under mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/raul-sirvent/">Raül Sirvent&lt;/a> aims to develop a service that facilitates the automated replication of COMPSs experiments within the Chameleon infrastructure.&lt;/p>
&lt;h2 id="about-the-project">About the project:&lt;/h2>
&lt;p>The project proposes to create a service that will have the capability to take a COMPSs crate (an artifact adhering to the RO-Crate specification) and, through analysis of the provided metadata construct a Chameleon-compatible image for replicating the experiment on the testbed.&lt;/p>
&lt;h2 id="how-it-all-started">How it all started&lt;/h2>
&lt;p>This journey began amidst our college&amp;rsquo;s cultural fest, in which I was participating, just 15 days before the proposal submission deadline. Many of my friends had been working for months to get selected for GSoC. I didn’t think I could participate this year because I was late, so I thought, &amp;ldquo;Better luck next year.&amp;rdquo; But during the fest, I kept hearing about UC OSPO and that a senior had been selected within a month. So, I was in my room when my friend told me, &amp;ldquo;What&amp;rsquo;s the worst that can happen? Just apply,&amp;rdquo; and so I did. I chose this project and wrote my introduction in Slack without knowing much. After that, it&amp;rsquo;s history. I worked really hard for the next 10 days learning about the project, making the proposal, and got selected.&lt;/p>
&lt;h2 id="first-few-weeks">First few weeks:&lt;/h2>
&lt;p>I started the project a week early from June 24, and it’s been two weeks since. The start was a bit challenging since it required setting up a lot of things on my local machine. For the past few weeks, the majority of my time has been dedicated to learning about COMPSs, RO-Crate, and Chameleon, the three technologies this project revolves around. The interaction with my mentor has also been great. From the weekly report meetings to the daily bombardment of doubts by me, he seems really helpful.
It is my first time working with Chameleon or any cloud computing software, so it can be a bit overwhelming sometimes, but it is getting better with practice.&lt;/p>
&lt;p>Stay tuned for progress in the next blog!!&lt;/p></description></item><item><title>FEP-Bench: Benchmarking for Enhanced Feature Engineering and Preprocessing in Machine Learning</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/fep_bench/20240612-jaycezhu/</link><pubDate>Wed, 12 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/fep_bench/20240612-jaycezhu/</guid><description>&lt;p>Hello, I&amp;rsquo;m &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/lihaowen-jayce-zhu/">Lihaowen (Jayce) Zhu&lt;/a>, currently pursuing my Master of Science in Computer Science at the University of Chicago. I will be spending my
summer working on the project &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uchicago/fep_bench/">FEP-Bench: Benchmarking for Enhanced Feature Engineering and Preprocessing in Machine Learning&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/yuyang-roy-huang/">Yuyang (Roy) Huang&lt;/a>
and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/swami-sundararaman/">Swami Sundararaman&lt;/a>, my &lt;a href="https://docs.google.com/document/d/1ta-AgK6Dom25OingMkIR1tRzd2Yk78PZa776Wb3oFQ8/edit?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a>.&lt;/p>
&lt;p>The landscape of machine learning (ML) is profoundly impacted by the initial stages of feature engineering and data preprocessing. This phase, critical for the success of ML projects, is often the most time-consuming, representing about 80% of the effort in typical ML workflows. The FEP-Bench project proposes to address the significant bottlenecks encountered during this phase, particularly focusing on the challenges posed by data retrieval from data lakes and computational inefficiencies in data operations. By exploring innovative caching, prefetching, and heuristic strategies, this proposal aims to optimize the preprocessing workflow, thereby enhancing efficiency and reducing the required resources of ML projects.&lt;/p></description></item><item><title>First Steps in Enhancing User Experience Reproducibility through TROVI Redesign</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/chameleontroviredesign/20240612-aliciaem/</link><pubDate>Wed, 12 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/chameleontroviredesign/20240612-aliciaem/</guid><description>&lt;p>Hello! My name is &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/alicia-esquivel-morel/">Alicia Esquivel Morel&lt;/a>, and I&amp;rsquo;m a graduate research assistant at the University of Missouri – Columbia, pursuing a PhD in Computer Science. This summer, I&amp;rsquo;m working on a project to &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uchicago/trovi/">improve user experience reproducibility through a redesign of TROVI&lt;/a>, as part of the Summer of Reproducibility (SoR) program. Excited to be working with two fabulous mentors; &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/kate-keahey/">Kate Keahey&lt;/a>, and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mark-powers/">Mark Powers&lt;/a>.&lt;/p>
&lt;p>&lt;strong>Research Reproducibility with a TROVI Redesign&lt;/strong>&lt;/p>
&lt;p>Researchers constantly face challenges replicating experiments due to limitations in current tools. TROVI, a platform designed to facilitate experiment replication, can be hindered by hard to follow interfaces and difficulties integrating code and data. This leads to confusion and frustration.&lt;/p>
&lt;p>My SoR project tackles these issues by redesigning TROVI to enhance user experience reproducibility. Imagine a user-friendly platform where uploading code, sharing data, and collaborating with colleagues becomes effortless.&lt;/p>
&lt;p>&lt;strong>The Redesign&amp;rsquo;s Goals&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Enhanced User Experience:&lt;/strong> Inspired by user-friendly platforms like Google Colab, we&amp;rsquo;ll simplify TROVI&amp;rsquo;s interface for intuitive navigation and ease of use.&lt;/li>
&lt;li>&lt;strong>Uploads and Sharing:&lt;/strong> Uploading code and data, as well as collaborating with researchers are key goals. Integration with platforms like GitHub will further streamline collaboration.&lt;/li>
&lt;li>&lt;strong>Continuous Improvement:&lt;/strong> A built-in feedback loop will allow users to provide input and suggestions, ensuring TROVI constantly evolves based on user needs.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>The Road Ahead&lt;/strong>&lt;/p>
&lt;p>We&amp;rsquo;re at the beginning of the redesign process. In the next blog post, I&amp;rsquo;ll describe the project&amp;rsquo;s specific goals and the deliverables you can expect.&lt;/p>
&lt;p>&lt;strong>Stay tuned to see how TROVI is built for reproducible research!!&lt;/strong>&lt;/p></description></item><item><title>FSA: Benchmarking Fail-Slow Algorithms</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/fsa-benchmarking/20240612-xikangsong/</link><pubDate>Wed, 12 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/fsa-benchmarking/20240612-xikangsong/</guid><description>&lt;p>Hi everyone! I&amp;rsquo;m Xikang, a master&amp;rsquo;s CS student at UChicago. As a part of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uchicago/failslowalgorithms/">FSA benchmarking Project&lt;/a>, I&amp;rsquo;m thrilled to be a contributor to OSRE 2024, collaborating with Kexin Pei, the assistant Professor of Computer Science at Uchicago and Ruidan, a talented PhD student at UChicago.&lt;/p>
&lt;p>This summer, I will focus on integrating some advanced ML into our RAID slowdown analysis. Our aim is to assess whether LLMs can effectively identify RAID slowdown issues and to benchmark their performance against our current machine learning algorithms. We will test the algorithms on Chameleon Cloud and benchmark them.&lt;/p>
&lt;p>Additionally, we will explore optimization techniques to enhance our pipeline and improve response quality. We hope this research will be a start point for future work, ultilizing LLMs to overcome the limitations of existing algorithms and provide a comprehensive analysis that enhances RAID and other storage system performance.&lt;/p>
&lt;p>I&amp;rsquo;m excited to work with all of you and look forward to your suggestions.
if you are interested, Here is my &lt;a href="https://docs.google.com/document/d/1KpodnahgQDNf1-05TF2BdYXiV0lT_oYEnC0oaatHRoc/edit?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a>&lt;/p></description></item><item><title>ML-Powered Problem Detection in Chameleon</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/chameleoncloud/20240612-syed/</link><pubDate>Wed, 12 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/chameleoncloud/20240612-syed/</guid><description>&lt;p>Hello, I am &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/syed-mohammad-qasim/">Syed Mohammad Qasim&lt;/a>, a PhD candidate in Electrical and Computer Engineering at Boston University. I will be spending my
summer working on the project &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uchicago/ml_detect_chameleon/">ML-Powered Problem Detection in Chameleon&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ayse-coskun/">Ayse Coskun&lt;/a>
and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/michael-sherman/">Michael Sherman&lt;/a>.&lt;/p>
&lt;p>Currently, Chameleon Cloud monitors sites at the Texas Advanced Computing Center (TACC), University of Chicago,
Northwestern University, and Argonne National Lab. They collect metrics using Prometheus at each site and feed them
all to a central Mimir cluster. All the logs go to a central Loki, and Grafana is used to visualize and set alerts.
Chameleon currently collects around 3000 metrics. Manually reviewing and setting alerts on them is time-consuming
and labor-intensive. This project aims to help Chameleon operators monitor their systems more effectively and improve overall
reliability by creating an anomaly detection service that can augment the existing alerting framework.&lt;/p></description></item><item><title>OpenMLEC: Open-source MLEC implementation with HDFS on top of ZFS</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/openmlec/202406012-jiajunmao/</link><pubDate>Wed, 12 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uchicago/openmlec/202406012-jiajunmao/</guid><description>&lt;p>Hello, I&amp;rsquo;m &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jiajun-mao/">Jiajun Mao&lt;/a>, a BS/MS student at the University of Chicago studying Computer Science. I will be spending this summer working on the project &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/ornl/openmlec/">OpenMLEC: Open-source MLEC implementation with HDFS on top of ZFS&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/meng-wang/">Meng Wang&lt;/a>
and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/anjus-george/">Anjus George&lt;/a>, my &lt;a href="https://docs.google.com/document/d/1nYgNlGdl0jUgW8avpu671oRpMoxaZHZPwlDfBNXRVro/edit?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a>.&lt;/p>
&lt;p>How to increase data’s durability and reliability while decreasing storage cost have always been interesting topics of research. Erasure coded storage systems in recent years have been seen as strong candidates to replace replications for colder storage tiers. In the paper “Design Considerations and Analysis of Multi-Level Erasure Coding in Large-Scale Data Centers”, the authors explored using theory and simulation on how a multiple tiered erasure coded system can out-perform systems using single level erasure codes in areas such as encoding throughput and network bandwidth consumed for repair, addressing a few pain points in adopting erasure coded storage systems. I will be implementing the theoretical and simulation result of this paper by building on top of HDFS and ZFS, and benchmarking the system performance.&lt;/p>
&lt;p>The project will aim to achieve&lt;/p>
&lt;ul>
&lt;li>HDFS understanding the underlying characteristics of ZFS as the filesystem&lt;/li>
&lt;li>HDFS understanding the failure report from ZFS, and use new and special MLEC repair logic to execute parity repair&lt;/li>
&lt;li>ZFS will be able to accept repair data from HDFS to repair a suspended pool caused by catastrophic data corruption&lt;/li>
&lt;/ul></description></item><item><title>Reproducible Performance Benchmarking for Genomics Workflows on HPC Cluster</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uga/genomicswf/20240612-martinputra/</link><pubDate>Wed, 12 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uga/genomicswf/20240612-martinputra/</guid><description>&lt;p>Hi! I&amp;rsquo;m Martin, and I will be working on &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uga/genomicswf/">Reproducible Performance Benchmarking for Genomics Workflows on HPC Cluster&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/in-kee-kim/">In Kee Kim&lt;/a>. Our work is driven by the scale of computing systems that hosts data commons &amp;ndash; we believe that performance characterization of genomics workload should be done &lt;em>rapidly&lt;/em> and at the &lt;em>scale&lt;/em> similar to production settings. &lt;a href="https://drive.google.com/file/d/1LmOpCKv09ZGKlkG6VNleWBZ792nIuVOf/view?usp=sharing" target="_blank" rel="noopener">Feel free to check our proposal&lt;/a> for more details!&lt;/p>
&lt;p>We propose &lt;em>GenScale&lt;/em>, a genomics workload benchmarking tool which can achieve both the scale and speed necessary for characterizing performance under large-scale settings. &lt;em>GenScale&lt;/em> will be built on top of industrial-grade cluster manager (e.g. Kubernetes), metrics collection &amp;amp; monitoring systems (e.g. Prometheus), and will support comprehensive set of applications used in state-of-art genomics workflows. Initial version developed during this project will include DNA and RNA alignment workflows.&lt;/p>
&lt;p>Finally, we believe that open access and reproducible research will greatly accelerate the pace of scientific discovery. We aim to package our artefacts and generated datasets in ways that makes it easiest to replicate, analyze, and build upon. I personally look forward to learn from &amp;amp; contribute to the open source community!&lt;/p></description></item><item><title>Developing a Pipeline to Benchmark Drift Management Strategies</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/last/20240610-williamn/</link><pubDate>Mon, 10 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/anl/last/20240610-williamn/</guid><description>&lt;p>With guidance from mentors &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ray-andrew-sinurat/">Ray Andrew Sinurat&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sandeep-madireddy/">Sandeep Madireddy&lt;/a> under the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/anl/last">LAST&lt;/a> project, I aim to develop a pipeline to benchmark the efficacy of various drift management algorithms.&lt;/p>
&lt;p>Despite the abundance of literature on this subject, reproducibility remains a challenge due to the lack of available source code. As such, by crafting this pipeline, I aim to create standardized platform for researchers and practitioners to compare several state-of-the-art drift management approaches. Through rigorous testing and benchmarking, we seek to identify the most effective algorithms across a spectrum of drift scenarios, including gradual, sudden, and recurring drift.&lt;/p>
&lt;p>This final deliverable of this pipeline will be packaged into a Chameleon Trovi Artifact. The pipeline will also be made easily extensible to cater to additional datasets or any custom-made drift-mitigation methods. This is my &lt;a href="https://docs.google.com/document/d/1biPUKMiKrNSegPVFDIyhjKkYeyiyD4hYqQghdsaU4IE/edit?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> for the project.&lt;/p>
&lt;p>See you around!&lt;/p></description></item><item><title>Reproducing and benchmarking scalability bugs hiding in cloud systems</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/osu/scalerep/20240610-shuangliang/</link><pubDate>Mon, 10 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/osu/scalerep/20240610-shuangliang/</guid><description>&lt;p>Hello there!&lt;/p>
&lt;p>I am Shuang Liang, a third-year student studying Computer and Information Science at The Ohio State University. My passion lies in cloud computing and high-performance computing, areas I have explored extensively during my academic journey. I have participated in various projects and competitions, which have honed my technical skills and deepened my interest in distributed systems.&lt;/p>
&lt;p>As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/osu/scalerep">ScaleRep: Reproducing and benchmarking scalability bugs hiding in cloud systems&lt;/a>, my &lt;a href="https://threadeater.github.io/files/Understanding_and_Addressing_Scalability_Bugs_in_Large_Scale_Distributed_Systems%20%281%29.pdf" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of Professor &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/yang-wang/">Yang Wang&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/bogdan-bo-stoica/">Bogdan &amp;quot;Bo&amp;quot; Stoica&lt;/a> aims to tackle the critical challenges posed by scalability bugs in systems like Cassandra, HDFS, and Hadoop. These bugs can lead to severe operational issues such as system downtime and data loss, particularly as systems scale up.&lt;/p>
&lt;p>The project goals include systematically analyzing and documenting scalability bugs, developing protocols to effectively trigger and quantify the impact of these bugs, and creating reproducible artifacts and detailed investigation scripts to aid in bug analysis.&lt;/p>
&lt;p>Our project will involve rigorous bug report analysis, reproduction of scalability bugs, and a comparative study of system behaviors before and after bug fixes. We aim to develop methodologies that enhance the reliability and performance of large-scale distributed systems, providing valuable insights and resources to the open-source community.&lt;/p>
&lt;p>Stay tuned to explore the future of reliable and scalable distributed systems!&lt;/p></description></item><item><title>BenchmarkST: Cross-Platform, Multi-Species Spatial Transcriptomics Gene Imputation Benchmarking</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uci/benchmarkst/20240609-qianru/</link><pubDate>Sun, 09 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/uci/benchmarkst/20240609-qianru/</guid><description>&lt;p>Hello! My name is Qianru, and I will be working on a project to improve spatial transcriptomics during Google Summer of Code 2024. My project, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uci/benchmarkst/">Benchmarking Gene Imputation Methods for Spatial Transcriptomics&lt;/a>, is mentored by &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ziheng-duan/">Ziheng Duan&lt;/a> and &lt;a href="https://users.soe.ucsc.edu/~cormac/" target="_blank" rel="noopener">Cormac Flanagan&lt;/a>. The goal is to create a standard platform to evaluate methods for filling in missing gene data, which is a big challenge in spatial transcriptomics. &lt;a href="https://drive.google.com/file/d/1ydqGuuzpNgPpVUBvTiFvF1q7qV9gA_wm/view?usp=sharing" target="_blank" rel="noopener">My proposal can be viewed here!&lt;/a>&lt;/p>
&lt;p>Spatial transcriptomics lets us see where genes are active in tissues, giving us insight into how cells interact in their natural environment. However, current methods often miss some gene data, making it hard to get a complete picture. Gene imputation can help fill in these gaps.&lt;/p>
&lt;p>My project will:&lt;/p>
&lt;p>Create a benchmark dataset to standardize gene imputation tasks across different platforms, species, and organs.&lt;/p>
&lt;p>Compare various gene imputation methods to see how well they work in different scenarios.&lt;/p>
&lt;p>Develop a user-friendly Python package with tools for gene imputation to help researchers improve their data.&lt;/p>
&lt;p>I&amp;rsquo;m excited to contribute to this project and help advance the field of spatial transcriptomics by making data analysis more accurate and comprehensive.&lt;/p></description></item><item><title>FEP-Bench: Benchmarking for Enhanced Feature Engineering and Preprocessing in Machine Learning</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uchicago/fep_bench/</link><pubDate>Mon, 03 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uchicago/fep_bench/</guid><description>&lt;p>&lt;strong>Project Idea Description&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Storage systems, machine learning&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, PyTorch, Bash scripting, Linux, Machine Learning modeling&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/yuyang-roy-huang/">Yuyang (Roy) Huang&lt;/a> (primary contact), &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/swami-sundararaman/">Swami Sundararaman&lt;/a>&lt;/li>
&lt;li>&lt;strong>Contributor(s):&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/lihaowen-jayce-zhu/">Lihaowen (Jayce) Zhu&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>In the realm of machine learning (ML), preprocessing of data is a critical yet often underappreciated phase, consuming approximately 80% of the time in common ML tasks. This extensive time consumption can be attributed to various challenges encountered from both data and computation perspectives.&lt;/p>
&lt;p>From the data side, one significant challenge is the slow retrieval of data from data lakes, which are storage repositories that hold a vast amount of raw data in its native format. However, the process of extracting this data can be slow, causing computation cycles to wait for data arrival and leading to delays in the entire preprocessing phase. Furthermore, the size of the data often exceeds the memory capacity of standard computing systems. This is a frequent occurrence in ML, as datasets are typically large and complex. Handling such large datasets requires sophisticated memory management techniques to ensure efficient preprocessing without overwhelming the system&amp;rsquo;s memory.&lt;/p>
&lt;p>On the computation side, a naive solution to data operations, especially aggregation, often leads to inefficiencies. These operations may require grouping a large chunk of data as a prerequisite before performing any actual computation. This grouping, without careful configuration and management, can trigger serious data shuffling, leading to extensive remote data movement when the data is distributed across various storage systems. Such data movement is not only time-consuming but also resource-intensive.&lt;/p>
&lt;p>To mitigate these challenges, there is a pressing need to design better caching, prefetching, and heuristic strategies for data preprocessing. The team aims to significantly reduce the time and resources required for preprocessing by optimizing data retrieval and computational processes.&lt;/p>
&lt;p>However, prior to the design and implementation of such a system, a systematic understanding of the preprocessing workflow is essential. Hence, throughout the program, the students will need to:&lt;/p>
&lt;ul>
&lt;li>Understand the current system used to preprocess data for ML training, for example, Hadoop or Spark.&lt;/li>
&lt;li>Collect the common datasets used for different types of ML models.&lt;/li>
&lt;li>Collect the typical operations used for preprocessing these datasets.&lt;/li>
&lt;li>Benchmark the performance in these operations under the existing frameworks under various experimental settings.&lt;/li>
&lt;li>Package the benchmark such that the team can later use it for reproduction or evaluation.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Project Deliverable&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>A rolodex for the commonly used dataset and corresponding preprocess operations and expected output formats/types&lt;/li>
&lt;li>A Chameleon Trovi package that preprocess the dataset with single-machine preprocessing framework like pandas&lt;/li>
&lt;li>A Chameleon Trovi package that preprocess the dataset in an existing distributed computation framework like Hadoop or Spark&lt;/li>
&lt;/ul></description></item><item><title>SLICES/pos: Reproducible Experiment Workflows</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/tum/slices/20240517-warmuth/</link><pubDate>Fri, 17 May 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/tum/slices/20240517-warmuth/</guid><description>&lt;p>Servus everyone!&lt;/p>
&lt;p>I&amp;rsquo;m Kilian Warmuth, currently pursuing my M.Sc. in Computer Science at the Technical University of Munich (TUM) after completing my B.Sc. in Computer Science at the same institution. Throughout my academic education, I have taken courses in Advanced Computer Networks, which have deepened my understanding and expertise in the field. I was involved in an interdisciplinary project where I created a testing toolchain for the packet generator MoonGen using the SCLICES/pos testbed. This experience provided me with extensive hands-on exposure to pos, increasing my interest in reproducible testbeds and the enhancement of pos.&lt;/p>
&lt;p>As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/tum/slices">SLICES/pos: Reproducible Experiment Workflows&lt;/a> project, my &lt;a href="https://1drv.ms/b/s!AkZKU_K5p7iNnQfzdH2eXFsnKfdU?e=skZmXc" target="_blank" rel="noopener">proposal&lt;/a>, under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sebastian-gallenmuller/">Sebastian Gallenmüller&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/kate-keahey/">Kate Keahey&lt;/a>, and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/georg-carle/">Georg Carle&lt;/a>, aims to address the challenges of managing experiment results within the &lt;a href="https://www.net.in.tum.de/fileadmin/bibtex/publications/papers/gallenmueller_scholz_conext2021.pdf" target="_blank" rel="noopener">pos framework&lt;/a>.&lt;/p>
&lt;p>The project leverages the RO-Crate open standard to organize result data systematically, enhancing accessibility and comprehensibility of research findings. We aim to improve experiment documentation for the pos testbed, providing clear setup and execution instructions to ensure reproducibility. Therefore we need to simplify the dissemination of research findings by automating the creation of RO-Crates, allowing researchers to focus on experiment design without needing to be familiar with RO-Crate standards. Implementing these standards will enhance the sharing of results by automating publication processes for open repositories, promoting transparency and collaboration.&lt;/p>
&lt;p>We also aim to enhance the portability of experiments across different testbeds, with a particular focus on the Chameleon Testbed. We will develop introductory examples demonstrating how to use pos in various testbed environments. Additionally, we will design and execute a portable complex network experiment based on SLICES/pos. To validate the portability enhancements, we will perform experiments on the Chameleon testbed. Finally, we will refine the portability of pos experiments within Chameleon to ensure seamless execution.&lt;/p>
&lt;p>Stay tuned to explore the future of reproducible testbeds!&lt;/p></description></item><item><title>HDEval: Benchmarking LLMs that Generate Verilog/Chisel Modules From Natural Language</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240611-ashwinbardhwaj/</link><pubDate>Tue, 14 May 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240611-ashwinbardhwaj/</guid><description>&lt;p>Hi everyone!&lt;/p>
&lt;p>I&amp;rsquo;m Ashwin Bardhwaj, currently pursuing a bachelors in Electrical Engineering and Computer Science at UC Berkeley. I was recently involved in a project to implement a secure hardware encryption enclave in Verilog. That&amp;rsquo;s why I was excited to work with the MASC group to evaluate how existing generalized LLMs (such as ChatGPT 4 or StarCoder) can generate accurate Verliog/Chisel code from English and assist in the hardware development process.&lt;/p>
&lt;p>As part of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/livehd">Micro Architecture Santa Cruz (MASC)&lt;/a> my &lt;a href="https://drive.google.com/file/d/1Fnr85lqrTs7OBohfHfSZI2K3wZU3zJm0/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jose-renau/">Jose Renau&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sakshi-garg/">Sakshi Garg&lt;/a> looks to create a suite of benchmark programs for &lt;a href="https://github.com/masc-ucsc/hdeval" target="_blank" rel="noopener">HDEval&lt;/a>.&lt;/p>
&lt;p>The deliverable of this project is to create multiple large HDL benchmarks along with a respective set of prompts. Using yosys to implement Logic Equivalence Check, we are able to prove through formal verification that the generated code will exhibit the same behavior as the benchmark. In addition, we can also consider the performance and resource utilization of the generated code as a metric.&lt;/p></description></item><item><title>(Re)Evaluating Artifacts for Understanding Resource Artifacts</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/depaul/reevaluating/</link><pubDate>Wed, 20 Mar 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/depaul/reevaluating/</guid><description>&lt;p>&lt;strong>Project Idea Description&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Virtualization, Containerization, Profiling, Reproducibility&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> C and Python and DevOps experience.&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large; 350 hours&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/tanu-malik/">Tanu Malik&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>This project aims to characterize computer-science related artifacts that are either submitted to conferences or deposited in reproducibility hubs such as Chameleon. We aim to characterize experiments into different types and understand reproducibility requirements of this rich data set, possibly leading to a benchmark.
We will then understand packaging requirements, especially of distributed experiments and aim to instrument a package archiver to reproduce a distributed experiment. Finally, we will use learned experiment characteristics to develop a classifier that will determine alternative resources where experiment can be easily reproduced.&lt;/p>
&lt;p>&lt;strong>Project Deliverable&lt;/strong>
Specific Tasks include:
A pipeline consisting of a set of scripts to characterize artifacts.
Packaged artifacts and an analysis report with open-sourced data about the best guidelines to package using Chameleon.
A classifier system based on artifact and resource characteristics.&lt;/p></description></item><item><title>Auto Appendix</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/tuwien/autoappendix/</link><pubDate>Mon, 11 Mar 2024 14:48:10 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/tuwien/autoappendix/</guid><description>&lt;p>The SC Conference Series, a leading forum on High Performance Computing (HPC), supports scientific rigor through an enhanced reproducibility of accepted papers.
To that end, all manuscripts submitted to the SC Technical Papers program must contain an Artifact Description.
Authors of accepted papers may request reproducibility badges, for which an Appendix describing the
Artifact Evaluation is required.&lt;/p>
&lt;p>In recent years, &lt;a href="https://www.chameleoncloud.org" target="_blank" rel="noopener">Chameleon&lt;/a> has facilitated SC&amp;rsquo;s reproducibility initiative by enabling authors to develop and share computational, reproducible artifacts through the Chameleon cloud.
The Chameleon platform helps authors and reviewers to easily share computational artifacts,
which are included in the papers&amp;rsquo; artifact appendices.&lt;/p>
&lt;p>The proposed project aims to assess all AD/AE appendices submitted for reproducibility badge requests. This evaluation will focus on AD/AE appendices that utilized the Chameleon cloud as the execution platform, examining their potential for automation.
Our aim is to evaluate the feasibility of fully automating various components of the appendices.
Students will engage directly with the chairs of the SC24 Reproducibility Initiative in this effort.&lt;/p>
&lt;h3 id="advancing-sc-conference-artifact-reproducibility-via-automation">&lt;strong>Advancing SC Conference Artifact Reproducibility via Automation&lt;/strong>&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Reproducibility&lt;/code> &lt;code>Reproducible Research&lt;/code> &lt;code>Artifact Evaluation&lt;/code> &lt;code>Open Science&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: HPC, Cloud computing, Chameleon, MPI, OpenMP, CUDA&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Difficult&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sascha-hunold/">Sascha Hunold&lt;/a>&lt;/li>
&lt;li>&lt;strong>Tasks&lt;/strong>:
&lt;ul>
&lt;li>Perform an analysis of the current limitations of AD/AE appendices submitted for Artifact Evaluation.&lt;/li>
&lt;li>Re-run the computational artifacts to identify areas for enhancement, with a primary objective of achieving full automation of Artifact Evaluation using the Chameleon cloud.&lt;/li>
&lt;li>Evaluate the existing automation capabilities of the Chameleon cloud.&lt;/li>
&lt;li>Develop a set of recommendations for structuring Computational Artifacts, aimed at benefiting future SC conferences.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul></description></item><item><title>ML-Powered Problem Detection in Chameleon</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uchicago/ml_detect_chameleon/</link><pubDate>Wed, 06 Mar 2024 16:33:57 -0600</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uchicago/ml_detect_chameleon/</guid><description>&lt;p>Today’s Continuous Integration/Continuous Development (CI/CD) trends encourage
rapid design of software using a wide range of software components, followed by
frequent updates that are immediately deployed on the cloud. The complexity of
cloud systems along with the component diversity and break-neck pace of
development amplify the difficulty in identifying or fixing problems related to
performance, resilience, and security. Furthermore, existing approaches that
rely on human experts—e.g., methods involving manually-written
rules/scripts—have limited applicability to modern CI/CD processes, as they are
fragile, costly, and often not scalable. Consequently, there is growing
interest in applying machine learning (ML) based methods for identifying
vulnerabilities in code, non-compliant or otherwise problematic software, and
resilience problems in systems and networks. However, despite some success
stories in applying AI for cloud operations (e.g., in resource management),
much of cloud operations still rely on human-centric methods, which require
updates as the cloud undergoes CI/CD cycles. The goal of this summer project is
to explore methods of automation for the Chameleon Cloud to enable faster
detection and diagnosis of problems. Overall, the project will contribute to an
overarching vision of building an infrastructure that collects and synthesizes
cross-layer data from large-scale cloud systems, applying ML-powered methods to
automate cloud ops, and, further, making this data available to researchers
through coherent APIs and analytics engines.&lt;/p>
&lt;p>Currently, Chameleon uses runbooks as manual guides for operational tasks,
including routine maintenance and troubleshooting. However, these traditional
runbooks often fall short in dynamic and fast-paced CI/CD environments, as they
lack the flexibility to adapt to changes in software versions, deployment
configurations, and the unique challenges of emerging issues. To overcome these
challenges, the project will leverage ML to automate anomaly detection based on
telemetry data collected from Chameleon Cloud&amp;rsquo;s monitoring frameworks. This
method will not only facilitate rapid identification of performance anomalies
but also enable automated generation of runbooks. These runbooks can then offer
operators actionable steps to resolve issues efficiently, thereby making the
anomaly mitigation process more efficient. Furthermore, this approach supports
the automatic creation of targeted runbooks for newly generated support
tickets, enhancing response times and system reliability.&lt;/p>
&lt;p>Time-permitting, using a collection of automated runbooks (each targeting a
specific problem), we will analyze support tickets, common problems, and their
frequency to offer insights and suggestions to help roadmapping for Chameleon
Cloud to offer the best return on investment on fixing problems.&lt;/p>
&lt;p>A key aspect of this summer project is enhancing the reproducibility of
experiments in the cloud and improving data accessibility. We plan to design
infrastructures and APIs so that the telemetry data that is essential for
anomaly detection and automated runbooks is systematically documented and made
available. We also aim to collect and share insights and modules on applying ML
for cloud operations, including ML pipelines, data labeling strategies, data
preprocessing techniques, and feature engineering. By sharing these insights,
we aim to promote best practices and support reproducible experiments on public
clouds, thus fostering future ML-based practices within the Chameleon Cloud
community and beyond. Time permitting, we will explore applying lightweight
privacy-preserving approaches on telemetry data as well.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Machine Learning&lt;/code>, &lt;code>Anomaly Detection&lt;/code>, &lt;code>Automated Runbooks&lt;/code>, &lt;code>Telemetry Data&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>:
&lt;ul>
&lt;li>Proficiency in Machine Learning: Understanding of ML algorithms for anomaly detection and automation.&lt;/li>
&lt;li>Cloud Computing Knowledge: Familiarity with CI/CD environments and cloud architectures.&lt;/li>
&lt;li>Programming Skills: Proficiency in languages such as Python, especially in cloud and ML contexts.&lt;/li>
&lt;li>Data Analysis: Ability to analyze telemetry data using data analytics tools and libraries.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Hard&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/michael-sherman/">Michael Sherman&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>ReproNB: Reproducibility of Interactive Notebook Systems</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/depaul/repronb/</link><pubDate>Mon, 26 Feb 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/depaul/repronb/</guid><description>&lt;p>&lt;strong>Project Idea Description&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> HPC, MPI, distributed systems&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> C++, Python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Difficult&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large; 350 hours&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/tanu-malik/">Tanu Malik&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Notebooks have gained wide popularity in scientific computing. A notebook is both a web-based interactive front- end to program workflows and a lightweight container for sharing code and its output. Reproducing notebooks in different target environments, however, is a challenge. Notebooks do not share the computational environment in which they are executed. Consequently, despite being shareable they are often not reproducible. We have developed &lt;a href="https://github.com/depaul-dice/Flinc" target="_blank" rel="noopener">FLINC&lt;/a> (see also &lt;a href="https://dice.cs.depaul.edu/pdfs/pubs/C31.pdf" target="_blank" rel="noopener">eScience'22 paper&lt;/a>) to address this problem. However, it currently does not support all forms of experiments, especially those relating to HPC experiments. In this project we will extend FLINC to HPC experiments. This will involve using recording and replaying mechanisms such as &lt;a href="https://kento.github.io/code/" target="_blank" rel="noopener">ReMPI&lt;/a> and &lt;a href="https://rr-project.org/" target="_blank" rel="noopener">rr&lt;/a> within FLINC.&lt;/p>
&lt;p>&lt;strong>Project Deliverable&lt;/strong>&lt;/p>
&lt;p>The project deliverable will be a set of HPC experiments that are packaged with FLINC and available on Chamaeleon.&lt;/p></description></item><item><title>SciStream-Rep: An Artifact for Reproducible Benchmarks of Scientific Streaming Applications</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/anl/scistream/</link><pubDate>Mon, 26 Feb 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/anl/scistream/</guid><description>&lt;p>&lt;a href="https://github.com/scistream/scistream-proto" target="_blank" rel="noopener">SciStream&lt;/a> is a framework and toolkit that attempts to tackle the problem of enabling high-speed(+100Gbps), memory-to-memory data streaming in scientific environments. This task is particularly challenging because data producers (e.g., data acquisition applications on scientific instruments, simulations on supercomputers) and consumers (e.g., data analysis applications) may be in different security domains and thus require bridging of those domains. Furthermore, either producers, consumers, or both may lack external network connectivity and thus require traffic forwarding proxies. If you want to learn more, please take a look at our &lt;a href="https://dl.acm.org/doi/abs/10.1145/3502181.3531475" target="_blank" rel="noopener">HPDC'22 paper&lt;/a>.&lt;/p>
&lt;h3 id="scistream-rep-an-artifact-for-reproducible-benchmarks-of-scientific-streaming-applications">SciStream-Rep: An Artifact for Reproducible Benchmarks of Scientific Streaming Applications&lt;/h3>
&lt;p>&lt;strong>Project Idea Description:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: Network Performance Testing, Benchmarking, Data Streaming, Reproducibility&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Python, Scripting, Linux, Containers, Networking, benchmark tools&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350) hours&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/joaquin-chung/">Joaquin Chung&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/flavio-castro/">Flavio Castro&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>This project focuses on expanding the scope of testing SciStream’s architecture by incorporating a variety of traffic patterns based on real scientific applications. The goal is to understand how different traffic patterns influence the performance of memory-to-memory data streaming in scientific scenarios by creating artifacts for reproducible experiments. Additionally, the project will explore the use of different forwarding elements, such as Nginx and HAProxy, to assess their impact on data streaming efficiency and security.&lt;/p>
&lt;p>Reproducibility is especially difficult in shared network environments such as Chameleon and FABRIC testbeds. We can expect similar results for two exact same experiments, only when the network condition (external to our traffic) is similar for both experiments. By creating reproducible artifacts for Chameleon and FABRIC, we can build statistical confidence in the measured results by multiple repetitions from other researchers.&lt;/p>
&lt;p>The Specific Tasks of the Project Include:&lt;/p>
&lt;ul>
&lt;li>Developing a set of benchmarks to measure the performance of scientific streaming applications across a broader range of traffic patterns.&lt;/li>
&lt;li>Creating a set of artifacts for generating traffic patterns typical of data streaming applications.&lt;/li>
&lt;li>Deploying various forwarding elements within the SciStream architecture for the Chameleon and FABRIC testbeds.&lt;/li>
&lt;li>Compiling a best practices document detailing the optimal configurations for Scistream.&lt;/li>
&lt;/ul>
&lt;h3 id="scistream-lb-a-dynamic-load-balancing-solution-using-programmable-network-devices">Scistream-LB: A Dynamic Load Balancing Solution Using Programmable network devices&lt;/h3>
&lt;p>&lt;strong>Project Idea Description:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: Network Performance Testing, Data Streaming, Reproducibility, Programmable Data Planes&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Python/Scripting, Linux, Docker/Containers, Networking fundamentals, Experience with OpenFlow/P4 programming&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Difficult&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350) hours&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/joaquin-chung/">Joaquin Chung&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/flavio-castro/">Flavio Castro&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The aim of this project is to create a specialized forwarding element using OpenFlow (OF) or P4 programming languages, tailored to enhance the SciStream data plane. This new development seeks to enable a more flexible and hardware-based (and therefore more efficient) alternative to conventional software-based forwarding mechanisms like NGINX or HAProxy, specifically designed to support the needs of high-performance data streaming environments for scientific applications. The OF/P4 forwarding elements will be packaged as artifacts for reproducibility experiments in Chameleon and FABRIC testbeds. Reproducibility is especially difficult in shared network environments such as Chameleon and FABRIC testbeds. We can expect similar results for two exact same experiments, only when the network condition (external to our traffic) is similar for both experiments. By creating reproducible artifacts for Chameleon and FABRIC, we can build statistical confidence in the measured results by multiple repetitions from other researchers.&lt;/p>
&lt;p>Specific tasks of the project include:&lt;/p>
&lt;ul>
&lt;li>Design and implementation of an OF/P4-based forwarding element that can be seamlessly integrated with the data plane of SciStream’s architecture.&lt;/li>
&lt;li>Forwarding logic that supports efficient and secure memory-to-memory data streaming.&lt;/li>
&lt;li>A set of benchmarks for evaluating the new forwarding element against traditional options, focusing on improvements in throughput, latency, and security.&lt;/li>
&lt;li>An investigation on the potential advantages of programmable network elements for detailed control over data streaming paths and security configurations.&lt;/li>
&lt;li>A package of the newly developed forwarding elements as artifacts for reproducibility experiments in Chameleon and FABRIC testbeds.&lt;/li>
&lt;/ul></description></item><item><title>Chameleon Trovi Redesign</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uchicago/trovi/</link><pubDate>Wed, 21 Feb 2024 13:43:55 -0600</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uchicago/trovi/</guid><description>&lt;p>&lt;a href="https://www.chameleoncloud.org/experiment/share" target="_blank" rel="noopener">Trovi&lt;/a> on
&lt;a href="https://www.chameleoncloud.org" target="_blank" rel="noopener">Chameleon&lt;/a> is an open-source service designed
to significantly enhance the &lt;a href="https://wordpress.cels.anl.gov/nimbusproject/wp-content/uploads/sites/116/2023/08/Reproducibility_On_Chameleon-3.pdf" target="_blank" rel="noopener">practical
reproducibility&lt;/a>
of computer science research. By allowing Chameleon users to upload, share, and
access packaged experiments and other research artifacts, Trovi aims to
streamline the process of replicating and building upon existing studies. This
capability is crucial in the scientific community, where the ability to
accurately reproduce research results is as fundamental to validating,
critiquing, and extending scientific findings as reading papers. The importance
of Trovi lies in its potential to serve as a centralized hub that facilitates
the exchange of valuable research outputs, promotes transparency, and fosters
collaboration among researchers. By improving the ease with which experiments
can be replicated and data can be shared, Trovi supports the advancement of
knowledge and innovation in the field of computer science, making it an
essential tool for researchers seeking to contribute to the development of
reproducible and robust scientific research.&lt;/p>
&lt;p>This project will focus on the evolution of Trovi. It will aim to enhance Trovi
as a tool to advance practical reproducibility in CS research. Students will
evaluate the most important use cases and enabling features necessary to
enhance Trovi&amp;rsquo;s functionality and user experience. With these design insights,
students will then create a robust interface that allows researchers to
integrate experiment code and data easily as packaged artifacts, similar to the
user-friendly design of Google Colab, and build off other users&amp;rsquo; artifacts to
create novel experiments, similar to the design of GitHub. Furthermore,
students will create comprehensive documentation with valuable insights into
what works well and what requires improvement, creating a dynamic feedback loop
to guide the ongoing redesign process. Lastly, students will actively
participate in designing webinars, creating and posting video tutorials, and
organizing academic events at the University of Chicago to showcase the work on
Trovi. This multifaceted project ensures a well-rounded experience and fosters
a collaborative learning environment.&lt;/p>
&lt;p>Each of the project ideas below focuses on a different aspect of the overall
goal to enhance Trovi as a tool for advancing practical reproducibility in
CS research. They are designed to offer a comprehensive approach,
from technical development to community engagement, ensuring a well-rounded
enhancement of the service.&lt;/p>
&lt;h3 id="user-interface-redesign-for-experiment-artifacts-sharing">&lt;strong>User Interface Redesign for Experiment Artifacts Sharing&lt;/strong>&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>User Interface Design&lt;/code> &lt;code>User Experience&lt;/code> &lt;code>Web Development&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: HTML/CSS, JavaScript, UX design principles&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Moderate to Hard&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium to Large&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mark-powers/">Mark Powers&lt;/a>&lt;/li>
&lt;li>&lt;strong>Tasks&lt;/strong>:
&lt;ul>
&lt;li>Conduct user research to understand the needs and pain points of current
and potential Trovi users.&lt;/li>
&lt;li>Design wireframes and prototypes that incorporate user feedback and aim to
simplify the process of uploading, sharing, and reusing research artifacts.&lt;/li>
&lt;li>Implement the frontend redesign using a modern web framework to ensure
responsiveness and ease of use.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="packaged-artifacts-integration-system">&lt;strong>Packaged Artifacts Integration System&lt;/strong>&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Cloud Computing&lt;/code> &lt;code>Data Management&lt;/code> &lt;code>Web APIs&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Python, RESTful APIs, Docker, Git&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Hard&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mark-powers/">Mark Powers&lt;/a>&lt;/li>
&lt;li>&lt;strong>Tasks&lt;/strong>:
&lt;ul>
&lt;li>Develop a system that allows users to easily package and upload their
experimental code and data to Trovi.&lt;/li>
&lt;li>Create a standardized format or set of guidelines for packaging experiments
to ensure consistency and ease of use.&lt;/li>
&lt;li>Implement API endpoints that enable automated uploads, downloads, and
integration with other tools like GitHub or Zenodo.&lt;/li>
&lt;li>Test the system with real-world experiments to ensure reliability and ease
of integration.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="community-engagement-and-educational-materials">&lt;strong>Community Engagement and Educational Materials&lt;/strong>&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Educational Technology&lt;/code> &lt;code>Community Building&lt;/code> &lt;code>Content Creation&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Video Editing, Public Speaking, Event Planning&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Moderate&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mark-powers/">Mark Powers&lt;/a>&lt;/li>
&lt;li>&lt;strong>Tasks&lt;/strong>:
&lt;ul>
&lt;li>Design and organize webinars that introduce Trovi and its new features to
the research community.&lt;/li>
&lt;li>Create engaging video tutorials that guide users through the process of
using Trovi for their research needs.&lt;/li>
&lt;li>Develop comprehensive documentation that covers both basic and advanced use
cases, troubleshooting, and tips for effective collaboration using Trovi.&lt;/li>
&lt;li>Organize academic events, such as workshops or hackathons, that encourage
the use of Trovi for collaborative research projects.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="feedback-loop-and-continuous-improvement-system">&lt;strong>Feedback Loop and Continuous Improvement System&lt;/strong>&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Software Engineering&lt;/code> &lt;code>Data Analysis&lt;/code> &lt;code>User Feedback&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Python, SQL, Data Visualization, Web Development&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Moderate&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mark-powers/">Mark Powers&lt;/a>&lt;/li>
&lt;li>&lt;strong>Tasks&lt;/strong>:
&lt;ul>
&lt;li>Implement a system within Trovi for collecting, storing, and analyzing user
feedback and usage data.&lt;/li>
&lt;li>Develop dashboards that visualize feedback trends and identify areas for
improvement.&lt;/li>
&lt;li>Create mechanisms for users to easily report bugs, request features, and
offer suggestions for the platform.&lt;/li>
&lt;li>Use the collected data to prioritize development efforts and continuously
update the platform based on user needs and feedback.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul></description></item><item><title>Data leakage in applied ML: reproducing examples of irreproducibility</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/nyu/data-leakage/</link><pubDate>Wed, 21 Feb 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/nyu/data-leakage/</guid><description>&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> applied machine learning, data leakage, reproducibility&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, data analysis, machine learning&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fraida-fund/">Fraida Fund&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mohamed-saeed/">Mohamed Saeed&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Project Idea Description&lt;/strong>&lt;/p>
&lt;p>Data leakage &lt;a href="https://www.cell.com/patterns/pdfExtended/S2666-3899%2823%2900159-9" target="_blank" rel="noopener">has been identified&lt;/a> as a major cause of irreproducibility of a paper&amp;rsquo;s findings, when machine learning techniques are applied to problems in science. Data leakage includes errors such as:&lt;/p>
&lt;ul>
&lt;li>pre-processing before splitting into training/test sets&lt;/li>
&lt;li>feature selection before splitting into training/test sets&lt;/li>
&lt;li>duplicated data points in both training and test sets&lt;/li>
&lt;li>temporal leakage (e.g. shuffled K-fold cross validation with temporal data)&lt;/li>
&lt;li>group leakage (e.g. shuffled K-fold cross validation with data that has group structure)&lt;/li>
&lt;/ul>
&lt;p>and leads to an overly optimistic evaluation of model performance, such that the finding may no longer be the same when the error is corrected.&lt;/p>
&lt;p>Despite the seriousness of this problem, data leakage is often not covered in introductory machine learning courses, and many users of machine learning across varied science domains are unaware of it. Even those who have learned &amp;ldquo;rules&amp;rdquo; for avoiding data leakage (e.g. &amp;ldquo;never do feature selection on the test set&amp;rdquo;) may not understand the reasons for these &amp;ldquo;rules&amp;rdquo;, and how important they are for ensuring that the final result is valid and reproducible.&lt;/p>
&lt;p>The goal of this project is to create &lt;em>learning materials&lt;/em> demonstrating how instances of data leakage invalidate a result. These materials should be easily adoptable by instructors teaching machine learning in a wide variety of contexts, including those teaching a non-CS audience. To achieve this, the project proposes to re-implement published results that have been affected by data leakage, and package these implementations along with supporting material in a format suitable for use in classrooms and by independent learners. For each &amp;ldquo;irreproducible result&amp;rdquo;, the &amp;ldquo;package&amp;rdquo; should include -&lt;/p>
&lt;ul>
&lt;li>a re-implementation of the original result&lt;/li>
&lt;li>an explanation of the data leakage problem affecting the result, with an implementation of a &amp;ldquo;toy example&amp;rdquo; on synthetic data&lt;/li>
&lt;li>a re-implementation of the result without the data analysis error, to show how the finding is affected&lt;/li>
&lt;li>and examples of exam or homework questions that an instructor adopting this package may use to assess understanding.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Writing a successful proposal for this project&lt;/strong>&lt;/p>
&lt;p>A good proposal for this project should include, for at least a few &amp;ldquo;types&amp;rdquo; of data leakage mentioned above -&lt;/p>
&lt;ul>
&lt;li>a specific published result that could be used as an exemplar (you may find ideas among the review papers listed &lt;a href="https://reproducible.cs.princeton.edu/#rep-failures" target="_blank" rel="noopener">here&lt;/a>)&lt;/li>
&lt;li>a brief description of the details of the experiment that will reproduce that result (e.g. what data is used, what machine learning technique is used, what are the hyperparameters used for training)&lt;/li>
&lt;li>and an explanation of why this result is suitable for this use (it uses a publicly available dataset, a machine learning technique that is familiar and accessible to students in an introductory course, the paper has sufficient detail to reproduce the result, etc.)&lt;/li>
&lt;/ul>
&lt;p>The contributor will need to create learning materials that are written in a clear, straightforward, and concise manner, without unncessary jargon. The proposal should show evidence of the contributor&amp;rsquo;s writing abilities.&lt;/p>
&lt;p>&lt;strong>Github link&lt;/strong>&lt;/p>
&lt;p>There is no pre-existing Git repository for this project - at the beginning of the summer, the contributor will create a new repository in the &lt;a href="https://github.com/teaching-on-testbeds/" target="_blank" rel="noopener">Teaching on Testbeds&lt;/a> organization, and the project materials will &amp;ldquo;live&amp;rdquo; there.&lt;/p>
&lt;p>To get a sense of the type of code you would be writing, here is an example of a learning module related to data leakage (however, it is not in the format described above): &lt;a href="https://colab.research.google.com/github/ffund/ml-notebooks/blob/master/notebooks/4-linear-regression-case-study-part-2.ipynb" target="_blank" rel="noopener">Beauty in the Classroom&lt;/a>&lt;/p>
&lt;p>&lt;strong>Project Deliverables&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&amp;ldquo;Packages&amp;rdquo; of learning materials for teaching about common types of data leakage&lt;/li>
&lt;li>&lt;a href="https://chameleoncloud.org/experiment/share/" target="_blank" rel="noopener">Trovi&lt;/a> artifacts for &amp;ldquo;playing back&amp;rdquo; each of the &amp;ldquo;packages&amp;rdquo;&lt;/li>
&lt;/ul></description></item><item><title>Evaluating congestion controls past and future</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/nyu/congestion-control/</link><pubDate>Wed, 21 Feb 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/nyu/congestion-control/</guid><description>&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> computer networks, congestion control, reproducibility&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, Bash scripting, Linux, computer network performance evaluation&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fraida-fund/">Fraida Fund&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ashutosh-srivastava/">Ashutosh Srivastava&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Project Idea Description&lt;/strong>&lt;/p>
&lt;p>In computer networks, congestion control protocols play an outsize role in determining our experience with networked applications. New congestion control algorithms are regularly proposed by researchers to improve throughput and latency performance, adapt to new types of networks, and align more closely with the needs of new applications.&lt;/p>
&lt;p>However, our understanding of the benefits of a new congestion control protocol depends to a large extent on the evaluation - the network topology, the network delay and throughput, the type of flow, the type of competing traffic - and there is no single standard way to evaluate a congestion control protocol. The &lt;a href="https://pantheon.stanford.edu/static/pantheon/documents/pantheon-paper.pdf" target="_blank" rel="noopener">Pantheon&lt;/a> project (which is no longer supported) sought to fill this gap somewhat and address the problem of reproducibility of congestion control results, but their approach is not easily adapted to evaluation scenarios representative of new types of applications or networks. Nor is it capable of representing the evaluation scenarios in most published results related to congestion control.&lt;/p>
&lt;p>The goal of this project, therefore is to create an evaluation suite for congestion control protocols that can be used to reproduce existing congestion control results in the academic literature, &lt;em>and&lt;/em> to evaluate new protocols under similar evaluation conditions, &lt;em>and&lt;/em> to be easily extended to new scenarios. An &amp;ldquo;evaluation scenario&amp;rdquo; includes:&lt;/p>
&lt;ul>
&lt;li>a Python notebook to realize the network topology on the FABRIC and/or Chameleon testbed, and configure the network characteristics,&lt;/li>
&lt;li>scripts to generate the data flow(s) needed for the evaluation,&lt;/li>
&lt;li>and scripts to capture data from the experiment and visualize the results.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Writing a successful proposal for this project&lt;/strong>&lt;/p>
&lt;p>To write a good proposal for this project, you should review the most influential papers on TCP congestion control, and especially those related to TCP protocols that are available in the Linux kernel.&lt;/p>
&lt;p>Use your findings to explain what your proposed evaluation suite will include (what network topologies, what flow generators), and justify this with reference to the academic literature. Also indicate which &lt;em>specific results&lt;/em> you expect to be able to reproduce using this suite (e.g. include figures from influential papers showing evaluation results! with citation, of course).&lt;/p>
&lt;p>You can also take advantage of existing open source code that reproduces a congestion control result, e.g. &lt;a href="https://github.com/sdatta97/imcbbrrepro" target="_blank" rel="noopener">Replication: When to Use and When Not to Use BBR&lt;/a>, or &lt;a href="https://github.com/ashutoshs25/bbr-dominance-experiments" target="_blank" rel="noopener">Some of the Internet may be heading towards BBR dominance: an experimental study&lt;/a>.&lt;/p>
&lt;p>&lt;strong>Github link&lt;/strong>&lt;/p>
&lt;p>There is no pre-existing Git repository for this project - at the beginning of the summer, the contributor will create a new repository for this project.&lt;/p>
&lt;p>&lt;strong>Project Deliverables&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&amp;ldquo;Packages&amp;rdquo; of evaluation scenarios that can be used to evaluate a congestion control algorithm implemented in the Linux kernel&lt;/li>
&lt;li>&lt;a href="https://chameleoncloud.org/experiment/share/" target="_blank" rel="noopener">Trovi&lt;/a> artifacts for realizing each evaluation scenario on Chameleon&lt;/li>
&lt;/ul></description></item><item><title>Automatic reproducibility of COMPSs experiments through the integration of RO-Crate in Chameleon</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/bsc/ro-crate-compss/</link><pubDate>Mon, 19 Feb 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/bsc/ro-crate-compss/</guid><description>&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Provenance, reproducibility, standards, image creation&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, JSON, Bash scripting, Linux, image creation and deployment&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/raul-sirvent/">Raül Sirvent&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Project Idea Description&lt;/strong>&lt;/p>
&lt;p>The &lt;a href="https://compss.bsc.es/" target="_blank" rel="noopener">COMPSs programming model&lt;/a> provides an interface for the programming of a
sequential application that is transformed in a workflow that, thanks to the COMPSs runtime, is later
scheduled in the available computing resources. Programming is enabled for different languages through
the use of bindings: Java, C/C++ and Python (named PyCOMPSs).
COMPSs is able to generate &lt;a href="https://compss-doc.readthedocs.io/en/stable/Sections/05_Tools/04_Workflow_Provenance.html" target="_blank" rel="noopener">Workflow Provenance information&lt;/a>
after the execution of an experiment. The generated artifact (code + data + recorded metadata)
enables the sharing of results through the use of tools such as the &lt;a href="https://workflowhub.eu/" target="_blank" rel="noopener">WorkflowHub portal&lt;/a>,
that provides the capacity of generating a DOI of the results to include them as permanent references
in scientific papers.&lt;/p>
&lt;p>The format of the metadata generated in COMPSs experiments follows the &lt;a href="https://www.researchobject.org/ro-crate/" target="_blank" rel="noopener">RO-Crate specification&lt;/a>,
and, more specifically, two &lt;a href="https://www.researchobject.org/ro-crate/profiles.html" target="_blank" rel="noopener">profiles&lt;/a>:
the Workflow and Workflow Run Crate profiles. This metadata enables not only the sharing of results, but also their
reproducibility.&lt;/p>
&lt;p>This project proposes the creation of a service that enables the automatic reproducibility of COMPSs experiments
in the Chameleon infrastructure. The service will be able to get a COMPSs crate (artifact that follows the RO-Crate
specification), and, by parsing the available metadata, build a Chameleon compatible image for reproducing the
experiment in the testbed. Small modifications to the COMPSs RO-Crate are foreseen (i.e. the inclusion of third party
software required by the application).&lt;/p>
&lt;p>&lt;strong>Project Deliverables&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Study the different environments and specifications (COMPSs, RO-Crate, Chameleon, Trovi, &amp;hellip;).&lt;/li>
&lt;li>Design the most appropriate integration, considering all the elements involved.&lt;/li>
&lt;li>Integrate PyCOMPSs basic experiments reproducibility in Chameleon.&lt;/li>
&lt;li>Integrate PyCOMPSs complex experiments reproducibility in Chameleon (i.e. with third party software dependencies).&lt;/li>
&lt;/ul></description></item><item><title>BenchmarkST: Cross-Platform, Multi-Species Spatial Transcriptomics Gene Imputation Benchmarking</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uci/benchmarkst/</link><pubDate>Sat, 17 Feb 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uci/benchmarkst/</guid><description>&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> bioinformatics, spatial transcriptomics, gene imputation, benchmarking, cross-platform/species analysis&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong>
&lt;ul>
&lt;li>&lt;strong>Programming Languages:&lt;/strong>
&lt;ul>
&lt;li>Proficient in Python and/or R, commonly used in bioinformatics.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Data Analysis:&lt;/strong>
&lt;ul>
&lt;li>Experience with statistical data analysis and machine learning models.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Bioinformatics Knowledge (not required but preferred):&lt;/strong>
&lt;ul>
&lt;li>Proficiency in bioinformatics and computational biology.&lt;/li>
&lt;li>Familiarity with spatial transcriptomics datasets and platforms.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Advanced&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours). Given the scope of integrating multi-platform, multi-species datasets and the complexity of benchmarking gene imputation methods, this project is substantial. It requires extensive data preparation, analysis, and validation phases, making it suitable for a larger time investment.&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ziheng-duan/">Ziheng Duan&lt;/a> (contact person)&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Project Idea Description&lt;/strong>&lt;/p>
&lt;p>The orchestration of cellular life is profoundly influenced by the precise control of gene activation and silencing across different spatial and temporal contexts. Understanding these complex spatiotemporal gene expression patterns is vital for advancing our knowledge of biological processes, from development and disease progression to adaptation. While single-cell RNA sequencing (scRNA-seq) has revolutionized our ability to profile gene expression across thousands of cells simultaneously, its requirement for cell dissociation strips away the critical spatial context, limiting our comprehension of cellular interactions within their native environments. Recent strides in spatial transcriptomics have started to bridge this gap by enabling spatially resolved gene expression measurements at single-cell or even sub-cellular resolutions. These advancements offer unparalleled opportunities to delineate the intricate tapestry of gene expression within tissues, shedding light on the dynamic interactions between cells and their surroundings.&lt;/p>
&lt;p>Despite these technological advances, a significant challenge remains: the datasets generated by spatial transcriptomic technologies are often incomplete, marred by missing gene expression values due to various technical and biological constraints. This limitation severely impedes our ability to fully interpret these rich datasets and extract meaningful insights from them. Gene imputation emerges as a pivotal solution to this problem, aiming to fill in these missing data points, thereby enhancing the resolution, quality, and interpretability of spatial transcriptomic datasets.&lt;/p>
&lt;p>Recognizing the critical importance of this task, there is a pressing need for a unified benchmarking platform that can facilitate the evaluation and comparison of gene imputation methods across a diverse array of samples, spanning multiple sampling platforms, species, and organs. Currently, the bioinformatics and spatial transcriptomics fields lack such a standardized framework, hindering progress and innovation. To address this gap, our project aims to establish a comprehensive gene imputation dataset that encompasses a wide range of conditions and parameters. We intend to reproduce known methods and assess their efficacy, providing a solid and reproducible foundation for future advancements in this domain.&lt;/p>
&lt;p>&lt;strong>Project Deliverable&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>A comprehensive, preprocessed benchmark dataset that spans multiple sampling platforms, species, and organs, aimed at standardizing gene imputation tasks in spatial transcriptomics.&lt;/li>
&lt;li>An objective comparison of state-of-the-art gene imputation methodologies, enhancing the understanding of their performance and applicability across diverse biological contexts.&lt;/li>
&lt;li>A user-friendly Python package offering a suite of gene imputation tools, designed to fulfill the research needs of the spatial transcriptomics community by improving data completeness and reproducibility.&lt;/li>
&lt;/ul></description></item><item><title>ScaleRep: Reproducing and benchmarking scalability bugs hiding in cloud systems</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/osu/scalerep/</link><pubDate>Sat, 10 Feb 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/osu/scalerep/</guid><description>&lt;p>&lt;strong>Topics:&lt;/strong> Distributed systems, Scalability, Bug analysis, Bug reproducibility&lt;br>
&lt;strong>Skills:&lt;/strong> Java, Python, bash scripting, perf, Linux internals&lt;br>
&lt;strong>Difficulty:&lt;/strong> Hard&lt;br>
&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;br>
&lt;strong>Mentors:&lt;/strong> &lt;strong>&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/bogdan-bo-stoica/">Bogdan &amp;quot;Bo&amp;quot; Stoica&lt;/a> (contact person)&lt;/strong>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/yang-wang/">Yang Wang&lt;/a>&lt;/p>
&lt;p>&lt;strong>Project Idea Description&lt;/strong>&lt;/p>
&lt;p>Large-scale distributed systems are integral to the infrastructure of a wide range of applications and services.
The continuous evolution of these systems requires ongoing efforts to address inherent faults which span a variety of issues including availability, consistency, concurrency, configuration, durability, error-handling, integrity, performance, and security.
Recent developments in the field and the rise of cloud computing have been marked by a notable increase in the scale at which such systems operate.&lt;/p>
&lt;p>This increase in scale introduces specific challenges, particularly in terms of system reliability and performance.
As distributed systems expand beyond single machines, addressing the growing demands for computation, memory and storage becomes more difficult.
This underlying complexity leads to the emergence of scalability bugs — defects that surface in large-scale deployments, yet do not reveal themselves in a small-scale setting.&lt;/p>
&lt;p>To better understand scalability bugs, we set out to investigate a set of scalability issues documented over the last 5 years from 10 popular open-source large-scale systems.
These bugs have led to significant operational challenges, such as system downtime, reduced responsiveness, data loss, and data corruption.
Moreover, addressing them required extensive collaboration and problem-solving efforts among engineers and bug reporters, with discussions often spanning a month or more.&lt;/p>
&lt;p>We observed that traditional bug finding techniques are insufficient for detecting scalability bugs since these defects are triggered by a mixture of scale-related aspects not properly investigated by previous approaches.
These characteristics include the number of components involved, the system load and workload size, the reliability of recovery protocols, and the magnitude of intermediate failures.
Although previous research examined some of these aspects, it has typically done so either in isolation (individually), or without providing a comprehensive understanding of the fundamental bug patterns, symptoms, root causes, fixes, and, more importantly, how easily these bugs can be reproduced in-house.&lt;/p>
&lt;p>Therefore, the main goal of this project is to systematically understand, characterize, and document the challenges associated with scalability bugs, at-large.
Our approach is twofold: first, to analyze scalability bugs in terms of reproducibility, and second, to develop methodologies for triggering them and measuring their impact.
Specifically, we aim to:&lt;/p>
&lt;ol>
&lt;li>Provide detailed accounts of bug reproduction experiences for a diverse set of recently reported scalability bugs from our benchmark applications;&lt;/li>
&lt;li>Identify specific challenges that prevent engineers from reproducing certain scalability bugs and investigate how prevalent these obstacles are;&lt;/li>
&lt;li>Create a suite of protocols to effectively trigger and quantify the impact of scalability bugs, facilitating their investigation in smaller-scale environments.&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Project Deliverable&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>A set of Trovi replayable artifacts enabling other researchers to easily reproduce scalability bugs for our benchmark applications;&lt;/li>
&lt;li>A set of Jupyter notebook scripts allowing to conveniently replay each step in our investigation;&lt;/li>
&lt;li>A detailed breakdown of the challenges faced when reproducing scalability bugs and how these obstacles differ from those related to more “traditional” types of bugs.&lt;/li>
&lt;/ul></description></item><item><title>GPEC: An Open Emulation Platform to Evaluate GPU/ML Workloads on Erasure Coding Storage</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/lanl/gpec/</link><pubDate>Thu, 08 Feb 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/lanl/gpec/</guid><description>&lt;p>&lt;strong>Project Idea Description&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Storage Systems, Machine Learning, Erasure Coding&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> C/C++, Python, PyTorch, Bash scripting, Linux, Erasure Coding, Machine Learning&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/meng-wang/">Meng Wang&lt;/a> (primary contact), &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/john-bent/">John Bent&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Large-scale data centers store immense amounts of user data across a multitude of disks, necessitating redundancy strategies like erasure coding (EC) to safeguard against disk failures. Numerous research efforts have sought to assess the performance and durability of various erasure coding approaches, including single-level erasure coding, locally recoverable coding, and multi-level erasure coding.&lt;/p>
&lt;p>Despite its widespread adoption, a significant research gap exists regarding the performance of large-scale erasure-coded storage systems when exposed to machine learning (ML) workloads. While conventional practice often leans towards replication for enhanced performance, this project seeks to explore whether cost-effective erasure encoding can deliver comparable performance. In this context, several fundamental questions remain unanswered, including:
Can a typical erasure-coded storage system deliver sufficient throughput for ML training tasks?
Can an erasure-coded storage system maintain low-latency performance for ML training and inference workloads?
How does disk failure and subsequent repair impact the throughput and latency of ML workloads?
What influence do various erasure coding design choices, such as chunk placement strategies and repair methods, have on the aforementioned performance metrics?&lt;/p>
&lt;p>To address these questions, the most straightforward approach would involve running ML workloads on large-scale erasure coded storage systems within HPC data centers. However, this presents challenges for researchers and students due to limited access to expensive GPUs and distributed storage systems, especially when dealing with large-scale evaluations. Consequently, there is a need for a cost-effective evaluation platform.&lt;/p>
&lt;p>The objective of this project is to develop an open-source platform that facilitates cheap and reproducible evaluations of erasure-coded storage systems concerning ML workloads. This platform consists of two key components:
GPU Emulator: This emulator is designed to simulate GPU performance for ML workloads. Development of the GPU emulator is near completion.
EC Emulator: This emulator is designed to simulate the performance characteristics of erasure-coded storage systems. It is still in the exploratory phase and requires further development.&lt;/p>
&lt;p>The student&amp;rsquo;s responsibilities will include documenting the GPU emulator, progressing the development of the EC emulator, and packaging the experiments to ensure easy reproducibility. It is anticipated that this platform will empower researchers and students to conduct cost-effective and reproducible evaluations of large-scale erasure-coded storage systems in the context of ML workloads.&lt;/p>
&lt;p>&lt;strong>Project Deliverable&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Build an EC emulator to emulate the performance characteristics of large-scale erasure-coded storage systems&lt;/li>
&lt;li>Incorporate the EC emulator into ML workloads and GPU emulator&lt;/li>
&lt;li>Conduct reproducible experiments to evaluate the performance of erasure-coded storage systems in the context of ML workloads&lt;/li>
&lt;li>Publish a Trovi artifact shared on Chameleon Cloud and a GitHub repository with open-source code&lt;/li>
&lt;/ul></description></item><item><title>LAST: Let’s Adapt to System Drift</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/anl/last/</link><pubDate>Wed, 07 Feb 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/anl/last/</guid><description>&lt;p>&lt;strong>Project Idea Description&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Computer systems, machine learning&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, PyTorch, Bash scripting, Linux, Data Science and Machine Learning&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ray-andrew-sinurat/">Ray Andrew Sinurat&lt;/a> (primary contact), &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sandeep-madireddy/">Sandeep Madireddy&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The performance of computer systems is constantly evolving, a natural outcome of updating hardware, improving software, and encountering hardware quirks over time. At the same time, machine learning (ML) models are becoming increasingly popular. They are being used widely to address various challenges in computer systems, notably in speeding up decision-making. This speed is vital for a quick and flexible response, essential for meeting service-level agreements (SLAs). Yet, an interesting twist has emerged: like the computer systems they aid, ML models also experience a kind of &amp;ldquo;aging.&amp;rdquo; This results in a gradual decline in their effectiveness, a consequence of changes in their operating environment.&lt;/p>
&lt;p>The phenomenon of model &amp;ldquo;aging&amp;rdquo; is a ubiquitous occurrence across various domains, not limited merely to computer systems. This process of aging can significantly impact the performance of a model, emphasizing the critical importance of early detection mechanisms to maintain optimal functionality. In light of this, numerous strategies have been formulated to mitigate the aging of models. However, the generalizability and effectiveness of these strategies across diverse domains, particularly in computer systems, remain largely unexplored. This research aims to bridge this gap by designing and implementing a comprehensive data analysis pipeline. The primary objective is to evaluate the efficacy of various strategies through a comparative analysis, focusing on their performance in detecting and addressing model aging. To achieve a better understanding of this issue, the research will address the following pivotal questions:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Data-Induced Model Aging&lt;/strong>: What specific variations within the data can precipitate the aging of a model? Understanding the nature and characteristics of data changes that lead to model deterioration is crucial for developing effective prevention and mitigation strategies.&lt;/li>
&lt;li>&lt;strong>Efficacy of Aging Detection Algorithms&lt;/strong>: How proficient are the current algorithms in identifying the signs of model aging? Assessing the accuracy and reliability of these algorithms will provide insights into their practical utility in real-world scenarios.&lt;/li>
&lt;li>&lt;strong>Failure Points in Detection&lt;/strong>: In what scenarios or under what data conditions do the aging detection mechanisms fail? Identifying the limitations and vulnerabilities of these algorithms is vital for refining their robustness and ensuring comprehensive coverage.&lt;/li>
&lt;li>&lt;strong>Scalability and Responsiveness&lt;/strong>: How do these algorithms perform in terms of robustness and speed, particularly when subjected to larger datasets? Evaluating the scalability and responsiveness of the algorithms will determine their feasibility and effectiveness in handling extensive and complex datasets, a common characteristic in computer systems.&lt;/li>
&lt;/ul>
&lt;p>To better understand and prevent issues related to model performance, our approach involves analyzing various datasets, both system and non-system, that have shown notable changes over time. We aim to apply machine learning (ML) models to these datasets to assess the effects of these changes on model performance. Our goal is to leverage more advanced ML techniques to create new algorithms that address these challenges effectively. This effort is expected to contribute significantly to the community, enhancing the detection of model aging and improving model performance in computer systems.&lt;/p>
&lt;p>&lt;strong>Project Deliverable&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Run pipeline on several computer systems and non-computer systems dataset&lt;/li>
&lt;li>A Trovi artifact for data preprocessing and model training shared on Chameleon Cloud&lt;/li>
&lt;li>A GitHub repository containing the pipeline source code&lt;/li>
&lt;/ul></description></item><item><title>Reproducibility in Data Visualization</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/niu/repro-vis/</link><pubDate>Tue, 06 Feb 2024 15:00:00 -0500</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/niu/repro-vis/</guid><description>&lt;p>At the heart of evaluating reproducibility is a judgment about whether
two results are indeed
the same. This can be complicated in the context of data visualization due to
rapidly evolving technology and differences in how users perceive the results.
First, due to the rapid evolution of libraries including web technologies,
visualizations created in the past may look different when rendered in the future.
Second, as the goal of data visualization is communicating data to people,
different people may perceive visualizations in a different way.
Thus, when a reproduced visualization does not exactly match the original, judging
whether they are &amp;ldquo;similar enough&amp;rdquo; is complicated by these factors. For example,
changes in a colormap may be deemed minor by a computer but could lead people to different
understandings of the data. The goals of this research are to capture visualizations in a way that
allows their reproducibility to be evaluated and to develop methods to categorize the differences
when a reproduced visualization differs from the original.&lt;/p>
&lt;h3 id="investigate-solutions-for-capturing-visualizations">Investigate Solutions for Capturing Visualizations&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Reproducibility, Data Visualization&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python and/or JavaScript, Data Visualization Tools&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/david-koop/">David Koop&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The goal of this project is to investigate, augment, and/or develop solutions to capture
visualizations that appear in formats including websites and Jupyter notebooks.
In &lt;a href="https://github.com/simprov/simprov" target="_blank" rel="noopener">past work&lt;/a>, we implemented methods
to capture thumbnails as users interacted with visualizations. Other solutions
can be used to capture interactive visualizations. We wish to understand
the feasibility of recording such visualizations and their utility in
evaluating reproducibility in the future.&lt;/p>
&lt;h5 id="specific-tasks">Specific tasks:&lt;/h5>
&lt;ul>
&lt;li>Evaluate tools for capturing static visualizations on the web&lt;/li>
&lt;li>Investigate tools for capturing dynamic visualizations on the web&lt;/li>
&lt;li>Investigate how data including code or metadata can be captured with visualizations&lt;/li>
&lt;li>Augment or develop tools to aid in capturing reproducible visualizations&lt;/li>
&lt;/ul>
&lt;h3 id="categorize-differences-in-reproduced-visualizations">Categorize Differences in Reproduced Visualizations&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Reproducibility, Data Visualization&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python and/or JavaScript, Data Visualization Tools&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate/Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/david-koop/">David Koop&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The goal of this project is to organize types of differences in reproduced visualizations and create tools to detect them. Publications and computational notebooks record renderings of visualizations.
When they also include the code to reproduce the visualization, we can
regenerate them in order to compare them. Often, the reproduced visualization does
not match the original (see examples in this &lt;a href="https://arxiv.org/abs/2308.06894" target="_blank" rel="noopener">manuscript&lt;/a>).
This project seeks to categorize the types of differences
that can occur in order and start understanding how they impact judgments of reproducibility.&lt;/p>
&lt;h5 id="specific-tasks-1">Specific tasks:&lt;/h5>
&lt;ul>
&lt;li>Evaluate and/or develop tools to compare two visualizations&lt;/li>
&lt;li>Evaluate the utility of artificial intelligence solutions&lt;/li>
&lt;li>Organize and categorize the detected differences&lt;/li>
&lt;li>Develop tools to determine the types or categories of differences present in two visualizations&lt;/li>
&lt;/ul></description></item><item><title>FSA: Benchmarking Fail-Slow Algorithms</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uchicago/failslowalgorithms/</link><pubDate>Tue, 06 Feb 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uchicago/failslowalgorithms/</guid><description>&lt;p>&lt;strong>Project Idea Description&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Storage systems, machine learning&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, PyTorch, Bash scripting, Linux, Machine Learning modeling&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ruidan-li/">Ruidan Li&lt;/a> (primary contact), &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/kexin-pei/">Kexin Pei&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>In the realm of modern applications, achieving not only low but also predictable response times is a critical requirement. Performance instability, even when it amounts to just a few milliseconds of delay, can result in violations of Service Level Objectives (SLOs). Redundancy at the RAID group level provides a layer of protection; however, the early identification of potential slowdowns or failures is paramount in minimizing their impact on overall system latency.&lt;/p>
&lt;p>Fail-Slow represents a unique type of fault within storage systems, characterized by the system&amp;rsquo;s ability to continue functioning while progressively deteriorating – its performance significantly drops below expected levels. Notably, fail-slow conditions are responsible for a considerable share of latency tails. Detecting fail-slow faults is particularly challenging, as they can be easily masked by the normal fluctuations in performance. Consequently, the identification of fail-slow faults is a critical area of research, demanding meticulous attention.&lt;/p>
&lt;p>Several strategies have been developed to address the fail-slow issue, yet the question of their broad applicability remains. We plan to implement and assess various existing fail-slow detection algorithms, examining their strengths and weaknesses. Our analysis will concentrate on key questions:&lt;/p>
&lt;p>How promptly can the algorithm identify a fail-slow symptom?
What methods does the algorithm employ to accurately distinguish fail-slow incidents, thereby minimizing false negatives?
Through what approach does the algorithm achieve the right sensitivity level to keep false positives in check?&lt;/p>
&lt;p>This evaluation aims to shed light on the effectiveness of current methodologies in detecting fail-slow faults, crucial for enhancing system reliability and performance.&lt;/p>
&lt;p>Building upon our evaluation of several fail-slow detection algorithms, our objective is to harness advanced machine learning (ML) models to develop a novel algorithm. This initiative seeks to address and potentially compensate for the identified weaknesses in existing methodologies. By focusing on the critical aspects of early detection, accurate differentiation, and optimal sensitivity, we aim to create a solution that reduces both false negatives and false positives, thereby enhancing overall system reliability. This approach represents a strategic effort to not only advance the current state of fail-slow detection but also to contribute significantly to the resilience and performance of storage systems.&lt;/p>
&lt;p>&lt;strong>Project Deliverable&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>A Trovi artifact for the existing Fail-Slow detection algorithms on Chameleon Cloud&lt;/li>
&lt;li>A GitHub repository containing the full evaluation result&lt;/li>
&lt;li>A Google Colab notebook for quick replay&lt;/li>
&lt;/ul></description></item><item><title>OpenMLEC: Open-source MLEC implementation with HDFS on top of ZFS</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/ornl/openmlec/</link><pubDate>Mon, 05 Feb 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/ornl/openmlec/</guid><description>&lt;p>&lt;strong>Project Idea Description&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Storage Systems, Erasure Coding&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> C/C++, Java, Bash scripting, Linux, HDFS, ZFS, Erasure Coding&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/meng-wang/">Meng Wang&lt;/a> (&lt;a href="mailto:wangm12@uchicago.edu">Main contact person&lt;/a>) and Anjus George&lt;/li>
&lt;/ul>
&lt;p>Multi-Level Erasure Coding (MLEC), which performs erasure coding at both network and local levels, has seen large deployments in practice. Our recent research work has shown that MLEC can provide high durability with higher encoding throughput and less repair network traffic compared to other erasure coding methods. This makes MLEC particularly appealing for large-scale data centers, especially high-performance computing (HPC) systems.&lt;/p>
&lt;p>However, current MLEC systems often rely on straightforward design choices, such as Clustered/Clustered (C/C) chunk placement and the Repair-All (RALL) method for catastrophic local failures. Our recent simulations [1] have revealed the potential benefits of more complex chunk placement strategies like Clustered/Declustered (C/D), Declustered/Clustered (D/C), and Declustered/Declustered (D/D). Additionally, advanced repair methods such as Repair Failed Chunks Only (RFCO), Repair Hybrid (RHYB), and Repair Minimum (RMIN) have shown promise for improving durability and performance according to our simulations. Despite promising simulation results, these optimized design choices have not been implemented in real systems.&lt;/p>
&lt;p>In this project, we propose to develop open-source MLEC implementations in real systems, offering a range of design choices from simple to complex. Our approach leverages ZFS for local-level erasure coding and HDFS for network-level erasure coding, supporting both clustered and declustered chunk placement at each level. The student&amp;rsquo;s responsibilities include setting up HDFS on top of ZFS, configuring various MLEC chunk placements (e.g., C/D, D/C, D/D), and implementing advanced repair methods within HDFS and ZFS. The project will culminate in reproducible experiments to evaluate the performance of MLEC systems under different design choices.&lt;/p>
&lt;p>We will open-source our code and aim to provide valuable insights to the community on optimizing erasure-coded systems. Additionally, we will provide comprehensive documentation of our work and share Trovi artifacts on Chameleon Cloud to facilitate easy reproducibility of our experiments.&lt;/p>
&lt;p>[1] Meng Wang, Jiajun Mao, Rajdeep Rana, John Bent, Serkay Olmez, Anjus George, Garrett Wilson Ransom, Jun Li, and Haryadi S. Gunawi. Design Considerations and Analysis of Multi-Level Erasure Coding in Large- Scale Data Centers. In The International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’23), 2023.&lt;/p>
&lt;p>&lt;strong>Project Deliverable&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Open-source MLEC implementations with a diverse range of design choices.&lt;/li>
&lt;li>Configuration setup for HDFS on top of ZFS, supporting various MLEC chunk placements.&lt;/li>
&lt;li>Implementation of advanced repair methods within HDFS and ZFS.&lt;/li>
&lt;li>Reproducible experiments to assess the performance of MLEC systems across distinct design choices.&lt;/li>
&lt;li>Comprehensive documentation of the project and the provision of shared Trovi artifacts on Chameleon Cloud for ease of reproducibility.&lt;/li>
&lt;/ul></description></item><item><title>EdgeRep: Reproducing and benchmarking edge analytic systems</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uchicago/edgerep/</link><pubDate>Fri, 02 Feb 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uchicago/edgerep/</guid><description>&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> video analytics, machine learning&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, PyTorch, Bash scripting, Linux, Machine Learning modeling&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/yuyang-roy-huang/">Yuyang (Roy) Huang&lt;/a> (contact person), &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/junchen-jiang/">Junchen Jiang&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Project Idea Description&lt;/strong>&lt;/p>
&lt;p>With the flourishing of ideas like smart cities and smart manufacturing, a
massive number of edge devices (e.g., traffic or security cameras,
thermometers, flood sensors, etc.) are deployed and connected to the network.
These devices collect and analyze data across space and time, aiding
stakeholders like city governments and manufacturers in optimizing their plans
and operations. However, the sheer number of edge devices and the large amount
of communication among the devices and central servers raises significant
challenges in how to manage and schedule resources. This includes network
bandwidth between the devices and computing power on both edge devices and bare
metal servers, all to maintain the reliable service capability of running
applications.&lt;/p>
&lt;p>Moreover, given the limited resources available to edge devices, there&amp;rsquo;s an
emerging trend to reduce average compute and/or bandwidth usage. This is
achieved by leveraging the uneven distribution of interesting events with
respect to both time and space in the input data. This, in turn, introduces
further challenges in provisioning and managing the amount of resources
available to edge devices. The resource demands of running applications can
greatly depend on the input data, which is both dynamic and unpredictable.&lt;/p>
&lt;p>Keeping these challenges in mind, the team previously designed and implemented
a dynamic resource manager capable of understanding the applications and making
decisions based on this understanding at runtime. However, such a resource
manager has only been tested with a limited number and types of video analytic
applications. Thus, through the OSRE24 project, we aim to:&lt;/p>
&lt;ul>
&lt;li>Collect a wide range of videos to form a comprehensive video dataset&lt;/li>
&lt;li>Reproduce other state-of-art self-adaptive video analytic applications&lt;/li>
&lt;li>Package the dataset as well as the application to publish them on Chameleon
Trovi site&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Project Deliverable&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Collect a wide range of videos to form a comprehensive video dataset&lt;/li>
&lt;li>Reproduce other state-of-art self-adaptive video analytic applications&lt;/li>
&lt;li>Package the dataset as well as the application to publish them on Chameleon
Trovi site&lt;/li>
&lt;/ul></description></item><item><title>FEP-Bench: Benchmarks for understanding featuring engineering and preprocessing bottlenecks</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/ibm/fep-bench/</link><pubDate>Fri, 02 Feb 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/ibm/fep-bench/</guid><description>&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> storage system, scheduling, distributed system, machine learning&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, PyTorch, Bash scripting, Linux, Machine Learning modeling&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/yuyang-roy-huang/">Yuyang (Roy) Huang&lt;/a> (contact person), &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/swami-sundararaman/">Swami Sundararaman&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Project Idea Description&lt;/strong>&lt;/p>
&lt;p>In the realm of machine learning (ML), preprocessing of data is a critical yet
often underappreciated phase, consuming approximately 80% of the time in common
ML tasks. This extensive time consumption can be attributed to various
challenges encountered from both data and computation perspectives.&lt;/p>
&lt;p>From the data side, one significant challenge is the slow retrieval of data
from data lakes, which are storage repositories that hold a vast amount of raw
data in its native format. However, the process of extracting this data can be
slow, causing computation cycles to wait for data arrival and leading to delays
in the entire preprocessing phase. Furthermore, the size of the data often
exceeds the memory capacity of standard computing systems. This is a frequent
occurrence in ML, as datasets are typically large and complex. Handling such
large datasets requires sophisticated memory management techniques to ensure
efficient preprocessing without overwhelming the system&amp;rsquo;s memory.&lt;/p>
&lt;p>On the computation side, a naive solution to data operations, especially
aggregation, often leads to inefficiencies. These operations may require
grouping a large chunk of data as a prerequisite before performing any actual
computation. This grouping, without careful configuration and management, can
trigger serious data shuffling, leading to extensive remote data movement when
the data is distributed across various storage systems. Such data movement is
not only time-consuming but also resource-intensive.&lt;/p>
&lt;p>To mitigate these challenges, there is a pressing need to design better
caching, prefetching, and heuristic strategies for data preprocessing. The team
aims to significantly reduce the time and resources required for preprocessing
by optimizing data retrieval and computational processes.&lt;/p>
&lt;p>However, prior to the design and implementation of such a system, a systematic
understanding of the preprocessing workflow is essential. Hence, throughout the
program, the students will need to:&lt;/p>
&lt;ul>
&lt;li>Understand the current system used to preprocess data for ML training, for
example, Hadoop or Spark.&lt;/li>
&lt;li>Collect the common datasets used for different types of ML models.&lt;/li>
&lt;li>Collect the typical operations used for preprocessing these datasets.&lt;/li>
&lt;li>Benchmark the performance in these operations under the existing frameworks
under various experimental settings.&lt;/li>
&lt;li>Package the benchmark such that the team can later use it for reproduction or
evaluation.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Project Deliverable&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Understand the current system used to preprocess data for ML training, for
example, Hadoop or Spark.&lt;/li>
&lt;li>Collect the common datasets used for different types of ML models.&lt;/li>
&lt;li>Collect the typical operations used for preprocessing these datasets.&lt;/li>
&lt;li>Benchmark the performance in these operations under the existing frameworks
under various experimental settings.&lt;/li>
&lt;li>Package the benchmark such that the team can later use it for reproduction or evaluation.&lt;/li>
&lt;/ul></description></item><item><title>FetchPipe: Data Science Pipeline for ML-based Prefetching</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uchicago/fetchpipe/</link><pubDate>Fri, 02 Feb 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uchicago/fetchpipe/</guid><description>&lt;p>&lt;strong>Project Idea Description&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Storage systems, machine learning&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> C/C++, Python, PyTorch, Bash scripting, Linux, Machine Learning modeling&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/daniar-h.-kurniawan/">Daniar H. Kurniawan&lt;/a> (primary contact), Haryadi Gunawi&lt;/li>
&lt;/ul>
&lt;p>The contemporary landscape of high-performance servers, particularly those designed for data centers and AI/ML training, prominently features solid-state drives (SSDs) and spinning disks (HDDs) as primary storage devices. These components play a crucial role in shaping overall system performance, underscoring the importance of addressing and minimizing Input/Output (I/O) latency. This is particularly crucial given the widespread adoption of hybrid storage systems, where caching and prefetching strategies are instrumental in optimizing storage performance. Caching involves using faster but less dense memory to store frequently accessed data, while prefetching aims to reduce latency by fetching data from slower memory to cache before it is needed. Although both caching and prefetching present valid challenges, our primary emphasis is on the prefetching problem due to the inherent difficulty in predicting future access.&lt;/p>
&lt;p>Traditional prefetchers, dating back 1-2 decades, heavily rely on predefined rules for prefetching based on LBA access sequences, limiting their adaptability to complex scenarios. For instance, the read-ahead prefetcher is confined to prefetching the next data item within a file for faster sequential access. Addressing this limitation, recent advancements include learning-based methods, such as Long Short-Term Memory (LSTM) techniques like DeepPrefetcher and Delta LSTM, which model the LBA delta to cover a broader range of LBAs. However, they are still struggling to achieve high accuracy when the workload pattern changes drastically. Although there are some sophisticated prefetchers capable of learning complex I/O access patterns using Graph structure, they face challenges in their deployment due to the computational cost.&lt;/p>
&lt;p>In this project, our goal is to provide an end-to-end data science pipeline to empower the research on ML-based prefetchers. We believe that this pipeline is crucial for fostering active collaboration between the ML community and storage systems researchers. This collaboration aims to optimize existing ML-based prefetching solutions. Specifically, we will provide the dataset for training/testing and some samples of ML-based models that can further be developed by the community. Furthermore, we will also provide a setup for evaluating the ML model when deployed in storage systems.&lt;/p>
&lt;p>&lt;strong>Project Deliverable&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Compile I/O traces from various open traces and open systems.&lt;/li>
&lt;li>Develop a pipeline for building ML-based prefetching solutions.&lt;/li>
&lt;li>Build a setup to evaluate the model in a real hybrid storage system.&lt;/li>
&lt;li>Publish a Trovi artifact shared on Chameleon Cloud and a GitHub repository&lt;/li>
&lt;/ul></description></item><item><title>Reproducible Performance Benchmarking for Genomics Workflows on HPC Cluster</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uga/genomicswf/</link><pubDate>Fri, 02 Feb 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uga/genomicswf/</guid><description>&lt;p>&lt;strong>Project Idea description&lt;/strong>&lt;/p>
&lt;p>We aim to characterize the performance of genomic workflows on HPC clusters by conducting two research activities using a broad set of state-of-the-art genomic applications and open-source datasets.&lt;/p>
&lt;p>&lt;strong>Performance Benchmarking and Characterizing Genomic Workflows:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: High Performance Computing (HPC), Data Analysis, Scientific Workflows&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Linux, Python, Bash Scripting, Data Science Toolkit, Kubernetes, Container Orchestration, Genomics Applications (e.g. BWA, FastQC, Picard, GATK, STAR)&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor(s):&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/in-kee-kim/">In Kee Kim&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>In this activity, students will perform comprehensive performance measurements of genomic data processing on HPC clusters using state-of-the-art applications, workflows, and real-world datasets. They will collect and package datasets for I/O, memory, and compute utilization using industry-standard tools and best practices. Measurement will be done using Kubernetes container orchestration on a multi-node cluster to achieve scalability, with either custom-made metrics collection system or integration of existing industry standard tools. (e.g. Prometheus).&lt;/p>
&lt;p>&lt;strong>Quantifying Performance Interference and Assessing Their Impact on Workflow Execution Time:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: Machine Learning, Data Analysis, and Scientific Workflows and Computations&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Linux, Python, Bash Scripting, Data Science Toolkit, Kubernetes, Container Orchestration&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Difficult&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium (175 hours)&lt;/li>
&lt;li>&lt;strong>Mentor(s):&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/in-kee-kim/">In Kee Kim&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>In this activity, students will measure the slowdown of various applications due to resource contention (e.g. CPU and I/O). Students will analyze whether an application is compute-bound, I/O bound, or both, then analyze the correlation between resource utilization and execution time. Following that, students will assess the impact of per-application slowdown to the slowdown of a whole workflow. To the best of our knowledge, this will be the first study which systematically quantifies per-application interference when running genomics workflow on an HPC cluster.&lt;/p>
&lt;p>For both subprojects, all experiments will also be conducted in a reproducible manner (e.g., as a Trovi package or Chameleon VM images), and all code will be open-sourced (e.g., shared on a public Github repo).&lt;/p>
&lt;p>&lt;strong>Project Deliverable&lt;/strong>:&lt;/p>
&lt;p>A Github repository and/or Chameleon VM image containing source code for application executions &amp;amp; metrics collection.
Jupyter notebooks and/or Trovi artifacts containing analysis and mathematical models for application resource utilization &amp;amp; the effects of data quality.&lt;/p></description></item><item><title>StatWrap</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/northwestern/statwrap/</link><pubDate>Wed, 24 Jan 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/northwestern/statwrap/</guid><description>&lt;p>&lt;a href="https://sites.northwestern.edu/statwrap/" target="_blank" rel="noopener">StatWrap&lt;/a> is a free and open-source assistive, non-invasive discovery and inventory tool to document research projects. It inventories project assets (e.g., code files, data files, manuscripts, documentation) and organizes information without additional input from the user. It also provides structure for users to add searchable and filterable notes connected to files to help communicate metadata about intent and analysis steps.&lt;/p>
&lt;p>At its core, StatWrap helps investigators identify and track changes in a research project as it evolves - which may affect reproducibility. For example: (1) people on the project can change over time, so processes may not be consistently executed due to transitions in employment; (2) data changes over time, due to accruing additional cases, adding new variables, or correcting mistakes in existing data; (3) software (e.g. used for data preparation and statistical analysis) evolves as it is edited, improved, and optimized; and (4) software can break or produce different results due to changes &amp;lsquo;under the hood&amp;rsquo; such as updates to statistical packages, compilers, or interpreters. StatWrap passively and actively documents these changes to support reproducibility.&lt;/p>
&lt;p>Additional information:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://sites.northwestern.edu/statwrap/" target="_blank" rel="noopener">StatWrap home&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/stattag/statwrap" target="_blank" rel="noopener">StatWrap code (GitHub)&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="reproducibility-checklists">Reproducibility Checklists&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>reproducibility&lt;/code>, &lt;code>user interface&lt;/code>, &lt;code>checklists&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: JavaScript, React&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/luke-rasmussen/">Luke Rasmussen&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>This goal of this project is to develop support within StatWrap to generate customizable reproducibility checklists. The developer will use the metadata and user input collected by StatWrap to automatically generate checklists. This functionality will allow investigators to automatically generate a document indicating what practices they&amp;rsquo;ve followed to support reproducibility. Part of the project will involve surveying proposed reproducibility checklists and considering what to implement in StatWrap. This work will take a systematic approach to documenting reproducibility, much like PRISMA checklists for systematic reviews or CONSORT checklists for clinical trials.&lt;/p>
&lt;p>The specific tasks of the project include:&lt;/p>
&lt;ul>
&lt;li>Identify candidate reproducibility checklists to use as guides&lt;/li>
&lt;li>Create the data structure for configuring reproducibility checklists&lt;/li>
&lt;li>Display the reproducibility checklist in the user interface&lt;/li>
&lt;li>Store responses and comments to the checklist as provided by the user&lt;/li>
&lt;li>Generate a reproducibility checklist report from StatWrap&lt;/li>
&lt;/ul></description></item><item><title>StatTag: Connecting statistical software to Microsoft Word</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/northwestern/stattag/</link><pubDate>Mon, 22 Jan 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/northwestern/stattag/</guid><description>&lt;p>StatTag is a free, &lt;a href="https://github.com/stattag" target="_blank" rel="noopener">open-source&lt;/a> software plug-in for conducting reproducible research. It facilitates the creation of dynamic documents using Microsoft Word documents and statistical software, such as Stata, SAS, R, and Python. Users can use StatTag to embed statistical output (estimates, tables and figures) into a Word document and then with one click individually or collectively update output with a call to the statistical program.&lt;/p>
&lt;p>What makes StatTag different from other tools for creating dynamic documents is that it allows for statistical code to be edited directly from Microsoft Word. Using StatTag means that modifications to a dataset or analysis no longer require transcribing or re-copying results into a manuscript or table.&lt;/p>
&lt;p>StatTag works by interpreting specially formatted comments (&amp;ldquo;tags&amp;rdquo;) within a code file. StatTag then reads the code file, executes the code through the corresponding language interpreter, formats the results, and inserts them into the Word document as a field.&lt;/p>
&lt;p>There are versions of StatTag for both Microsoft Windows and macOS. Proposed projects here are specific to the Microsoft Windows version, which is developed in the C# programming language.&lt;/p>
&lt;p>&lt;strong>Additional Information:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://sites.northwestern.edu/stattag/" target="_blank" rel="noopener">StatTag homepage&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/stattag" target="_blank" rel="noopener">StatTag on GitHub&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://pubmed.ncbi.nlm.nih.gov/33215069/" target="_blank" rel="noopener">Welty et al., &amp;ldquo;Facilitating reproducible research through direct connection of data analysis with manuscript preparation: StatTag for connecting statistical software to Microsoft Word&amp;rdquo;&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="support-additional-programming-languages">Support Additional Programming Languages&lt;/h3>
&lt;ul>
&lt;li>Topics: &lt;code>reproducibility&lt;/code>, &lt;code>statistics&lt;/code>&lt;/li>
&lt;li>Skills: C# and one of: MATLAB, Octave, SQL, Julia&lt;/li>
&lt;li>Difficulty: Medium&lt;/li>
&lt;li>Size: Medium or large (175 or 350 hours)&lt;/li>
&lt;li>Mentor: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/luke-rasmussen/">Luke Rasmussen&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Following the same structure used for other language support in StatTag, develop support for a new programming language (suggested languages are provided, but applicants can propose others). This will include:&lt;/p>
&lt;ul>
&lt;li>Creating a Parser class to support StatTag-specific interpretation of results (e.g., identifying a line of code that is writing to a CSV file, then loading that CSV file)&lt;/li>
&lt;li>Creating an Automation class that manages communication with the supported programming language&amp;rsquo;s interpreter. Python support uses a Jupyter kernel, and both SAS and Stata support invoke DLLs directly.&lt;/li>
&lt;li>Integrating the language into the UI (e.g., allowing it to be a valid code file, adding the icon for the code file to the UI)&lt;/li>
&lt;li>Additional setup/configuration as needed (e.g., SQL support would require secure configuration for connecting to the databse server).&lt;/li>
&lt;/ul>
&lt;p>Develop unit tests to demonstrate code is functioning. Create test scripts in the implemented language to exercise and demonstrate end-to-end execution.&lt;/p>
&lt;h3 id="process-tags-in-jupyter-notebooks">Process Tags in Jupyter Notebooks&lt;/h3>
&lt;ul>
&lt;li>Topics: &lt;code>reproducibility&lt;/code>, &lt;code>jupyter&lt;/code>&lt;/li>
&lt;li>Skills: C#, Jupyter Notebooks, Python&lt;/li>
&lt;li>Difficulty: Medium&lt;/li>
&lt;li>Size: Medium (175 hours)&lt;/li>
&lt;li>Mentor: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/luke-rasmussen/">Luke Rasmussen&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>StatTag uses&lt;/p>
&lt;p>StatTag currently has support for Python, and utilizes the Jupyter kernel to interact with Python. However, we currently do not fully support processing StatTag &amp;rsquo;tags&amp;rsquo; in a Jupyter notebook.&lt;/p>
&lt;p>Following the same structure used for RMarkdown integration in StatTag, develop support for Jupyter Notebooks in StatTag. StatTag should be able to:&lt;/p>
&lt;ul>
&lt;li>Take as input one or more Jupyter Notebooks&lt;/li>
&lt;li>Confirm that the Jupyter Notebook uses Python&lt;/li>
&lt;li>Identify StatTag formatted tags within the notebook&lt;/li>
&lt;li>Pass relevant code to the Python processor already implemented in StatTag&lt;/li>
&lt;/ul>
&lt;p>In addition, develop unit tests to demonstrate code is functioning as intended. Create test Jupyter Notebooks to exercise and demonstrate end-to-end execution.&lt;/p></description></item><item><title>SLICES/pos: Reproducible Experiment Workflows</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/tum/slices/</link><pubDate>Sat, 06 Jan 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/tum/slices/</guid><description>&lt;p>&lt;a href="https://www.slices-ri.eu/" target="_blank" rel="noopener">SLICES-RI&lt;/a> is a european research initiative aiming to create a digital research infrastructure providing an experimental platform for the upcoming decades.
One of the main goals of this initiative is the creation of fully reproducible experiments.
The SLICES research infrastructure will consist of different experiment sites focusing on different research domains such as AI experiments, Cloud and HPC-driven experiments, or investigations on wireless networks.&lt;/p>
&lt;p>To achieve reproducibility, the research group on network architectures and services of the Technical University of Munich develops the &lt;a href="https://dl.acm.org/doi/10.1145/3485983.3494841" target="_blank" rel="noopener">SLICES plain orchestrating service (SLICES/pos)&lt;/a>.
This framework supports a fully automated structured experiment workflow.
The structure of this workflow acts as a template for the design of experiments.
Users that adhere to this template will create inherently reproducible experiments, a feature we call reproducible-by-design.&lt;/p>
&lt;p>The SLICES/pos framework currently exists in two versions:
(1) A fully-managed pos deployment, that uses the SLICES/pos framework to manage the entire testbed and (2) a hosted SLICES/pos deployment.
The hosted SLICES/pos deployment is a temporary deployment that runs inside existing testbeds such as &lt;a href="https://chameleoncloud.org/" target="_blank" rel="noopener">Chameleon&lt;/a> or &lt;a href="https://cloudlab.us/" target="_blank" rel="noopener">CloudLab&lt;/a>.&lt;/p>
&lt;p>&lt;strong>Additional Information:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://dl.acm.org/doi/10.1145/3485983.3494841" target="_blank" rel="noopener">plain orchestrating service&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="support-additional-programming-languages">Support Additional Programming Languages&lt;/h3>
&lt;ul>
&lt;li>Topics: &lt;code>reproducibility&lt;/code>, &lt;code>statistics&lt;/code>&lt;/li>
&lt;li>Skills: Python&lt;/li>
&lt;li>Difficulty: Medium&lt;/li>
&lt;li>Size: Large (350 hours)&lt;/li>
&lt;li>Mentor: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sebastian-gallenmuller/">Sebastian Gallenmüller&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/georg-carle/">Georg Carle&lt;/a>, and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/kate-keahey/">Kate Keahey&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Design a set of basic examples that demonstrate the usage of pos that can be executed on the SLICES/pos testbed in Munich and the Chameleon testbed.
This set of basic examples acts as a demonstration of pos&amp;rsquo; capabilities and as a tutorial for new users.
Based on these introductory examples, a more complex experiment shall be designed and executed, demonstrating the portability of the experiments between testbeds.
This experiment involves the entire experiment workflow consisting of the setup and configuration of the testbed infrastructure, the collection of measurement results, and finally, their evaluation and publication.
Multiple results of this experiment shall be created on different testbeds and hardware configurations.
The results of the experiments will differ depending on the different hardware platforms on which the experiment was executed.
These results shall be evaluated and analyzed to find a common connection between the different result sets of the experiments.&lt;/p>
&lt;ul>
&lt;li>Create introductory examples demonstrating the usage of pos&lt;/li>
&lt;li>Design and create a portable complex network experiment based on SLICES/pos&lt;/li>
&lt;li>Execute the experiment on different testbeds (Chameleon, SLICES/pos testbed)&lt;/li>
&lt;li>Analysis of reproduced experiment&lt;/li>
&lt;li>Automated analysis of experimental results&lt;/li>
&lt;li>Deduction of a model describing the fundamental connections between different experiment executions&lt;/li>
&lt;/ul></description></item><item><title>Static Python Perf: Measuring the Cost of Sound Gradual Types</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uutah/static-python-perf/</link><pubDate>Sat, 06 Jan 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uutah/static-python-perf/</guid><description>&lt;p>Gradual typing is a solution to the longstanding tension between typed and
untyped languages: let programmers write code in any flexible language (such
as Python), equip the language with a suitable type system that can describe
invariants in part of a program, and use run-time checks to ensure soundness.&lt;/p>
&lt;p>For now, though, the cost of run-time checks can be enormous.
Order-of-magnitude slowdowns are common. This high cost is a main reason why
TypeScript is unsound by design &amp;mdash; its types are not trustworthy in order
to avoid run-time costs.&lt;/p>
&lt;p>Recently, a team at Meta built a gradually-typed variant of Python called
(&lt;em>drumroll&lt;/em>) Static Python. They report an incredible 4% increase in CPU
efficiency at Instagram thanks to the sound types in Static Python. This
kind of speedup is unprecedented.&lt;/p>
&lt;p>Other languages may want to follow the Static Python approach to gradual types,
but there are big reasons to doubt the Instagram numbers:&lt;/p>
&lt;ul>
&lt;li>the experiment code is closed source, and&lt;/li>
&lt;li>the experiment itself is not easily reproducible (even for Instagram!).&lt;/li>
&lt;/ul>
&lt;p>Static Python needs a rigorous, reproducible performance evaluation to test
whether it is indeed a fundamental advance for gradual typing.&lt;/p>
&lt;p>Related Work:&lt;/p>
&lt;ul>
&lt;li>Gradual Soundness: Lessons from Static Python
&lt;a href="https://programming-journal.org/2023/7/2/" target="_blank" rel="noopener">https://programming-journal.org/2023/7/2/&lt;/a>&lt;/li>
&lt;li>Producing Wrong Data Without Doing Anything Obviously Wrong!
&lt;a href="https://users.cs.northwestern.edu/~robby/courses/322-2013-spring/mytkowicz-wrong-data.pdf" target="_blank" rel="noopener">https://users.cs.northwestern.edu/~robby/courses/322-2013-spring/mytkowicz-wrong-data.pdf&lt;/a>&lt;/li>
&lt;li>On the Cost of Type-Tag Soundness
&lt;a href="https://users.cs.utah.edu/~blg/resources/pdf/gm-pepm-2018.pdf" target="_blank" rel="noopener">https://users.cs.utah.edu/~blg/resources/pdf/gm-pepm-2018.pdf&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="design-and-run-an-experiment">Design and Run an Experiment&lt;/h3>
&lt;ul>
&lt;li>Topics: &lt;code>performance&lt;/code>, &lt;code>cluster computing&lt;/code>, &lt;code>statistics&lt;/code>&lt;/li>
&lt;li>Skills: Python AST parsing, program generation, scripting, measuring performance&lt;/li>
&lt;li>Difficulty: Medium&lt;/li>
&lt;li>Size: Medium (175 hours)&lt;/li>
&lt;li>Mentor: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ben-greenman/">Ben Greenman&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Design an experiment that covers the space of gradually-typed Static Python programs
in a fair way. Since every variable in a program can have up to 3 different types,
there are easily 3^20 possibilities in small programs &amp;mdash; far too many to measure
exhaustively.&lt;/p>
&lt;p>Run the experiment on an existing set of benchmarks using a cluster such as CloudLab.
Manage the cluster machines across potentially dozens of reservations and combine
the results into one comprehensive view of Static Python performance.&lt;/p>
&lt;h3 id="derive-benchmarks-from-python-applications">Derive Benchmarks from Python Applications&lt;/h3>
&lt;ul>
&lt;li>Topics: &lt;code>types&lt;/code>, &lt;code>optimization&lt;/code>, &lt;code>benchmark design&lt;/code>&lt;/li>
&lt;li>Skills: Python&lt;/li>
&lt;li>Difficulty: Medium&lt;/li>
&lt;li>Size: Small to Large&lt;/li>
&lt;li>Mentor: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ben-greenman/">Ben Greenman&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Build or find realistic Python applications, equip them with rich types,
and modify them to run a meaningful performance benchmark. Running a benchmark
should produce timing information, and the timing should not be significantly
influenced by random variables, I/O actions, or system events.&lt;/p></description></item><item><title>These 4 new features will change the way you use OpenROAD</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20231029-luarss/</link><pubDate>Sun, 29 Oct 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20231029-luarss/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Welcome to the final blog post for my GSoC’23! Once again, my name is
Jack and I am working under the open-source electronic design automation
project - OpenROAD. We are a fast growing leading open-source
foundational application for semiconductor digital design, as evidenced
from our consistent star growth since inception. You may check us out
at this &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD/" target="_blank" rel="noopener">link&lt;/a>.
Allow me to share the four significant contributions I made in this GSoC
project.&lt;/p>
&lt;p>&lt;a href="https://star-history.com/#The-OpenROAD-Project/OpenROAD&amp;amp;Date" target="_blank" rel="noopener">
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://api.star-history.com/svg?repos=The-OpenROAD-Project/OpenROAD&amp;amp;type=Date" alt="Star History Chart" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/a>&lt;/p>
&lt;h2 id="1-improving-ease-of-installation">1) Improving Ease of Installation&lt;/h2>
&lt;p>Firstly, OpenROAD is now able to support multiple operating systems.
This is essential as one of our primary goals is to democratise chip
implementation. And installation is often one of the hardest steps
to get right, so that was one of our priorities. Today, we have
provided options for different types of installation:&lt;/p>
&lt;ul>
&lt;li>&lt;em>Prebuilt binaries&lt;/em>: Local installations can often be riddled
with incompatibilities or unexpected bugs, as well as taking a long
compilation time. We sidestepped this by providing semi-regular
updates to OpenROAD binary, reducing the time to installation.&lt;/li>
&lt;li>&lt;em>Docker&lt;/em>: Echoing previous concerns, we also enabled Docker installation
for 9 major operating systems. Docker is extremely flexible and runs
on many operating systems (as long as it is supported by Docker).&lt;/li>
&lt;/ul>
&lt;p>With these changes, we have observed 10% reduction of installation related Github issues posted on a weekly basis.&lt;/p>
&lt;p>
&lt;figure id="figure-figure-1-supported-os-matrix">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Pic1" srcset="
/report/osre23/ucsd/openroad/20231029-luarss/pic1_hu40f387e99db4aa81085a02b3bc75ebae_22326_5ec6a03672875da1d114ed8b24e54d81.webp 400w,
/report/osre23/ucsd/openroad/20231029-luarss/pic1_hu40f387e99db4aa81085a02b3bc75ebae_22326_256594bafdfffa842322c55b991f1ae1.webp 760w,
/report/osre23/ucsd/openroad/20231029-luarss/pic1_hu40f387e99db4aa81085a02b3bc75ebae_22326_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20231029-luarss/pic1_hu40f387e99db4aa81085a02b3bc75ebae_22326_5ec6a03672875da1d114ed8b24e54d81.webp"
width="650"
height="608"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 1: Supported OS matrix
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;h2 id="2-filling-missing-documentation">2) Filling Missing Documentation&lt;/h2>
&lt;p>Next, we have made considerable improvements to over 20 tool-specific
documentations, introducing consistent formatting styles for each page.
We introduce default values and datatypes to allow users to use the
tools with greater ease.&lt;/p>
&lt;p>
&lt;figure id="figure-figure-2-helpful-documentation-defaults-and-datatype">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Pic2" srcset="
/report/osre23/ucsd/openroad/20231029-luarss/pic2_hu909e40a774da931354132b6c4f3b2165_22459_f20854090d02e2c8c4eab994e275b52a.webp 400w,
/report/osre23/ucsd/openroad/20231029-luarss/pic2_hu909e40a774da931354132b6c4f3b2165_22459_2d201fd5ada34b46714b076a84194e28.webp 760w,
/report/osre23/ucsd/openroad/20231029-luarss/pic2_hu909e40a774da931354132b6c4f3b2165_22459_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20231029-luarss/pic2_hu909e40a774da931354132b6c4f3b2165_22459_f20854090d02e2c8c4eab994e275b52a.webp"
width="691"
height="368"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 2: Helpful documentation defaults and datatype
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;p>Rather than having all arguments for a function under a common table,
we separated out into developer arguments and developer commands.
This is to further make our documentation more beginner-friendly to read,
while not alienating our technical userbase. We have also added sections
for example scripts and regression test, so as to help onboard
newcomers to each tool of the flow.&lt;/p>
&lt;p>
&lt;figure id="figure-figure-3-useful-developer-commands-example-scripts-and-regression-test-instructions">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Pic3" srcset="
/report/osre23/ucsd/openroad/20231029-luarss/pic3_huf8b2e6da7ee6998c3390f4691d0458af_30285_e3fcd088f5df4574a67cf6d097c9e73a.webp 400w,
/report/osre23/ucsd/openroad/20231029-luarss/pic3_huf8b2e6da7ee6998c3390f4691d0458af_30285_1ceeb7f590547f00904c173b5a084798.webp 760w,
/report/osre23/ucsd/openroad/20231029-luarss/pic3_huf8b2e6da7ee6998c3390f4691d0458af_30285_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20231029-luarss/pic3_huf8b2e6da7ee6998c3390f4691d0458af_30285_e3fcd088f5df4574a67cf6d097c9e73a.webp"
width="690"
height="670"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 3: Useful developer commands, example scripts, and regression test instructions
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;h2 id="3-extensible-documentation-framework">3) Extensible Documentation Framework&lt;/h2>
&lt;p>Thirdly, we have introduced extensible documentation frameworks.
Now, what do we mean by &lt;em>extensible&lt;/em>? It means we have created an
infrastructure which is easy to use for developers, and allows for
greater maintanability. Our goal is to create something that
requires minimal changes to add content for documentation.&lt;/p>
&lt;p>So, how did we do this?&lt;/p>
&lt;p>We introduced 4 initiatives, namely: the warning/error messages glossary.
We noticed that people were searching for error and warning messages,
but our documentation did not have them. So we added a page where all
the error/warning messages along with relevant code line number can
be generated automatically. On top of that, developers can add useful
debug information to help the end user.&lt;/p>
&lt;p>
&lt;figure id="figure-figure-4-warningerror-messages-glossary">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Pic4" srcset="
/report/osre23/ucsd/openroad/20231029-luarss/pic4_hu4de9242319e92c4f80050403ede9a5eb_17089_aa069c4f5f2d1682fc92525139f6d57c.webp 400w,
/report/osre23/ucsd/openroad/20231029-luarss/pic4_hu4de9242319e92c4f80050403ede9a5eb_17089_881f56c79ec21ee86b422f9eb12ef3c8.webp 760w,
/report/osre23/ucsd/openroad/20231029-luarss/pic4_hu4de9242319e92c4f80050403ede9a5eb_17089_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20231029-luarss/pic4_hu4de9242319e92c4f80050403ede9a5eb_17089_aa069c4f5f2d1682fc92525139f6d57c.webp"
width="687"
height="348"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 4: Warning/Error messages glossary.
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;p>Next, we also introduced automatically generated Doxygen pages, which
integrates nicely into our C++/Tcl source code framework. This automatic
generation will make it much more convenient for developers to just
insert comments into their source code, and allow Doxygen to generate
documentation automatically.&lt;/p>
&lt;p>
&lt;figure id="figure-figure-5-doxygen-pages">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Pic5" srcset="
/report/osre23/ucsd/openroad/20231029-luarss/pic5_hu5bbe3008d2202e9240368dd966dc7b39_37072_567ad1b2725278073bfe8cdf4d2dad6a.webp 400w,
/report/osre23/ucsd/openroad/20231029-luarss/pic5_hu5bbe3008d2202e9240368dd966dc7b39_37072_35b25ed8006816a0cd300dba6aedb4a3.webp 760w,
/report/osre23/ucsd/openroad/20231029-luarss/pic5_hu5bbe3008d2202e9240368dd966dc7b39_37072_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20231029-luarss/pic5_hu5bbe3008d2202e9240368dd966dc7b39_37072_567ad1b2725278073bfe8cdf4d2dad6a.webp"
width="760"
height="578"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 5: Doxygen pages.
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;p>Next, we introduced cloud-based packaging. It is important that our
framework is able to runnable on cloud, and the ever-popular notebook
format. Our Colab based notebook was created with this in mind, and
allows for easy transfer to other notebook providers with some
modifications. Check out the notebooks here!&lt;/p>
&lt;p>
&lt;figure id="figure-figure-6-google-colab-can-now-run-openroad-scripts">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Pic6" srcset="
/report/osre23/ucsd/openroad/20231029-luarss/pic6_hu84acc4eba83f1de30ea399aa678d63ae_48463_0f20b3a36a05036a4602868c18f0da9b.webp 400w,
/report/osre23/ucsd/openroad/20231029-luarss/pic6_hu84acc4eba83f1de30ea399aa678d63ae_48463_125685c82e5be8372c2ae4b937fdd412.webp 760w,
/report/osre23/ucsd/openroad/20231029-luarss/pic6_hu84acc4eba83f1de30ea399aa678d63ae_48463_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20231029-luarss/pic6_hu84acc4eba83f1de30ea399aa678d63ae_48463_0f20b3a36a05036a4602868c18f0da9b.webp"
width="760"
height="321"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 6: Google Colab can now run OpenROAD scripts.
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;p>Lastly, we have the changelog workflow which can be triggered manually.
For our open-source project, we have chosen not to do software releases.
This means it can be difficult to track the changes between commit
numbers. Adding this workflow can help newcomers track the changes
easier, by month.&lt;/p>
&lt;p>
&lt;figure id="figure-figure-7-sample-output-of-github-changelog">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Pic7" srcset="
/report/osre23/ucsd/openroad/20231029-luarss/pic7_hu4e7dae0ef8916646279c834f2bbbed59_40244_a13d29d9b1d8fe53307365f5dfd84d86.webp 400w,
/report/osre23/ucsd/openroad/20231029-luarss/pic7_hu4e7dae0ef8916646279c834f2bbbed59_40244_9baeb333eb95f59c9ac1004e0e9fd54c.webp 760w,
/report/osre23/ucsd/openroad/20231029-luarss/pic7_hu4e7dae0ef8916646279c834f2bbbed59_40244_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20231029-luarss/pic7_hu4e7dae0ef8916646279c834f2bbbed59_40244_a13d29d9b1d8fe53307365f5dfd84d86.webp"
width="760"
height="400"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 7: Sample output of github changelog
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;h2 id="4-openroad-chatbot">4) OpenROAD Chatbot&lt;/h2>
&lt;p>Finally, we are also discussing the potential of creating a chatbot whose
purpose is to answer user queries. We were thinking, there are lots of
domain knowledge in Slack Channels, Github repos, and so on, so why
not create a LLM-based chatbot. Stay tuned for updates!&lt;/p>
&lt;h2 id="personal-reflections">Personal Reflections&lt;/h2>
&lt;p>To me, my most valuable takeaway is with regards to code quality. Often
times, we as coders tend to opt for the best solution and “hack” something
out quickly. Hacking is fine, as a proof of concept - but not for
long term code development. Working in open-source projects like this,
I have learnt to avoid creating unnecessary files, shortening the code
and optimising runtime. In doing our job, we also wish to make life
easier, not harder for future developers&lt;/p>
&lt;h2 id="final-words">Final Words&lt;/h2>
&lt;p>I would like to express my gratitude to my mentors Indira and Vitor for
their guidance and insight throughout the project, as well as the
OpenROAD dev team for their assistance. Would also like to thank the
Google Summer of Code organising committee, and UCSC for creating such a
wonderful program. Being able to contribute to actual real open-source
projects with real needs, is truly the best of both worlds for aspiring
programmers.&lt;/p></description></item><item><title>Final Blog Measuring Open-source Database Systems under TPC-C Benchmark with Unreported Settings</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/osu/missingsettings/20231030-ren.450/</link><pubDate>Wed, 25 Oct 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/osu/missingsettings/20231030-ren.450/</guid><description>&lt;p>In my final blog, I will first introduce the project, then describe the achievements after the midterm and summarize our experiments. As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/osu/missingsettings">Measuring Research Prototypes under Unreported Settings&lt;/a> my &lt;a href="https://drive.google.com/file/d/1ouFre-qMDCL_LiH5jFNUCOI1yAYHdWcS/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/yang-wang/">Yang Wang&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/miao-yu/">Miao YU&lt;/a> aims to understand the impact of missing settings in artifact evaluation.&lt;/p>
&lt;p>In my midterm blog(/report/osre23/osu/missingsettings/20230802-ren.450/), I took three paratmeters as the PostgreSQL config to test the performance of TPC-C benchmark and got some initial results about the effect of different parameters separately on throughput performance. After the midterm, I continue doing experiments on these four parameters (shared_buffer, min_wal_size, max_wal_size and effective_cache_size) with more values and associate them to measure the effect on performance. These parameters are related to memory consumption, checkpoints and planner cost in the database server. You can refer to my previous blog for details.&lt;/p>
&lt;p>For the experiment, we continue to measure the throughput performanace for the benchmark by setting scalefactor as 10 and incrementing worker terminals. The settings for database server are all default values except the four parameters we choose to tune. For the shared_buffer parameter, we choose from initial 128mb to 8gb, in total 6 values. Then for each shared_buffer setting, effective_cache_size includes three values, from initial 4gb to 16gb. Next, for each effective_cache_size setting we tune the min_wal_size and max_wal_size as a tuple, min_wal_size has two values and max_wal_size has four values, in total 6 values. We conduct the experiments by running three rounds for each setting and get all three throughput numbers and calculate their average values.&lt;/p>
&lt;p>Based on the &lt;a href="https://docs.google.com/spreadsheets/d/12OeSwZGq2G4-YGY5BTH5uZbVcAaxcZqYhqciCaBiF2E/edit?usp=sharing" target="_blank" rel="noopener">results&lt;/a>, the observation holds as the conclusion from midterm blog. The throughput of the benchmark can be affected by tuning shared_buffer and max_wal_size. Effective_cache_size and min_wal_size do not have obvious effect for this benchmark. The improvement is limited after shared_buffer and max_wal_size reach a certain value.&lt;/p>
&lt;p>In our experiment, we only choose three possible parameters for one benchmark. The experiment is expensive considering the consuming time. There are also more values of above mentioned parameters to test. This experiment can also indicate we may need to sample a subset of settings to generate observations that match those from a full extensive artifact evaluation.&lt;/p></description></item><item><title>Public Artifact and Data Visualization: A Journey to Empower</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/intel/artifactviz/20231024-zjyhhhhh/</link><pubDate>Tue, 24 Oct 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/intel/artifactviz/20231024-zjyhhhhh/</guid><description>&lt;p>​
Hola Amigos!
​
As we draw the curtains on our project titled &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/intel/artifactviz">Public Artifact and Data Visualization&lt;/a> we&amp;rsquo;re thrilled to present the incredible advancements we&amp;rsquo;ve achieved since our mid-term update. Our mission has been to foster a deeper understanding of data and empower users to make informed decisions. Let&amp;rsquo;s delve into the remarkable evolution of our project.&lt;/p>
&lt;h2 id="unveiling-new-functionalities">Unveiling New Functionalities&lt;/h2>
&lt;ol>
&lt;li>Modular Architecture: Your Way, Your Choice&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>At the core of our project is a modular architecture designed to cater to your unique preferences. We firmly believe that choice empowers users. Thus, we&amp;rsquo;ve given you the option to select between a Graphical User Interface (GUI) and a Command-Line Interface (CLI). It&amp;rsquo;s about providing a platform that adapts to your specific requirements and style of interaction.&lt;/li>
&lt;/ul>
&lt;ol start="2">
&lt;li>Real-time Backend Environment Monitoring: Data as it Happens&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>Real-time monitoring of backend environment data is at the heart of our project. It&amp;rsquo;s not just about collecting data; it&amp;rsquo;s about providing continuous insights into system performance. This feature empowers you to make real-time, data-driven decisions—an essential capability in today&amp;rsquo;s fast-paced computing landscape.&lt;/li>
&lt;/ul>
&lt;ol start="3">
&lt;li>Visualizing Environment Variables: Clarity Amidst Complexity&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>We&amp;rsquo;ve placed a strong emphasis on user-friendly data visualization. Our enhancements enable you to navigate through detected variables effortlessly and compare iterations within different buckets. The result is a visual representation of complex data, making it easier to comprehend and analyze.&lt;/li>
&lt;/ul>
&lt;ol start="4">
&lt;li>Predefined Monitoring Commands: Your Head Start&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>We understand that monitoring can be a daunting task. To simplify the process, we&amp;rsquo;ve introduced predefined monitoring commands such as mpstat and iostat. These templates serve as a launchpad for monitoring common system metrics, helping you get started quickly and efficiently.&lt;/li>
&lt;/ul>
&lt;ol start="5">
&lt;li>Comprehensive Customization: Tailoring the Experience&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>Recognizing that every user has unique needs, our platform now offers extensive documentation. This documentation serves as a guide, enabling users to fine-tune their monitoring commands. It&amp;rsquo;s about tailoring the platform to match your specific requirements and preferences. The power to customize is firmly in your hands.&lt;/li>
&lt;/ul>
&lt;ol start="6">
&lt;li>Import and Export Functionality: Seamless Collaboration&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>In an era where collaboration and data management are essential, we&amp;rsquo;ve introduced the capability to import and export environment data. This feature simplifies data management and supports collaborative efforts, making it easy to share monitoring data and conduct analysis across various environments.&lt;/li>
&lt;/ul>
&lt;h2 id="exploring-our-repositories">Exploring Our Repositories&lt;/h2>
&lt;p>​
As mentioned earlier, we have completed the core functionalities of our platform, and we would love to have you try it out and provide us with valuable feedback. Here are the links to our repositories where you can explore and experiment with our platform:
​&lt;/p>
&lt;ol>
&lt;li>&lt;a href="https://github.com/PublicExperimentDatabase/PublicExperimentGUI" target="_blank" rel="noopener">GUI Repository&lt;/a> and &lt;a href="https://github.com/PublicExperimentDatabase/PublicExperimentCLI" target="_blank" rel="noopener">CLI Repository&lt;/a>
&lt;ul>
&lt;li>The journey begins with a choice. Our repositories cater to a diverse range of user preferences. Inside the README.md file of the GUI repository, you&amp;rsquo;ll find meticulous installation instructions to guide you through setting up the Graphical User Interface (GUI). It&amp;rsquo;s your portal to a user-friendly experience&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="https://github.com/PublicExperimentDatabase/test-experiment" target="_blank" rel="noopener">Sample Repository&lt;/a>
&lt;ul>
&lt;li>For those eager to embark on their monitoring journey, our Sample Repository is a valuable resource. It provides scripts that not only enable you to run our program but also serve as templates. These templates are designed to simplify the monitoring of your own programs, tailored to your unique requirements.
​&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;h2 id="project-demo">Project Demo&lt;/h2>
&lt;p>​
To provide you with a glimpse of what our project can do, here are some demo images showcasing the capabilities and features of &amp;ldquo;Public Artifact and Data Visualization.&amp;rdquo;
​
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="" srcset="
/report/osre23/intel/artifactviz/20231024-zjyhhhhh/feature1_hub6eb130c638c788b954d77fd05b17dc2_80420_39e93d5df25c8b9261ed5b60f3a49091.webp 400w,
/report/osre23/intel/artifactviz/20231024-zjyhhhhh/feature1_hub6eb130c638c788b954d77fd05b17dc2_80420_435c9e662168ef7e029d1c36702fca84.webp 760w,
/report/osre23/intel/artifactviz/20231024-zjyhhhhh/feature1_hub6eb130c638c788b954d77fd05b17dc2_80420_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/intel/artifactviz/20231024-zjyhhhhh/feature1_hub6eb130c638c788b954d77fd05b17dc2_80420_39e93d5df25c8b9261ed5b60f3a49091.webp"
width="760"
height="396"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
​
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="" srcset="
/report/osre23/intel/artifactviz/20231024-zjyhhhhh/feature2_hu70b57a5e19005e11cc3a42881b456609_84702_df590742e12a23dea8d1f3414c9e5c16.webp 400w,
/report/osre23/intel/artifactviz/20231024-zjyhhhhh/feature2_hu70b57a5e19005e11cc3a42881b456609_84702_b47182cd4c3ea07108c723e7c18875e4.webp 760w,
/report/osre23/intel/artifactviz/20231024-zjyhhhhh/feature2_hu70b57a5e19005e11cc3a42881b456609_84702_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/intel/artifactviz/20231024-zjyhhhhh/feature2_hu70b57a5e19005e11cc3a42881b456609_84702_df590742e12a23dea8d1f3414c9e5c16.webp"
width="760"
height="396"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
​
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="" srcset="
/report/osre23/intel/artifactviz/20231024-zjyhhhhh/feature3_hu17639a210c97ec1be7d726068aef2aa2_44169_6117bb9125bca9a4f63ad1631b5f7bcc.webp 400w,
/report/osre23/intel/artifactviz/20231024-zjyhhhhh/feature3_hu17639a210c97ec1be7d726068aef2aa2_44169_3979a5588d47e6a37a482b5f2184d3af.webp 760w,
/report/osre23/intel/artifactviz/20231024-zjyhhhhh/feature3_hu17639a210c97ec1be7d726068aef2aa2_44169_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/intel/artifactviz/20231024-zjyhhhhh/feature3_hu17639a210c97ec1be7d726068aef2aa2_44169_6117bb9125bca9a4f63ad1631b5f7bcc.webp"
width="736"
height="656"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
​&lt;/p>
&lt;h2 id="thank-you-for-joining-us">Thank You for Joining Us&lt;/h2>
&lt;p>​
We appreciate your support and participation in this journey of data visualization and empowerment. Our commitment to enhancing the world of data comprehension remains unwavering. As we mark the end of this chapter, we eagerly anticipate the exciting future that awaits in the realm of data visualization. The path doesn&amp;rsquo;t end here; it&amp;rsquo;s just the beginning of a new chapter in our collective exploration of data&amp;rsquo;s potential.`
​&lt;/p></description></item><item><title>Final Blog on Teaching Computer Networks with Reproducible Research: Developing a 'classroom competition' for adaptive video delivery</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/edunet/20230820-srishti-j18/</link><pubDate>Fri, 20 Oct 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/edunet/20230820-srishti-j18/</guid><description>&lt;p>Hello Again!&lt;/p>
&lt;p>I&amp;rsquo;m excited to present my final blog post summarizing the progress and achievements made over the 2023 Summer of Reproducibility Fellowship.I will be sharing the work I&amp;rsquo;ve created for the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/nyu/edunet">Teaching Computer Networks with Reproducible Research: Developing a &amp;lsquo;classroom competition&amp;rsquo; for adaptive video delivery&lt;/a>.&lt;/p>
&lt;h2 id="recap-of-the-journey">Recap of the Journey&lt;/h2>
&lt;p>In my &lt;a href="content/report/osre23/nyu/edunet/20230801-Srishti-j18">mid-term&lt;/a> evaluation, I discussed the initial milestones and challenges I encountered during this program. At that point, I studied the key figures from the research paper &amp;lsquo;&lt;a href="https://dl.acm.org/doi/10.1145/2491172.2491179" target="_blank" rel="noopener">Downton Abbey Without the Hiccups: Buffer-Based Rate Adaptation for HTTP Video Streaming&lt;/a>&amp;rsquo;. My primary objectives were to ensure compatibility with both Python 2 and Python 3 and to incorporate an &amp;lsquo;Estimated Download Rate&amp;rsquo; metric into the output file generated by the adaptive video client. Furthermore, I expanded the project to include two crucial visualizations: buffer occupancy vs. time and estimated download rate vs. time.&lt;/p>
&lt;h2 id="final-project-progress">Final Project Progress&lt;/h2>
&lt;p>In the final weeks of my internship, I worked towards my ultimate goal, which was to reproduce existing work and create a clear guide for future students. I aimed to enable them to build upon and improve this work. To achieve this, I created a new experiment using an existing one,&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="" srcset="
/report/osre23/nyu/edunet/20230820-srishti-j18/feature1_hu9d52f6d5cbdd1ece23828e42c8b71316_147352_f279db1f4805fb171d3cff4ae4a908dc.webp 400w,
/report/osre23/nyu/edunet/20230820-srishti-j18/feature1_hu9d52f6d5cbdd1ece23828e42c8b71316_147352_9e5cf37b8721460bee97304092b3b9fa.webp 760w,
/report/osre23/nyu/edunet/20230820-srishti-j18/feature1_hu9d52f6d5cbdd1ece23828e42c8b71316_147352_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/edunet/20230820-srishti-j18/feature1_hu9d52f6d5cbdd1ece23828e42c8b71316_147352_f279db1f4805fb171d3cff4ae4a908dc.webp"
width="760"
height="442"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>which I titled &amp;ldquo;&lt;a href="https://github.com/Srishti-j18/adaptive-video/blob/68bd537a65eeec0f221ae095b35b18c1e8ffd2ef//notebooks/exec_policy.ipynb" target="_blank" rel="noopener">Compare Adaptive Video Policies&lt;/a>&amp;rdquo;&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="" srcset="
/report/osre23/nyu/edunet/20230820-srishti-j18/featured_hu28a6c98f585340505adb453b1827e333_184681_9db9d6a3e27e1f9a70c791dbc5fb72d7.webp 400w,
/report/osre23/nyu/edunet/20230820-srishti-j18/featured_hu28a6c98f585340505adb453b1827e333_184681_12adfaacac4f310f07b71c83727dd13e.webp 760w,
/report/osre23/nyu/edunet/20230820-srishti-j18/featured_hu28a6c98f585340505adb453b1827e333_184681_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/edunet/20230820-srishti-j18/featured_hu28a6c98f585340505adb453b1827e333_184681_9db9d6a3e27e1f9a70c791dbc5fb72d7.webp"
width="760"
height="575"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>This experiment compares two policies: rate-based (basic) policy and
buffer-based (Netflix) policy. In the experiment, I covered the following key aspects:&lt;/p>
&lt;p>How Both Policies Work: I detailed the workings of both the rate-based and buffer-based policies, explaining how each policy selects the next bitrate, among other relevant information.&lt;/p>
&lt;p>Instructions for Execution of Policies: After conducting several experiments with different settings, I determined the most appropriate settings for this experiment. These settings have been added to the instructions for executing both policies, with a focus on ensuring similar &amp;ldquo;high&amp;rdquo; network rates, &amp;ldquo;low&amp;rdquo; data rates, similar durations of the &amp;ldquo;high&amp;rdquo; data rate before the
interruption, and similar durations of the &amp;ldquo;interruption.&amp;rdquo; This setup allows for an easy and clear comparison of the two policies.&lt;/p>
&lt;p>Discussion Part: In the discussion section, I addressed the differences that students can observe after conducting the experiment and visualising the graphs and videos.&lt;/p>
&lt;p>In conclusion, I would like to thank my mentor, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fraida-fund/">Fraida Fund&lt;/a>, who has given me excellent guidance and would like to express my gratitude to OSRE23, where I have learned so much. This experience has been amazing for my personal and professional growth.&lt;/p></description></item><item><title>Final Blog on Using Reproducibility in Machine Learning Education</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/eduml/20231018-msaeed/</link><pubDate>Wed, 18 Oct 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/eduml/20231018-msaeed/</guid><description>&lt;p>Welcome back!&lt;/p>
&lt;p>In my final blog post for the 2023 Summer of Reproducibility Fellowship, I&amp;rsquo;ll be sharing my experiences and the materials I&amp;rsquo;ve created for the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/nyu/eduml">Using Reproducibility in Machine Learning Education project&lt;/a>. As a quick reminder, my mentor &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fraida-fund/">Fraida Fund&lt;/a> and I have been working on developing interactive open-source educational resources that teach reproducibility and reproducible research in machine learning. You can find my &lt;a href="https://drive.google.com/file/d/13HnCMZawpabiLdBoOiaJFF2mNXIPLCVJ/view?usp=sharing" target="_blank" rel="noopener">proposal here&lt;/a>.&lt;/p>
&lt;p>In this post, I&amp;rsquo;ll give you a rundown of my experience and share the materials I&amp;rsquo;ve created. If you haven&amp;rsquo;t checked out my previous blog posts, definitely take a look before diving into this one. Let&amp;rsquo;s get started!&lt;/p>
&lt;h2 id="why-is-this-project-important-">Why is this project important 🤔&lt;/h2>
&lt;p>Reproducibility is an essential aspect of scientific research, and it&amp;rsquo;s becoming increasingly important in the field of computer science. However, most efforts to promote reproducibility in education focus on students who are actively involved in research, leaving a significant gap in the curriculum for introductory courses. Our project aims to address this issue by incorporating reproducibility experiences into machine learning education.&lt;/p>
&lt;h2 id="why-reproducibility-matters-in-education-">Why Reproducibility Matters in Education 🎓&lt;/h2>
&lt;p>There are two primary reasons why we believe reproducibility belongs in the computer science classroom. Firstly, it allows students to experience the process of reproducing research firsthand, giving them a deeper understanding of the scientific method and its importance in the field. This exposure can inspire students to adopt reproducible practices in their future careers, contributing to a more transparent and reliable scientific community.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="" srcset="
/report/osre23/nyu/eduml/20231018-msaeed/reproducibilityBenifits_hudb7baafb83412fd51973fc577da0863d_141778_0f4e9e6ba00e070430ccd90e09800a28.webp 400w,
/report/osre23/nyu/eduml/20231018-msaeed/reproducibilityBenifits_hudb7baafb83412fd51973fc577da0863d_141778_22ad37d3ef94bfc2aa93cf4ba651684e.webp 760w,
/report/osre23/nyu/eduml/20231018-msaeed/reproducibilityBenifits_hudb7baafb83412fd51973fc577da0863d_141778_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/eduml/20231018-msaeed/reproducibilityBenifits_hudb7baafb83412fd51973fc577da0863d_141778_0f4e9e6ba00e070430ccd90e09800a28.webp"
width="760"
height="207"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;em>Source: Fund, Fraida. &amp;ldquo;We Need More Reproducibility Content Across the Computer Science Curriculum.&amp;rdquo; Proceedings of the 2023 ACM Conference on Reproducibility and Replicability. 2023.&lt;/em>&lt;/p>
&lt;p>Secondly, as shown in the figure, involving students in reproducibility efforts can have a significant impact on the reproducibility ecosystem itself. Students can create reproducibility artifacts, such as replicable experiments or data analysis, that can be used by other researchers, including authors and graduate students. Additionally, students can consume reproducibility artifacts created by the research community, provide feedback, and suggest improvements. Authors appreciate this type of engagement, as it adds value to their work and promotes open science.&lt;/p>
&lt;h2 id="focusing-on-machine-learning-">Focusing on Machine Learning 🧐&lt;/h2>
&lt;p>Given the growing interest in machine learning and its relevance to reproducibility, our project decided to focus on this area. Machine learning already has a strong culture of reproducibility, with initiatives like &lt;a href="https://paperswithcode.com/" target="_blank" rel="noopener">Papers with Code&lt;/a> and the &lt;a href="https://paperswithcode.com/rc2022" target="_blank" rel="noopener">ML Reproducibility Challenge&lt;/a>. These efforts encourage researchers to share their code and reproduce recent machine learning papers, validating their results. By leveraging these existing resources, we can create learning materials that utilize real-world examples and foster hands-on reproducibility experiences for students.&lt;/p>
&lt;h2 id="the-interactive-notebooks-">The Interactive Notebooks 📖&lt;/h2>
&lt;p>We have created two learning materials that focus on machine learning and reproducibility. &lt;strong>The first material&lt;/strong> looks at a paper titled &lt;a href="https://arxiv.org/abs/1910.08475" target="_blank" rel="noopener">&amp;ldquo;On Warm Starting Neural Network Training&amp;rdquo;&lt;/a> by Jordan T. Ash and Ryan P. Adams. This paper discusses the concept of warm-starting, which involves using weights from a previously trained model on a subset of the dataset to train a new model. The authors compare the performance of warm-started models with randomly initialized models and find that the warm-started models perform worse as shown in the below figure.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="" srcset="
/report/osre23/nyu/eduml/20231018-msaeed/figure1_hu9d945c7dbee9ef6ef608a89a33d817c5_76602_5815498dd015ebc84b00505c90a65354.webp 400w,
/report/osre23/nyu/eduml/20231018-msaeed/figure1_hu9d945c7dbee9ef6ef608a89a33d817c5_76602_df01d4772e731cee04ae4783ac0cc994.webp 760w,
/report/osre23/nyu/eduml/20231018-msaeed/figure1_hu9d945c7dbee9ef6ef608a89a33d817c5_76602_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/eduml/20231018-msaeed/figure1_hu9d945c7dbee9ef6ef608a89a33d817c5_76602_5815498dd015ebc84b00505c90a65354.webp"
width="760"
height="306"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>Our material takes students through the process of identifying the different claims made in the paper and finding the corresponding experiments that support them. They will also learn how to use open-source code and available data to reproduce these experiments and understand the computational complexity associated with reproducing each experiment. This material can be found on both &lt;a href="https://github.com/mohammed183/re_warm_start_nn/tree/main" target="_blank" rel="noopener">github&lt;/a> and &lt;a href="https://chameleoncloud.org/experiment/share/5b5717df-9aa9-470f-b393-c1e189c008a8" target="_blank" rel="noopener">chameleon&lt;/a> where you can use chameleon to run the material on the required resources.&lt;/p>
&lt;p>&lt;strong>The second material&lt;/strong> examines the paper &lt;a href="https://arxiv.org/abs/2010.11929" target="_blank" rel="noopener">&amp;ldquo;An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale&amp;rdquo;&lt;/a> by Dosovitskiy et al., which introduces a novel way of applying the transformer architecture, which was originally designed for natural language processing, to image recognition tasks. The paper shows that transformers can achieve state-of-the-art results on several image classification benchmarks, such as ImageNet, when trained on large-scale datasets as shown in the following table.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="" srcset="
/report/osre23/nyu/eduml/20231018-msaeed/table1_hu639b2ac18dac1313dd35f10cc0ae8db7_237634_7faf7451ff08ac87e7d12ab941c77f8e.webp 400w,
/report/osre23/nyu/eduml/20231018-msaeed/table1_hu639b2ac18dac1313dd35f10cc0ae8db7_237634_5441e5e4c6ffed9b29244a3a3dcde852.webp 760w,
/report/osre23/nyu/eduml/20231018-msaeed/table1_hu639b2ac18dac1313dd35f10cc0ae8db7_237634_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/eduml/20231018-msaeed/table1_hu639b2ac18dac1313dd35f10cc0ae8db7_237634_7faf7451ff08ac87e7d12ab941c77f8e.webp"
width="760"
height="354"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>Our material guides students through the process of understanding which claims can and cannot be validated based on the available datasets and how complex it can be to validate each claim. Additionally, they will learn how to use pre-trained models to replicate computationally expensive experiments. Again this material can be on both &lt;a href="https://github.com/mohammed183/re_vit/tree/main" target="_blank" rel="noopener">github&lt;/a> and &lt;a href="https://chameleoncloud.org/experiment/share/8f0e34c5-d2c4-45be-8425-36686ad57650" target="_blank" rel="noopener">chameleon&lt;/a>.&lt;/p>
&lt;p>Both materials are designed to be easy to understand and interactive, allowing students to engage with the content and gain a deeper understanding of the concepts. Instructors can use these materials to assess their students&amp;rsquo; understanding of machine learning and reproducibility.&lt;/p>
&lt;h2 id="reflecting-on-the-journey">Reflecting on the Journey&lt;/h2>
&lt;p>As we wrap up our journey of creating beginner-friendly learning materials for machine learning using reproducibility, it&amp;rsquo;s time to reflect on the rewarding experiences and valuable lessons learned along the way. Our deep dive into the world of machine learning and reproducibility not only enriched our knowledge but also provided us with an opportunity to contribute to the community at the &lt;strong>UC Open Source Symposium 2023&lt;/strong> at UCSC.&lt;/p>
&lt;p>The symposium was a memorable event where we presented our work in a poster session. The diversity of the audience, ranging from professors and researchers to students, added depth to our understanding through their valuable feedback and insights. It was intriguing to see the potential applications of our work in various contexts and its capacity to benefit the broader community.&lt;/p>
&lt;p>This project has been a personal journey of growth, teaching me much more than just machine learning and reproducibility. It honed my skills in collaboration, communication, and problem-solving. I learned to distill complex ideas into simple, accessible language and create engaging, interactive learning experiences. The most fulfilling part of this journey has been seeing our work come alive and realizing its potential to positively impact many people. The gratification that comes from creating something useful for others is unparalleled, and we are thrilled to share our materials with the world.&lt;/p>
&lt;p>Your time and interest in our work are greatly appreciated! Hope you enjoyed this blog!&lt;/p></description></item><item><title>Learning Machine Learning by Reproducing Vision Transformers</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/eduml/20231006-msaeed/</link><pubDate>Fri, 06 Oct 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/eduml/20231006-msaeed/</guid><description>&lt;p>Hello again!&lt;/p>
&lt;p>In this blog post, I will be discussing the second material I created for the 2023 Summer of Reproducibility Fellowship. As you may recall from my &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/eduml/20230601-msaeed">first post&lt;/a>, I am working on the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/nyu/eduml">Using Reproducibility in Machine Learning Education&lt;/a> project with &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fraida-fund/">Fraida Fund&lt;/a> as my mentor. My goal is to create interactive open-source educational resources that teach reproducibility and reproducible research in machine learning (ML), as outlined in my &lt;a href="https://drive.google.com/file/d/13HnCMZawpabiLdBoOiaJFF2mNXIPLCVJ/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a>.&lt;/p>
&lt;p>In this post, I will share with you my second material, and how it can be helpful in machine learning class to teach students about vision transformers and reproducibility at the same time. If you haven&amp;rsquo;t seen my first work, be sure to check out my &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/eduml/20230802-msaeed">previous blog post&lt;/a>. Without further ado, let&amp;rsquo;s dive in!&lt;/p>
&lt;h2 id="reproducing-an-image-is-worth-16x16-words-transformers-for-image-recognition-at-scale">Reproducing “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale”&lt;/h2>
&lt;p>This material is a reproduction of Dosovitskiy et al.‘s 2020 paper, &lt;a href="https://arxiv.org/abs/2010.11929" target="_blank" rel="noopener">“An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale”&lt;/a>. This paper introduces the Vision Transformer (ViT), a novel architecture that applies the transformer model, originally designed for natural language processing tasks, to image recognition. The ViT model achieves state-of-the-art performance on several image classification benchmarks, demonstrating the potential of transformers for computer vision tasks.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="" srcset="
/report/osre23/nyu/eduml/20231006-msaeed/ViT_hud9ed20979bb56dae4d8e9f4231875a17_383197_485ae1a0cccbdc73994be22901c125d5.webp 400w,
/report/osre23/nyu/eduml/20231006-msaeed/ViT_hud9ed20979bb56dae4d8e9f4231875a17_383197_f8af78acab4a91489ecff3308bc9c9c1.webp 760w,
/report/osre23/nyu/eduml/20231006-msaeed/ViT_hud9ed20979bb56dae4d8e9f4231875a17_383197_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/eduml/20231006-msaeed/ViT_hud9ed20979bb56dae4d8e9f4231875a17_383197_485ae1a0cccbdc73994be22901c125d5.webp"
width="760"
height="229"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
The figure illustrates the key idea behind ViT, which is to treat an image as a sequence of patches, similar to how a transformer treats a sentence as a sequence of words. Each patch is flattened into a vector and fed into the transformer encoder, which learns to capture the complex relationships between these patches. The resulting representation is then fed into an MLP head, which produces a final prediction for the image. This approach allows ViT to handle large input images and capture both global context and fine-grained details. ViT models can also be pre-trained on large datasets and fine-tuned on smaller datasets for improved performance.&lt;/p>
&lt;p>To reproduce this paper, I followed a systematic approach to ensure reliable results:&lt;/p>
&lt;ul>
&lt;li>Critically analyze the paper&amp;rsquo;s qualitative and quantitative claims.&lt;/li>
&lt;li>Identify the necessary experiments to verify each claim.&lt;/li>
&lt;li>Determine the required data, code, and hyperparameters for each experiment.&lt;/li>
&lt;li>Utilize pre-trained models for validating claims that require high computational resources.&lt;/li>
&lt;li>Investigate resources shared by the authors, such as code, data, and models.&lt;/li>
&lt;li>Assess the feasibility of verifying different types of claims.&lt;/li>
&lt;li>Design new experiments for validating qualitative claims when certain models or datasets are unavailable.&lt;/li>
&lt;/ul>
&lt;p>I utilized &lt;a href="https://chameleoncloud.org/" target="_blank" rel="noopener">Chameleon&lt;/a> as my platform for conducting and documenting my reproduction experiments. Chameleon is a large-scale, reconfigurable experimental environment that supports computer science systems research. It enables users to create and share Jupyter notebooks capable of running Python code on Chameleon’s cloud servers. For this work, a GPU with 24GB or more memory is required to run the notebooks on GPU, which Chameleon offers in its variety of GPUs.&lt;/p>
&lt;p>I have set up a &lt;a href="https://github.com/mohammed183/re_vit" target="_blank" rel="noopener">GitHub repository&lt;/a> where you can access all of my reproduction work. The repository contains interactive Jupyter notebooks that will help you learn more about machine learning and the reproducibility of machine learning research. These notebooks provide a hands-on approach to understanding the concepts and techniques presented in my reproduction work.&lt;/p>
&lt;h2 id="challenges">Challenges&lt;/h2>
&lt;p>Reproducing a paper can be a challenging task, and I encountered several obstacles during the process, including:&lt;/p>
&lt;ul>
&lt;li>The unavailability of pretraining datasets and pretrained models&lt;/li>
&lt;li>Inexact or unspecified hyperparameters&lt;/li>
&lt;li>The need for expensive resources for some hyperparameters&lt;/li>
&lt;li>The use of different frameworks for baseline CNNs and Vision Transformers&lt;/li>
&lt;/ul>
&lt;p>These issues posed significant difficulties in replicating the following table, a key result from the Vision Transformer paper that demonstrates its superiority over prior state-of-the-art models.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="" srcset="
/report/osre23/nyu/eduml/20231006-msaeed/table1_hu639b2ac18dac1313dd35f10cc0ae8db7_237634_7faf7451ff08ac87e7d12ab941c77f8e.webp 400w,
/report/osre23/nyu/eduml/20231006-msaeed/table1_hu639b2ac18dac1313dd35f10cc0ae8db7_237634_5441e5e4c6ffed9b29244a3a3dcde852.webp 760w,
/report/osre23/nyu/eduml/20231006-msaeed/table1_hu639b2ac18dac1313dd35f10cc0ae8db7_237634_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/eduml/20231006-msaeed/table1_hu639b2ac18dac1313dd35f10cc0ae8db7_237634_7faf7451ff08ac87e7d12ab941c77f8e.webp"
width="760"
height="354"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>To overcome these challenges, I used the same models mentioned in the paper but pretrained on different datasets, experimented with various hyperparameter combinations to achieve the best results, and wrote my own code to ensure that both the baseline and Vision Transformer were fine-tuned using the same framework. I also faced other challenges, which I discussed in my notebooks along with the solutions I applied.&lt;/p>
&lt;h2 id="how-to-use-this-material">How to use this material?&lt;/h2>
&lt;p>This material consists of a series of notebooks that guide you through the paper, its claims, experiments, and results. You will learn how to analyze, interpret, and validate the authors&amp;rsquo; claims. To get started, I recommend briefly skimming the &lt;a href="https://arxiv.org/abs/2010.11929" target="_blank" rel="noopener">original paper&lt;/a> to gain an understanding of the main ideas and public information. This will help you see how the authors could have been more transparent and clear in certain sections. The notebooks provide clear instructions and explanations, as well as details on how I addressed any missing components.&lt;/p>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>In this blog post, I&amp;rsquo;ve walked you through the contents of this material and the insights users can gain from it. This material is particularly intriguing as it replicates a paper that has significantly influenced the field of computer vision. The interactive nature of the material makes it not only educational but also engaging and enjoyable. I believe users will find this resource both fun and beneficial.&lt;/p>
&lt;p>I hope you found this post informative and interesting. If you have any questions or feedback, please feel free to contact me. Thank you for reading and stay tuned for more updates!&lt;/p></description></item><item><title>noWorkflow as an experiment management tool - Final Report</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/noworkflow/20230914-jesselima/</link><pubDate>Thu, 14 Sep 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/noworkflow/20230914-jesselima/</guid><description>&lt;p>This post describes our midterm work status and some achievements we
have made so far in our project
&lt;a href="https://docs.google.com/document/d/1YMtPjZXcgt5eplyxIgQE8IBpQIiRlB9eqVSQiIPhXNU/edit?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a>
for
&lt;a href="https://ospo.ucsc.edu/project/osre23/nyu/noworkflow" target="_blank" rel="noopener">noWorkflow&lt;/a>.&lt;/p>
&lt;p>For a more friendly introduction to our work, please, refer to this
&lt;a href="https://github.com/jaglima/noworkflow_usecase/blob/main/README.md" target="_blank" rel="noopener">tutorial
available&lt;/a>.&lt;/p>
&lt;p>Our final code to merge is available in &lt;a href="https://github.com/jaglima/noworkflow/tree/sor_features" target="_blank" rel="noopener">this repository&lt;/a>.&lt;/p>
&lt;h2 id="different-ways-of-managing-experiments">Different ways of managing experiments&lt;/h2>
&lt;p>From our starting point at the midterm, and from our initial aspirations
for the SoR, we kept on track with the goal of adding features to
noWorkflow related to managing DS/ML experimental setups focusing on
reproducibility.&lt;/p>
&lt;p>With the emergence of IA across multiple fields in industry and
academia, the subject of reproducibility has become increasingly
relevant. In [1] we have an
interesting description of the sources of irreproducibility in Machine
Learning. All these sources are present at different stages during the
project's experimental phases and may even persist in production
environments, leading to the accumulation of technical debt
[2]. The problem of
irreproducibility is also discussed in [[3],
[4]], pointing out that the
velocity of deliverances usually comes at the expense of
reproducibility, among other victims.&lt;/p>
&lt;p>The CRISP-DM process as reviewed in
[5] demonstrates that Data
Science experiments follows a typical path of execution. In the same
manner, [[3], [6],
[7]], points out that
Machine Learning pipelines are composed of well-defined layers (or
stages) through its lifecycle. The emergence of IA in real world
applications stressed the almost artisanal ways of creating and managing
analytical experiments and reinforced that there is room to make things
more efficiently.&lt;/p>
&lt;p>In the search for possible approaches to the problem, we came across
several projects that aimed to address these issues. Not surprisingly,
multiple authors pursued the same goal, for instance [[9],
[10]]. In these references,
and confirmed in our survey, we found from targeted solutions to
specific steps in modeling to services aiming for end-to-end AIOps
management. Some are available as software packages, others as SaaS in
cloud environments. In general terms, all of them end up offering
features in different layers of the workflow (i.e. data, feature,
scoring, and evaluation) or with different conceptualizations of
reproducibility/replicability/repeatability as noticed by
[11]. On one hand, this lack of
standards makes any assessment difficult. On the other hand, it suggests
a community in an exploratory process of a hot topic subject.&lt;/p>
&lt;p>Specifically for this project, our focus is in the initial stages of
computational scientific experiments. As studied in [8], in this
phase, experiments are i) implemented by people as prototypes, ii) with
minor focus on pipeline design and iii) in tools like Notebooks, that
mix documentation, visualization and code with no required sequential
structure. These three practices impact reproducibility and efficiency
and are prone to create technical debts. However, tools like noWorkflow
show a huge potential in such scenarios. It is promising because they i)
demands a minimal setup to be functional, ii) works well with almost
nonexistent workflows iii) require minimal additional intrusive code
among the experimental one and iv) integrates well with Notebooks that
are the typical artifact in these experiments.&lt;/p>
&lt;p>According to its core team, the primary goal of noWorkflow is to
&amp;quot;...allow scientists to benefit from provenance data analysis even
when they don't use a workflow system.&amp;quot;. Unlike other tools,
&amp;quot;noWorkflow captures provenance from Python scripts without needing a
version control system or any other environment&amp;quot;. It is particularly
interesting when we are in the scenario described above, where we lack
any structured system at the beginning of experiments. In fact, after
going through the docs, we can verify that noWorkflow provides:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Command-line accessibility&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Seamless integration with Jupyter Notebooks&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Minimal setup requirements in your environment&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Elimination of the need for virtual machines or containers in its
setup&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Workflow-free operation&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Open source license&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Framework-agnostic position&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Finally, in our research, we confirmed that there is an open spot in the
management of scientific experiments that needs to be occupied by
reproducibility. Provenance tools can help the academy and industry
groups in this goal, and in this summer we focused on adding relevant
features to leverage the noWorkflow in this direction.&lt;/p>
&lt;h2 id="different-tools-for-different-needs">Different tools for different needs&lt;/h2>
&lt;p>In our research phase, we didn't find any taxonomy that fully
accommodated our review of different categories of tools providing
reproducibility and experimental management. So, we could describe some
tools in the following categories (freely adapted from this online
references
&lt;a href="https://ml-ops.org/content/mlops-principles" target="_blank" rel="noopener">[here]&lt;/a> and
&lt;a href="https://ambiata.com/blog/2020-12-07-mlops-tools/" target="_blank" rel="noopener">[here]&lt;/a>):&lt;/p>
&lt;p>&lt;strong>Data and Pipeline Versioning&lt;/strong>: Platforms dealing with ingestion,
processing, and exposing of features for model training and inference.
They enable collaboration and discoverability of already existing
Feature Sets throughout the teams and organizations. Provide provenance
and lineage for data in different levels of complexity.&lt;/p>
&lt;p>&lt;strong>Metadata Stores/Experiment Trackers&lt;/strong>: They are specifically built to
store metadata about ML experiments and expose it to stakeholders. They
help with debugging, comparing, and collaborating on experiments. It is
possible to divide them into Experiment Trackers and a Model Registry.
Moreover, there are projects offering reproducibility features like
hyperparameter search, experiment versioning, etc. However, they demand
more robust workflows and are better suited for projects in the
production/monitoring phases.&lt;/p>
&lt;p>&lt;strong>Pipeline frameworks&lt;/strong>: They operate within the realm of production,
similar to Data Engineering workflows. Their usual goal is to allow any
ML/AI products to be served across a wide range of architectures, and
integrate all the low-hanging fruits along the way. For instance,
pipelines adding hyperparameter optimization tasks, experiment tracking
integrations, boilerplate containerized deployment, etc.&lt;/p>
&lt;p>&lt;strong>Deployment and Observability&lt;/strong>: They focus on deploying models for
real-time inference and monitoring model quality once they are deployed
in production. Their aim is to facilitate post-deployment control tasks
such as monitoring feature drifts, conducting A/B testing, facilitating
fast model shifts, and more.&lt;/p>
&lt;p>The most remarkable aspect of this survey is that there are different
tools for different phases in the life cycle of AI products. There are
tools like DVC and Pachyderm that are Metadata Stores, allowing
Experiment Tracking with features of tagging variables, as well as Data
and Pipeline tracking. They are the most similar tools to noWorkflow in
functionality. However, DVC possesses a more complex framework in
dealing with different 'types' of tags, and relies on command line
tools to extract and analyze tagged variables. Also, it depends strongly
on git and replicate the git logics. Pachyderm requires a more
sophisticated setup at the start, relying on containers and a server. It
is an obstacle to small and lean prototypes, requiring installation of a
docker image, and all friction on managing it.&lt;/p>
&lt;p>There are other tools, like MLFlow and Neptune that pose themselves as
Model Experiment Versioning with features of Monitoring and Deployment.
They also have elements of pipeline frameworks, offering full
integration and boiler plates for seamless integration with cloud
platforms.&lt;/p>
&lt;p>Pipelines are a vast field. They are AWS SageMaker, Google Vertex,
DataRobot and Weights &amp;amp; Biases, among others. All of them offer features
helping in all categories, with a strong focus on exploring all
automation that can be offered to the final user, suggesting automatic
parameter tuning, model selection, retraining, data lineage, metadata
storing, etc.&lt;/p>
&lt;p>Finally, Deployment and Observability frameworks are in the deployment
realm, which is another stage far removed from prototypical phases of
experiments. They come into the scene when all experimental and
inferential processes are done, and there is an AI artifact that needs
to be deployed and monitored. Such tools like Seldon, H2O, Datarobot do
this job, again, with some features of Hyperparameter tuning, pipeline
frameworks, data and pipeline tracking.&lt;/p>
&lt;p>In light of this, when considering management and operation of
experiments, we have a reduced sample of alternatives. Among them,
Notebook integration/management are rare. Some of them rely on other
tools like Git or enforces an overhead in the coding/setup with reserved
keywords, tags and managerial workflows that hinder the process.&lt;/p>
&lt;p>At first sight, our &amp;quot;informal&amp;quot; taxonomy positions noWorkflow as a
Data/Pipeline Versioning and Metadata Store/Experiment Tracker. It is
not a Pipeline Framework which works like a building block, facilitating
the integration of artifacts at production stages. It is not a
Deployment and Observability framework, because they are in the
post-deployment realm, which is another stage far removed from
prototypical phases of experiments.&lt;/p>
&lt;h2 id="desiderata">Desiderata&lt;/h2>
&lt;p>As mentioned earlier, a typical workflow in DS/ML projects is well
described by the CRISP-DM [5]
and precede phases of deployment and production in the whole lifecycle
of DS/ML projects.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./images/media/image1.png" alt="" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>Fig 1: CRISP-DM example of trajectory through a data science project&lt;/p>
&lt;p>Briefly speaking, a workflow starts when a user creates a Jupyter
Notebook and starts writing code. Usually, he/she imports or selects
data from a source, explore features which are expected to have the
highest inference potential, tunes some parameters to set up its
training, trains and evaluates the predictive power of the model through
different metrics. At this final step, we have delineated a trial. This
trial result can suggest further improvements and new hypotheses about
data, features, model types and hyperparameters. Then, we have a new
experiment in mind that will result in a new trial.&lt;/p>
&lt;p>When this process repeats multiple times, a researcher may end with
different notebooks storing, each one, a different experiment. Each
notebook has multiple hyperparameters, modeling choices and modeling
hypotheses. Otherwise, the experimenter may have a unique notebook where
different experiments were executed, in a nonlinear order between the
cells. This former case is pointed out in
[8], where Notebook flexibility
makes it difficult to understand which execution order resulted in a
specific output.&lt;/p>
&lt;p>In a dream space, any researcher/team would have benefited at most if
they could&lt;/p>
&lt;p>a) in a running Notebook, being able to retrieve all the operations
that contributed to the result of a variable of interest. In this
case, modifications applied in the inputs or in the order of
operations would be easily detectable. In the same way, any
nonlinear execution that interferes in a control result.&lt;/p>
&lt;p>b) Compare trials after different experiments. After experimenting with
different hypotheses about hyperparameters, features or operation
order, the user should easily compare the history of two trials and
spot differences.&lt;/p>
&lt;p>c) Retrieve a target variable among different trials that were executed
in the context of an experiment. After proceeding with multiple
experimental trials, users should be able to compare the results
that are stored in different Notebooks (or even not).&lt;/p>
&lt;p>d) Be as much &amp;quot;no workflow&amp;quot; as possible. All the former requisites
should be possible with minimal code intervention, tags, reserved
words or any active coding effort.&lt;/p>
&lt;p>With these goals in mind, we worked on our deliverables and used the
experiment carried out by [12]
as a guideline to validate the new noWorkflow features.&lt;/p>
&lt;h2 id="deliverables">Deliverables&lt;/h2>
&lt;p>In this session, we will describe what we have implemented during this
summer.&lt;/p>
&lt;p>We started on tagging cells and variables and then navigating through
its pre-dependencies, or all other variables and function calls that
contributed to its final value. This was a fundamental step that allowed
us to evolve to create features that are really useful in day-to-day
practice.&lt;/p>
&lt;p>From the features of tagging a cell and tagging a variable, we evolved
to the following features (an interactive notebook is available here):&lt;/p>
&lt;ul>
&lt;li>&lt;em>backwards_deps('var_name', glanularity_level)&lt;/em> : returns a
dictionary storing operations/functions calls and their associated
values that contributed to the final value of the tagged variable.
Glanularity_level allows to set if the internal operations of the
functions must be included or not.&lt;/li>
&lt;/ul>
&lt;blockquote>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./images/media/image5.png" alt="backwards_deps" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;/blockquote>
&lt;ul>
&lt;li>
&lt;p>&lt;em>global_backwards_deps&lt;/em>('var_name', glanularity_level) : does the
same as backwards_deps, but from all different tagging and
re-tagging events in the notebook. It allows to retrieval of the
complete operation of a tagged variable across all executed cells in
the notebook&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;em>store_operations(trial_id, dictionary_ops)&lt;/em> : save the current
trial in order to make further comparisons with other experiments.
The dictionaries aren't stored in the &lt;em>.noworkflow/db.sqlite&lt;/em>, but
in a shelve object named *ops.db* in the current notebook local
folder.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;em>resume_trials()&lt;/em> : to support the management of experiments, the
user can see the trial_ids of all experiments stored in the ops.db
available for comparison/analysis.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;em>trial_intersection_diff(trial_id1, trial_id2)&lt;/em> : all mutual
variables/funcion_calls between two experiments have its scalar
values compared&lt;/p>
&lt;/li>
&lt;/ul>
&lt;blockquote>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./images/media/image2.png" alt="trial_intersection_diff" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;/blockquote>
&lt;ul>
&lt;li>&lt;em>trial_diff(trial_id1, trial_id2)&lt;/em> : The values of variables and
function calls are exhibited in a diff file format, emphasizing the
operations' order. The goal here is to show that between the two
experiments, the order of operations was different. Again, only
scalar values are exhibited. More complex data structures (matrices,
vectors, tensors, etc.) are only signaled as &lt;em>'complex_type'&lt;/em>&lt;/li>
&lt;/ul>
&lt;blockquote>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./images/media/image3.png" alt="" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;/blockquote>
&lt;ul>
&lt;li>&lt;em>var_tag_plot('var_name')&lt;/em> : Chart the evolution of a given
variable across multiple trials in the database. In this case, all
experiments stored in ops.db and tagged as *target_var* have their
values plotted&lt;/li>
&lt;/ul>
&lt;blockquote>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./images/media/image4.png" alt="" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;/blockquote>
&lt;ul>
&lt;li>&lt;em>var_tag_values('var_name') :&lt;/em> Provides access to pandas.dataframe
var_name entries with correspondent values across different trials.&lt;/li>
&lt;/ul>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./images/media/image6.png" alt="" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="challenges">Challenges&lt;/h2>
&lt;p>As expected, we had unexpected findings along the project. Bellow, we
delve into the most significant challenges we had to face:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Jupyter notebooks allow a nonlinear execution of small parts of code
through cells. More than once, we had to align about how to create
functionalities to attend different scenarios that were unexpected.
One example was the backwards_deps() and global_backwards_deps()
functions. The latter function was born to cover the case where the
user wants all dependencies rather than the local cell dependencies.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Despite the high quality of the current version of the package, the
project needs documentation, which slows down the analysis of any
new development. In this project, the aid of mentors was crucial at
some points where a deeper knowledge was needed.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>What is the vocation of noWorkflow? At some points in the project,
we had to discuss forcing some kind of workflow over the user. And
it would go against the philosophy of the project.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>When working on comparing results, especially in DS/ML fields,
complex types arise. Numerical vectors, matrices, and tensors from
NumPy and other frameworks, as well as data frames, can't be
properly manipulated based on our current approach.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The dilemma of focusing on graphic visual features versus more
sophisticated APIs. More than once, we needed to choose between
making a visual add-on to Jupyter or implementing a more complete
API.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The current version of Jupyter support in noWorkflow doesn&amp;rsquo;t
integrate well with Jupyter Lab. Also, even the IPython version has
new versions, and noWorkflow needs to adapt to a new version.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="future-improvements">Future Improvements&lt;/h2>
&lt;p>Given our current achievements and the insights gained along the
project, we would highlight the following points as crucial future
roadmap improvements:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Add a complex type treatment for comparisons. Today, visualizing and
navigating through matrices, data frames, tensors, isn't possible
with noWorkflow, although the user can do by its own means.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Integrate the dictionaries storing sequences of operations from
shelve objects to a more efficient way of storage and retrieval.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Make it easier for users to manage (store, retrieve, and navigate)
through different trials.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Add graphical management instead of relying upon API calls only.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Evolve the feature of tagging cells.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>When tagging a model, save its binary representation to be recovered
in the future.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Adding the capability of tracking the local dataset reading.
Currently, it is possible to track changes in the name/path of the
dataset. However, any modification in the integrity of a dataset is
not traceable.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="what-ive-learned">What I've learned&lt;/h2>
&lt;p>This was a great summer with two personal discoveries. The first one was
my first formal contact with the Reproducibility subject. The second was
to fully contribute with an Open Source project. In the research phase,
I could get in touch with the state-of-the-art of reproducibility
research and some of it nuances. In the Open Source contributing
experience, I could be mentored by the core team of the noWorkflow and
exercise all the skills required in doing high level software product.&lt;/p>
&lt;h2 id="acknowledgments">Acknowledgments&lt;/h2>
&lt;p>I would like to thank the organization of Summer of Reproducibility for
aiding this wonderful opportunity for interested people to engage with
Open Source software. Also, thanks to the core team of noWorkflow for
supporting me in doing this work.&lt;/p>
&lt;h2 id="bibliography">Bibliography&lt;/h2>
&lt;p>[1] [O. E. Gundersen, K. Coakley, C. Kirkpatrick, and Y. Gil, &amp;ldquo;Sources
of irreproducibility in machine learning: A review,&amp;rdquo; &lt;em>arXiv preprint
arXiv:2204. 07610&lt;/em>.]&lt;/p>
&lt;p>[2] [D. Sculley &lt;em>et al.&lt;/em>, &amp;ldquo;Machine Learning: The High Interest Credit
Card of Technical Debt,&amp;rdquo; in &lt;em>SE4ML: Software Engineering for Machine
Learning (NIPS 2014 Workshop)&lt;/em>,
2014.]&lt;/p>
&lt;p>[3] [P. Sugimura and F. Hartl, &amp;ldquo;Building a reproducible machine
learning pipeline,&amp;rdquo; &lt;em>arXiv preprint arXiv:1810. 04570&lt;/em>,
2018.]&lt;/p>
&lt;p>[4] [D. Sculley &lt;em>et al.&lt;/em>, &amp;ldquo;Hidden technical debt in machine learning
systems,&amp;rdquo; &lt;em>Adv. Neural Inf. Process. Syst.&lt;/em>, vol. 28,
2015.]&lt;/p>
&lt;p>[5] [F. Martínez-Plumed &lt;em>et al.&lt;/em>, &amp;ldquo;CRISP-DM twenty years later: From
data mining processes to data science trajectories,&amp;rdquo; &lt;em>IEEE Trans. Knowl.
Data Eng.&lt;/em>, vol. 33, no. 8, pp. 3048&amp;ndash;3061,
2019.]&lt;/p>
&lt;p>[6] [N. A. Lynnerup, L. Nolling, R. Hasle, and J. Hallam, &amp;ldquo;A Survey on
Reproducibility by Evaluating Deep Reinforcement Learning Algorithms on
Real-World Robots,&amp;rdquo; in &lt;em>Proceedings of the Conference on Robot
Learning&lt;/em>, L. P. Kaelbling, D. Kragic, and K. Sugiura, Eds., in
Proceedings of Machine Learning Research, vol. 100. PMLR, 30 Oct--01
Nov 2020, pp. 466&amp;ndash;489.]&lt;/p>
&lt;p>[7] [A. Masood, A. Hashmi, A. Masood, and A. Hashmi, &amp;ldquo;AIOps:
predictive analytics &amp;amp; machine learning in operations,&amp;rdquo; &lt;em>Cognitive
Computing Recipes: Artificial Intelligence Solutions Using Microsoft
Cognitive Services and TensorFlow&lt;/em>, pp. 359&amp;ndash;382,
2019.]&lt;/p>
&lt;p>[8] [J. F. Pimentel, L. Murta, V. Braganholo, and J. Freire,
&amp;ldquo;Understanding and improving the quality and reproducibility of Jupyter
notebooks,&amp;rdquo; &lt;em>Empirical Software Engineering&lt;/em>, vol. 26, no. 4, p. 65,
2021.]&lt;/p>
&lt;p>[9] [D. Kreuzberger, N. Kühl, and S. Hirschl, &amp;ldquo;Machine Learning
Operations (MLOps): Overview, Definition, and Architecture,&amp;rdquo; &lt;em>IEEE
Access&lt;/em>, vol. 11, pp. 31866&amp;ndash;31879,
2023.]&lt;/p>
&lt;p>[10] [N. Hewage and D. Meedeniya, &amp;ldquo;Machine learning operations: A
survey on MLOps tool support,&amp;rdquo; &lt;em>arXiv preprint arXiv:2202. 10169&lt;/em>,
2022.]&lt;/p>
&lt;p>[11] [H. E. Plesser, &amp;ldquo;Reproducibility vs. replicability: a brief
history of a confused terminology,&amp;rdquo; &lt;em>Front. Neuroinform.&lt;/em>, vol. 11, p.
76, 2018.]&lt;/p>
&lt;p>[12] [Z. Salekshahrezaee, J. L. Leevy, and T. M. Khoshgoftaar, &amp;ldquo;The
effect of feature extraction and data sampling on credit card fraud
detection,&amp;rdquo; &lt;em>Journal of Big Data&lt;/em>, vol. 10, no. 1, pp. 1&amp;ndash;17,
2023.]&lt;/p></description></item><item><title>Reproducible Evaluation of Multi-level Erasure Coding (Midterm)</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ornl/multilevelerasure/20230801-zhiyanw/</link><pubDate>Sat, 05 Aug 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ornl/multilevelerasure/20230801-zhiyanw/</guid><description>&lt;p>Hi Everyone,&lt;/p>
&lt;p>I hope everything goes well! This is my second blog post for my project &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ornl/MultiLevelErasure">Reproducible Evaluation of Multi-level Erasure Coding&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/john-bent/">John Bent&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/anjus-george/">Anjus George&lt;/a>, and Meng Wang. In summary, my project aims to build a platform to reproducibly evaluate the performance and durability of MLEC (Multi-Level Erasure Coding) for large-scale storage systems under different design configurations. The details are in this &lt;a href="https://docs.google.com/document/d/1dO1aING1QcSB---XklzUjNz0usVh7qWffVGC3GZq2AE/edit?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a>.&lt;/p>
&lt;p>In the course of these few weeks, I&amp;rsquo;ve completed several tasks to achieve the aim of this project, including&lt;/p>
&lt;ul>
&lt;li>Literature Review&lt;/li>
&lt;li>Studying the Erasure Coding Simulator and Creating Reproducible Evaluations, with the following policies
&lt;ul>
&lt;li>Clustered/Declustered Local-level SLEC&lt;/li>
&lt;li>Clustered/Declustered Network-level SLEC&lt;/li>
&lt;li>MLEC with C/C, C/D, D/C, D/D configuration&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="literature-review">Literature Review&lt;/h2>
&lt;p>Prior to developing the simulator, my first step was to delve into various literature related to distinct Erasure Coding policies. To understand a simulator for complex Erasure coding policy such as MLEC, I want to start from the simpler EC policies, and then extend my knowledge to more complex ones such as MLEC. Moreover, I also aimed to contrast the durability of MLEC with other comparable EC policies like LRC in my evaluations, making it vital to understand the implementation of these policies.&lt;/p>
&lt;p>Over the first week, I read several papers regarding different chunk placement policies regarding erasure coding, including LRC (Local Reconstruction Codes), CL-LRC (Combined Locality for Local Reconstruction Codes), SODP (Single Overlap declustered parity), and MLEC (Multi-Level Erasure Coding). These papers offered a fundamental comprehension of each policy, their respective advantages and drawbacks, and their practical usage in production environments.&lt;/p>
&lt;h2 id="simulator-reproduction">Simulator Reproduction&lt;/h2>
&lt;p>After gaining some understanding with the papers I read, I started to study the EC simulator by building the simulator myself. I got the MLEC simulator from the mentors. However, the simulator lacks documentation and guides, making it hard for others to reproduce evaluation results. The simulator is also complicated to understand, as it simulates various EC schemes, chunk placements, and rebuild policies, which results in 13,000 LOC. Therefore, my goal is to understand the design and implementation details of the simulator, after which I will create guides for reproducible evaluations.&lt;/p>
&lt;p>In order to fully understand the simulator, the best way is to rebuild the simulator by myself. The simulator is designed to mimic disk failures over the span of a year under varying chunk placement policies. Once successfully rebuilt, the simulator will enable me to assess the durability of MLEC in relation to other widely-used chunk placement policies. I followed the given simulator and rewrote it on my own in Python.&lt;/p>
&lt;p>Based on the skeleton of the given simulator, I first rebuilt a simple simulator that simulates SLEC (single level erasure coding, in both local and network settings) with clustered parities. With the arguments given, the simulator can run arbitrary numbers of iterations that simulate disk failures in one year. The simulator then collects iterations in which there is a data loss. The ratio of failed iterations to total executed iterations is the durability of the erasure coding policy. This simulation allows us to evaluate the durability of SLEC, laying foundations for later evaluation of MLEC.&lt;/p>
&lt;p>Next, I extended my simulator from local-level SLEC implementation by adding more policies. I began by introducing a network-level SLEC policy with clustered parities. This differs slightly from the local-level EC as it necessitates the consideration of factors like network bandwidth within the simulator.&lt;/p>
&lt;p>In addition, I have delved deeper into simulating declustered parities and successfully discovered a method to simulate disk failures. Basically, the simulator generates failures within a one-year timeframe and subsequently repairs them using priority queues. The disks associated with stripes experiencing the most failures are given the highest repair priority. With this construction, the simulator is capable of simulating local-level declustered parities, with the ability to specify parameters.&lt;/p>
&lt;p>Upon successfully simulating local-level declustered parities, the construction of the simulator for network level declustered parities was rather straightforward. I then validated it using the simulator and math models provided by the mentors. The results perfectly agree with each other, which proves the correctness of my understanding for the SLEC declustered placements. By implementing the simulator myself, I strengthened my understanding of erasure coding designs and the simulation techniques, which equipped me with a solid foundation to continue to reproduce MLEC simulations.&lt;/p>
&lt;p>Based on my knowledge gained from implementing SLEC simulators myself, I then reverse-engineered the MLEC simulator provided by the mentors from their MLEC paper. I choose to start from the simplest policy, which is clustered parities in both levels. After spending a considerable time digging into the simulator source codes, I was able to understand the simulation workflows, different repair methods that it implements, and the splitting method that it uses to simulate high durabilities. I then revised my simulator based on my understanding. I also tried to run a few experiments using the same configuration setups as specified in the paper. The results agree well with those in the paper, which verified the success of my reproducing work.&lt;/p>
&lt;h2 id="technical-issues">Technical Issues&lt;/h2>
&lt;p>In the process of building the MLEC, I&amp;rsquo;ve encountered many issues, conceptual or technical. The mentors are super helpful and responsive in the process, so I was able to have steady progress.&lt;/p>
&lt;h2 id="summary">Summary&lt;/h2>
&lt;p>Overall, I&amp;rsquo;ve rebuilt a python simulator for various EC policies, and the simulator can successfully reproduce the results from paper.&lt;/p>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;p>My next step would be to package the simulator into reprodTrovi artifact, so others can reproduce evaluations on performance and durability of various EC policies, in particular MLEC&lt;/p></description></item><item><title>Mid Term Blog : Using Reproducibility in Machine Learning Education: Reproducibility with Incomplete Methodology Descriptions</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/eduml/20230804-indianspeedster/</link><pubDate>Fri, 04 Aug 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/eduml/20230804-indianspeedster/</guid><description>&lt;p>Hey,&lt;/p>
&lt;p>I am Shekhar and I am one of several students who are working on developing materials for reproducibility in machine learning education, under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fraida-fund/">Fraida Fund&lt;/a>. My &lt;a href="https://drive.google.com/file/d/1rCzLGIJ8HYCVjY_MfndgrQjAQa2SQbqZ/view?usp=sharing" target="_blank" rel="noopener">Proposal&lt;/a> aims to develop interactive educational materials about reproducibility in machine learning, for use in graduate and undergraduate classes. Our goal is to help students and researchers (1) understand some of the challenges they may face when trying to reproduce someone else&amp;rsquo;s published result, and (2) in their own publications, to specify the methodology so that the result will be more easily reproduced by others.&lt;/p>
&lt;h2 id="motivation">Motivation&lt;/h2>
&lt;p>My work is inspired by my participation in the &lt;a href="https://paperswithcode.com/rc2022" target="_blank" rel="noopener">2022 Machine Learning Reproducibility Challenge&lt;/a>, where I was reproducing a result related to bias in hate speech classifiers. The paper seemed at first to have complete methodology details. However, when I tried to implement their approach based on the description of the paper, I realized some important details were missing - for example, in the part where they replaced swear words in the text with other words having similar meaning. I wasn&amp;rsquo;t able to identify the exact list of swear words they used, or what approach they followed if the selected replacement was also a swear word. The choices I made when the authors&amp;rsquo; approach was left ambiguous had a significant impact on the magnitude of the final result.&lt;/p>
&lt;h2 id="milestones-and-accomplishments">Milestones and Accomplishments&lt;/h2>
&lt;p>To inform researchers and students about this problem, I created a fictitious machine learning research paper, and a sequence of accompanying Python notebooks to highlight various choices that can be made to fill in the gaps, and explore how these choices can impact the overall results of the research. Our &amp;ldquo;research paper&amp;rdquo; is about the impact of data augmentation on few-shot learning for intent classification. We implemented a basic data augmentation strategy with synonym replacement using the HWU64 dataset and a BERT classifier, and the results suggest that synonym replacement as a data augmentation technique leads to only minor improvement in accuracy.
In the fictitious paper, we left some of the methodology details ambiguous. When reproducing the results using the accompanying notebooks, the reader follows a &amp;ldquo;Choose Your Own Adventure&amp;rdquo; format, selecting a path through a tree, where each node represents ambiguous methodology details and branches out to different choices that are made at that instance. The leaf nodes will represent the final results, providing insights into the magnitude of the differences resulting from each node selection. Some of the choices that the reader makes are -&lt;/p>
&lt;ul>
&lt;li>what subset of the source dataset to use.&lt;/li>
&lt;li>some of the details of data pre-processing.&lt;/li>
&lt;li>some of the details of the synonym replacement data augmentation strategy.&lt;/li>
&lt;li>some training hyperparameters and the details of the hyperparameter search.&lt;/li>
&lt;/ul>
&lt;p>During the first phase of our project, we have implemented an initial draft of these notebooks, to explore various scenarios and see their impact on results. Next, we will further develop the interactive educational material around them.&lt;/p>
&lt;h2 id="challenges">Challenges&lt;/h2>
&lt;p>During the first half of the project, I faced two main challenges. First, I had to come up with a hypothetical research scenario that was realistic, yet easy for students without much expertise to understand. Attaining the right balance was essential to make it engaging and educational. The second challenge was to deliberately leave some details unclear in a realistic way while ensuring that the choices based on that ambiguity had a significant impact on the results. Fortunately, I had the guidance and support of my mentor, which allowed me to successfully tackle these challenges.&lt;/p>
&lt;p>Throughout this project, I faced various challenges and obstacles, but it turned out to be an incredible learning experience. I had the opportunity to dive deep into the domains of few-shot learning and meta-learning, which were entirely new to me. Moreover, I was able to find ambiguous methodologies present in academic papers and explore diverse scenarios related to them. Looking ahead, I am eager to continue working on this project throughout the summer, as it promises further learning and personal growth.&lt;/p></description></item><item><title>Reproducible Analysis &amp; Models for Predicting Genomics Workflow Execution Time (Midterm Blog Post)</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uga/genomicswfmodels/20230803-charishulu/</link><pubDate>Thu, 03 Aug 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uga/genomicswfmodels/20230803-charishulu/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/uga/genomicswfmodels/">Reproducible Analysis &amp;amp; Models for Predicting Genomics Workflow Execution Time&lt;/a>, our goal was to characterize the tools on genomic workflows in terms of system metrics and data quality to build machine learning models to predict the elapsed time of genomic workflows. While Shayantan (another contributor) did the analysis on data quality metrics, I contributed to the system metrics analysis. We are getting closer to that goal because we have managed to collect datasets and do some analysis.&lt;/p>
&lt;h2 id="steps">Steps&lt;/h2>
&lt;p>In this project, we selected DNA-Seq Pipeline as the workflow to be analyzed. This pipeline consists of four tools for processing single-end reads, namely BWA-mem, Samtool-view, Picard-SortSam, Picard-MarkDuplicates. For each tool we executed it using various configurations and stored system metrics for each execution. To do this, we have to take two steps:&lt;/p>
&lt;ul>
&lt;li>Step 1: Building the tools execution environment.&lt;/li>
&lt;li>Step 2: Developing a program to execute tools using some configurations and collect runtime parameters (eg. CPU, RSS, VSZ, and IO) automatically.&lt;/li>
&lt;/ul>
&lt;h2 id="execution-environment">Execution Environment&lt;/h2>
&lt;p>Tools are executed on Chameleon instances by submitting them using Slurm. The machine used in collecting system metrics is a Haswell instance of the Chameleon Texas server. This instance uses Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz with following detailed specifications.&lt;/p>
&lt;table>
&lt;tr>
&lt;th>Number of CPUs&lt;/th>
&lt;td>48&lt;/td>
&lt;/tr>
&lt;tr>
&lt;th>Number of threads per core&lt;/th>
&lt;td>2&lt;/td>
&lt;/tr>
&lt;tr>
&lt;th>Number of cores per socket&lt;/th>
&lt;td>12&lt;/td>
&lt;/tr>
&lt;tr>
&lt;th>Number of sockets&lt;/th>
&lt;td>2&lt;/td>
&lt;/tr>
&lt;/table>
&lt;p>In this experiment, we use n+1 instances, where there are n compute nodes and 1 master node. Each execution is done by submitting a job, which is a tool with a certain configuration, from a master node and it will be processed by one of the compute nodes. In order for the tool to be executed, we need to set the master node to be a common container using NFS. This common container is used to store input files and commands for executing tools so that all nodes can access them without having to download and install them.&lt;/p>
&lt;h2 id="executing-and-collecting-system-metrics">Executing and Collecting System Metrics&lt;/h2>
&lt;p>Tools will be executed in various specific configurations by varying parameters such as input size, number of CPU allocation, memory allocation and threads. For example, for BWA-mem respectively the number of variations in values for the number of CPU allocations, memory allocations, and threads is 5, 4, and 5 using 10 different files so that there are 5 x 4 x 5 x 10 = 1000 configuration combinations. For each configuration will be executed 8 times so that there are 8000 data points. Configuration details can be seen in the following table.&lt;/p>
&lt;table>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>#repetions&lt;/th>
&lt;th>#files&lt;/th>
&lt;th>#allocated CPU&lt;/th>
&lt;th>#allocated memory&lt;/th>
&lt;th>#threads&lt;/th>
&lt;th>total&lt;/th>
&lt;/tr>
&lt;tr>
&lt;th>BWA-mem&lt;/th>
&lt;td>8&lt;/td>
&lt;td>10&lt;/td>
&lt;td>2, 4, 8, 16, 32&lt;/td>
&lt;td>8, 16, 32, 64&lt;/td>
&lt;td>2, 4, 8, 16, 32&lt;/td>
&lt;td>8000&lt;/td>
&lt;/tr>
&lt;tr>
&lt;th>Samtool-view&lt;/th>
&lt;td>10&lt;/td>
&lt;td>10&lt;/td>
&lt;td>2, 4, 8, 16, 32&lt;/td>
&lt;td>8, 16, 32, 64&lt;/td>
&lt;td>-&lt;/td>
&lt;td>2000&lt;/td>
&lt;/tr>
&lt;tr>
&lt;th>Picard-Sortsam&lt;/th>
&lt;td>10&lt;/td>
&lt;td>10&lt;/td>
&lt;td>2, 4, 8, 16, 32&lt;/td>
&lt;td>8, 16, 32, 64&lt;/td>
&lt;td>-&lt;/td>
&lt;td>2000&lt;/td>
&lt;/tr>
&lt;tr>
&lt;th>Picard-MarkDuplicates&lt;/th>
&lt;td>10&lt;/td>
&lt;td>10&lt;/td>
&lt;td>2, 4, 8, 16, 32&lt;/td>
&lt;td>8, 16, 32, 64&lt;/td>
&lt;td>-&lt;/td>
&lt;td>2000&lt;/td>
&lt;/tr>
&lt;/table>
&lt;p>Meanwhile, to run the tools, we use the following commands:&lt;/p>
&lt;ul>
&lt;li>BWA-mem&lt;/li>
&lt;/ul>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-shell" data-lang="shell">&lt;span class="line">&lt;span class="cl">&lt;span class="nv">$BWA&lt;/span> mem -t &lt;span class="nv">$threads&lt;/span> &lt;span class="nv">$REF_DIR&lt;/span>/hg19.fa &lt;span class="si">${&lt;/span>&lt;span class="nv">INPUT_DIR&lt;/span>&lt;span class="si">}&lt;/span>/&lt;span class="si">${&lt;/span>&lt;span class="nv">sra_id&lt;/span>&lt;span class="si">}&lt;/span>*.fastq &amp;gt; &lt;span class="si">${&lt;/span>&lt;span class="nv">OUTPUT_DIR&lt;/span>&lt;span class="si">}&lt;/span>/&lt;span class="si">${&lt;/span>&lt;span class="nv">sra_id&lt;/span>&lt;span class="si">}&lt;/span>.sam
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;ul>
&lt;li>Samtool-view&lt;/li>
&lt;/ul>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-shell" data-lang="shell">&lt;span class="line">&lt;span class="cl">&lt;span class="nv">$SAMTOOLS&lt;/span> view &lt;span class="nv">$INPUT_DIR&lt;/span>/&lt;span class="si">${&lt;/span>&lt;span class="nv">sra_id&lt;/span>&lt;span class="si">}&lt;/span>.sam -Shb -o &lt;span class="nv">$OUTPUT_DIR&lt;/span>/&lt;span class="si">${&lt;/span>&lt;span class="nv">sra_id&lt;/span>&lt;span class="si">}&lt;/span>.bam
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;ul>
&lt;li>Picard-SortSam&lt;/li>
&lt;/ul>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-shell" data-lang="shell">&lt;span class="line">&lt;span class="cl">java -jar &lt;span class="nv">$PICARD&lt;/span> SortSam &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span>&lt;span class="nv">CREATE_INDEX&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="nb">true&lt;/span> &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span>&lt;span class="nv">INPUT&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="nv">$INPUT_DIR&lt;/span>/&lt;span class="si">${&lt;/span>&lt;span class="nv">sra_id&lt;/span>&lt;span class="si">}&lt;/span>.bam &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span>&lt;span class="nv">OUTPUT&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="nv">$OUTPUT_DIR&lt;/span>/&lt;span class="si">${&lt;/span>&lt;span class="nv">sra_id&lt;/span>&lt;span class="si">}&lt;/span>.bam &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span>&lt;span class="nv">SORT_ORDER&lt;/span>&lt;span class="o">=&lt;/span>coordinate &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span>&lt;span class="nv">VALIDATION_STRINGENCY&lt;/span>&lt;span class="o">=&lt;/span>STRICT
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;ul>
&lt;li>Picard-MarkDuplicates&lt;/li>
&lt;/ul>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-shell" data-lang="shell">&lt;span class="line">&lt;span class="cl">java -jar &lt;span class="nv">$PICARD&lt;/span> MarkDuplicates &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span>&lt;span class="nv">CREATE_INDEX&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="nb">true&lt;/span> &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span>&lt;span class="nv">INPUT&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="nv">$INPUT_DIR&lt;/span>/&lt;span class="si">${&lt;/span>&lt;span class="nv">sra_id&lt;/span>&lt;span class="si">}&lt;/span>.bam &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span>&lt;span class="nv">OUTPUT&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="nv">$OUTPUT_DIR&lt;/span>/&lt;span class="si">${&lt;/span>&lt;span class="nv">sra_id&lt;/span>&lt;span class="si">}&lt;/span>.bam &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span>&lt;span class="nv">METRICS_FILE&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="nv">$OUTPUT_DIR&lt;/span>/&lt;span class="si">${&lt;/span>&lt;span class="nv">sra_id&lt;/span>&lt;span class="si">}&lt;/span>_rmd.txt &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span>&lt;span class="nv">VALIDATION_STRINGENCY&lt;/span>&lt;span class="o">=&lt;/span>STRICT
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>In Slurm, each job has a job id. In addition, there is a &lt;code>scontrol listpids&lt;/code> command to see the job id to PID mapping. Using this, we can obtain system metrics for a job by gathering information from the &lt;code>/proc/$PID&lt;/code> system file. Information that can be obtained from it is the use of CPU, physical memory, virtual memory, read bytes, and write bytes at a particular time. So that in collecting this data, we will record these features along with the timestamp at 1 second intervals throughout the execution process.&lt;/p>
&lt;h2 id="results">Results&lt;/h2>
&lt;p>We also have calculated the correlation for each feature with the elapsed time. For BWA-mem, the features that correlate more than absolute of 0.5 are Input size, Average CPU Usage, and Output file size , which is in SAM format. For samtools there are input size, average cpu usage and output size in BAM.
For Sortsam, there are input size, write operation, and BAM output size. For MarkDuplicates, there are input size and BAM output size.&lt;/p>
&lt;table>
&lt;tr>
&lt;th>Features\Tools&lt;/th>
&lt;th>BWA-mem&lt;/th>
&lt;th>Samtool-view&lt;/th>
&lt;th>Picard-SortSam&lt;/th>
&lt;th>Picard-MarkDuplicates&lt;/th>
&lt;/tr>
&lt;tr>
&lt;th>Allocated CPU&lt;/th>
&lt;td>-0.145&lt;/td>
&lt;td>-0.095&lt;/td>
&lt;td>-0.179&lt;/td>
&lt;td>-0.156&lt;/td>
&lt;/tr>
&lt;tr>
&lt;th>Allocated physical memory&lt;/th>
&lt;td>-0.010&lt;/td>
&lt;td>-0.038&lt;/td>
&lt;td>-0.069&lt;/td>
&lt;td>0.132&lt;/td>
&lt;/tr>
&lt;tr>
&lt;th>Input size&lt;/th>
&lt;td>&lt;b>0.583&lt;/b>&lt;/td>
&lt;td>&lt;b>0.651&lt;/b>&lt;/td>
&lt;td>&lt;b>0.937&lt;/b>&lt;/td>
&lt;td>&lt;b>0.922&lt;/b>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;th>Threads&lt;/th>
&lt;td>-0.072&lt;/td>
&lt;td>-&lt;/td>
&lt;td>-&lt;/td>
&lt;td>-&lt;/td>
&lt;/tr>
&lt;tr>
&lt;th>Average CPU&lt;/th>
&lt;td>&lt;b>-0.607&lt;/b>&lt;/td>
&lt;td>&lt;b>-0.567&lt;/b>&lt;/td>
&lt;td>-0.479&lt;/td>
&lt;td>-0.480&lt;/td>
&lt;/tr>
&lt;tr>
&lt;th>Peak CPU&lt;/th>
&lt;td>-0.175&lt;/td>
&lt;td>0.174&lt;/td>
&lt;td>-0.170&lt;/td>
&lt;td>0.046&lt;/td>
&lt;/tr>
&lt;tr>
&lt;th>Average RSS&lt;/th>
&lt;td>0.040&lt;/td>
&lt;td>0.034&lt;/td>
&lt;td>0.131&lt;/td>
&lt;td>0.182&lt;/td>
&lt;/tr>
&lt;tr>
&lt;th>Peak RSS&lt;/th>
&lt;td>0.068&lt;/td>
&lt;td>0.046&lt;/td>
&lt;td>0.314&lt;/td>
&lt;td>0.175&lt;/td>
&lt;/tr>
&lt;tr>
&lt;th>Average VSZ&lt;/th>
&lt;td>0.032&lt;/td>
&lt;td>-0.349&lt;/td>
&lt;td>-0.127&lt;/td>
&lt;td>0.090&lt;/td>
&lt;/tr>
&lt;tr>
&lt;th>Peak VSZ&lt;/th>
&lt;td>0.048&lt;/td>
&lt;td>0.074&lt;/td>
&lt;td>-0.130&lt;/td>
&lt;td>0.088&lt;/td>
&lt;/tr>
&lt;tr>
&lt;th>Write bytes&lt;/th>
&lt;td>0.037&lt;/td>
&lt;td>0.190&lt;/td>
&lt;td>&lt;b>0.735&lt;/b>&lt;/td>
&lt;td>0.244&lt;/td>
&lt;/tr>
&lt;tr>
&lt;th>Read bytes&lt;/th>
&lt;td>-0.031&lt;/td>
&lt;td>0.109&lt;/td>
&lt;td>0.070&lt;/td>
&lt;td>0.110&lt;/td>
&lt;/tr>
&lt;tr>
&lt;th>Output SAM size&lt;/th>
&lt;td>&lt;b>0.589&lt;/b>&lt;/td>
&lt;td>-&lt;/td>
&lt;td>-&lt;/td>
&lt;td>-&lt;/td>
&lt;/tr>
&lt;tr>
&lt;th>Output BAM size&lt;/th>
&lt;td>-&lt;/td>
&lt;td>&lt;b>0.763&lt;/b>&lt;/td>
&lt;td>&lt;b>0.934&lt;/b>&lt;/td>
&lt;td>&lt;b>0.923&lt;/b>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;th>Output BAI size&lt;/th>
&lt;td>-&lt;/td>
&lt;td>-&lt;/td>
&lt;td>0.400&lt;/td>
&lt;td>0.399&lt;/td>
&lt;/tr>
&lt;/table>
&lt;h2 id="future-works">Future Works&lt;/h2>
&lt;p>For further work, we will analyze the correlation between elapsed time and features whose scores are below an absolute 0.5. Because there is a possibility that these features are actually correlated with the elapsed time but do not appear to be correlated because the measurements are made by calculating the overall data. So we also need to calculate the feature correlation for each data grouped by input file. Then, we create a machine learning model to predict elapsed time.&lt;/p></description></item><item><title>[FLASHNET]: Leveraging ML-augmented I/O in Linux</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uchicago/flashnet/20230802-justin08784/</link><pubDate>Wed, 02 Aug 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uchicago/flashnet/20230802-justin08784/</guid><description>&lt;p>Hello everyone,&lt;/p>
&lt;p>This is my second blog post for SoR 2023. As you may recall from my &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uchicago/flashnet/20230530-justin08784/">initial blogpost&lt;/a>, I am working on the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/uchicago/flashnet/">Flashnet&lt;/a> project under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/haryadi-s.-gunawi/">Haryadi S. Gunawi&lt;/a>.&lt;/p>
&lt;p>I&amp;rsquo;ve been assigned two major tasks under Flashnet:&lt;/p>
&lt;ol>
&lt;li>Perform post-training quantization (PTQ) on existing Flashnet models&lt;/li>
&lt;li>Implement a rocksDB client (to interface with the Flashnet kernel) with 3-way replication&lt;/li>
&lt;/ol>
&lt;h2 id="task-1-perform-post-training-quantization-ptq-on-existing-flashnet-models">Task 1: Perform post-training quantization (PTQ) on existing Flashnet models&lt;/h2>
&lt;p>Since all of our models are currently built using the keras API, I decided to use the tensorflow-lite library, which supports direct conversion. Unfortunately, I encountered several persistent bugs while attempting to apply full-integer quantization on our binary neural network model:&lt;/p>
&lt;h3 id="shapedimension-distortion">Shape/dimension distortion:&lt;/h3>
&lt;p>Bug description: The quantized tflite model produces outputs of shape (8, 1) –– same as input shape–– when the original model produces single-value outputs (1, 1).&lt;/p>
&lt;p>Status: Resolved&lt;/p>
&lt;ul>
&lt;li>The original model has an input dimension of 8 for each input/x-value and there could be several inputs grouped in a single batch.&lt;/li>
&lt;li>Input/batch size is also determined implicitly in the normalization layer of the original model&lt;/li>
&lt;li>However, the &amp;ldquo;interpreter&amp;rdquo; in the quantized model runs inference one by one, and so batch size needs to be explicitly set to &amp;ldquo;1&amp;rdquo; i.e. a shape of single input, (1,8)&lt;/li>
&lt;li>Doing so resolves the model distortion&lt;/li>
&lt;/ul>
&lt;h3 id="incorrect-y-value-range">Incorrect y-value range:&lt;/h3>
&lt;p>Bug description: There are no variation in the quantized model outputs (i.e. it spits out the same value for each input row)&lt;/p>
&lt;p>In the original model, each inference output is a floating point value between 0 and 1. Outputs also vary according to input. This output is rounded towards 0 or 1 using a 0.5 standard cutoff (i.e. x &amp;gt; 0.5 → x = 1). Since the quantized model condenses 32-bit floats into 8-bit integers, we should expect a similar variation in output values across an 8-bit integer range.&lt;/p>
&lt;p>Printing the quantized model weights, I discovered that weight burst/exploding gradient may be occur during quantization process i.e. the values of weights are exploding to infinity or vanishing to 0, and therefore unable to deliver any meaningful value. The likely consequence of this is that the inference output always equals the bias matrix (since the Wx term in y = Wx + B gets zeroed out).&lt;/p>
&lt;p>Status: Open&lt;/p>
&lt;ul>
&lt;li>Multiple potential causes were considered, without any success:
&lt;ul>
&lt;li>Improper quantization of inputs/outputs&lt;/li>
&lt;li>Insufficient training time/number of epochs&lt;/li>
&lt;li>Incompatible model type/structure&lt;/li>
&lt;li>Incompatible tensorflow-lite version&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>At this point, I concluded that tensorflow-lite is too bug-ridden to make making any further attempts with the library not worthwhile.&lt;/li>
&lt;/ul>
&lt;h2 id="task-2-implement-a-rocksdb-client-to-interface-with-the-flashnet-kernel-with-3-way-replication">Task 2: Implement a rocksDB client (to interface with the Flashnet kernel) with 3-way replication&lt;/h2>
&lt;p>rocksdb is an embedded database for key-value data. Our Flashnet team is currently implementing a Flashnet client in ceph, and so they have tasked me to explore an implementation in rocksdb as an alternative.&lt;/p>
&lt;p>I&amp;rsquo;ve started on this segment of the project only recently, so my current work is still in its formative stages. As of writing, I&amp;rsquo;ve been primarily concerned with setup of software (on a new chameleon instance), running toy db examples, and educating myself on basic terminology/rocksdb documentation.&lt;/p>
&lt;h2 id="future-work">Future work&lt;/h2>
&lt;p>I expect to continue working on Task 1 (do quantization from ground-up or use a different library) and Task 2 as detailed above. I also hope to implement a transformer-based model to supplement our existing suite of Flashnet models.&lt;/p></description></item><item><title>[Midterm] FlashNet: Towards Reproducible Continual Learning for Storage System</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uchicago/flashnet/20230807-rannnayy/</link><pubDate>Wed, 02 Aug 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uchicago/flashnet/20230807-rannnayy/</guid><description>&lt;h2 id="mid-term-report">Mid-Term Report&lt;/h2>
&lt;p>As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/uchicago/flashnet">FlashNet&lt;/a> my &lt;a href="https://drive.google.com/file/d/1EhJm3kqrpybOkpXiiRMfqVxGeKe9iIsh/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/haryadi-s.-gunawi/">Haryadi S. Gunawi&lt;/a> and &lt;strong>Daniar Kurniawan&lt;/strong> aims to implement and optimize the FlashNet model in real-world storage systems using continual learning techniques. We focus on predicting I/Os latency to decide whether or not the I/O should be failovered to other SSD. The following sections elaborates the work description, major milestones achieved, accomplishments, and challenges during the first half of summer.&lt;/p>
&lt;h2 id="work-description-major-milestones-achieved-and-accomplishments">Work Description, Major Milestones Achieved, and Accomplishments&lt;/h2>
&lt;p>For the first half of the summer, I implemented continual learning pipeline of the model and several drift detection algorithms. After that, I evaluated the effectiveness. Below are the detailed description for each subtask.&lt;/p>
&lt;h3 id="1-continual-learning-pipeline">1. Continual Learning pipeline&lt;/h3>
&lt;p>Firstly, I designed the pipeline. As shown on the graph below, the pipeline contains 4 main modules, namely initial train, retrain, inference, and monitor.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Pipeline Flowchart" srcset="
/report/osre23/uchicago/flashnet/20230807-rannnayy/cl-pipeline_hubf4c27ce042fa200bb9ef46ed6f9b5dd_194399_2067e763ad30087275106bc5b2921a5a.webp 400w,
/report/osre23/uchicago/flashnet/20230807-rannnayy/cl-pipeline_hubf4c27ce042fa200bb9ef46ed6f9b5dd_194399_fcd6d4a25c164fcfc872329662c36fa5.webp 760w,
/report/osre23/uchicago/flashnet/20230807-rannnayy/cl-pipeline_hubf4c27ce042fa200bb9ef46ed6f9b5dd_194399_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uchicago/flashnet/20230807-rannnayy/cl-pipeline_hubf4c27ce042fa200bb9ef46ed6f9b5dd_194399_2067e763ad30087275106bc5b2921a5a.webp"
width="760"
height="249"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>The modules were first developed in Python using linear regression model. Turns out, linear regression model is not good enough that it gave bad accuracy. To overcome this problem, I introduced more models and learning task.&lt;/p>
&lt;p>Hence, in the final implementation, we have random forest and neural networks model for both regression and classification task. Aforementioned models outperforms linear regression. The pipeline is also already optimized.&lt;/p>
&lt;h3 id="2-drift-detection-algorithms">2. Drift detection algorithms&lt;/h3>
&lt;p>Sometimes, the built model&amp;rsquo;s performance may degrade when facing recent I/Os having different characteristics than what it was trained upon. Hence, there should be a retrain process. Retrain should be triggered. The trigger could be as simple as periodically, or using technique called drift detection. While retraining too often might cause big overhead for computation, retraining too seldom might also cause performance degradation. Hence, we should build a good and reliable drift detection algorithm that can sense the presence of concept and covariate drift in recent data.&lt;/p>
&lt;p>In order to build a good algorithm, I used heuristics derivated from the understanding about latency and throughput change over time. However, the result turns out not really good. Thus, I&amp;rsquo;ve been relying on using statistical tests as the drift detector. By far, Kalmogorov-Smirnov Test&amp;ndash;commonly known as ks-test&amp;ndash;is the best drift detector.&lt;/p>
&lt;h3 id="3-evaluation">3. Evaluation&lt;/h3>
&lt;p>The featured image in the headline of this blog, also shown below, is the result of the evaluation. I evaluated the models and drift detection algorithms using Cumulative Distribution Function (CDF) graph, to see if any tail cut is made.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Evaluation" srcset="
/report/osre23/uchicago/flashnet/20230807-rannnayy/featured_hua13ad1b86612ea35a1f0d083114566fc_25432_4866e846612d96725d801519edf06392.webp 400w,
/report/osre23/uchicago/flashnet/20230807-rannnayy/featured_hua13ad1b86612ea35a1f0d083114566fc_25432_9203cd36fc4c6de03e02a799cd564f1d.webp 760w,
/report/osre23/uchicago/flashnet/20230807-rannnayy/featured_hua13ad1b86612ea35a1f0d083114566fc_25432_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uchicago/flashnet/20230807-rannnayy/featured_hua13ad1b86612ea35a1f0d083114566fc_25432_4866e846612d96725d801519edf06392.webp"
width="760"
height="396"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="challenges">Challenges&lt;/h2>
&lt;p>During the implementation, I encountered several challenges as follows,&lt;/p>
&lt;h3 id="1-choice-of-model">1. Choice of Model&lt;/h3>
&lt;p>Since we want to integrate the pipeline to real storage systems, we had to be mindful of model choice. Machine learning based models are lighter than deep learning based models. However, deep learning based models offer higher accuracy, thus more preferable. Hence, I implemented both and examine the effectivity of the models.&lt;/p>
&lt;h3 id="2-choice-of-drift-detection-algorithm">2. Choice of Drift Detection Algorithm&lt;/h3>
&lt;p>Continual learning technique is chosen for this task may require the model to be retrained since the workload may change over time. However, the implication is we need to have a condition that triggers the retraining to be done. As training model is costly, we need to retrain it mindfully. Thus, we use drift detection algorithm to detect whether or not retraining is needed.&lt;/p>
&lt;p>There are two types of drift detection algorithms, namely statistical based test and model based drift detection. For minimizing overhead reason, we pick statistical tests. There exists various algorithms of choice. I picked 5 of them to be implemented and evaluated.&lt;/p>
&lt;h2 id="plan">Plan&lt;/h2>
&lt;p>For the second half of the summer, I am going to study Riak and create Chameleon Trovi artifact for deploying Riak in a cluster.&lt;/p></description></item><item><title>Introducing Levels of Reproduction and Replication in Machine Learning</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/eduml/20230802-msaeed/</link><pubDate>Wed, 02 Aug 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/eduml/20230802-msaeed/</guid><description>&lt;p>Hello again,&lt;/p>
&lt;p>I am Mohamed Saeed and this is my second blog post for the 2023 Summer of Reproducibility Fellowship. As you may recall from my &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/eduml/20230601-msaeed">previous post&lt;/a>, I am working on the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/nyu/eduml">Using Reproducibility in Machine Learning Education&lt;/a> project with &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fraida-fund/">Fraida Fund&lt;/a> as my mentor. My goal is to create interactive open educational resources that teach reproducibility and reproducible research in machine learning (ML) as I &lt;a href="https://drive.google.com/file/d/13HnCMZawpabiLdBoOiaJFF2mNXIPLCVJ/view?usp=sharing" target="_blank" rel="noopener">proposed&lt;/a>.&lt;/p>
&lt;p>In this post, I will share with you some of the progress I have made so far, as well as some of the challenges I have faced and how I overcame them. I will also highlight some of the specific accomplishments that I am proud of and what I plan to do next.&lt;/p>
&lt;h2 id="reproducing-on-warm-starting-neural-network-training">Reproducing &amp;ldquo;On Warm Starting Neural Network Training&amp;rdquo;&lt;/h2>
&lt;p>This material is a reproduction of the paper &lt;a href="https://arxiv.org/abs/1910.08475" target="_blank" rel="noopener">&amp;ldquo;On Warm Starting Neural Network Training&amp;rdquo;&lt;/a> by Jordan T. Ash and Ryan P. Adams (2020). This paper investigates the effect of warm-starting neural networks, which means using the weights of previous models trained on a subset of the data, to train on a new dataset that has more data.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="" srcset="
/report/osre23/nyu/eduml/20230802-msaeed/warm_start_huf40f540ab6672b609385b58179d23d2a_3423296_0c5af6e4428dce728fe7a643b2b8e6d3.webp 400w,
/report/osre23/nyu/eduml/20230802-msaeed/warm_start_huf40f540ab6672b609385b58179d23d2a_3423296_f3e332c8b81d6d3146e54527a273bbfe.webp 760w,
/report/osre23/nyu/eduml/20230802-msaeed/warm_start_huf40f540ab6672b609385b58179d23d2a_3423296_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/eduml/20230802-msaeed/warm_start_huf40f540ab6672b609385b58179d23d2a_3423296_0c5af6e4428dce728fe7a643b2b8e6d3.webp"
width="760"
height="383"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
The figure illustrates how the new model uses the weights from the previous model as its initial values. This allows the new model to train on both the “Original” data, which it has already seen, and the new data, which it has not encountered before. In contrast, the randomly initialized model treats the entire data as unfamiliar and starts from scratch.&lt;/p>
&lt;p>The paper also shows that this method can lead to lower test accuracy than starting from scratch with random weights, even though the training loss is similar. The paper also proposes a simple way to improve the test accuracy of warm-starting by adding some noise to the previous weights.&lt;/p>
&lt;p>To reproduce this paper, I followed a systematic approach that ensured reliable results. This approach involved:&lt;/p>
&lt;ul>
&lt;li>Reading the paper and its main claims carefully.&lt;/li>
&lt;li>Finding out what resources the authors shared, such as code, data, and models.&lt;/li>
&lt;li>Looking for additional materials online that could help me save time and fill in the gaps left by the authors.&lt;/li>
&lt;li>Setting up the environment and dependencies needed to run the code smoothly.&lt;/li>
&lt;li>Writing code and updating any outdated functions that might cause errors.&lt;/li>
&lt;li>Running the code and verifying that it matched the results reported in the paper.&lt;/li>
&lt;li>Analyzing and interpreting the results and comparing them with the paper’s findings.&lt;/li>
&lt;/ul>
&lt;p>I used &lt;a href="https://chameleoncloud.org/" target="_blank" rel="noopener">Chameleon&lt;/a> as my platform for running and documenting my reproduction experiments. Chameleon is a large-scale, reconfigurable experimental platform that supports computer science systems research. It allows users to create and share Jupyter notebooks that can run Python code on Chameleon’s cloud servers.&lt;/p>
&lt;p>I created a &lt;a href="https://github.com/mohammed183/re_warm_start_nn" target="_blank" rel="noopener">GitHub repository&lt;/a> where you can find all related to my reproduction work in the form of interactive jupyter notebooks that will help you learn more about machine learning and reproducibility of machine learning research.&lt;/p>
&lt;h2 id="challenges">Challenges&lt;/h2>
&lt;p>Reproducing a paper is not an easy task. I faced several challenges along the way. One of the biggest challenges was the lack of code and pretrained models from the authors. This is a common problem for many reproducibility projects. Fortunately, I found a previous reproducibility publication for this paper on &lt;a href="https://rescience.github.io/bibliography/Kireev_2021.html" target="_blank" rel="noopener">ReScience journal&lt;/a>. I used some of their code and added some new functions and modifications to match the original paper’s descriptions. I also encountered other challenges that I discussed in the notebooks with the solutions that I applied.&lt;/p>
&lt;h2 id="how-to-use-this-material">How to use this material?&lt;/h2>
&lt;p>This material is a series of notebooks that walk you through the paper and its claims, experiments, and results. You will learn how to analyze, explain, and validate the authors’ claims. To get started, I suggest you skim the &lt;a href="https://arxiv.org/abs/1910.08475" target="_blank" rel="noopener">original paper&lt;/a> briefly to get the main idea and the public information. This will help you understand how the authors could have been more clear and transparent in some sections. I have given clear instructions and explanations in the notebooks, as well as how I dealt with the missing components. You can use this material for self-learning or as an assignment by hiding the final explanation notebook.&lt;/p>
&lt;h2 id="conclusion-and-future-work">Conclusion and Future Work&lt;/h2>
&lt;p>In this blog post, I have shared with you some of my work on reproducing warm starting neural network training. I have learned a lot from this experience and gained a deeper understanding of reproducibility and reproducible research principles in ML.&lt;/p>
&lt;p>I am very happy with what I have achieved so far, but I still have more work to do. I am working on reproducing the &lt;a href="https://arxiv.org/abs/2010.11929" target="_blank" rel="noopener">Vision Transformer: An Image is Worth 16x16 Words&lt;/a> paper by Alexey Dosovitskiy et al. This time my approach is to use the available pretrained models provided by the authors to verify the claims made in the paper. However, there are some challenges that I face in reproducing the paper. For example, some of the datasets and code that the authors used are not publicly available, which makes it hard to replicate their experiments exactly. These challenges are common in reproducing research papers, especially in computer vision. Therefore, it is important to learn how to deal with them and find ways to validate some of the claims.&lt;/p>
&lt;p>I hope you enjoyed reading this blog post and found it informative and interesting. If you have any questions or feedback, please feel free to contact me. Thank you for your attention and stay tuned for more updates!&lt;/p></description></item><item><title>Midterm Blog Measuring Open-source Database Systems under TPC-C Benchmark with Unreported Settings</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/osu/missingsettings/20230802-ren.450/</link><pubDate>Wed, 02 Aug 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/osu/missingsettings/20230802-ren.450/</guid><description>&lt;p>As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/osu/missingsettings">Measuring Research Prototypes under Unreported Settings&lt;/a> my &lt;a href="https://drive.google.com/file/d/1ouFre-qMDCL_LiH5jFNUCOI1yAYHdWcS/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/yang-wang/">Yang Wang&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/miao-yu/">Miao YU&lt;/a> aims to understand the impact of missing settings in artifact evaluation.&lt;/p>
&lt;p>Based on our project proposal, the first step is to test the benchmark application on targeted systems. We pick open-source database system PostgreSQL as the target system. We test the TPC-C benchmark on PostgreSQL under default settings. We measure the throughput performanace for the benchmark by setting scalefactor as 10 and incrementing worker terminals. The settings for database server are all default values. We will take these results as baseline. In order to test on more parameters and system settings, we need to choose an association of parameters to get optimal throughput.&lt;/p>
&lt;p>We use an online tool &lt;a href="https://pgtune.leopard.in.ua/#/" target="_blank" rel="noopener">PGTune&lt;/a>, which aims to tune PostgreSQL config by the hardware. We select shared_buffer, min/max_wal_size and effective_cache_size as first set of parameters to measure. They are related to memory consumption, checkpoints and planner cost in the database server. Based on PostgreSQL &lt;a href="https://www.postgresql.org/docs/current/runtime-config.html" target="_blank" rel="noopener">official documentation&lt;/a>, shared_buffer sets the amount of memory the database server uses for shared memory buffers. Max_wal_size sets the maximum size to let the WAL grow during automatic checkpoints. Larger settings for shared_buffers usually require a corresponding increase in max_wal_size, in order to spread out the process of writing large quantities of new or changed data over a longer period of time. Effective_cache_size sets the planner&amp;rsquo;s assumption about the effective size of the disk cache that is available to a single query. This is factored into estimates of the cost of using an index; a higher value makes it more likely index scans will be used, a lower value makes it more likely sequential scans will be used.&lt;/p>
&lt;p>We conduct the experiments by setting the parameters with increments and compare the throughput performance with each other and the baseline. Based on the results, the throughput of the benchmark with larger shared_buffer and max_wal_size is up to 1.5X of the performance under default settings. The improvement by tuning max_wal_size is larger than that of tuning shared_buffer. The increased effective_cache_size does not have effect for this benchmark workload compared to its default value of the system.&lt;/p>
&lt;p>There are more values of above mentioned parameters to test. Next, I will test those parameters with increments of the values. Furthemore, we need to choose an association of more parameters to get optimal throughput. Also, the tuning tool may not generate optimal values for very high memory systems based on its description. This requires we test more possible parameters and their values for better performance.&lt;/p></description></item><item><title>ScaleBugs: Reproducible Scalability Bugs</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucdavis/scalebugs/20230802-boluwarinayinmode/</link><pubDate>Wed, 02 Aug 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucdavis/scalebugs/20230802-boluwarinayinmode/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>As part of the Scalebugs Project, we have worked on building a dataset of reproducible scalability bugs. To achieve this, we go through existing bug reports for popular distributed systems, which include Cassandra, HDFS, Ignite, and Kafka. Workloads are designed to reproduce these scalability bugs by triggering some functionalities of the system under different configurations (e.g., different numbers of nodes), for which we will observe the impact on performance.&lt;/p>
&lt;p>So far we have worked on packaging the buggy and fixed versions of scalability systems, a runtime environment that ensures reproducibility, and the workloads used to trigger the symptoms of the bug inside docker containers. By packaging these versions together, we are simplifying the process of deployment and testing. This enables us to switch between different versions efficiently, aiding in the identification and comparison of the bug&amp;rsquo;s behavior. For each scalability system, we have carefully built a runtime environment that is consistent and reproducible. This approach ensures that each time we run tests or investigations, the conditions remain identical.&lt;/p>
&lt;h2 id="new-terms">New Terms&lt;/h2>
&lt;p>In order to make sense of the various bug reports, we had to learn some terminologies associated with scalability systems:&lt;/p>
&lt;p>&lt;strong>Clusters&lt;/strong>: Clusters are groups of related or connected items, often found in various fields such as computer science, data analysis, or even social sciences. For example, in data analysis, clusters might represent groups of data points with similar characteristics, making it easier to understand patterns or trends in the data.&lt;/p>
&lt;p>&lt;strong>Cluster Membership&lt;/strong>: Cluster membership refers to the process of determining which items or entities belong to a particular cluster. This task can be done based on various criteria, such as similarity in attributes, spatial proximity, or shared characteristics.&lt;/p>
&lt;p>&lt;strong>Locks&lt;/strong>: In computer programming, locks are mechanisms used to manage access to shared resources, such as files, data structures, or hardware devices. When multiple processes or threads need to access a shared resource simultaneously, locks ensure that only one process or thread can access it at a time, preventing data corruption or conflicts.&lt;/p>
&lt;p>&lt;strong>Lock Contentions&lt;/strong>: Lock contention occurs when multiple processes or threads attempt to acquire the same lock simultaneously. When this happens, one process or thread must wait until the lock becomes available, leading to potential delays and reduced performance.&lt;/p>
&lt;p>&lt;strong>Critical Paths&lt;/strong>: In project management or process analysis, a critical path is the longest chain of dependent tasks that determines the overall duration of the project or process. Any delay in tasks along the critical path will directly impact the project&amp;rsquo;s completion time.&lt;/p>
&lt;p>&lt;strong>Tokens&lt;/strong>: Tokens can have various meanings depending on the context. In computer programming, tokens are the smallest units of source code recognized by a compiler or interpreter. In cryptography, tokens can represent digital certificates or authentication data used for secure communication.&lt;/p>
&lt;p>&lt;strong>Nodes&lt;/strong>: In the context of network theory or graph theory, nodes are individual points or entities that form a network or graph. In a computer network, nodes can be devices like computers or routers, and in a social network, nodes can represent individuals or entities.&lt;/p>
&lt;p>&lt;strong>Peers&lt;/strong>: Peers are entities within a network that have the same status or capabilities. In peer-to-peer networks, each node can act as both a client and a server, enabling direct communication between nodes without relying on a central server.&lt;/p>
&lt;p>&lt;strong>Gossipers, Gossip Protocol&lt;/strong>: In distributed systems, gossipers are nodes that share information with each other using the gossip protocol. The gossip protocol involves randomly selecting peers and exchanging information in a decentralized manner, allowing information to spread quickly across the network.&lt;/p>
&lt;p>&lt;strong>Threads&lt;/strong>: Threads are the smallest units of execution within a process in computer programming. Multiple threads can run concurrently within a single process, enabling multitasking and parallel processing. Threads can share the same resources within the process, making them more lightweight than separate processes. However, proper synchronization is essential to prevent data corruption or conflicts when multiple threads access shared resources.&lt;/p>
&lt;p>&lt;strong>Flush and Writes Contention&lt;/strong>: This refers to a situation where simultaneous operations involving data flushing (saving data to a storage medium) and data writing (updating or adding data) are causing conflicts or delays. This contention can arise when multiple processes or threads attempt to perform these operations concurrently, leading to performance bottlenecks or potential data integrity issues.&lt;/p>
&lt;h2 id="accomplishments">Accomplishments&lt;/h2>
&lt;p>We have been able to build docker containers for the following scalability bugs:&lt;/p>
&lt;p>&lt;strong>IGNITE 12087&lt;/strong>&lt;/p>
&lt;p>This bug stems from the resolution of the IGNITE-5227 issue (another bug), which has led to a significant decline in the performance of a particular operation. Prior to addressing IGNITE-5227, the insertion of 30,000 entries displayed remarkable efficiency, completing in roughly 1 second. However, post the resolution, executing the same insertion process for 30,000 entries witnessed a considerable slowdown, taking approximately 130 seconds – a performance degradation of nearly 100 times.&lt;/p>
&lt;p>&lt;strong>CASSANDRA 14660&lt;/strong>&lt;/p>
&lt;p>This bug is related to how clusters work together and how a lock is causing conflicts with the critical path. The issue arises from a method call that uses O(Peers * Tokens) resources while contending for a lock, which is causing problems in the write path. The lock is used to protect cached tokens that are essential for determining the correct replicas. The lock is implemented as a synchronized block in the TokenMetadata class.&lt;/p>
&lt;p>&lt;em>How was this fixed?&lt;/em>&lt;/p>
&lt;p>It was fixed by reducing the complexity of the operation to O(Peers) taking advantage of some properties of the token list and the data structure.&lt;/p>
&lt;p>&lt;strong>CASSANDRA 12281&lt;/strong>&lt;/p>
&lt;p>This bug is also related to how clusters work together and a lock conflict. The issue arises when a specific method is trying to access a lot of resources (O(Tokens^2)) while contending for a read lock. As reported, a cluster with around 300 nodes has around 300 * 256 (assuming the default number of tokens) tokens, thus joining a new member reportedly is taking more than 30 mins. This happens because due to the long execution time here, this lock makes every gossip message delayed, so the node never becomes active.&lt;/p>
&lt;p>&lt;em>How was this fixed?&lt;/em>&lt;/p>
&lt;p>The granularity of the lock is decreased, meaning that the expensive function calls now do not take the problematic read lock and simply use a synchronized block, synchronizing on a specific field, that does the job much better.&lt;/p>
&lt;p>&lt;strong>HA16850&lt;/strong>&lt;/p>
&lt;p>This is a bug related to obtaining thread information in the JvmMetrics package. When obtaining thread information, the original buggy version used MXBeans to obtain thread information. The call uses an underlying native implementation that holds a lock on threads, preventing thread termination or creation. This means that the more threads that we have to obtain information for, the longer the function call will hold a lock. The result is that the execution time scales on the number of active threads O(threads).&lt;/p>
&lt;p>&lt;em>How was this fixed?&lt;/em>&lt;/p>
&lt;p>Developers utilized a ThreadGroup to keep track of obtaining metrics for threads. The result is that there is no lock held for every thread.&lt;/p>
&lt;p>&lt;strong>CA13923&lt;/strong>&lt;/p>
&lt;p>This issue revolves around conflicts between the &amp;ldquo;flush&amp;rdquo; and &amp;ldquo;writes&amp;rdquo; processes. The main problem is that during the &amp;ldquo;flush&amp;rdquo; process, a resource-intensive function called &amp;ldquo;getAddressRanges&amp;rdquo; is invoked. This function has a high computational cost and its complexity is O(Tokens^2). In other words, the time it takes to complete this function grows quickly as the number of &amp;ldquo;tokens&amp;rdquo; increases. This situation is causing challenges and delays in the overall process.&lt;/p>
&lt;p>&lt;em>How was this fixed?&lt;/em>&lt;/p>
&lt;p>This function call affected many paths and they made sure no one calls getAddressRanges in critical paths.&lt;/p>
&lt;h2 id="challenges">Challenges&lt;/h2>
&lt;p>&lt;strong>Demanding Memory Requirements&lt;/strong>: Running certain builds consumes a significant amount of memory. This places a strain on system resources and can impact the overall performance and stability of the process.&lt;/p>
&lt;p>&lt;strong>Little Issues Impacting Execution&lt;/strong>: Often, seemingly minor details can obstruct the successful execution of a build. Resolving such issues requires thorough investigation and extensive research into similar problems faced by others in the past.&lt;/p>
&lt;p>&lt;strong>Complexities of Scalability Bugs&lt;/strong>: Identifying the underlying causes of scalability-related bugs is intricate. These bugs exhibit unique characteristics that can complicate the process of pinpointing and comprehending their root origins.&lt;/p>
&lt;h2 id="what-is-docker--for-those-who-dont-know-about-it-">What is Docker? ( For those who don&amp;rsquo;t know about it )&lt;/h2>
&lt;p>Docker is a platform that facilitates the containerization of applications, leading to consistent and efficient deployment across diverse environments. Its benefits include portability, resource efficiency, isolation, and rapid development cycles. DockerHub complements Docker by providing a centralized hub for sharing and accessing container images, fostering collaboration and ease of use within the Docker ecosystem.&lt;/p>
&lt;p>More about docker &lt;a href="https://docs.docker.com/get-started/overview/" target="_blank" rel="noopener">https://docs.docker.com/get-started/overview/&lt;/a>&lt;/p></description></item><item><title>Mid-term blog post for Teaching Computer Networks with Reproducible Research: Developing a 'classroom competition' for adaptive video delivery</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/edunet/20230801-srishti-j18/</link><pubDate>Tue, 01 Aug 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/edunet/20230801-srishti-j18/</guid><description>&lt;p>Hello!&lt;/p>
&lt;p>I am Srishti Jaiswal and this is my second blog post for the 2023 Summer of Reproducibility Fellowship.&lt;/p>
&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>As I reach the halfway mark of my internship journey, I have had the incredible opportunity to work on a project that revolves around reproducing an adaptive video research result using cloud-based experimentation. This blog post delves into my exciting work so far, the significant milestones achieved, specific accomplishments to celebrate, and the challenges overcome. Utilizing CloudLab and FABRIC, I embarked on a journey to reproduce essential figures from the research paper &lt;a href="https://dl.acm.org/doi/10.1145/2491172.2491179" target="_blank" rel="noopener">Downton Abbey Without the Hiccups: Buffer-Based Rate Adaptation for HTTP Video Streaming&lt;/a>, ensure Python2 and Python3 compatibility and incorporate an Estimated Download Rate column in the log file produced by the video client. Let&amp;rsquo;s explore the details of this captivating internship experience.&lt;/p>
&lt;h2 id="major-milestones-reached">Major Milestones Reached&lt;/h2>
&lt;p>Here are the milestones we have reached so far:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Familiar with CloudLab and Fabric Testbeds: I learned how to run an adaptive video experiment, which is the jumping-off point for my project, on the CloudLab and FABRIC platforms.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Python2 and Python3 Compatibility: My first task was to port an existing open-source code base developed for Python2 (which is no longer supported) so that it can run in Python3.
Now code is running successfully in both versions for all the policies of the existing open source, i.e. Basic, Netflix and Sara.
Fixed &lt;a href="https://github.com/Srishti-j18/AStream/issues/1" target="_blank" rel="noopener">issue#1&lt;/a> .&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Estimated Download Rate for Basic Policy: To make it easier for users to understand and visualize how the adaptive video policy works, I added an additional metric, “Estimated Download Rate”, to the output file produced by the adaptive video client.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Graphing Buffer Occupancy and Estimated Download Rate: I extended the existing experiment to show two additional visualizations that are important for understanding how the adaptive video client works: buffer occupancy vs time and estimated download rate vs time.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="overcoming-challenges">Overcoming Challenges&lt;/h2>
&lt;p>I encountered several challenges throughout this project, especially as it was my first time working independently on a research paper as a third-year engineering student. However, with my mentor&amp;rsquo;s guidance and support, I persevered and learned to tackle each obstacle with determination.&lt;/p>
&lt;p>One significant challenge was porting the entire code from Python2 to Python3. This transition resulted in numerous errors, and I often found it challenging to pinpoint where the mistakes occurred. To overcome this, I adopted a step-by-step approach, fixing errors one by one and verifying them using Python2 for comparison.&lt;/p>
&lt;p>Understanding the complex codebase was another hurdle that led to moments of feeling stuck in an infinite loop. But every time I faced such situations, I sought my mentor&amp;rsquo;s advice, and together, we made strategic changes to overcome these challenges.&lt;/p>
&lt;p>I am immensely grateful for my mentor&amp;rsquo;s expertise and support throughout this internship. Her guidance played a crucial role in helping me navigate through the challenges and grow both professionally and personally. I eagerly look forward to the rest of the journey, knowing that I can continue making meaningful contributions to this research project with her inspiring mentorship.&lt;/p>
&lt;h2 id="future-prospects">Future Prospects&lt;/h2>
&lt;p>As the second half of my internship approaches, I am eager to refine further and expand our experimentation.
Our main aim is to reproduce the existing work and provide a clear guide for other students to do the same for this, I have to create a framework that helps them improve and build upon this work.&lt;/p>
&lt;p>I hope you enjoyed reading this blog post.If you have any questions or feedback, please feel free to contact me. Thank you for your attention and stay tuned for more updates!&lt;/p></description></item><item><title>Reproducible Analysis &amp; Models for Predicting Genomics Workflow Execution Time (Midterm Blog Post)</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uga/genomicswfmodels/20230801-shayantan/</link><pubDate>Tue, 01 Aug 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uga/genomicswfmodels/20230801-shayantan/</guid><description>&lt;p>We are currently midway into the OSRE 2023 program and the following post lists the progress that I have made on the project so far.
As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/uga/genomicswfmodels/">Reproducible Analysis &amp;amp; Models for Predicting Genomics Workflow Execution Time&lt;/a> our overall goal was to enumerate the effect of sequence data quality on execution times. Towards that end, we decided to first identify suitable datasets from the two commmonly available -omics data modalities - transcriptomics and genomics. Albrecht et al. [1] developed &lt;em>&lt;strong>seqQscorer&lt;/strong>&lt;/em> to automate the quality control step of NGS data analysis through predictive modeling. They have also published the list of ENCODE datasets used for training the models. Quality label has been assigned as 0 for released files or 1 for revoked files. Based on the guidelines set forth by ENCODE&amp;rsquo;s Data Coordination Centre (DCC) a comprehensive manual annotation of the data was done by scientists and the resulting quality variable &amp;ldquo;status&amp;rdquo; was published to serve as an indication of the quality of the data. The following steps outline the process of generating the data table for building the machine learning models.&lt;/p>
&lt;ul>
&lt;li>Step 1: Programmatically accessed 86 (34 released ; 34 revoked) RNA-seq files from ENCODE database. All the fastq files were single ended.&lt;/li>
&lt;li>Step 2: Programmatically accessed 288 (144 released ; 144 revoked) DNA-seq files from ENCODE database. All the fastq files were paired ended.&lt;/li>
&lt;li>Step 3: Implemeted the STAR aligner for RNA-seq and the BWA aligner for DNA seq. The resulting outputs contained the alignment times for both the &amp;ldquo;revoked&amp;rdquo; and &amp;ldquo;released&amp;rdquo;.&lt;/li>
&lt;li>Step 4: Ran statistical tests to determine whether there is any significant differences in the runtimes of both types of files.&lt;/li>
&lt;/ul>
&lt;p>Currently I am running the FASTQC tool to extract data quality metrics for the same set of files as discsussed above. Once I have collected those metrics, I can start building regression models to determine whether there is any significant impact of data quality on execution time. The first step toward the execution of a typical genomic analysis workflow is quality control of the raw data - a crucial step in removing low-quality data instances that may significantly impact the downstream analysis. Through our analysis we aim to develop a reproducible ML model that will give the user an estimate of the runtime based on the raw FATSQ file as input.&lt;/p>
&lt;h2 id="references">References&lt;/h2>
&lt;p>[1] Albrecht, S., Sprang, M., Andrade-Navarro, M.A. &lt;em>et al.&lt;/em> seqQscorer: automated quality control of next-generation sequencing data using machine learning. &lt;em>Genome Biol&lt;/em> &lt;strong>22&lt;/strong>, 75 (2021). &lt;a href="https://doi.org/10.1186/s13059-021-02294-2" target="_blank" rel="noopener">https://doi.org/10.1186/s13059-021-02294-2&lt;/a>&lt;/p></description></item><item><title>[Mid-term] Capturing provenance into Data Science/Machine Learning workflows</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/noworkflow/20230731-jesselima/</link><pubDate>Mon, 31 Jul 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/noworkflow/20230731-jesselima/</guid><description>&lt;p>This post describes our midterm work status and some achievements we have done so far in &lt;a href="https://docs.google.com/document/d/1YMtPjZXcgt5eplyxIgQE8IBpQIiRlB9eqVSQiIPhXNU/edit#heading=h.nnxl1g16trg0" target="_blank" rel="noopener">the project&lt;/a> for the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/nyu/noworkflow/">noWorkflow&lt;/a> package.&lt;/p>
&lt;h4 id="the-initial-weeks">The initial weeks&lt;/h4>
&lt;p>I started doing a bibliographical review on reproducibility in the Data Science (DS) and Machine Learning (ML) realms. It was a new subject to me, and I aimed to build a more robust theoretical background in the field. Meanwhile, I took notes in &lt;a href="https://jaglima.github.io/" target="_blank" rel="noopener">this series of posts&lt;/a>.&lt;/p>
&lt;p>Then, as planned, I integrated with the current noWorkflow supporters in order get a broader view of the project and their contributions. Additionally, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/juliana-freire/">Juliana Freire&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/joao-felipe-pimentel/">João Felipe Pimentel&lt;/a>, and I set up a weekly one-hour schedule to keep track of my activities.&lt;/p>
&lt;h3 id="brainstormed-opportunities">Brainstormed opportunities&lt;/h3>
&lt;p>At the beginning of June, we also met with other project supporters to brainstorm about our initial proposal. From this meeting, we came up with a plan on how technically approach a noWorkflow new feature in Data Science and Machine Learning experimental management.&lt;/p>
&lt;p>In this brainstorm, we aligned that &lt;em>Jupyter Notebooks are, by far, the most frequent set up in DS/ML computational experiments. They established themselves as the fundamental artifact by embedding code, text and enabling execution and visualization. Entire experiments are created and kept in Jupyter notebooks until they are sent to production. And the opportunity at hand is to integrate noWorkflow with Jupyter Notebooks&lt;/em>.
Then, our mid-term goal was adapted from the original plan of only selecting and executing a prototypical ML experiment. We added the goal of paving the way for providing a tagging feature for Notebook cells.&lt;/p>
&lt;p>More specifically, DS/ML experimental workflows usually have well-defined stages composed of &lt;em>data reading&lt;/em>, &lt;em>feature engineering&lt;/em>, &lt;em>model scoring&lt;/em>, and &lt;em>metrics evaluation&lt;/em>. In our dream space, the user would tag a cell in their experiment, enabling the capture of the tagged metadata into a database. This step integrates the ultimate goal of facilitating comparisons, management, and even causal inference across different trials of a DS/ML experiment.&lt;/p>
&lt;h3 id="current-deliverables">Current deliverables&lt;/h3>
&lt;p>So, based on our plans, we create a separate table to store the metadata from cell tagging. This table stores the cell hash codes and information to match the code executed within a cell. As a result, we can store tags and the activation ids of the cells enabling us to identify a cell containing a given stage in a DS/ML experiment.&lt;/p>
&lt;p>The second feature implemented was tagging a specific variable. In the same way for a cell, now it is possible to stamp a given variable with a tag, keeping its name, id, and received value in this separated table.&lt;/p>
&lt;p>Finally, we worked on displaying the dependencies of a given variable. In this case, by tagging a given variable, we can display the other variables, values, and cells activated in its construction. Then, we can visualize the dependencies that contributed to its final value.&lt;/p>
&lt;p>For an overview of current developments, please refer to my &lt;a href="https://github.com/jaglima/noworkflow/tree/stage_tagging" target="_blank" rel="noopener">fork of the main project&lt;/a>.&lt;/p>
&lt;h3 id="challenges">Challenges&lt;/h3>
&lt;p>During this period, we had to make choices along the way. For instance, capturing the provenance of cells through tags is a different solution than tagging code chunks in scripts. In this case, we decided to stick with tagging Notebook cells at this moment. We also opted to start storing the metadata to enable comparisons between trials rather than focus on a sophisticated graphic and user-friendly cell tagging system. We also opted to keep this metadata info stored in a separate table in the database.&lt;/p>
&lt;h3 id="next-steps">Next steps&lt;/h3>
&lt;p>In the second half of the summer, our goal is to integrate these features in order to proceed with comparisons among experiments. Such comparisons would use the tagged variables as the hyperparameters of DS/ML experiments or key variables to assess the experiments, such as errors or scores. As a result, we will be able to compare the results of two trials in a more accurate, and easily reproducible experiment.&lt;/p></description></item><item><title>Improving Video Applications' Accuracy by Enabling The Use of Concierge</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uchicago/edgebench/20230731-zharfanf/</link><pubDate>Mon, 31 Jul 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uchicago/edgebench/20230731-zharfanf/</guid><description>&lt;style>
p {
text-align: justify;
}
img {
display: block;
margin-left: auto;
margin-right: auto;
}
&lt;/style>
&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Hello, it&amp;rsquo;s me again, Faishal, a SoR project contributor for the edgebench project. For the past these two months, my mentors and I have been working on improving the performance of our system. In this report, I would like to share with you what we have been working on.&lt;/p>
&lt;h2 id="motivation">Motivation&lt;/h2>
&lt;p>Edgebench is a project that focuses on how to efficiently distribute resource (bandwidth and cpu usage) across several video applications. Nowaday&amp;rsquo;s video applications process its data or video on a server or known as edge computing, hence bandwidth or compute unit may be the greatest concern if we talk about edge computing in terms of WAN, because it is strictly limited.&lt;/p>
&lt;p>Consider the following case, suppose we have 3 video applications running that is located in several areas across a city. Suppose the total bandwidth allocated to those 3 video applications is also fixed. Naively, we may divide the bandwidth evenly to every camera in the system. We may have the following graph of the allocated bandwidth overtime.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./images/baseline_alloc.png" alt="Baseline Allocation" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>They are fixed and won’t change. However, every video application has its own characteristic to deliver such a good result or f1-score. It is our task to maintain high average f1-score. Therefore we need to implement a new solution which is accuracy-oriented. The accuracy-gradient&lt;a href="%28#acc%29">[1]&lt;/a> comes into this.&lt;/p>
&lt;h2 id="system-design">System Design&lt;/h2>
&lt;p>On our current design, we need a resource allocator, namely concierge. This concierge determines how much bandwidth is needed for every video application (vap) in the system. Concierge will do the allocation at a certain time interval that has been determined before. This process is called profiling, on this process, the concierge will first ask every vap to calculate their f1-score at a certain video segment when the bandwidth is added by profile_delta. Then the difference of this f1-score is substracted by the default f1-score, namely &lt;code>f1_diff_high&lt;/code>. After that, the concierge will ask to reduce its bandwidth by profile_delta and do the same process as before, this result will be named &lt;code>f1_diff_low.&lt;/code> Those two results will be sent to the concierge for the next step. On the concierge, there will be sensitivity calculation, where sensitivity is&lt;/p>
&lt;!-- pada sistem yang kami desain, kami membutuhkan sebuah resource allocator yang kami namakan concierge. Concierge ini yang akan menentukan berapa besarnya bandwidth yang dibutuhkan pada tiap video application. Concierge akan melakukan penentuan bw dalam interval yang sudah ditentukan sebelumnya, pada tahap ini, concierge akan meminta kepada seluruh video aplikasi untuk menghitung f1-score pada segmen video tertentu ketika alokasi bandwidth pada aplikasi itu dinaikan sebesar delta yang sudah ditentukan pula. Setelah itu, the difference of f1-score disimpan pada variabel f1_diff_high. Lalu concierge akan meminta f1-score ketika bw akan diturunkan sebesar delta. Akan pula dihitung the difference-nya. Kedua hasil tersebut akan dikirimkan oleh video aplikasi kepada concierge untuk dilakukan perhitungan selanjutnya. -->
&lt;!-- Pada concierge, akan dilakukan perhitungan sensitivity. Where sensitivity -->
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://latex.codecogs.com/svg.image?&amp;amp;space;sensitivity[i]=f1%5c_diff%5c_high[i]-%5cSigma_%7bk=1%7d%5enf1%5c_diff%5c_low[k];k%5cneq&amp;amp;space;i&amp;amp;space;" alt="sensitivity[i] = f1_diff_high[i] - \Sigma_{k=1}^nf1_diff_low[k]; k \neq i" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>This equation tells us which video application will give us the best f1-score improvement if we add more bandwidth to one vap while reducing other&amp;rsquo;s bandwidth. From this, we will optimize and the concierge will give the bandwdith to the one with the highest sensitivity and take the bandwidth from the app with the lowest sensitvity.&lt;/p>
&lt;h2 id="results">Results&lt;/h2>
&lt;p>As aforementioned, our main objective is to improve the accuracy. However, there are two parameters that will be taken into account which are improvement and the overhead of its improvement. We first choose 3 dds apps&lt;a href="#dds">[2]&lt;/a> that we think will be our ideal case. The following graphs show the profile of our ideal case&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./images/ideal_case.png" alt="Ideal Case" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>We can see that two of them have high sensitivity especially on lower bandwidth and one of them has low sensitivity. This is a perfect scenario since we may sacrifice one&amp;rsquo;s bandwidth and give it to the app that has the highest sensitivity at that iteration. We will do the experiment under the following setup&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-shell" data-lang="shell">&lt;span class="line">&lt;span class="cl">&lt;span class="nv">DATASETS&lt;/span>&lt;span class="o">=(&lt;/span>&lt;span class="s2">&amp;#34;&amp;#34;&lt;/span> &lt;span class="s2">&amp;#34;uav-1&amp;#34;&lt;/span> &lt;span class="s2">&amp;#34;coldwater&amp;#34;&lt;/span> &lt;span class="s2">&amp;#34;roppongi&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nv">MAX_BW&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="m">1200&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nv">PROFILING_DELTA&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="m">80&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nv">MI&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="m">5&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>That setup block tells us we will use the total bandwith of 1200 kbps, that means at first we will distribute the bandwidth evenly (400 kbps). The profiling_delta will be 80 kbps and profiling interval (&lt;code>MI&lt;/code>) will be 5 seconds.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./images/merged_ideal.png" alt="Merged Ideal" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:center">&lt;strong>Mode&lt;/strong>&lt;/th>
&lt;th style="text-align:center">&lt;em>DDS&lt;/em> &lt;br> (&lt;span style="color:blue">&lt;em>uav-1&lt;/em>&lt;/span>)&lt;/th>
&lt;th style="text-align:center">&lt;em>DDS&lt;/em> &lt;br> (&lt;span style="color:orange">&lt;em>coldwater&lt;/em>&lt;/span>)&lt;/th>
&lt;th style="text-align:center">&lt;em>DDS&lt;/em> &lt;br> (&lt;span style="color:green">&lt;em>roppongi&lt;/em>&lt;/span>)&lt;/th>
&lt;th style="text-align:center">Average&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:center">Baseline&lt;/td>
&lt;td style="text-align:center">0.042&lt;/td>
&lt;td style="text-align:center">0.913&lt;/td>
&lt;td style="text-align:center">0.551&lt;/td>
&lt;td style="text-align:center">0.502&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">&lt;strong>Concierge&lt;/strong>&lt;/td>
&lt;td style="text-align:center">0.542&lt;/td>
&lt;td style="text-align:center">0.854&lt;/td>
&lt;td style="text-align:center">0.495&lt;/td>
&lt;td style="text-align:center">&lt;strong>0.63&lt;/strong> (&lt;span style="color:green">&lt;em>+25.5%&lt;/em>&lt;/span>)&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>From the result, we managed to improve the average f1-score by &lt;strong>0.1&lt;/strong> or &lt;strong>25.5%&lt;/strong>. This is obviously a very good result. There are a total of 10 videos in our dataset, for the next experiment, we first will generate 6 combinations of dds apps. Noted that for each combination, one video will be uav-1 since we know that it has the highest sensitivity. We will the experiment with 4 bandwidth scenarios &lt;strong>(1200, 1500, 1800, 2100)&lt;/strong> in kbps.&lt;/p>
&lt;!-- dari hasil tersebut, kita telah berhasil meng-improve rata-rata f1-score sebesar 0.1 atau 13.5% Hal ini tentu saja merupakan sebuah hasil yang sangat baik. Selanjutnya kami melakukan tes yang sama namun dengan video yang berbeda. setupnya demikian -->
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./images/only_uav_merged.png" alt="Only Uav-1" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>The left figure depicts the average improvement of the concierge. Here we can see that the improvement decreases when the total bandwidth increases. The reason behind this is at a higher bandwidth, the sensitivity tends to be closer to 0 and the concierge won&amp;rsquo;t do any allocation. Overall, this confirms our previous result that with the help of uav-1, the concierge can improve the f1-score up to 0.1. The next experiment is to randomly pick 3 dds videos out of 10 videos that will be generated 10 times. We would like to see how it perfoms without any help of uav-1.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./images/random_merged.png" alt="Random Merged" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>From the result, we still managed to get the improvement. However, it seems that average improvement decreases compared to the previous one. The reason of this phenomenon will be discussed later.&lt;/p>
&lt;h3 id="overhead-measurement">Overhead Measurement&lt;/h3>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./images/overhead_1.png" alt="Overhead Measurement" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>From the graph above, each graph represents the total bandwidth used. In this experiment, it is clearly known that the lower MI leads to higher overhead since there would be more profiling process than higher MI. From the 4 graphs above, it can be known that there would be a significant trade off if we lower the MI since the improvement itself is not highly significant. The highest improvement is at &lt;strong>1200kbps&lt;/strong>. Hence, for higher bandwidth, there is no need to do the profiling too often&lt;/p>
&lt;h2 id="discussion">Discussion&lt;/h2>
&lt;p>There are some limitations of our current design. If we have a look at box-plot in figure 5 above, we can see that there is some combinations where the improvement is negative.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./images/recovery_failed.png" alt="Failed Recovery" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>The figure above depicts the profiling process from the segment 6 to determine the bandwidth used at segment 7. Here we can see that the f1-score at that bandwidth for (&lt;span style="color:blue">&lt;em>jakarta&lt;/em>&lt;/span>) drops significantly. Our current design cannot address this issue yet since we only consider current video segment. There is a need to not only look at current segment, but also the previous and the future segment should be taken into account as well.&lt;/p>
&lt;p>Regarding the overhead, we are aware that 50% overhead is still considered bad. We might as well try the dynamic &lt;code>MI&lt;/code> or skip the profiling for certain video if not neccesarry.&lt;/p>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>Regardless the aforementioned limitations, this report shows that the concierge is generally capable of giving an f1-score improvement. The update of the next will be shown in the final report later.&lt;/p>
&lt;h2 id="references">References&lt;/h2>
&lt;p>&lt;a id="acc">[1]&lt;/a> &lt;a href="https://drive.google.com/file/d/1U_o0IwYcBNF98cb5K_h56Nl-bQJSAtMj/view?usp=sharing" target="_blank" rel="noopener">https://drive.google.com/file/d/1U_o0IwYcBNF98cb5K_h56Nl-bQJSAtMj/view?usp=sharing&lt;/a> &lt;br>
&lt;a id="dds">[2]&lt;/a> Kuntai Du, Ahsan Pervaiz, Xin Yuan, Aakanksha Chowdhery, Qizheng Zhang, Henry Hoffmann, and Junchen Jiang. 2020. Server-driven video streaming for deep learning inference. In Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication. 557–570.&lt;/p></description></item><item><title>Mid-term blog post for Public Artifact Data and Visualization</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/intel/artifactviz/20230731-zjyhhhhh/</link><pubDate>Mon, 31 Jul 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/intel/artifactviz/20230731-zjyhhhhh/</guid><description>&lt;p>Over the past few weeks, our platform development has been progressing steadily, and we are excited to share the milestones we have achieved so far. As planned in our &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/intel/artifactviz/20230617-zjyhhhhh">introductory blog&lt;/a>, we have successfully laid the groundwork for the platform with the guidance and support of our mentor.&lt;/p>
&lt;h2 id="milestones-and-accomplishments">Milestones and Accomplishments&lt;/h2>
&lt;p>Here are some of the key functionalities we have implemented so far:&lt;/p>
&lt;ol>
&lt;li>Modular Architecture: We successfully designed the platform with a modular architecture, separating the Graphical User Interface (GUI) and Command-Line Interface (CLI) functionalities. This modularity allows users to interact with the platform in their preferred way.&lt;/li>
&lt;li>Experiment and Bucket Creation: Users can now create experiments, buckets (for storing different implementations of experiments), and iterations using either the GUI or CLI.&lt;/li>
&lt;li>Real-time Backend Environment Monitoring: Through the command line interface, users have the capability to control the monitoring of backend environment data, allowing for real-time tracking and analysis of important metrics.&lt;/li>
&lt;li>Visualizing Environment Variables: Users can now visualize detected environment variables on the platform. Moreover, they can compare iterations within different buckets and gain more insights by observing the timeseries data, such as CPU usage, in a graphical format.&lt;/li>
&lt;/ol>
&lt;h2 id="challenges">Challenges&lt;/h2>
&lt;p>In the early stages of designing our platform, we encountered significant challenges at the system design level. One of the most daunting obstacles we faced was devising an effective method to monitor backend environment variables. To tackle this obstacle, we engaged in extensive discussions and sought guidance from our mentor. After careful consideration, we decided to adopt a multi-process approach to monitor the backend environment variables effectively. Specifically, we devised a meticulous strategy of creating a separate process in the background for each specific metric we needed to monitor. By allocating a dedicated process to each metric, we ensured a streamlined and efficient monitoring process.&lt;/p>
&lt;p>Currently, we are facing a challenge related to monitoring metrics. Since different users have varying monitoring requirements, it is impractical for us to manually write monitoring solutions for each user. To address this issue, we are actively working on implementing a pluggable design that allows users to configure their own monitoring preferences.&lt;/p>
&lt;p>Our approach involves providing users with the flexibility to define their custom configuration files or write monitoring programs following our documented guidelines. This way, users can specify the specific metrics they wish to monitor and tailor the monitoring process to their individual needs.&lt;/p>
&lt;h2 id="try-it-out">Try it Out!&lt;/h2>
&lt;p>As mentioned earlier, we have completed the core functionalities of our platform, and we would love to have you try it out and provide us with valuable feedback. Here are the links to our repositories where you can explore and experiment with our platform:&lt;/p>
&lt;ol>
&lt;li>&lt;a href="https://github.com/PublicExperimentDatabase/PublicExperimentGUI" target="_blank" rel="noopener">GUI Repository&lt;/a> and &lt;a href="https://github.com/PublicExperimentDatabase/PublicExperimentCLI" target="_blank" rel="noopener">CLI Repository&lt;/a>
&lt;ul>
&lt;li>In the README.md file of GUI repo, you will find detailed installation instructions to set up the Graphical User Interface (GUI). Follow the steps provided to get started with our platform.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="https://github.com/PublicExperimentDatabase/test-experiment" target="_blank" rel="noopener">Sample Repository&lt;/a>
&lt;ul>
&lt;li>In this repository, we have included scripts that allow you to run our program. Additionally, you can use these scripts as templates to monitor your own programs according to your specific requirements.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;p>We welcome you to take the platform for a test drive and feel free to raise any issues you encounter during the installation process. Your feedback is invaluable to us, as it helps us identify and address any potential installation challenges and improve the user experience.&lt;/p></description></item><item><title>Enhancing Drift Detection through Fine-Tuning Llama2</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/anl/perfdrift/20230730-kangrui/</link><pubDate>Sun, 30 Jul 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/anl/perfdrift/20230730-kangrui/</guid><description>&lt;p>Greetings everyone, I&amp;rsquo;m Kangrui. Over the past few weeks, we&amp;rsquo;ve dedicated our efforts and have consequently made significant progress in our drift detection methods. Now, I&amp;rsquo;m excited to present to you a detailed elaboration on how we prompted and fine-tuned Llama2 to efficiently carry out the drift detection task.&lt;/p>
&lt;h2 id="motivation">Motivation&lt;/h2>
&lt;h3 id="why-llm-in-drift-detection-method">Why LLM in drift detection method?&lt;/h3>
&lt;p>The use of large language models (LLMs) in drift detection methods presents numerous benefits that place it as a prominent solution in this domain.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Rapid Development:&lt;/strong> LLMs are in the vanguard of technological advancement. This field is evolving rapidly with continuous enhancements in model architecture, training techniques, and data handling. With every new version, these models are showing an increasing capacity to understand and generate human-like text, pushing the limits of what is achievable in Natural Language Processing (NLP) and Artificial Intelligence (AI) as a whole.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Superior Performance:&lt;/strong> Traditional drift detection methodologies such as Page-Hinkley, EDDM, and HDDM have their merits and have found success in numerous scenarios. Even Deep Learning (DL) techniques, like training a predictive model based on error rates, have made significant strides in the field. However, when handling complex, high-dimensional, and real-time data, LLMs have demonstrated exceptional results. They are not only able to effectively predict and respond to drifts but also adapt to new trends more swiftly. Our experiments using LLMs like GPT-3.5-turbo have yielded impressive results, notably outperforming other methods.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="GPT-3.5-turbo Performance" srcset="
/report/osre23/anl/perfdrift/20230730-kangrui/gpt-3.5-performance_hudb1929583c62f83e6182026371c0950a_147441_986c57531b096aac2ea5604c7942efed.webp 400w,
/report/osre23/anl/perfdrift/20230730-kangrui/gpt-3.5-performance_hudb1929583c62f83e6182026371c0950a_147441_534b4ca0b9e767d820ed9b45d754db9f.webp 760w,
/report/osre23/anl/perfdrift/20230730-kangrui/gpt-3.5-performance_hudb1929583c62f83e6182026371c0950a_147441_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/anl/perfdrift/20230730-kangrui/gpt-3.5-performance_hudb1929583c62f83e6182026371c0950a_147441_986c57531b096aac2ea5604c7942efed.webp"
width="760"
height="303"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>&lt;em>Fig. 1: Concept dirfts detected by GPT-3.5-turbo in Cori dataset&lt;/em>&lt;/p>
&lt;ol start="3">
&lt;li>&lt;strong>Flexibility:&lt;/strong> One of the major advantages of using LLMs is their flexibility in dealing with different types of input and output. In contrast to traditional methods, which are confined to single feature concept drift detection and can only process numerical values, LLMs can handle a range of input types including text, numbers, and more complex data structures. This capability allows them to detect multi-feature concept drifts, thereby broadening the scope and complexity of problems they can tackle. Moreover, the generation capability of LLMs can provide rich and detailed output, facilitating more comprehensive insights into the detected drifts.&lt;/li>
&lt;/ol>
&lt;h2 id="why-llama2-in-drift-detection-method">Why Llama2 in drift detection method?&lt;/h2>
&lt;p>Llama2 presents a series of advantages that make it an excellent choice for applying llm in drift detection. Here&amp;rsquo;s a breakdown of the key reasons:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Performance Guarantee:&lt;/strong> As a newly released model, Llama2 has undergone extensive development and testing, providing a reliable guarantee of performance. It represents the cutting edge in AI technology, having benefited from the latest research and advancements in language model design.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Accessibility Guarantee:&lt;/strong> One significant advantage of Llama2 is that it is open-source. It is readily accessible on HuggingFace, which also provides a range of mature tools to fine-tune and deploy the model.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Flexibility for Fine-Tuning:&lt;/strong> Llama2 comes in different sizes, such as 7B, 13B, and 75B parameters, which allows for flexibility in model selection based on the task&amp;rsquo;s requirements and computational resources.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="data">Data&lt;/h2>
&lt;h3 id="dataset">Dataset&lt;/h3>
&lt;p>In our study, we employed &lt;a href="https://github.com/alipsgh/data-streams" target="_blank" rel="noopener">Synthetic data streams&lt;/a> for the fine-tuning of Llama2. Synthetic data streams serve as an invaluable resource for controlled experiments in the domain of drift detection. These curated datasets encompass varied types of drifts, providing us with the capability to assess the efficacy of our detection algorithms under diverse scenarios.&lt;/p>
&lt;p>Here is a brief introduction to the synthetic datasets we used:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Sine1 &amp;amp; Sine2:&lt;/strong> These datasets induce abrupt concept drift within a two-dimensional feature space. The classification rule, a sine function, dictates the instance labels, which are flipped at every drift point.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Mixed:&lt;/strong> This dataset, characterized by its combination of numeric and boolean features, uses a composite classification rule. The abrupt concept drift is simulated via a periodic reversal of class labels.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Stagger:&lt;/strong> This categorical dataset incorporates abrupt concept drift by periodically altering the classification rules tied to the features.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Circles &amp;amp; LED:&lt;/strong> These datasets are designed to simulate gradual concept drift. In Circles, the classification of instances is determined by their spatial relation to specific circles. LED imitates a seven-segment digit display, introducing drift by interchanging the pertinent attributes.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>Typically, the synthetic datasets contain 100,000 or 1,000,000 instances. The concept drift happens every 25000 or 33333 instances each portraying either abrupt (with drifting period of 50 instances) or gradual concept drifts (with drifting period of 500 instances).&lt;/p>
&lt;h3 id="data-preprocessing-and-metrics">Data Preprocessing and Metrics&lt;/h3>
&lt;p>Given the token limit of Llama2 and the specific requirements of our project, we needed to transform the data into an appropriate format.&lt;/p>
&lt;p>As such, we processed each data stream into three sections: the &amp;lsquo;undrifted&amp;rsquo; period, the &amp;lsquo;drifting&amp;rsquo; period, and the &amp;lsquo;drifted&amp;rsquo; period. All instances in each section were randomly and independently drawn from the original data stream, summing up to a maximum of 100 instances. The number of instances for the undrifted and drifted periods ranged from 20 to 50, and for the drifting period, it ranged from 10 to 20.&lt;/p>
&lt;p>For instance, let&amp;rsquo;s consider a dataset containing 100,000 instances where the concept drift occurs every 25,000 instances, causing abrupt concept drift. To format a data point, we could draw 20 to 50 instances from the first 25,000 as the undrifted period. Then, we could draw 10 to 20 instances from the 25,001st to 25,050th instance as the drifting period. Finally, we would draw 10 to min(100 - num(undrifted period) - num(drifting period), 50) from the 25,051st to 50,050th instance as the drifted period. This newly formatted data stream would then be fed into Llama2.&lt;/p>
&lt;p>We also included some additional information to assist Llama2&amp;rsquo;s inference process. A typical data point in our processed dataset includes:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;before_period&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">31&lt;/span>&lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;transition_period&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="mi">32&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">38&lt;/span>&lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;after_period&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="mi">39&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">59&lt;/span>&lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;before_index&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="mi">196&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">19963&lt;/span>&lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;transition_index&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="mi">20002&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">20030&lt;/span>&lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;after_index&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="mi">20310&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">39984&lt;/span>&lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;meta&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;Dataset: MIXED&lt;/span>&lt;span class="se">\n\t&lt;/span>&lt;span class="s2">v&amp;#39;s type is nominal, range is (&amp;#39;False&amp;#39;, &amp;#39;True&amp;#39;)&lt;/span>&lt;span class="se">\n\t&lt;/span>&lt;span class="s2">w&amp;#39;s type is nominal, range is (&amp;#39;False&amp;#39;, &amp;#39;True&amp;#39;)&lt;/span>&lt;span class="se">\n\t&lt;/span>&lt;span class="s2">x&amp;#39;s type is numeric&lt;/span>&lt;span class="se">\n\t&lt;/span>&lt;span class="s2">y&amp;#39;s type is numeric&lt;/span>&lt;span class="se">\n\t&lt;/span>&lt;span class="s2">class&amp;#39;s type is nominal, range is (&amp;#39;p&amp;#39;, &amp;#39;n&amp;#39;)&lt;/span>&lt;span class="se">\n&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;data_stream&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>From this dictionary, the &amp;ldquo;meta&amp;rdquo; and &amp;ldquo;data_stream&amp;rdquo; entries are fed into Llama2. The &amp;ldquo;transition_period&amp;rdquo; serves as the criterion: if Llama2&amp;rsquo;s answer lies within the &amp;ldquo;transition_period&amp;rdquo;, we deem it correct.&lt;/p>
&lt;h2 id="llama2">Llama2&lt;/h2>
&lt;h3 id="inference">Inference&lt;/h3>
&lt;p>We experimented with three variations of prompts during the inference phase.&lt;/p>
&lt;p>&lt;strong>Prompt Version 1:&lt;/strong>&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="cl">[INST] &amp;lt;&amp;lt;SYS&amp;gt;&amp;gt;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> You are a helpful, respectful, and honest assistant. Always provide the most helpful responses possible while ensuring safety. Ensure that your responses are socially unbiased, positive, and free from harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. If a question lacks coherence or sense, explain why instead of providing incorrect information. If you are uncertain about an answer, refrain from sharing false information.
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;lt;&amp;lt;/SYS&amp;gt;&amp;gt;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> Your task is to identify the index in a given data stream where the relationship between the features and labels begins to change. The data stream is formatted as a list, with each element being a two-element list: the first represents the features (also a list), and the second is the label. If your answer is &amp;#39;x&amp;#39;, it indicates that the data pattern starts shifting at the xth data point in the stream.
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> Here&amp;#39;s an example of the data&amp;#39;s metadata: Dataset: SINE1
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> x&amp;#39;s type is numeric
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> y&amp;#39;s type is numeric
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> class&amp;#39;s type is nominal, range is (&amp;#39;p&amp;#39;, &amp;#39;n&amp;#39;)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> The given data stream is: [[[0.7, 0.07], &amp;#39;p&amp;#39;], [[0.45, 0.78], &amp;#39;n&amp;#39;], ..., [[0.64, 0.45], &amp;#39;n&amp;#39;]]
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> Your task is to respond with a single index. No additional information is required.
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">[/INST]
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;strong>Prompt Version 2:&lt;/strong>&lt;/p>
&lt;p>The same as Prompt 1, but with a specific range for the index response:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="cl">Please provide an index ranging from 0 to 96. No additional information is required.
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;strong>Prompt Version 3:&lt;/strong>&lt;/p>
&lt;p>This prompt uses an instruction-input-output design, which we adopted for fine-tuning:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="cl">Below is an instruction paired with an input that provides further context. Write a response that appropriately completes the request.
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">### Instruction:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Identify the index in a given data stream where the relationship between features and labels begins to change. The data stream is formatted as a list, each element being a two-element list: the first represents the features (also a list), and the second is the label. For instance, if the response is &amp;#39;x&amp;#39;, it means that the data pattern starts shifting at the xth data point in the stream. Only respond with an index, no further information is necessary.
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">### Input:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Meta Data:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Dataset: SINE1
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> x&amp;#39;s type is numeric
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> y&amp;#39;s type is numeric
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> class&amp;#39;s type is nominal, range is (&amp;#39;p&amp;#39;, &amp;#39;n&amp;#39;)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Data stream:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">[[[0.7, 0.07], &amp;#39;p&amp;#39;], [[0.45, 0.78], &amp;#39;n&amp;#39;], .., [[0.64, 0.45], &amp;#39;n&amp;#39;]]
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">### Response:
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Despite minor differences between Prompt Version 1 and Version 2, both suggested by Meta, the results varied significantly, a topic we will delve into in the following section. Prompt Version 3, employing the instruction-input-output structure, was used during our fine-tuning process.&lt;/p>
&lt;h3 id="fine-tuning">Fine-Tuning&lt;/h3>
&lt;p>We utilized the tools provided by &lt;a href="https://github.com/facebookresearch/llama-recipes" target="_blank" rel="noopener">llama-recipes&lt;/a> to fine-tune Llama2. The key command used to initiate the fine-tuning process is illustrated below:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-shell" data-lang="shell">&lt;span class="line">&lt;span class="cl">python llama_finetuning.py --use_peft &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --peft_method lora &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --quantization &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --model_name meta-llama/Llama-2-13b-chat-hf &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --output_dir ./fine_tuned_model/Llama-2-13b-chat-hf-test_finetune &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --dataset alpaca_dataset &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --batch_size_training &lt;span class="m">40&lt;/span> &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --num_epochs &lt;span class="m">1&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Some explaination about the parameters:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="cl">--use_peft: This flag indicates the use of the Parameter-Efficient Fine-Tuning (PEFT) method. PEFT allows us to fine-tune the model more efficiently.
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">--peft_method lora: Here, we specify that the Lora (Layer-wise Optimal Brain Surgeon with Relevance-based Adjustment) method should be used for PEFT.
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">--quantization: The quantization flag is used to reduce the memory footprint of the model during the inference stage. It does so by reducing the precision of the model&amp;#39;s weights.
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">--dataset alpaca_dataset: Specifies the dataset setting used for fine-tuning, in this case, the &amp;#39;alpaca_dataset&amp;#39; indicates the instruction-input-output structure for fine-tuning.
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="results">Results&lt;/h2>
&lt;p>The performance of various models and prompt versions is depicted in Fig. 2.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="All Performance" srcset="
/report/osre23/anl/perfdrift/20230730-kangrui/performance_plot_hu026976f577cb17db71cb82cd3675225d_101027_f4b54b1d163428a3bbdd2373c5e7d6c6.webp 400w,
/report/osre23/anl/perfdrift/20230730-kangrui/performance_plot_hu026976f577cb17db71cb82cd3675225d_101027_ba09d14d8674a9735bf9bb60ce301dae.webp 760w,
/report/osre23/anl/perfdrift/20230730-kangrui/performance_plot_hu026976f577cb17db71cb82cd3675225d_101027_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/anl/perfdrift/20230730-kangrui/performance_plot_hu026976f577cb17db71cb82cd3675225d_101027_f4b54b1d163428a3bbdd2373c5e7d6c6.webp"
width="760"
height="608"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>&lt;em>Fig. 2: Performance comparison of different models and prompt versions.&lt;/em>&lt;/p>
&lt;p>It is evident from the results that the design of the prompt has a significant impact on Llama2&amp;rsquo;s performance. Furthermore, due to computational resource constraints, we have only managed to fine-tune Llama2 on a portion of our dataset (approximately 1,000 instances). The entire training set consists of 19,000 instances, and the test set includes 5,000 instances. Despite these limitations, a performance increase is noticeable after fine-tuning.&lt;/p></description></item><item><title>GPU Emulator for Easy Reproducibility of DNN Training -- Interim Blog Post</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/utexas/gpuemulator/20230730-haoranwu/</link><pubDate>Sun, 30 Jul 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/utexas/gpuemulator/20230730-haoranwu/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;h4 id="motivation">Motivation&lt;/h4>
&lt;p>The growing popularity of Deep Neural Networks has resulted in a substantial increase in demand for Graphics Processing Units (GPUs). GPUs are crucial for conducting matrix computations in DNN training and inference. However, they are expensive to purchase for personal use, and the limited availability of GPU resources in public research clouds like Chameleon further exacerbates the issue. This scarcity of resources can cause delays in DNN-related research projects. Therefore, building an emulator can ameliorate the trouble of reserving GPUs, and the emulator can be modified to gather the profiles needed for optimization much quicker.&lt;/p>
&lt;h4 id="overture">Overture&lt;/h4>
&lt;p>The follwing sections will introduce the completed tasks and specify the details within each. The contents are briefly summarized and will try to present the necessary information only. We finished the following tasks:&lt;/p>
&lt;ul>
&lt;li>Literature Review&lt;/li>
&lt;li>Emulator implementation:
&lt;ul>
&lt;li>Time Profiling&lt;/li>
&lt;li>Pinned Memory&lt;/li>
&lt;li>Inter-GPUs Computation&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Reproducing Figures&lt;/li>
&lt;/ul>
&lt;p>I will introduce them and the importance of each one.&lt;/p>
&lt;h2 id="tasks--reason">Tasks + Reason&lt;/h2>
&lt;h4 id="literature-review">Literature Review&lt;/h4>
&lt;p>While waiting for the measurements, I started reading about other GPU-related papers, especially the ones about GPU Schedulers. We found that besides emulating computation and transfer time, we should also emulate the GPU memory profile in order to reproduce some other papers. Fortunately, it’s doable. In fact, without actually using a GPU, we can emulate many aspects of the GPU, more than just its timing. I found several papers that are reproducible theoretically, but they use Tensorflow while my current work targets Pytorch. Therefore I need to keep looking for the ones that use Pytorch.&lt;/p>
&lt;p>Afterwards, we started doing more paper reviews and looked over the papers about GPU Scheduling from 2018-2023 to see if we can reproduce figures from other papers. We went over 150 papers to search for the ones that do have implementation in PyTorch and the complemented GitHub page. We managed to find about 15 papers built in PyTorch and 6 of them were published on GitHub.&lt;/p>
&lt;p>We found the paper “CoGNN: Efficient Scheduling for Concurrent GNN Training on GPUs” and its GitHub page. The paper has three badges of “Artifacts Available, Evaluated, and Reproduced.” The paper’s content is implemented in PyTorch which means we can probably emulate this paper’s result with the emulator we already have by adding more features. We have started testing out to see if we can set up a similar environment and reproduce the experiments in the paper. After checking out the reproducibility of the paper, we will try to reproduce it using our emulator, and we might add new features to our emulator during this process.&lt;/p>
&lt;p>Firstly, I tried to reproduce the figures in the paper “CoGNN: Efficient Scheduling for Concurrent GNN Training on GPUs”, but stopped after a considerable number of attempts because the README was incomplete and too hard to follow. I first headed to the GitHub of the paper. I read the paper and understood that the GNN’s training was not the same as regular deep learning training, because it had input irregularity, and CoGNN helped better schedule the jobs to the machines by their algorithm. However, when I tried to install the software by the requirement of their environment README in order to reproduce the figures, many dependency issues were there, and barely any packages required were installed successfully. Their README in the software module was unclear on how to run the experiments too. Following the experiment setup did not give me the expected results. After a set of struggles with even completing one suggested experiment, we eventually decided to move on with other papers and abandoned this paper, reminding me the importance of reproducibility again.&lt;/p>
&lt;p>Secondly, we found another paper “Beware of Fragmentation: Scheduling GPU-Sharing Workloads with Fragmentation Gradient Descent”. After reading the paper, we figured that the main focus was on distributing the resources (CPU, GPU) of the nodes to the jobs that were distributed by the Kubernetes Scheduler. In this way, there would be less GPU fragmentation and a higher utility rate of the resources. The paper used a simulator to simulate a large number of nodes and run the jobs by simulation. I successfully ran the experiments demonstrated in the repo and even created a smaller sample so that we could gain the result faster, because their original experiment takes 1020 times which will take about a month. However, when we dug deeper into their paper, we soon realized that their emulator is not a “real” one. Although their emulator is built off Kubernetes, the side where they used to create the figures are mere simulators and therefore doesn’t fit with our goal of emulating only GPU-related parts while running other real-system parts.&lt;/p>
&lt;h5 id="reason">Reason:&lt;/h5>
&lt;p>The purpose is to figure out which papers can be reproduced using the emulator, and what other features are needed for the emulator to work.&lt;/p>
&lt;h4 id="emulator-implementation">Emulator implementation&lt;/h4>
&lt;h5 id="time-profiling">Time Profiling&lt;/h5>
&lt;p>I did the performance profiling of different GPUs, which included CPU-to-GPU data transfer time and GPU computation time. These two elements will always be rather constant on GPUs so they can be easily emulated by profiling first and then utilized in the emulation. We did it for 6 different GPUs including k80, rtx6000, m40, a100pcie, v100, and p100.&lt;/p>
&lt;p>After having the performance profiling information of a few types of GPU nodes, I implemented the first naive version of the emulator. I used the profile recorded and sleep() function to represent the amount of time that each step needs to accomplish. Meanwhile, the time also varies with the command given so some simple arithmetics were implemented too. It’s implemented on a CPU node yet if we want to know the time profile of a GPU, we can still get them just like on a real GPU node.&lt;/p>
&lt;h5 id="reason-1">Reason:&lt;/h5>
&lt;p>The time profile collected can be compared with Data Wait Time to conduct research on minimizing pipeline stall across different GPUs and models.&lt;/p>
&lt;h5 id="pinned-memory">Pinned Memory&lt;/h5>
&lt;p>Pin memory threads – GPU-based Pytorch utilizes such threads to copy data from SHM to pinned memory, but CPU-based Pytorch doesn’t do so. Therefore, I need to implement an emulation of the pin mem threads. Fortunately, the data copy time is predictable. I have already found out that pin mem time has little to do with # of workers or the model type but only the batch size. I still need to find out if it has anything to do with the GPU nodes, which I assume not at this point.&lt;/p>
&lt;p>While implementing the features, We first emulated the CPU-to-GPU transfer time and GPU computation time for the p100 GPU based on the profiled information. Another CUDA behavior that requires emulation is that CUDA copies data from shared memory to pinned memory. In order to emulate it, we measured and emulated the time for copying such data (pinned memory). However, the emulator did not behave exactly as the real GPU. This was because we only emulated the time cost of using pinned_memory, but didn’t emulate its memory cost. In order to resolve the problem above, we wrote a CPython module to manually allocate page-locked memory (which behaves the same as CUDA’s pinned_memory). After we implemented this mechanism, the emulator’s fundamental functions were equipped and properly mimicked CUDA’s behaviors.&lt;/p>
&lt;h5 id="reason-2">Reason:&lt;/h5>
&lt;p>After collecting the GPU profile, I did a comparison with the actual GPU but noticed some differences in their IO time, meaning there was a difference between the emulation-based Pytorch and the actual GPU-based Pytorch.&lt;/p>
&lt;h5 id="inter-gpus-computation">Inter-GPUs Computation&lt;/h5>
&lt;p>We worked on the emulation of inter-GPU computation time in order to emulate Figure 9 in the DNN stall paper. This is one of the influential factors in multi-GPU training and we decided to first figure out how to implement this feature. As claimed in the paper, the larger the batch size, the less time it took to update the model. However, our current emulator would give out the same computation time since we have not added features to emulate inter-GPU behaviors. The smaller the batch size, more overheads were proven to be larger. The first step was to rent a lease that had 2 GPUs and saw the effects of inter-GPUs on computation time. We found that there was a small amount of overhead when running two GPUs instead of 1 GPU on the p100 node. My job was to find out where and how these overheads happened and find ways to emulate these features in order to reproduce Figure 9. We used resnet18, 4 workers, 10 batches to separately run 128 batch-size with 1 GPU (Group A) and 256 batch-size with 2 GPUs (Group B). With our current emulator, we would get the same computation time for both experiments to finish 1 batch. However, we saw that the computation time of Group B was longer than Group A, meaning there were some overheads in computation time. I then hacked into the source code of PyTorch and successfully figured out one part of the overhead contributing factors.&lt;/p>
&lt;h5 id="reason-3">Reason:&lt;/h5>
&lt;p>To better complete the emulator so that it can procide accurate emulation even when using more than 1 GPU on a machine.&lt;/p>
&lt;h4 id="reproducing-figures">Reproducing Figures&lt;/h4>
&lt;p>After implementing the emulator, we managed to use it to reproduce Figures 3, 4, 5, and 6 in the paper &lt;a href="chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://vldb.org/pvldb/vol14/p771-mohan.pdf">“Analyzing and Mitigating Data Stalls in DNN Training”&lt;/a> after a series of experiments and testing. It was noted that some environments in the paper were not the same as what we ran in the past week, but general patterns did apply to the expected hypothesis and measurements. We double checked all the data and figures produced and found out that our prototype meets our expectations, and it was time to look for other papers to reproduce to make the emulator more interesting.
The orginial comparing with the reproduced figures are demonstrated as below, you can notice that the patterns do reflect our expected results:
Original Figure 3:
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="original_figure3" srcset="
/report/osre23/utexas/gpuemulator/20230730-haoranwu/original_figure3_hu4825ad7506f6235ff41682e84b760224_101661_b55e965312579f5be79be0c6d21c853a.webp 400w,
/report/osre23/utexas/gpuemulator/20230730-haoranwu/original_figure3_hu4825ad7506f6235ff41682e84b760224_101661_def01bfdab18fc7d262d08c4c2388828.webp 760w,
/report/osre23/utexas/gpuemulator/20230730-haoranwu/original_figure3_hu4825ad7506f6235ff41682e84b760224_101661_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/utexas/gpuemulator/20230730-haoranwu/original_figure3_hu4825ad7506f6235ff41682e84b760224_101661_b55e965312579f5be79be0c6d21c853a.webp"
width="710"
height="399"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
Reproduced Figure 3:
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="reproduced_figure3" srcset="
/report/osre23/utexas/gpuemulator/20230730-haoranwu/reproduced_figure3_hu7ba68c9ce0c1781ae4d515ec33f3be68_90176_8b936fcb3ddfc9bb3592d7628c1f8641.webp 400w,
/report/osre23/utexas/gpuemulator/20230730-haoranwu/reproduced_figure3_hu7ba68c9ce0c1781ae4d515ec33f3be68_90176_7e97d9d32a7ef0dc9e6fd3817d59f028.webp 760w,
/report/osre23/utexas/gpuemulator/20230730-haoranwu/reproduced_figure3_hu7ba68c9ce0c1781ae4d515ec33f3be68_90176_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/utexas/gpuemulator/20230730-haoranwu/reproduced_figure3_hu7ba68c9ce0c1781ae4d515ec33f3be68_90176_8b936fcb3ddfc9bb3592d7628c1f8641.webp"
width="676"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
Original Figure 4:
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="original_figure4" srcset="
/report/osre23/utexas/gpuemulator/20230730-haoranwu/original_figure4_huf7912859d14522ea127e57f50e29e6e8_97339_981ac3e3445c520fd3934870e4eddab4.webp 400w,
/report/osre23/utexas/gpuemulator/20230730-haoranwu/original_figure4_huf7912859d14522ea127e57f50e29e6e8_97339_d783b7ce2dba6c9c3d988fc836f9f0ca.webp 760w,
/report/osre23/utexas/gpuemulator/20230730-haoranwu/original_figure4_huf7912859d14522ea127e57f50e29e6e8_97339_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/utexas/gpuemulator/20230730-haoranwu/original_figure4_huf7912859d14522ea127e57f50e29e6e8_97339_981ac3e3445c520fd3934870e4eddab4.webp"
width="687"
height="390"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
Reproduced Figure 4:
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="reproduced_figure4" srcset="
/report/osre23/utexas/gpuemulator/20230730-haoranwu/reproduced_figure4_hubf1eec72de057c79eda887bdba386155_82657_05469d8674723f8da1bb12f8f7e3e989.webp 400w,
/report/osre23/utexas/gpuemulator/20230730-haoranwu/reproduced_figure4_hubf1eec72de057c79eda887bdba386155_82657_60c661b4eb6240ac2765c6540bcaf26c.webp 760w,
/report/osre23/utexas/gpuemulator/20230730-haoranwu/reproduced_figure4_hubf1eec72de057c79eda887bdba386155_82657_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/utexas/gpuemulator/20230730-haoranwu/reproduced_figure4_hubf1eec72de057c79eda887bdba386155_82657_05469d8674723f8da1bb12f8f7e3e989.webp"
width="695"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="reproduced_figure4a" srcset="
/report/osre23/utexas/gpuemulator/20230730-haoranwu/reproduced_figure4a_huce40b0ef347fbe8c5dda7cd64b89a82a_83431_6b8b84d9f461863c0223d3d2992f1557.webp 400w,
/report/osre23/utexas/gpuemulator/20230730-haoranwu/reproduced_figure4a_huce40b0ef347fbe8c5dda7cd64b89a82a_83431_732e04a2227ba54fb9fb64e0dbe90717.webp 760w,
/report/osre23/utexas/gpuemulator/20230730-haoranwu/reproduced_figure4a_huce40b0ef347fbe8c5dda7cd64b89a82a_83431_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/utexas/gpuemulator/20230730-haoranwu/reproduced_figure4a_huce40b0ef347fbe8c5dda7cd64b89a82a_83431_6b8b84d9f461863c0223d3d2992f1557.webp"
width="687"
height="701"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
Original Figure 5:
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="original_figure5" srcset="
/report/osre23/utexas/gpuemulator/20230730-haoranwu/original_figure5_hua788af616c21c63d77c26ece571c44f2_48006_c74d4ac38263a1767beef877888c7a0e.webp 400w,
/report/osre23/utexas/gpuemulator/20230730-haoranwu/original_figure5_hua788af616c21c63d77c26ece571c44f2_48006_01b1945b0dc6c47dee7de7ade64b50bc.webp 760w,
/report/osre23/utexas/gpuemulator/20230730-haoranwu/original_figure5_hua788af616c21c63d77c26ece571c44f2_48006_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/utexas/gpuemulator/20230730-haoranwu/original_figure5_hua788af616c21c63d77c26ece571c44f2_48006_c74d4ac38263a1767beef877888c7a0e.webp"
width="549"
height="299"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
Reproduced Figure 5:
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="reproduced_figure5" srcset="
/report/osre23/utexas/gpuemulator/20230730-haoranwu/reproduced_figure5_hu19c2897350abb8738bf9073d8758691e_57335_fdba37e325a306223f6a391e9a69a4ac.webp 400w,
/report/osre23/utexas/gpuemulator/20230730-haoranwu/reproduced_figure5_hu19c2897350abb8738bf9073d8758691e_57335_8c9f78e75740fb2681a32c842527250b.webp 760w,
/report/osre23/utexas/gpuemulator/20230730-haoranwu/reproduced_figure5_hu19c2897350abb8738bf9073d8758691e_57335_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/utexas/gpuemulator/20230730-haoranwu/reproduced_figure5_hu19c2897350abb8738bf9073d8758691e_57335_fdba37e325a306223f6a391e9a69a4ac.webp"
width="719"
height="750"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
Original Figure 6:
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="original_figure6" srcset="
/report/osre23/utexas/gpuemulator/20230730-haoranwu/original_figure6_hu28381534680632c929008cb2ca5db00c_55137_a24c7a3b53c18cb5a6f22a52536fd86f.webp 400w,
/report/osre23/utexas/gpuemulator/20230730-haoranwu/original_figure6_hu28381534680632c929008cb2ca5db00c_55137_5f6250df82bf4852c73c925bc3934b14.webp 760w,
/report/osre23/utexas/gpuemulator/20230730-haoranwu/original_figure6_hu28381534680632c929008cb2ca5db00c_55137_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/utexas/gpuemulator/20230730-haoranwu/original_figure6_hu28381534680632c929008cb2ca5db00c_55137_a24c7a3b53c18cb5a6f22a52536fd86f.webp"
width="527"
height="304"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
Reproduced Figure 6:
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="reproduced_figure6" srcset="
/report/osre23/utexas/gpuemulator/20230730-haoranwu/reproduced_figure6_hu932abfdff2db3be7f25d6ea6fa38efc4_57111_1c46178f0a8f8ccef8efa61f5fe40809.webp 400w,
/report/osre23/utexas/gpuemulator/20230730-haoranwu/reproduced_figure6_hu932abfdff2db3be7f25d6ea6fa38efc4_57111_97e1110894b3975f840ad24bcbc0df12.webp 760w,
/report/osre23/utexas/gpuemulator/20230730-haoranwu/reproduced_figure6_hu932abfdff2db3be7f25d6ea6fa38efc4_57111_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/utexas/gpuemulator/20230730-haoranwu/reproduced_figure6_hu932abfdff2db3be7f25d6ea6fa38efc4_57111_1c46178f0a8f8ccef8efa61f5fe40809.webp"
width="760"
height="626"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h5 id="reason-4">Reason:&lt;/h5>
&lt;p>Our origninal goal was to reproduce papers. Therefore, reproducing figures is a really good step to achieve that.&lt;/p>
&lt;h2 id="summary--coming-future">Summary + Coming Future&lt;/h2>
&lt;p>We will keep on trying to complete the emulator and figure out the exact mechanisms needed for the implementation. We will also seek for more features and see if it&amp;rsquo;s possible to add in better features into the emulator.&lt;/p></description></item><item><title>Building extensions between Python libraries for Biotechnology laboratories</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/labop/20230804-luhesketh/</link><pubDate>Fri, 28 Jul 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/labop/20230804-luhesketh/</guid><description>&lt;p>Hello again! This is Luiza, a GSoC contributor for the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsd/labop">LabOp&lt;/a> Project.
My task is to build bridges between programming languages for Biotechnology Laboratory automation.&lt;/p>
&lt;p>When talking about life sciences, reproducibility is a issue amongst most research centers. Biotechnology focused laboratories usually have their own protocols developed in house for their own applications. Researchers rely on such protocols to perform their experiments and collect data but when it comes to sharing those protocols and performing them in different laboratories many difficulties arise. Whether it is by lack of equipment, reagents or even by having different orders of execution, replicating a protocol in another laboratory is a challenge. To address this issue LabOp was developed to represent a protocol and convert it in many ways possible, so it can be executed by humans and by machines.&lt;/p>
&lt;p>PylabRobot and PyHamilton also come to the picture as such libraries exist to make it possible to write protocols for Hamilton robots(and Tecan machines as well for PylabRobot) but those libraries share the limitation of being able to only represent laboratory protocols at their lower levels, with the user having to write every single command in Python for the protocol to be executed. Thus I’m currently developing an extension for LabOp protocols to be converted into PylabRobot/PyHamilton scripts. This way the researcher writing the protocol can do it in a friendlier fashion, using human-friendly terms to write protocols for robot execution.&lt;/p>
&lt;figure id="figure-behaviourspecialization-for-liquid-handling-class">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="BehaviourSpecialization for Liquid Handling class" srcset="
/report/osre23/ucsd/labop/20230804-luhesketh/featured_hu4f9fc1ff392d0f6236dd97921cc62ee1_67178_7dea1005b9355831aab4fd48906afaec.webp 400w,
/report/osre23/ucsd/labop/20230804-luhesketh/featured_hu4f9fc1ff392d0f6236dd97921cc62ee1_67178_67bd573e81d4a87cd9d10cf5cb216d81.webp 760w,
/report/osre23/ucsd/labop/20230804-luhesketh/featured_hu4f9fc1ff392d0f6236dd97921cc62ee1_67178_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/labop/20230804-luhesketh/featured_hu4f9fc1ff392d0f6236dd97921cc62ee1_67178_7dea1005b9355831aab4fd48906afaec.webp"
width="760"
height="436"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
BehaviourSpecialization for Liquid Handling class
&lt;/figcaption>&lt;/figure>
&lt;p>The first step is building a correspondence spreadsheet with a hello world protocol written in both languages (LabOp | PylabRobot ). This way we can make an equivalence between the functions, parameters and default commands of both Libraries, as well as their structure. This spreadsheet will serve as guidance for the conversion of the Liquid handling steps from their representation in LabOp to their representation in Pylabrobot.&lt;/p>
&lt;p>The second step is to create a file that&amp;rsquo;ll do execute the conversion. In this file I will define a Labware map that&amp;rsquo;s basically a dictionary translating the resources LabOp names into Labware IDs recognizable by PylabRobots &amp;ldquo;resource&amp;rdquo; classes and a Behaviourspecialization class that should convert LabOp actions into PylabRobots Liquid Handler class operations as they&amp;rsquo;ll coordinate the commands sent from the script to the machines.(see featured images)&lt;/p>
&lt;figure id="figure-dictionary-for-labop-to-pylabrobot-container-correspondence">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Dictionary for LabOp to Pylabrobot container correspondence" srcset="
/report/osre23/ucsd/labop/20230804-luhesketh/featured_2_hu5b9f0fb3cb1cb61c40db218a2048e04a_278481_76e3dd3c112ca74ef8e3b7459123e154.webp 400w,
/report/osre23/ucsd/labop/20230804-luhesketh/featured_2_hu5b9f0fb3cb1cb61c40db218a2048e04a_278481_8337c1f75572828ec38252d4fdee0f96.webp 760w,
/report/osre23/ucsd/labop/20230804-luhesketh/featured_2_hu5b9f0fb3cb1cb61c40db218a2048e04a_278481_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/labop/20230804-luhesketh/featured_2_hu5b9f0fb3cb1cb61c40db218a2048e04a_278481_76e3dd3c112ca74ef8e3b7459123e154.webp"
width="760"
height="465"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Dictionary for LabOp to Pylabrobot container correspondence
&lt;/figcaption>&lt;/figure>
&lt;p>Then we move to the protocol that will be tested on the Hamilton Machines, this is a Plasmid purification protocol that is usually performed by a human at a very lower level, one sample at a time. This limitation is not present on Hamilton robots as they can handle many samples at the same time with only one protocol execution. The robot that will be running this protocol has two modules that are not yet present in PylabRobot’s extensions, a pressure pump module and a on deck heatershaker. I’ll be implemmenting this modules in PylabRobot based on their default commands present in PyHamilton and run the protocol on a Hamilton Starlet unit.&lt;/p>
&lt;p>The steps of the protocol have been decoupled to facilitate the pilot testing, they are as follows:&lt;/p>
&lt;ul>
&lt;li>Liquid handling - GOOD TO GO&lt;/li>
&lt;li>Pressure pump module- requires adjustments&lt;/li>
&lt;li>plate grippers(necessary to move the plasmid plate from one module to another) - requires adjustment&lt;/li>
&lt;li>On deck heaterShaker- GOOD TO GO&lt;/li>
&lt;/ul>
&lt;p>The first pilot tests of the protocol will be run with water instead of plasmid to verify that all the steps are going smoothly, when that’s out of the way we will perform the protocol with dirty plasmids that require purification (which is what the protocol is for). The measurements for success will be sequencing the plasmid (if possible), performing a gel eletrophoresis and measuring absorbance of the DNA.&lt;/p>
&lt;p>The goal of this tests is to gather data from the efectiveness of the protocol and its execution on the machine, thus confirming that it is in fact a useful mechanism for DNA purification.&lt;/p></description></item><item><title>Uncovering Actionable Insights using ReadTheDocs Analytics</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20230727-luarss/</link><pubDate>Thu, 27 Jul 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20230727-luarss/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Hello again! This is Jack, a GSoC contributor for the OpenROAD Project.
My task is to update and optimise the documentation to encourage user
adoption and engagement.&lt;/p>
&lt;p>For open-source repo maintainers, &lt;a href="https://readthedocs.org/" target="_blank" rel="noopener">readthedocs&lt;/a>
is a godsend. One of its more underrated features are in providing
search and traffic analytics of up to &lt;strong>90 days&lt;/strong> for the &lt;code>Community&lt;/code> tier
users. This is awesome, because ReadTheDocs is &amp;ldquo;always free for open source
and community projects&amp;rdquo;.&lt;/p>
&lt;h2 id="motivation">Motivation&lt;/h2>
&lt;p>Why are analytics important?&lt;/p>
&lt;p>Analytics are great as a &lt;em>proxy&lt;/em> indicator for documentation engagement.
For instance, traffic to a page, could highlight how popular the tool is,
or it could also mean the tool is unclear and therefore people might need
more visits to the page to further understand usage. But overall,
it still indicates that the page needs to be taken care of due to the
increased visits.&lt;/p>
&lt;p>In what follows we aim to provide a quick tutorial as well as
list out some of the actionable insights we uncovered in the
OpenROAD/OpenROAD-flow-scripts documentation project.&lt;/p>
&lt;h2 id="preamble">Preamble&lt;/h2>
&lt;p>To download the analytics raw &lt;code>csv&lt;/code> files, refer to this
&lt;a href="https://docs.readthedocs.io/en/stable/analytics.html" target="_blank" rel="noopener">website&lt;/a>.&lt;/p>
&lt;p>You should also have the following packages installed: &lt;code>pandas&lt;/code>, &lt;code>numpy&lt;/code>, &lt;code>matplotlib&lt;/code>, &lt;code>scipy&lt;/code>.&lt;/p>
&lt;h2 id="traffic-analytics">Traffic Analytics&lt;/h2>
&lt;p>Traffic analytics are easy to understand.
It comes in the format &lt;code>Date&lt;/code>, &lt;code>Version&lt;/code>, &lt;code>Path&lt;/code>, &lt;code>DailyViews&lt;/code> as follows:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">df&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pd&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">read_csv&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;ta_or.csv&amp;#39;&lt;/span>&lt;span class="p">)[::&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">reset_index&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">drop&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">df&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Date&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">df&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Date&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">apply&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">split&lt;/span>&lt;span class="p">()[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">df&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">head&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>
&lt;figure id="figure-figure-1-loading-traffic-analytics-dataframe">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Load traffic analytics DF" srcset="
/report/osre23/ucsd/openroad/20230727-luarss/pic1_hu3ae39bf6bc653845cdf52f284f9914c8_18120_0fe44b789026339d8a488b67e455af49.webp 400w,
/report/osre23/ucsd/openroad/20230727-luarss/pic1_hu3ae39bf6bc653845cdf52f284f9914c8_18120_c34649440686784f502a8fa245519fe8.webp 760w,
/report/osre23/ucsd/openroad/20230727-luarss/pic1_hu3ae39bf6bc653845cdf52f284f9914c8_18120_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20230727-luarss/pic1_hu3ae39bf6bc653845cdf52f284f9914c8_18120_0fe44b789026339d8a488b67e455af49.webp"
width="420"
height="345"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 1: Loading traffic analytics dataframe
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;p>The raw data is not all that informative.
Let us aggregate the data to obtain the weekly views.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">weeklydf&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">df&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">copy&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">weeklydf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Date&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pd&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">to_datetime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">weeklydf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Date&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">-&lt;/span> &lt;span class="n">pd&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">to_timedelta&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">7&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">unit&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;d&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">weeklydf&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">weeklydf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">groupby&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="s1">&amp;#39;Path&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">pd&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Grouper&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">key&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;Date&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">freq&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;W&amp;#39;&lt;/span>&lt;span class="p">)])[&lt;/span>&lt;span class="s1">&amp;#39;Views&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span>\
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="n">sum&lt;/span>&lt;span class="p">()&lt;/span>\
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="n">reset_index&lt;/span>&lt;span class="p">()&lt;/span>\
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="n">sort_values&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Date&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">weeklydf&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">weeklydf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Path&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="s1">&amp;#39;/index.html&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>
&lt;figure id="figure-figure-2-aggregated-weekly-traffic">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Aggregated weekly traffic" srcset="
/report/osre23/ucsd/openroad/20230727-luarss/pic2_hu4e5090e8319a278be3c23daaec31a810_14831_2356d16291dbea694b0bc9c05693ffe8.webp 400w,
/report/osre23/ucsd/openroad/20230727-luarss/pic2_hu4e5090e8319a278be3c23daaec31a810_14831_cf13de62f49742cd0e76c661feea93ed.webp 760w,
/report/osre23/ucsd/openroad/20230727-luarss/pic2_hu4e5090e8319a278be3c23daaec31a810_14831_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20230727-luarss/pic2_hu4e5090e8319a278be3c23daaec31a810_14831_2356d16291dbea694b0bc9c05693ffe8.webp"
width="243"
height="393"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 2: Aggregated weekly traffic
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;p>Note that we can replace the page path with any interesting page path
we desire. A useful command to obtain all possible page paths in this
dataset is to use:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">weeklydf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Path&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">unique&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>
&lt;figure id="figure-figure-3-unique-paths-in-dataset">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Unique paths" srcset="
/report/osre23/ucsd/openroad/20230727-luarss/pic3_hub46945e77dd8a933670e33e4fea7dea8_54129_94dd6b47fa834b3c36ea619deffd3a6a.webp 400w,
/report/osre23/ucsd/openroad/20230727-luarss/pic3_hub46945e77dd8a933670e33e4fea7dea8_54129_f50b03560ab266073e2dee2fa7a04e51.webp 760w,
/report/osre23/ucsd/openroad/20230727-luarss/pic3_hub46945e77dd8a933670e33e4fea7dea8_54129_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20230727-luarss/pic3_hub46945e77dd8a933670e33e4fea7dea8_54129_94dd6b47fa834b3c36ea619deffd3a6a.webp"
width="591"
height="538"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 3: Unique paths in dataset
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;p>With these neat data in our arsenal, let us do some plotting!
For the visualisation, we have chosen to use the traffic aggregated
on a daily scale. On top of this, we also plot a linear
best-fit line of all the points to track the trendline over time.&lt;/p>
&lt;p>The code below shows how to plot the top 20 pages.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">plot_views&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">df&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">numPages&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mi">20&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Groupby Path, sum views&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pathResults&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">df&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">groupby&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Path&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Views&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">sum&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">sort_values&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">ascending&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">False&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">fig&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">ax&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">subplots&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">numPages&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">figsize&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="mi">15&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">30&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">fig&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tight_layout&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">i&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">numPages&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">key&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pathResults&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">index&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">temp&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">df&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">df&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Path&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="n">key&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ax&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">scatter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">temp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Date&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">temp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Views&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ax&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">set_xticks&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">arange&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">90&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">7&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="c1"># this line is to not clutter the x-axis too much.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ax&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">set_ylabel&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Views&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ax&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">set_title&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">key&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># linear regression&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">x&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">y&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">temp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Date&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">temp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Views&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">bestfit&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">stats&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">linregress&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">)),&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bestfit&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">equation&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nb">str&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">round&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bestfit&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">],&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="s2">&amp;#34;x + &amp;#34;&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="nb">str&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">round&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bestfit&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">],&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ax&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">plot&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">)),&lt;/span> &lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">poly1d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">polyfit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">)),&lt;/span> &lt;span class="n">y&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">))(&lt;/span>&lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">))),&lt;/span> &lt;span class="s1">&amp;#39;--&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="n">label&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">equation&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ax&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">legend&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">loc&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;upper right&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>
&lt;figure id="figure-figure-4-top-20-pages-by-daily-view-counts-in-descending-order">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Top 20 plots" srcset="
/report/osre23/ucsd/openroad/20230727-luarss/pic4_hu0544053b26ff363bea669ad03cb25a33_298754_208fbbf3fe9f3d6b7b48a8f44d65e70b.webp 400w,
/report/osre23/ucsd/openroad/20230727-luarss/pic4_hu0544053b26ff363bea669ad03cb25a33_298754_523ed86a22800eb3addad7738facd6cc.webp 760w,
/report/osre23/ucsd/openroad/20230727-luarss/pic4_hu0544053b26ff363bea669ad03cb25a33_298754_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20230727-luarss/pic4_hu0544053b26ff363bea669ad03cb25a33_298754_208fbbf3fe9f3d6b7b48a8f44d65e70b.webp"
width="379"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 4: Top 20 pages by daily view counts (in descending order)
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;p>Also, we can aggregate the total views by day to plot daily traffic:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">plot_daily_traffic&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">df&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Groupby Date, sum views&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">fig&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">figure&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">figsize&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="mi">15&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">10&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">dateResults&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">df&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">groupby&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Date&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Views&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">sum&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">x&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">y&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">dateResults&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">index&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">dateResults&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">values&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">scatter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">x&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">y&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">xticks&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">arange&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">90&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">7&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ylabel&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Views&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">title&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Traffic by Day&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># linear regression&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">bestfit&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">stats&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">linregress&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">)),&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bestfit&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">equation&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nb">str&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">round&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bestfit&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">],&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="s2">&amp;#34;x + &amp;#34;&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="nb">str&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">round&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bestfit&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">],&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">plot&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">)),&lt;/span> &lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">poly1d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">polyfit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">)),&lt;/span> &lt;span class="n">y&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">))(&lt;/span>&lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">))),&lt;/span> &lt;span class="s1">&amp;#39;--&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="n">label&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">equation&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">legend&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">loc&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;upper right&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>
&lt;figure id="figure-figure-5-daily-aggregated-traffic">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Daily aggregated traffic" srcset="
/report/osre23/ucsd/openroad/20230727-luarss/pic5_hu284f436d507b391ad27b39b31846aa7d_24195_f1cfe4f85a6f52b10851153e3759601f.webp 400w,
/report/osre23/ucsd/openroad/20230727-luarss/pic5_hu284f436d507b391ad27b39b31846aa7d_24195_be83d71fe2635b895829f733ef678a4f.webp 760w,
/report/osre23/ucsd/openroad/20230727-luarss/pic5_hu284f436d507b391ad27b39b31846aa7d_24195_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20230727-luarss/pic5_hu284f436d507b391ad27b39b31846aa7d_24195_f1cfe4f85a6f52b10851153e3759601f.webp"
width="760"
height="503"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 5: Daily aggregated traffic
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;h3 id="key-trends">Key Trends:&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Notice how there seems to be a cyclical pattern every week - rise
in average view counts during Mon-Fri, then a falloff on weekends.
This is most evident in the pages &lt;code>/index.html&lt;/code>, &lt;code>/main/README.html&lt;/code>.
This could be attributed to the standard work or study week of Mon-Fri.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>According to the gradient of the best-fit line for Figure 2,
there seems to be a slow decline of traffic for the OpenROAD docs.
For a gradient of -0.77, it translates roughly to decline of 22 views
per month. The small decline could be attributed to the higher traffic
from 19-29 March 2023, the dates for the
&lt;a href="https://openroaddesigncontest.org/" target="_blank" rel="noopener">OpenROAD 7nm design contest&lt;/a>.
Contest are always good for driving traffic.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="actionable-insights">Actionable insights:&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Top pages are usually landing pages: &lt;code>index.html&lt;/code>, &lt;code>main/README.html&lt;/code>, &lt;code>main/src/README.html&lt;/code>. We thus prioritised making these pages more readable and concise.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>This is followed by tutorial &lt;code>/tutorials/index.html&lt;/code> and &lt;code>/search.html&lt;/code>. The prominence of the tutorials page made us shift the tutorials link to a higher position on the left navigation sidebar. Search tips were also included to obtain better search results. More about search in the next section.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Next, as OpenROAD consists of 20 tools: traffic analytics helps us come up with an order to update: &lt;code>ifp&lt;/code>, &lt;code>gui&lt;/code>, &lt;code>odb&lt;/code>, &lt;code>ppl&lt;/code>, &lt;code>sta&lt;/code>, &lt;code>grt&lt;/code>, &lt;code>mpl&lt;/code>, &lt;code>gpl&lt;/code>, &lt;code>rsz&lt;/code>, &lt;code>rcx&lt;/code>. &lt;code>pdn&lt;/code>, &lt;code>cts&lt;/code>, &lt;code>psm&lt;/code>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="search-analytics">Search Analytics&lt;/h2>
&lt;p>Search analytics come in the form of: &lt;code>Date&lt;/code>, &lt;code>Query&lt;/code>, &lt;code>TotalResults&lt;/code>.
Contrary to traffic analytics, &lt;code>TotalResults&lt;/code> do not refer to search count
for the query that day, but rather it corresponds to the total results
returned by that query on that day. Separate aggregation still needs to
be done to obtain the final count.&lt;/p>
&lt;p>Firstly, let us load the dataset and perform a groupby on the column &lt;code>Date&lt;/code>
to obtain the daily count aggregates.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">df&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pd&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">read_csv&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;sa_or.csv&amp;#39;&lt;/span>&lt;span class="p">)[::&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">reset_index&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">drop&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">df&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">df&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">rename&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">columns&lt;/span> &lt;span class="o">=&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="s1">&amp;#39;Created Date&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s1">&amp;#39;Date&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;Total Results&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s1">&amp;#39;TotalResults&amp;#39;&lt;/span>&lt;span class="p">})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">df&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Date&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">df&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Date&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">apply&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">split&lt;/span>&lt;span class="p">()[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">dateResults&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">df&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">groupby&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Date&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TotalResults&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">count&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">dateResults&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>
&lt;figure id="figure-figure-6-code-output-for-daily-aggregated-search-counts">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Daily count code" srcset="
/report/osre23/ucsd/openroad/20230727-luarss/pic6_hufe284b2003be927a09036a17b0f147ed_7438_303764681c719b59422e8ac4adff87d5.webp 400w,
/report/osre23/ucsd/openroad/20230727-luarss/pic6_hufe284b2003be927a09036a17b0f147ed_7438_ae0b89dd9a05f1d083e0a5caf434a1c6.webp 760w,
/report/osre23/ucsd/openroad/20230727-luarss/pic6_hufe284b2003be927a09036a17b0f147ed_7438_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20230727-luarss/pic6_hufe284b2003be927a09036a17b0f147ed_7438_303764681c719b59422e8ac4adff87d5.webp"
width="390"
height="231"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 6: Code output for daily aggregated search counts.
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;p>Now we are ready to plot the daily aggregated searches. This represents
the number of times a search was performed on the documentation website.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">plot_daily_searches&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">df&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">dateResults&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">df&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">groupby&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Date&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TotalResults&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">count&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">x&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">y&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">dateResults&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">index&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">dateResults&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">values&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">scatter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">x&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">y&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">xticks&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">arange&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">90&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">7&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ylabel&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;# Times Searched&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">title&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Search count by day&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># linear regression&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">bestfit&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">stats&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">linregress&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">)),&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bestfit&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">equation&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nb">str&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">round&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bestfit&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">],&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="s2">&amp;#34;x + &amp;#34;&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="nb">str&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">round&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bestfit&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">],&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">plot&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">)),&lt;/span> &lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">poly1d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">polyfit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">)),&lt;/span> &lt;span class="n">y&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">))(&lt;/span>&lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">))),&lt;/span> &lt;span class="s1">&amp;#39;--&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="n">label&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">equation&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">legend&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">loc&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;upper right&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>
&lt;figure id="figure-figure-7-daily-aggregated-search-counts">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Final search analytics graph" srcset="
/report/osre23/ucsd/openroad/20230727-luarss/pic7_huaf6e8114aa38a4f77afbcf6239df4596_24960_dfcee10fa9be516c148eb11ac3598591.webp 400w,
/report/osre23/ucsd/openroad/20230727-luarss/pic7_huaf6e8114aa38a4f77afbcf6239df4596_24960_2bfda1034e5a343c34c529e62f8279ba.webp 760w,
/report/osre23/ucsd/openroad/20230727-luarss/pic7_huaf6e8114aa38a4f77afbcf6239df4596_24960_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20230727-luarss/pic7_huaf6e8114aa38a4f77afbcf6239df4596_24960_dfcee10fa9be516c148eb11ac3598591.webp"
width="760"
height="507"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 7: Daily aggregated search counts
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;p>We can also do an additional plot for queries that return zero results.
In other words, we are interested in the terms people are curious about;
but is not covered by our documentation currently.
Think of it as an on-site search engine optimisation.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">zeroResults&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">df&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">df&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TotalResults&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">zeroResults&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">zeroResults&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">groupby&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Query&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Date&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">count&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">sort_values&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">ascending&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">False&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;&lt;/span>&lt;span class="se">\n&lt;/span>&lt;span class="s1">All 0 results queries (desc)&lt;/span>&lt;span class="se">\n&lt;/span>&lt;span class="s1">&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">zeroResults&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">index&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tolist&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Example output as follows:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">[&amp;#39;autotuner&amp;#39;, &amp;#39;tdms&amp;#39;, &amp;#39;*macro*&amp;#39;, &amp;#39;rtlmp_max_inst&amp;#39;, &amp;#39;get_property&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;check_setup&amp;#39;, &amp;#39;centos&amp;#39;, &amp;#39;initialize_padring&amp;#39;, &amp;#39;core_utilization&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;pin_access&amp;#39;, &amp;#39;read_libraries&amp;#39;, &amp;#39;config&amp;#39;, &amp;#39;eco&amp;#39;, &amp;#39;rpt&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;improve_placement&amp;#39;, &amp;#39;define_process_corner&amp;#39;, &amp;#39;global_place&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;report_worst_slack&amp;#39;, &amp;#39;max_phi_cof&amp;#39;, &amp;#39;report_power&amp;#39;, &amp;#39;get_pins&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;registerfile&amp;#39;, &amp;#39;set_global_routing&amp;#39;, &amp;#39;prebuilt&amp;#39;, &amp;#39;env&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;repair_clock_inverters&amp;#39;, &amp;#39;set_thread_count&amp;#39;, &amp;#39;report_&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;partition_design&amp;#39;, &amp;#39;place_cell&amp;#39;, &amp;#39;blockage&amp;#39;, &amp;#39;partitionmgr&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;nmos&amp;#39;, &amp;#39;tuner&amp;#39;, &amp;#39;write_sdf&amp;#39;, &amp;#39;place_density&amp;#39;, &amp;#39;place_pins_args&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;size_cell&amp;#39;, &amp;#39;*macor*&amp;#39;, &amp;#39;repair_clock_inverter&amp;#39;, &amp;#39;misk&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;readhaty&amp;#39;, &amp;#39;readhat&amp;#39;, &amp;#39;obstruct&amp;#39;, &amp;#39;odbpy&amp;#39;, &amp;#39;openpdn&amp;#39;, &amp;#39;openram&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;placement_cfg&amp;#39;, &amp;#39;read_macro_placement&amp;#39;, &amp;#39;output_drc&amp;#39;, &amp;#39;positon&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;pct&amp;#39;, &amp;#39;qrctechtable&amp;#39;, &amp;#39;qrctechfile&amp;#39;, &amp;#39;qrctech&amp;#39;, &amp;#39;qrc&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;properly covered&amp;#39;, &amp;#39;precision innovations&amp;#39;, &amp;#39;repeater&amp;#39;, &amp;#39;&amp;#34;rcx-0487&amp;#34;&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;report_worst&amp;#39;, &amp;#39;report_area&amp;#39;, &amp;#39;report_clock_properties&amp;#39;, &amp;#39;skywater&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;study&amp;#39;, &amp;#39;sv&amp;#39;, &amp;#39;synth&amp;#39;, &amp;#39;synth_hierarchical&amp;#39;, &amp;#39;systemverilog&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;tdm&amp;#39;, &amp;#39;tdms_place&amp;#39;, &amp;#39;triton&amp;#39;, &amp;#39;ungroup&amp;#39;, &amp;#39;verilog_files&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;wrc&amp;#39;, &amp;#39;write_lef&amp;#39;, &amp;#39;write_partition_verilog&amp;#39;, &amp;#39;שואם&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;si2&amp;#39;, &amp;#39;sever&amp;#39;, &amp;#39;setrc&amp;#39;, &amp;#39;rtl_macro&amp;#39;, &amp;#39;report_dcalc&amp;#39;, &amp;#39;report_design&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;report_design_info&amp;#39;, &amp;#39;report_instance&amp;#39;, &amp;#39;report_slews&amp;#39;, &amp;#39;resize&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;rtlmp&amp;#39;, &amp;#39;set_power_activity&amp;#39;, &amp;#39;rtree&amp;#39;, &amp;#39;run_all&amp;#39;, &amp;#39;run_all.tcl&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;sc&amp;#39;, &amp;#39;set_all_input_output_delays&amp;#39;, &amp;#39;set_io_pin_constraints&amp;#39;, &amp;#39;metis&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;lefdef&amp;#39;, &amp;#39;make_result_file&amp;#39;, &amp;#39;macro_placement_cfg&amp;#39;, &amp;#39;clock__details&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;clocks__details&amp;#39;, &amp;#39;combinational&amp;#39;, &amp;#39;config.mk&amp;#39;, &amp;#39;coord&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;core_margin&amp;#39;, &amp;#39;db_process_node&amp;#39;, &amp;#39;dbblocjs&amp;#39;, &amp;#39;dbdatabase&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;dbr&amp;#39;, &amp;#39;dbrt&amp;#39;, &amp;#39;dbrttree&amp;#39;, &amp;#39;debian&amp;#39;, &amp;#39;define_pin_shape&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;densiy&amp;#39;, &amp;#39;desgin&amp;#39;, &amp;#39;diff_file&amp;#39;, &amp;#39;clk_period&amp;#39;, &amp;#39;clk_io_ptc&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;cdl&amp;#39;, &amp;#39;analog&amp;#39;, &amp;#39;./env.sh&amp;#39;, &amp;#39;178&amp;#39;, &amp;#39;6_final&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;6_final.odb&amp;#39;, &amp;#39;_placement&amp;#39;, &amp;#39;abat&amp;#39;, &amp;#39;add_stripe&amp;#39;, &amp;#39;arch&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;ccs&amp;#39;, &amp;#39;binaries&amp;#39;, &amp;#39;bookshelf&amp;#39;, &amp;#39;buff_cell&amp;#39;, &amp;#39;buildwithdocker&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;busbitchars&amp;#39;, &amp;#39;buschar&amp;#39;, &amp;#39;captable&amp;#39;, &amp;#39;directoryobject&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;disallow_one_site_gaps&amp;#39;, &amp;#39;distribute&amp;#39;, &amp;#39;is_port&amp;#39;, &amp;#39;hierarch&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;hop&amp;#39;, &amp;#39;hyper&amp;#39;, &amp;#39;initialie_flooorplan&amp;#39;, &amp;#39;initialize_flooorplan&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;instance_count&amp;#39;, &amp;#39;is_chip&amp;#39;, &amp;#39;lean&amp;#39;, &amp;#39;gui_final&amp;#39;, &amp;#39;lec&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;*def*&amp;#39;, &amp;#39;limitation&amp;#39;, &amp;#39;lyp&amp;#39;, &amp;#39;maco&amp;#39;, &amp;#39;macro_pin&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;macro_place&amp;#39;, &amp;#39;harness&amp;#39;, &amp;#39;gui.py&amp;#39;, &amp;#39;dont&amp;#39;, &amp;#39;fill_cell&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;dreamplace&amp;#39;, &amp;#39;em&amp;#39;, &amp;#39;enable_dpo&amp;#39;, &amp;#39;energy&amp;#39;, &amp;#39;env.sh&amp;#39;, &amp;#39;erc&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;export&amp;#39;, &amp;#39;findmaste&amp;#39;, &amp;#39;grt_layer_adjustments&amp;#39;, &amp;#39;findmaster&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;freepdk45&amp;#39;, &amp;#39;gdt&amp;#39;, &amp;#39;global_&amp;#39;, &amp;#39;global_place_db&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;global_placementy&amp;#39;, &amp;#39;graph&amp;#39;, &amp;#39;갲&amp;#39;]
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>For our case we can roughly the problem with these zero-result queries fall
under one of these categories:&lt;/p>
&lt;ul>
&lt;li>Missing documentation: Either the parameter of functionality&lt;/li>
&lt;li>Typo: User has the right keyword, but did not type it correctly. We will therefore provide them with search &lt;a href="https://openroad-flow-scripts.readthedocs.io/en/latest/user/FAQS.html#how-do-i-get-better-search-results" target="_blank" rel="noopener">tips&lt;/a> such as using fuzziness &lt;code>~N&lt;/code> operator for better matches.&lt;/li>
&lt;/ul>
&lt;h2 id="future-work">Future Work&lt;/h2>
&lt;p>ReadTheDocs could also be linked with
&lt;a href="https://analytics.google.com/analytics/web/provision/#/provision" target="_blank" rel="noopener">Google Analytics&lt;/a>,
but this remains for more advanced users.&lt;/p>
&lt;p>Another rich source of information helpful to open-source maintainers
are GitHub issues. These are the direct platform where users discuss
their problems. Another great way to track documentation engagement
is to use metrics such as: installation issues per unit week,
or user-issue retention rate, which tracks the number of users
that continue to file issues after their first.&lt;/p>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>This post showcases the amount of insight one can gather from parsing
traffic and search analytics. It also provides useful Python functions
that can be applied to the analytics dataset for fast prototyping
and experimentation. If you are a contributor to open-source projects,
try uncovering some insights for your doc pages today!&lt;/p></description></item><item><title>Halfway Through GSOC: My Experience and Learnings</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/edumle/20230718-kokoedwin/</link><pubDate>Mon, 17 Jul 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/edumle/20230718-kokoedwin/</guid><description>&lt;p>Hello there! I&amp;rsquo;m Jonathan Edwin, all the way from the beautiful archipelago of Indonesia. This year, I got the exciting chance to jump on board the 2023 Summer of Reproducibility initiative. It&amp;rsquo;s been quite the adventure! Right now, I&amp;rsquo;m pouring my energy into a fascinating project titled &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/nyu/eduml">Using Reproducibility in Machine Learning Education&lt;/a> project. I&amp;rsquo;m thrilled to be able to make my own little mark on it.&lt;/p>
&lt;p>For those of you who are not familiar with what I&amp;rsquo;m working on, let me shed some light. My project, as part of the &amp;ldquo;Using Reproducibility in Machine Learning Education&amp;rdquo; initiative under guidance of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fraida-fund/">Fraida Fund&lt;/a>, focuses on creating educational resources that center around reproducing some key machine learning techniques. These include Cutout data augmentation, U-Net, and Siamese networks, to name a few. The end product will be a series of interactive Jupyter notebooks that provide step-by-step guidance for students, helping them not only understand these complex models but also gain hands-on experience in achieving research reproducibility.&lt;/p>
&lt;p>&lt;strong>Progress and Challenges&lt;/strong>&lt;/p>
&lt;p>Embarking on this project, I dove headfirst into the world of Cutout data augmentation, immersing myself in the many experiments outlined in the foundational paper. This initial study proved to be an intricate blend of multiple datasets, two network architectures, and a performance evaluation of models with and without Cutout data augmentation. Additionally, it included the exploration of these models in combination with other data augmentation techniques.&lt;/p>
&lt;p>One of our main objectives has been to help students visualize how the model interacts with the data, and for this, we&amp;rsquo;ve been leveraging a tool called Grad-CAM. The initial paper provided a rich landscape for exploration and learning, leading us to segment our journey into five interactive Jupyter notebooks - Introduction, CutOut, ResNet, WideResNet, Regularization, and Grad-CAM.&lt;/p>
&lt;p>I&amp;rsquo;m excited to share that, as we&amp;rsquo;ve hit the mid-term milestone, I&amp;rsquo;ve managed to make significant strides and completed the notebooks up to the WideResNet section. It&amp;rsquo;s been a journey full of learning and growth, overcoming various challenges along the way - understanding the intricacies of the experiments, deconstructing complex architectures, and distilling all this into digestible, interactive notebooks for students. Despite the challenges, the process has been incredibly rewarding. As we gear up for the next half of the project, I&amp;rsquo;m eager to tackle the remaining sections and share my work with the community.&lt;/p>
&lt;p>&lt;strong>Learnings and Skills Gained&lt;/strong>&lt;/p>
&lt;p>&lt;em>&lt;strong>Embracing the Iterative Process of Open Source Development&lt;/strong>&lt;/em>: My initial foray into open source development had me writing and running code in one environment, then copying parts of it to another environment and pushing it from there to GitHub. This occasionally led to mistakes during the code migration. However, I&amp;rsquo;ve since learned to write or change a little bit of code, run the new version directly from GitHub, catch errors, and improve. In open source development, the end goal is to ensure everything works flawlessly, even if it involves several iterations. This is especially true considering the code from GitHub might directly run on platforms like Chameleon or Google Colab.&lt;/p>
&lt;p>&lt;em>&lt;strong>Understanding the Distinction between Reproducing Experiments and Crafting Educational Content&lt;/strong>&lt;/em>: There&amp;rsquo;s a stark difference between merely reproducing an experiment from a research paper and creating an educational resource around that experiment. The former generally involves cloning and running the code, verifying it against the claims in the paper with minimal modifications. The latter, however, necessitates adapting and simplifying the code, regardless of the learner&amp;rsquo;s skill level, to ensure their comprehension. It&amp;rsquo;s about carefully guiding learners through each step for a more profound understanding.&lt;/p>
&lt;p>&lt;em>&lt;strong>The Power of &amp;lsquo;Show, Don’t Tell&amp;rsquo;&lt;/strong>&lt;/em>: This priceless lesson was imparted by my mentor, Ms. Fraida Fund. Rather than telling me what to do when I erred or needed to learn something new, she demonstrated the correct way first-hand. This hands-on approach made understanding far easier. This principle is also reflected in the creation of our notebooks. For instance, we chose to include the Grad-CAM notebook. Although not directly referenced in the paper, it offers students a clear visual understanding of the impact of the Cutout technique, embodying the &amp;ldquo;show, don’t tell&amp;rdquo; philosophy.&lt;/p>
&lt;p>&lt;strong>Next Steps&lt;/strong>&lt;/p>
&lt;p>As we step into the second half of this thrilling journey, our primary goal is to complete the remaining sections of our Cutout project. We&amp;rsquo;re setting our sights on the final notebook - Grad-CAM. The Grad-CAM notebook will offer a visual exploration of how our models interpret and interact with data, thereby solidifying the students&amp;rsquo; understanding of Cutout data augmentation. So, stay tuned for more as we plunge into these fascinating topics!&lt;/p>
&lt;p>&lt;strong>Conclusion&lt;/strong>&lt;/p>
&lt;p>Looking back, my time with the Summer of Reproducibility initiative has been nothing short of a profound learning experience. Working on the &amp;ldquo;Using Reproducibility in Machine Learning Education&amp;rdquo; project has been both challenging and rewarding, and I am incredibly grateful for this opportunity.&lt;/p>
&lt;p>I&amp;rsquo;ve gained valuable insights into open-source development, delved deeper into the intricacies of machine learning techniques, and experienced firsthand the transformative power of a &amp;lsquo;show, don&amp;rsquo;t tell&amp;rsquo; teaching approach. Moreover, I&amp;rsquo;ve learned that the creation of educational resources requires a delicate balance between preserving the essence of original research and adapting it to foster easy understanding.&lt;/p>
&lt;p>As we press forward, I&amp;rsquo;m excited about the prospects of the coming weeks. The completion of the Grad-CAM notebook lies ahead, marking the final pieces of our Cutout project. Beyond this project, the skills and lessons I&amp;rsquo;ve acquired during this initiative will undoubtedly guide me in future endeavours.&lt;/p>
&lt;p>I can confidently say that my GSOC journey has been a remarkable chapter in my growth as a developer and researcher. Here&amp;rsquo;s to more learning, more coding, and more breakthroughs in the future!&lt;/p></description></item><item><title>Reproducible Analysis &amp; Models for Predicting Genomics Workflow Execution Time</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uga/genomicswfmodels/20230712-shayantan/</link><pubDate>Wed, 12 Jul 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uga/genomicswfmodels/20230712-shayantan/</guid><description>&lt;p>As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/uga/genomicswfmodels/">Reproducible Analysis &amp;amp; Models for Predicting Genomics Workflow Execution Time&lt;/a> my &lt;a href="https://drive.google.com/file/d/1N81dqvdTDcKjz5WDAUCdf5yi1BNR9Au6/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/in-kee-kim/">In Kee Kim&lt;/a>, Martin Putra and collaborator &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/charis-christopher-hulu/">Charis Christopher Hulu&lt;/a> (another OSRE fellow) aims to analyze large-scale sequencing datasets in order to gain insights on how ‘input quality’ affects genomic workflows’ execution times.&lt;br>
Recent advancements in Next-Generation Sequencing (NGS) technologies have resulted in massive amounts of nucleotide sequence data and automated genomic workflows to streamline analysis and data interpretation. The success of NGS-driven research has also led to a sudden increase in data of varying size and complexity, making it more time-consuming for researchers to test hypotheses. Analyzing
high-throughput genomic data requires a step-by-step execution of dedicated tools - also known as workflows. The first step toward the execution of a typical genomic analysis workflow is quality control
of the raw data - a crucial step in removing low-quality data instances that may significantly impact the downstream analysis. Prior work in this area has suggested that the runtimes of genomic workflows get affected due to qualitative differences in the data. Additionally, there is very little consensus on what constitutes “input quality” regarding data from large genomic experiments. In this proposal, we hypothesize that genomic data quality significantly impacts the genomic workflows’ execution time. We aim to leverage machine learning techniques to extract predictive features from quality control tools that robustly predict workflow execution time.&lt;/p></description></item><item><title>Public Artifact Data and Visualization</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/intel/artifactviz/20230617-zjyhhhhh/</link><pubDate>Sat, 17 Jun 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/intel/artifactviz/20230617-zjyhhhhh/</guid><description>&lt;p>Hello! As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/intel/artifactviz">Public Artifact Data and Visualization&lt;/a> our proposals (&lt;a href="https://drive.google.com/file/d/1egIQDLMQ5eV7Uc-S55-GTiSXdmrC3_Pj/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> from &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jiayuan-zhu/">Jiayuan Zhu&lt;/a> and &lt;a href="https://drive.google.com/file/d/1Gf68Pz8v3YjcQ1sWkS9n2hnl7_lsme2l/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> from &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/krishna-madhwani/">Krishna Madhwani&lt;/a>) under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/anjo-vahldiek-oberwagner/">Anjo Vahldiek-Oberwagner&lt;/a> aims to design a system that allows researchers to conveniently record and compare the environmental information, such as CPU utilization, of different iterations and versions of code during an experiment.&lt;/p>
&lt;p>In academic experiments, there is often a need to compare results and performance between different iterations and versions. This comparative analysis helps researchers evaluate the impact of different experimental parameters and algorithms on the results and enables them to optimize experimental design and algorithm selection. However, to conduct effective comparative analysis, it is essential to record and compare environmental information, alongside the experimental data. This information provides valuable insights into the factors that may influence the observed outcomes.&lt;/p>
&lt;p>Through this summer, we aim to develop a system that offers a streamlined interface, enabling users to effortlessly monitor their running programs using simple command-line commands. Moreover, our system will feature a user-friendly dashboard where researchers can access historical runtime information and visualize comparisons between different iterations. The dashboard will present comprehensive graphs and charts, facilitating the analysis of trends and patterns in the environmental data.&lt;/p></description></item><item><title>Reproducible Analysis &amp; Models for Predicting Genomics Workflow Execution Time</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uga/genomicswfmodels/20230616-charishulu/</link><pubDate>Fri, 16 Jun 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uga/genomicswfmodels/20230616-charishulu/</guid><description>&lt;p>Hi! I&amp;rsquo;m Charis, an undergraduate student in the IT and Big Data Analytics program at the Calvin Institute of Technology. As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/uga/genomicswfmodels/">Reproducible Analysis &amp;amp; Models for Predicting Genomics Workflow Execution Time&lt;/a> my &lt;a href="https://drive.google.com/file/d/1dFkC2A0HUVaWd6NpCbTjRZVfYxQ7jRxJ/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/in-kee-kim/">In Kee Kim&lt;/a> and &lt;strong>Martin Putra&lt;/strong> aims to gain insight into features that are highly correlated with execution times of genomics workflows and build machine learning models for predicting workflow execution time.&lt;/p>
&lt;p>Genomics workflows exhibit a long-tail pattern in their execution times. According to the previous project team&amp;rsquo;s findings, approximately 2% of genomics workflows had a median execution time of up to 15%, resulting in weeks of execution. Interestingly, it was observed that input quality plays a role in these execution time differences. Therefore, we will analyze features such as the quality of input data as well as the amount of resources allocated in the execution of genomics workflows to find features that correlate with execution time. Based on these features we will build a machine learning model that can predict the execution time of genomics workflows.&lt;/p>
&lt;p>By collaborating with Shayantan Banerjee (another contributor) who will study data quality, I will study the system metrics of genomics workflows both at workflow-level and tool-level. Metrics will be collected by running genomics workflows using the Slurm workload manager under various resource allocation conditions. Genomics workflows will be executed on Chameleon clusters of different sizes.&lt;/p></description></item><item><title>GPU Emulator for Easy Reproducibility of DNN Training</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/utexas/gpuemulator/20230613-haoranwu/</link><pubDate>Tue, 13 Jun 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/utexas/gpuemulator/20230613-haoranwu/</guid><description>&lt;p>Hi! I’m Haoran Wu, a third year at the University of Chicago majoring in Economics and Computer Science. With my &lt;a href="https://docs.google.com/document/d/1CcNbvbNAmY0XkV9ckjHnILdMh92h1wqLUYqpT6qIsZY/edit?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a>, I’m working on the &lt;a href="https://ospo.ucsc.edu/project/osre23/utexas/gpuemulator" target="_blank" rel="noopener">GPU Emulator for Easy Reproducibility of DNN Training&lt;/a> project with Professor &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/vijay-chidambaram/">Vijay Chidambaram&lt;/a>. A Deep Neural Network (DNN) is an advanced artificial neural network that employs multiple layers to process intricate patterns and relationships within data. It finds applications in various fields such as image and speech recognition, natural language processing, and predictive modeling. The layers in a DNN progressively extract higher-level features from raw input data, enabling the network to learn and generalize patterns effectively.&lt;/p>
&lt;p>The growing popularity of Deep Neural Networks has resulted in a substantial increase in demand for Graphics Processing Units (GPUs). GPUs are crucial for conducting matrix computations in DNN training and inference. However, they are expensive to purchase for personal use, and the limited availability of GPU resources in public research clouds like Chameleon further exacerbates the issue. This scarcity of resources can cause delays in DNN-related research projects.&lt;/p>
&lt;p>Nevertheless, not all DNN research experiments require the use of a GPU. System researchers, for instance, may be primarily interested in performance profiles and not necessarily in the accuracy of training or inference. These researchers might focus on optimizing the storage layer and data loading of DNN training. In such cases, a GPU emulator that accurately replicates GPU behavior without needing a physical GPU can fulfill their requirements. By utilizing a GPU emulator, system researchers can evaluate their system optimizations&amp;rsquo; performance without competing for limited GPU resources in the cloud, thereby avoiding unnecessary delays in their research progress. Our work will eventually be open source and benefit the community.&lt;/p></description></item><item><title>Using Reproducibility in Machine Learning Education</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/eduml/20230605-kokoedwin/</link><pubDate>Mon, 05 Jun 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/eduml/20230605-kokoedwin/</guid><description>&lt;p>I am Jonathan Edwin, coming from Indonesia, and I am extremely thrilled to be involved in the 2023 Summer of Reproducibility initiative. I am actively contributing to the project by making valuable contributions to the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/nyu/eduml">Using Reproducibility in Machine Learning Education&lt;/a> project.&lt;/p>
&lt;p>As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/nyu/eduml">Using Reproducibility in Machine Learning Education&lt;/a> my &lt;a href="https://drive.google.com/file/d/1UEIKfZuPwJ88fMQ1-109vzpA7r4-7ehG/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fraida-fund/">Fraida Fund&lt;/a> aims to develop educational resources focusing on reproducing and replicating fundamental machine-learning techniques, such as Cutout data augmentation, U-Net, and Siamese networks. The project aims to provide students with a hands-on learning experience that enhances their understanding of the models and their underlying principles while imparting valuable skills in ensuring research reproducibility.
The project will involve the creation of a series of interactive Jupyter notebooks covering the selected papers, guiding students through reproducing results, and focusing on best practices for ensuring reproducibility. Upon completion, the notebooks will provide a comprehensive and accessible learning experience for students while emphasizing the importance of reproducibility in machine learning education.
The proposal also identifies potential challenges associated with the project and proposed solutions to address them. Challenges include incompatibility issues with the original code and current frameworks or environments, difficulty in reproducing the exact results due to factors such as randomness or lack of specific details in the paper, and ensuring that the interactive elements in the Jupyter Notebooks are engaging and effective in teaching reproducibility concepts.&lt;/p></description></item><item><title>FlashNet: Towards Reproducible Continual Learning for Storage System</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uchicago/flashnet/20230604-rannnayy/</link><pubDate>Sun, 04 Jun 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uchicago/flashnet/20230604-rannnayy/</guid><description>&lt;p>Hello! I&amp;rsquo;m Rani, a third year undergraduate student at Institut Teknologi Bandung majoring at Informatics. As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/uchicago/flashnet">FlashNet&lt;/a> my &lt;a href="https://drive.google.com/file/d/1EhJm3kqrpybOkpXiiRMfqVxGeKe9iIsh/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/haryadi-s.-gunawi/">Haryadi S. Gunawi&lt;/a> and &lt;strong>Daniar Kurniawan&lt;/strong> aims to implement and optimize the FlashNet model in real-world storage systems using continual learning techniques.&lt;/p>
&lt;p>In real world workloads, it is known that the I/O stream changes and varies. Hence, the performance of I/O read/write could vary and introduce the tail latency. We would like to predict the latency of I/O read to cut the tail and improve the system&amp;rsquo;s performance. This project focuses on improving the FlashNet pipeline and introducing adaptability to the machine learning models built.&lt;/p>
&lt;p>During the summer, we planned to implement the continual learning pipeline using machine learning models we have built previously in the project. Of course, continual learning isn&amp;rsquo;t a continual learning without the ability of self-motivated retraining. Thus, we will implement several drift detection algorithms, evaluate, and test them. Besides, we will also build a visualization platform to evaluate and monitor the performance of the models built. Lastly, we planned to create Chameleon Trovi artifacts to demonstrate our experiments and make these implementations available and reproducible to the public.&lt;/p></description></item><item><title>Introducing Levels of Reproduction and Replication in Machine Learning</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/eduml/20230601-msaeed/</link><pubDate>Thu, 01 Jun 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/eduml/20230601-msaeed/</guid><description>&lt;p>Greetings everyone,&lt;/p>
&lt;p>I am Mohamed Saeed and I am delighted to be part of the 2023 Summer of Reproducibility program, where I am contributing to the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/nyu/eduml">Using Reproducibility in Machine Learning Education&lt;/a> project.&lt;/p>
&lt;p>My &lt;a href="https://drive.google.com/file/d/13HnCMZawpabiLdBoOiaJFF2mNXIPLCVJ/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> was accepted, and I am fortunate to have &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fraida-fund/">Fraida Fund&lt;/a> as my mentor. The objective of my project is to develop highly interactive open educational resources that can be utilized by instructors teaching graduate or undergraduate machine learning courses. These resources will focus on integrating instruction on reproducibility and reproducible research principles.&lt;/p>
&lt;p>Understanding and practicing reproducibility in machine learning (ML) research is of utmost importance in today&amp;rsquo;s scientific and technological landscape. Reproducibility ensures the reliability, transparency, and credibility of ML findings and discoveries. By learning the principles of reproducibility, students from different levels can validate research results, test introduced methodologies, and understand level of reproducibilty of research.&lt;/p>
&lt;p>My contribution will involve developing interactive educational resources that encompass code examples, writing exercises, and comprehensive explanations of key concepts of reproducing ML research. These resources will be carefully crafted to assist students at various levels of expertise. Our aim is for these resources to be widely adopted by instructors teaching graduate or undergraduate machine learning courses, as they seek to enhance the understanding of reproducibility and reproducible research principles.&lt;/p>
&lt;p>I think this is a great opportunity to learn more about ML research reproducibility. I&amp;rsquo;ll be posting regular updates and informative blogs throughout the summer, so stay tuned!&lt;/p></description></item><item><title>ScaleBugs: Reproducible Scalability Bugs</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucdavis/scalebugs/20230601-boluwarinayinmode/</link><pubDate>Thu, 01 Jun 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucdavis/scalebugs/20230601-boluwarinayinmode/</guid><description>&lt;p>Hello! As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucdavis/scalebugs/">ScaleBugs&lt;/a> project our proposals (&lt;a href="https://drive.google.com/file/d/17iANa5ei_gguZsGGwR1sfPHOoJysnNsf/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> from &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/goodness-ayinmode/">Goodness Ayinmode&lt;/a> and &lt;a href="https://drive.google.com/file/d/199ZsiWHXsLYbSJ896vaf8tjrYs23P5xN/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> from &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/zahra-nabila-maharani/">Zahra Nabila Maharani&lt;/a>) under the mentorship under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/cindy-rubio-gonzalez/">Cindy Rubio González&lt;/a>,&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/haryadi-s.-gunawi/">Haryadi S. Gunawi&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/hao-nan-zhu/">Hao-Nan Zhu&lt;/a> aims to build a dataset of reproducible scalability bugs by analyzing bug reports from popular distributed systems like Cassandra, HDFS, Ignite, and Kafka. For each bug report, we will analyze whether the reported bug is influenced by the scale of the operation, such as the number of nodes being used or a number of requests. The resulting dataset will consist of bug artifacts containing the buggy and fixed versions of the scalability system, a reproducible runtime environment, and workload shell scripts designed to demonstrate bug symptoms under different scales. These resources will help support research and development efforts in addressing scalability issues and optimizing system performance.&lt;/p></description></item><item><title>Reproducible Evaluation of Multi-level Erasure Coding</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ornl/multilevelerasure/20230531-zhiyanw/</link><pubDate>Wed, 31 May 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ornl/multilevelerasure/20230531-zhiyanw/</guid><description>&lt;p>Hi! My name is Alex, an undergraduate student at the University of Chicago. As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ornl/MultiLevelErasure">Reproducible Evaluation of Multi-level Erasure Coding&lt;/a>, my &lt;a href="https://docs.google.com/document/d/1dO1aING1QcSB---XklzUjNz0usVh7qWffVGC3GZq2AE/edit?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/john-bent/">John Bent&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/anjus-george/">Anjus George&lt;/a> aims to build a platform to reproducibly evaluate the performance and durability of MLEC (Multi-Level Erasure Coding) for large-scale storage systems under different design configurations.&lt;/p>
&lt;p>To provide some context, Erasure Coding (EC) is a common approach to protect data from disk failures. Data centers nowadays increasingly use Multi-Level Erasure Coding (MLEC), a newly developed erasure coding method that aims to deal with the drawbacks of Single-Level Erasure Coding (SLEC). Despite its increasing popularity, there have not been many systematic studies to analyze and evaluate MLEC, which is the focus of this project.&lt;/p>
&lt;p>The evaluation will primarily be conducted through simulations, since modifying configurations in a real large-scale system is costly and impractical. The expected deliverables of this project will be:&lt;/p>
&lt;ul>
&lt;li>An MLEC simulator that can reproducibly simulate different configurations of the MLEC system, e.g. coding parameter selection, chunk placement scheme, repair method choice, etc.&lt;/li>
&lt;li>An analysis of the performance and durability tradeoffs between different MLEC design choices based on the evaluation results from the simulation&lt;/li>
&lt;li>Reproduced SLEC evaluation results using existing SLEC simulators&lt;/li>
&lt;li>A comparison between MLEC and SLEC on performance and durability tradeoffs&lt;/li>
&lt;li>Well-written documents and detailed guides on how to reproduce the evaluation results&lt;/li>
&lt;/ul>
&lt;p>Our plan is to build the simulator throughout the summer. We hope our simulator and evaluation results can provide designers of large-scale storage systems with valuable insights on choosing the most appropriate erasure coding configuration per their needs.&lt;/p></description></item><item><title>[FLASHNET]: Leveraging ML-augmented I/O in Linux</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uchicago/flashnet/20230530-justin08784/</link><pubDate>Tue, 30 May 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uchicago/flashnet/20230530-justin08784/</guid><description>&lt;p>Hi! I&amp;rsquo;m Justin, an undergraduate at the University of Chicago. As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/uchicago/flashnet">Flashnet&lt;/a> project my &lt;a href="https://drive.google.com/file/d/1gsNaYUYOgdN2ilpyPOmI7jjLeoZh219J/view" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of
&lt;strong>Daniar Kurniawan&lt;/strong> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/haryadi-s.-gunawi/">Haryadi S. Gunawi&lt;/a> aims to port the Flashnet model into the Linux kernel.&lt;/p>
&lt;p>In this attempt, I will borrow architecture/design choices from LAKE (to take advantage of its integration of ML-focused hardware acceleration in the kernel) and evaluation criteria from LinnOS to test for model inference accuracy. I also plan to support latency &amp;ldquo;bucket&amp;rdquo; inference output to improve accuracy. Ultimately, my goal is to gain further insight into best practices for integrating ML models into real-life operating systems like Linux and to inform general design choices for the Flashnet pipeline.&lt;/p></description></item><item><title>Reproduce and benchmark self-adaptive edge applications under dynamic resource management</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uchicago/edgebench/20230530-zharfanf/</link><pubDate>Tue, 30 May 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/uchicago/edgebench/20230530-zharfanf/</guid><description>&lt;p>Hello there!&lt;/p>
&lt;p>I am Faishal Zharfan, a senior year student studying Telecommunication Engineering at Bandung Institute of Technology (ITB) in Bandung, Indonesia, my &lt;a href="https://drive.google.com/file/d/1u3UsCQZ40erpPmyoyn8DEVqH5Txmvvkz/view?usp=drive_link" target="_blank" rel="noopener">proposal&lt;/a>. I&amp;rsquo;m currently part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/uchicago/edgebench/">Edgebench&lt;/a> under the mentorship of Yuyang Huang. The main goal of this project is to be able to reproduce and benchmark self-adaptive video applications using the proposed solution.&lt;/p>
&lt;p>The topic that I&amp;rsquo;m currently working on is &amp;ldquo;Reproduce and benchmark self-adaptive edge applications under dynamic resource management&amp;rdquo; or known as edgebench is led by Prof. Junchen Jiang and Yuyang Huang. Edgebench is a project that focuses on how to efficiently distribute resource (bandwidth and cpu usage) across several video applications. Nowaday&amp;rsquo;s video applications process its data or video on a server or known as edge computing, hence bandwidth or compute unit may be the greatest concern if we talk about edge computing in terms of WAN, because it is strictly limited. We may distribute the bandwidth evenly across the cameras, however the needs of bandwidth/compute unit of each camera is different. Therefore we need another solution to tackle this problem, the solution proposed recently is called &amp;ldquo;accuracy gradient&amp;rdquo;, with this solution, we can tell how much of one application needs the bandwidth on a certain time to achieve higher accuracy. The goal of this solution is to allocate more bandwidth to the apps which has the higher f1-score improvement and reduce the other which doesn&amp;rsquo;t have a significant diminishment of f1-score. Henceforth, in the end we would have a higher total f1-score.&lt;/p>
&lt;p>Throughout this summer, we have planned to implement the &amp;ldquo;accuracy gradient&amp;rdquo; and test several baselines to be compared with the solution. As for the implementation, we are currently implementing the latency measurement. We are aware that there is an overhead over this solution, therefore the latency should be taken into account.&lt;/p></description></item><item><title>Automatic Cluster Performance Shifts Detection Toolkit</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/anl/perfdrift/20230527-kangrui/</link><pubDate>Sat, 27 May 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/anl/perfdrift/20230527-kangrui/</guid><description>&lt;p>Hi! I am Kangrui, a Pre-doc student at the University of Chicago. As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/anl/perfdrift">Automatic Cluster Performance Shifts Detection Toolkit&lt;/a> my &lt;a href="https://drive.google.com/file/d/1AxpgWLzF3oKTFlD8q6JYS35CxxJ6c76X/view?usp=share_link" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;strong>Sandeep Madireddy&lt;/strong> and &lt;strong>Ray Andrew&lt;/strong> aims to design a real-time performance shift detection algorithm for high-performance computing clusters, ensuring minimal overheads.&lt;/p>
&lt;p>This project focuses on developing a real-time performance shift detection algorithm tailored to heterogeneous workloads, aiming to promptly inform administrators about performance changes. The primary goal is to design an algorithm that efficiently detects shifts in real-time, with minimal system overheads.&lt;/p>
&lt;p>In addition to algorithm development, we plan to enhance the Darshan toolkit&amp;rsquo;s functionality by integrating our algorithm, offering users early performance shift detection. This integration will aid administrators in making informed system utilization and scheduling decisions.&lt;/p>
&lt;p>To promote transparency and reproducibility, we&amp;rsquo;ll encapsulate our findings, scripts, and profiling data within a Jupyter notebook, especially Chameleon Trovi, enabling other researchers to reproduce our experiments easily.&lt;/p>
&lt;p>Looking ahead, we plan to expand the algorithm&amp;rsquo;s applicability to cater to diverse HPC workloads and infrastructures. Other areas of interest include its use in detecting shifts in financial markets or monitoring IoT data streams. Further refinement of our algorithm, to reduce overheads and improve real-time detection capabilities, is also a part of our future endeavours. This task may involve evaluating various shift detection methods and noise filtering techniques.&lt;/p></description></item><item><title>Using Reproducibility in Machine Learning Education: Reproducibility with Incomplete Methodology Descriptions</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/eduml/20230527-indianspeedster/</link><pubDate>Sat, 27 May 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/eduml/20230527-indianspeedster/</guid><description>&lt;p>Hey,&lt;/p>
&lt;p>I am Shekhar and I am one of several students who are working on the project &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/nyu/eduml">Using Reproducibility in Machine Learning Education&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fraida-fund/">Fraida Fund&lt;/a>. My &lt;a href="https://drive.google.com/file/d/1rCzLGIJ8HYCVjY_MfndgrQjAQa2SQbqZ/view?usp=sharing" target="_blank" rel="noopener">Proposal&lt;/a> aims to develop interactive educational materials about reproducibility in machine learning, for use in graduate and undergraduate classes. My project is inspired by my experience in the &lt;a href="https://paperswithcode.com/rc2022" target="_blank" rel="noopener">Machine Learning Reproducibility Challenge&lt;/a>, where I found that a major challenge for reproducibility was that some details were left ambiguous in the paper I was trying to reproduce. For my project, I will develop an interactive tutorial to help demonstrate how if the methodology details are not fully specified in a publication, then someone trying to reproduce the result will have to make choices that may not match the authors’, and these choices will affect whether or not the final result is validated.&lt;/p></description></item><item><title>Update OpenROAD Documentation and Tutorials</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20230526-luarss/</link><pubDate>Fri, 26 May 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20230526-luarss/</guid><description>&lt;p>Hi! I am Jack, a Masters student at the National University of Singapore. In GSoC 2023, I will be undertaking the project entitled &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsd/openroad">Update OpenROAD Documentation and Tutorials&lt;/a> to improve the user experience and documentation of this exciting open-source RTL-to-GDSII framework, jointly mentored by &lt;strong>Indira Iyer Almeida&lt;/strong> and &lt;strong>Vitor Bandeira&lt;/strong>. Check out my proposal &lt;a href="https://drive.google.com/file/d/1_R4zDe2N05LtAsvDKa3w6C98GvIZ8HAI/view?usp=sharing" target="_blank" rel="noopener">here!&lt;/a>&lt;/p>
&lt;p>This project aims to review and update missing documentation and tutorials in OpenROAD-flow-scripts. A key focus will be on increasing ease-of-setup by updating documentation, setup scripts and docker-based commands. Next, we will also update documentation for the following OpenROAD components: Makefile flow variable, distributed detailed routing, Hier-RTLMP, Autotuner. If time permits, cloud enablement will be implemented, alongside notebook-based packaging to further increase ease of adoption.&lt;/p></description></item><item><title>Advancing Reproducible Science through Open Source Laboratory Protocols as Software</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/labop/20230621-luhesketh/</link><pubDate>Thu, 25 May 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/labop/20230621-luhesketh/</guid><description>&lt;p>Hello everyone!&lt;/p>
&lt;p>My name is Luiza, I am an eighth-semester Bsc Biological Sciences student from São Paulo, Brazil. As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsd/labop">LabOp&lt;/a> working group, my &lt;a href="https://docs.google.com/document/d/1pJ7UIATZYASXjbLdUosvq08QkhPNTFxZFId9dapNp-o/edit?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/dan-bryce/">Dan Bryce&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/tim-fallon/">Tim Fallon&lt;/a> aims to build a conversor that takes normal laboratory protocols and translates them into machine executable protocols. This is possible thanks to LabOP&amp;rsquo;s versatility to represent what a Laboratory protocol should look like. I´ll be testing this specialization in Hamilton machines that are great for experimenting scalling up.&lt;/p>
&lt;p>Nowadays we face a very common issue between Biotechnology laboratories, that is that protocols are difficult to share and to adapt for machine execution. Laboratory protocols are critical to biological research and development, yet complicated to communicate and reproduce across projects, investigators, and organizations. While many attempts have been made to address this challenge, there is currently no available protocol representation that is unambiguous enough for precise interpretation and automation, yet simultaneously abstract enough to enable reuse and adaptation.&lt;/p>
&lt;p>With LabOP we can take a protocol and convert it in multiple ways depending on the needs of the researcher for automation or human experimentation and allowing flexibility for execution and experimentation so I`ll be building a specialization that translates protocols in a way that they can be executed by Hamilton machines.&lt;/p></description></item><item><title>Measuring Open-source Database Systems under TPC-C Benchmark with Unreported Settings</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/osu/missingsettings/20230526-ren.450/</link><pubDate>Thu, 25 May 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/osu/missingsettings/20230526-ren.450/</guid><description>&lt;p>As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/osu/missingsettings">Measuring Research Prototypes under Unreported Settings&lt;/a> my &lt;a href="https://drive.google.com/file/d/1ouFre-qMDCL_LiH5jFNUCOI1yAYHdWcS/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/yang-wang/">Yang Wang&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/miao-yu/">Miao YU&lt;/a> aims to understand the impact of missing settings in artifact evaluation.&lt;/p>
&lt;p>The project plans to measure the impact of different missing settings for open-source database systems, such as MySQL and PostgreSQL particularly under the TPC-C Benchmark. The objective requires to run experiments on popular settings that are not reported and fix any problems during the experiments for the target systems. The project will compare the performance characteristics, and analyze the impact of missing settings on the performance of the target systems.&lt;/p></description></item><item><title>Verify the reproducibility of an experiment</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/noworkflow/20230524-jesselima/</link><pubDate>Wed, 24 May 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/noworkflow/20230524-jesselima/</guid><description>&lt;p>Hello everyone,
my name is Jesse and I&amp;rsquo;m proud to be a fellow in this 2023 Summer of Reproducibility program, contributing to &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/nyu/noworkflow">noWorkflow&lt;/a> project.&lt;/p>
&lt;p>My &lt;a href="https://docs.google.com/document/d/1YMtPjZXcgt5eplyxIgQE8IBpQIiRlB9eqVSQiIPhXNU/edit?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> was accepted under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/joao-felipe-pimentel/">João Felipe Pimentel&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/juliana-freire/">Juliana Freire&lt;/a> and aims to
work mapping and testing the capture of the provenance in typical Data Science and Machine Learning experiments.&lt;/p>
&lt;h4 id="what">What&amp;hellip;&lt;/h4>
&lt;p>Although much can be said about what reproducibility means, the ability to replicate results in day-to-day Data Science and Machine Learning experiments can pose a significant challenge for individuals, companies and researche centers. This challenge becomes even more pronounced with the emergence of analytics and IA, where scientific methodologies are extensively applied on an industrial scale. Then reproducibility assumes a key role in productivity and accountability expected from Data Scientists, Machine Learning Engineers, and other roles engaged in ML/AI projects.&lt;/p>
&lt;h4 id="how">How&amp;hellip;&lt;/h4>
&lt;p>In the day-to-day, the pitfalls of non-reproducibility appear at different points of the experiment lifecycle. These challenges arise when multiple experiments need to be managed for an individual or a team of scientists. In a typical experiment workflow, reproducibility appears in different steps of the process:&lt;/p>
&lt;ul>
&lt;li>The need to track the provenance of datasets.&lt;/li>
&lt;li>The need to manage changes in hypothesis tests.&lt;/li>
&lt;li>Addressing the management of system hardware and OS setups.&lt;/li>
&lt;li>Dealing with outputs from multiple experiments, including the results of various model trials.&lt;/li>
&lt;/ul>
&lt;p>In academic environments, these issues can result in mistakes and inaccuracies. In companies, they can lead to inefficiencies and technical debts that are difficult to address in the future.&lt;/p>
&lt;h4 id="finally">Finally&amp;hellip;&lt;/h4>
&lt;p>I believe this is a great opportunity to explore the emergence of these two hot topics that are IA and reproducilibity! I will share more updateds here throughout this summer and hope we can learn a lot together!&lt;/p></description></item><item><title>Teaching Computer Networks with Reproducible Research: Developing a 'classroom competition' for adaptive video delivery</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/edunet/20230524-srishti-j18/</link><pubDate>Tue, 23 May 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/edunet/20230524-srishti-j18/</guid><description>&lt;p>As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/nyu/edunet">Teaching Computer Networks with Reproducible Research project&lt;/a> my &lt;a href="https://drive.google.com/file/d/1EI0Zhh6YFwufEZ-53VWwhTOyJUuw7-Rf/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fraida-fund/">Fraida Fund&lt;/a> aims to develop a classroom competition for adaptive video delivery policies, leveraging an existing open-source reproducible result. The competition will challenge students to extend the original work and design their adaptive policies for head-to-head competition against their classmates.The project will involve packaging the existing result for easy reproducibility and building on it by implementing other adaptive video policies from the literature, developing different network settings for evaluating student submissions, and creating an evaluation framework for scoring submissions based on various criteria (so that competition remains fair and unbiased). The deliverables include a functional submission and evaluation process, an evaluation framework, and documentation and materials for course instructors to use in the classroom.&lt;/p></description></item><item><title>Reproducible Evaluation of Multipath Network Protocols</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/farmingdale/multipath/</link><pubDate>Thu, 16 Feb 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/farmingdale/multipath/</guid><description>&lt;p>Lead Mentor: &lt;a href="mailto:aydini@farmingdale.edu">Ilknur Aydin&lt;/a>&lt;/p>
&lt;p>As mobile devices with dual WiFi and cellular interfaces become widespread, network protocols have been developed that utilize the availability of multiple paths. However, the relative effectiveness of these protocols is highly dependent on the characteristics of the network (including the relationship between the two paths, which are often not independent). Researchers typically evaluate a multipath protocol for a small set of network scenarios, which vary from one publication to the next. It is therefore difficult to get a good picture of how different protocols perform in a range of settings.&lt;/p>
&lt;h3 id="framework-for-repeatable-direct-comparison-of-multipath-transport-protocols">Framework for repeatable, direct comparison of multipath transport protocols&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: Computer networks, wireless systems&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Linux, networking, data analysis and visualization, writing&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Large&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: 350 hours&lt;/li>
&lt;li>&lt;strong>Mentor(s)&lt;/strong>: &lt;a href="mailto:aydini@farmingdale.edu">Ilknur Aydin&lt;/a> and &lt;a href="mailto:ffund@nyu.edu">Fraida Fund&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>In single-path congestion control, the &lt;a href="https://pantheon.stanford.edu/" target="_blank" rel="noopener">Pantheon&lt;/a> work created a reference set of executable benchmarks that researchers could use to evaluate novel congestion control designs against existing work in a wide range of the scenarios. This project seeks to achieve something similar for multipath protocols, using publicly available networking testbeds like &lt;a href="https://fabric-testbed.net/" target="_blank" rel="noopener">FABRIC&lt;/a>. For this project, the participant will:&lt;/p>
&lt;ul>
&lt;li>Prepare a set of network benchmarks for multipath protocols, using live network links, real link traces, and emulated scenarios&lt;/li>
&lt;li>Develop an experiment using the benchmarks to evaluate existing multipath protocol implementations&lt;/li>
&lt;li>Prepare materials that researchers can use to evaluate novel multipath protocols against the others in the benchmark&lt;/li>
&lt;/ul></description></item><item><title>Is Reproducibility Enough? Understanding the Impact of Missing Settings in Artifact Evaluation</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/osu/missingsettings/</link><pubDate>Wed, 08 Feb 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/osu/missingsettings/</guid><description>&lt;p>While Artifact Evaluation tries to ensure that the evaluation results in a paper are reproducible, it leaves one question: How about experiment settings NOT reported by the paper? Such “missing settings” may create multiple problems: 1) sometimes the artifacts simply do not work under these missing settings, creating problems when a later work needs to compare to an earlier work under these settings; 2) sometimes the artifacts do not perform well under these missing settings, which may create a bias during the evaluation; 3) to improve the artifact to work under these missing settings, sometimes one needs to re-design the system, which may change the results of the original experiments.&lt;/p>
&lt;p>In this project, we plan to understand the impact of this problem: On the necessity side, how would these missing settings affect the conclusions of the original work? On the feasibility side, how much effort does it require to carry out extensive experiments? We plan to answer these questions by reproducing prior works, running them on popular settings that are not reported by these works, and fixing problems if any.&lt;/p>
&lt;h3 id="measuring-research-prototypes-under-unreported-settings">Measuring Research Prototypes under Unreported Settings&lt;/h3>
&lt;p>&lt;strong>Topics:&lt;/strong> reproducibility, databases, key-value stores, DNN training&lt;br>
&lt;strong>Skills:&lt;/strong> Java/Python, Linux, TPC/YCSB&lt;br>
&lt;strong>Difficulty:&lt;/strong> Medium&lt;br>
&lt;strong>Size:&lt;/strong> 350 hours&lt;br>
&lt;strong>Mentor(s):&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/yang-wang/">Yang Wang&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/miao-yu/">Miao YU&lt;/a>&lt;br>
&lt;strong>Contributor(s):&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/xueyuan-ren/">Xueyuan Ren&lt;/a>&lt;/p>
&lt;p>The student will first pick one or a few systems she is interested in. Then she will first try to reproduce their reported results. If successful, she will further try to measure these systems under previously unreported settings. During the procedure, she will need to diagnose and fix any problems that may show up. Finally, she will analyze whether the original conclusions still hold under these new settings and whether fixing any problems will change the performance characteristics of the target systems.&lt;/p></description></item><item><title>noWorkflow</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/nyu/noworkflow/</link><pubDate>Tue, 07 Feb 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/nyu/noworkflow/</guid><description>&lt;p>The &lt;a href="https://github.com/gems-uff/noworkflow" target="_blank" rel="noopener">noWorkflow&lt;/a> project aims at allowing scientists to benefit from provenance data analysis even when they don&amp;rsquo;t use a workflow system. Also, the goal is to allow them to avoid using naming conventions to store files originated in previous executions. Currently, when this is not done, the result and intermediate files are overwritten by every new execution of the pipeline.&lt;/p>
&lt;p>noWorkflow was developed in Python, and it is currently able to capture provenance of Python scripts using Software Engineering techniques such as abstract syntax tree (AST) analysis, reflection, and profiling, to collect provenance without the need of a version control system or any other environment.&lt;/p>
&lt;p>At the moment of this writing, the main version of noWorkflow is in the 2.0-alpha branch. We intend to release it before the summer.&lt;/p>
&lt;h3 id="verify-the-reproducibility-of-an-experiment">Verify the reproducibility of an experiment&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Reproducibility&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, SQL or SQLAlchemy ORM&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/joao-felipe-pimentel/">João Felipe Pimentel&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/juliana-freire/">Juliana Freire&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Implement an algorithm to compare the provenance from two (or more) trials (i.e., executions of an experiment) to check their reproducibility. The provenance stored in the relational (sqlite) database by noWorkflow 2 contains intermediate variable values from a trial. These values could be compared to check how much or where executions deviate from each other.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Compare trials of the same script (Medium)&lt;/li>
&lt;li>Estimate how much on trial deviate from another (Medium)&lt;/li>
&lt;li>Consider different scripts and execution flows (Large)&lt;/li>
&lt;li>Indicate which parts of the scripts are not reproducible (Large)&lt;/li>
&lt;/ul>
&lt;h3 id="control-levels-of-provenance-collection">Control levels of provenance collection&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Log experiments&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/joao-felipe-pimentel/">João Felipe Pimentel&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/juliana-freire/">Juliana Freire&lt;/a>&lt;/li>
&lt;li>&lt;strong>Contributor(s)&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jesse-lima/">Jesse Lima&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Add support for different levels of provenance collection in noWorkflow 2. Currently, noWorkflow 2 collects Python construct evaluations and all the dependencies among the evaluations. However, this collection is inefficient, since some of the collected provenance may not be necessary for end-users. In this project, it is desirable to provide ways to temporarily disable the provenance collection and to manually indicate the provenance in this situation.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Disable the collection inside specific functions (through decorators?)&lt;/li>
&lt;li>Disable the collection inside specific regions of the code (through with statements?)&lt;/li>
&lt;li>Collect only function activations in a region, instead of all variable dependencies&lt;/li>
&lt;li>Disable the collection of specific modules&lt;/li>
&lt;li>Design a DSL to express general dependencies for parts of the code where the collection is disabled&lt;/li>
&lt;/ul>
&lt;h3 id="upgrade-noworkflow-collection-to-support-new-python-constructs">Upgrade noWorkflow collection to support new Python constructs&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Log experiments&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/joao-felipe-pimentel/">João Felipe Pimentel&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/juliana-freire/">Juliana Freire&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Implement new AST transformations for provenance collection. While noWorkflow 2 works for newer Python versions, most of its implementation was targeted at Python 3.7. Newer Python versions have new constructs in which the provenance is ignored.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Identify which AST constructs implementations are missing&lt;/li>
&lt;li>Design AST transformations to execute functions before and after the evaluation of the constructs&lt;/li>
&lt;li>Create the dependencies for the new constructs&lt;/li>
&lt;/ul></description></item><item><title>ScaleBugs: Reproducible Scalability Bugs</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucdavis/scalebugs/</link><pubDate>Tue, 07 Feb 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucdavis/scalebugs/</guid><description>&lt;p>Scalable systems lay essential foundations of the modern information industry. HPC data centers tend to have hundreds to thousands of nodes in their clusters. The use of “extreme-scale” distributed systems has given birth to a new type of bug: scalability bugs. As its name suggests, scalability bugs may be presented depending on the scale of a run, and thus, symptoms may only be observable in large-scale deployments, but not in small or median deployments. For example, &lt;a href="https://issues.apache.org/jira/browse/CASSANDRA-6127" target="_blank" rel="noopener">Cassandra-6127&lt;/a> is a scalability bug detected in the popular distributed database Cassandra. The scalability bug causes unnecessary CPU usage, however, the symptom is not observed unless ~1000 nodes are deployed. This demonstrates the main challenge of studying scalability bugs: it is extremely challenging to reproduce without deploying the system at a large scale.&lt;/p>
&lt;p>In this project, our goal is to build a dataset of &lt;strong>reproducible&lt;/strong> scalability bugs. To achieve this, we will go through the existing bug reports for popular distributed systems, which include Cassandra, HDFS, Ignite, and Kafka. For each bug report, we determine if the reported bug depends on the scale of the run, such as the number of nodes utilized. With the collected scale-dependent bugs, we then will craft the workload to reproduce those scalability bugs. Our workloads will be designed to trigger some functionalities of the system under different configurations (e.g., different numbers of nodes), for which we will observe the impact on performance. For example, a successful reproduction should be able to show the performance drop along with an increasing number of nodes.&lt;/p>
&lt;h3 id="building-a-dataset-of-reproducible-scalability-bugs">Building a Dataset of Reproducible Scalability Bugs&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: Scalability systems, bug patterns, reproducibility, bug dataset&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Linux Shell, Docker, Java, Python&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/cindy-rubio-gonzalez/">Cindy Rubio González&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/haryadi-s.-gunawi/">Haryadi S. Gunawi&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/hao-nan-zhu/">Hao-Nan Zhu&lt;/a>&lt;/li>
&lt;li>&lt;strong>Contributor(s)&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/goodness-ayinmode/">Goodness Ayinmode&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/zahra-nabila-maharani/">Zahra Nabila Maharani&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The student will build a dataset of reproducible scalability bugs. Each bug artifact in the dataset will contain (1) the buggy and fixed versions of the scalability system, (2) a runtime environment that ensures reproducibility, and (3) a workload shell script that could demonstrate the symptoms of the bug under different scales.&lt;/p>
&lt;h4 id="specific-tasks">Specific Tasks&lt;/h4>
&lt;ul>
&lt;li>Work with the mentors to understand the context of the project.&lt;/li>
&lt;li>Learn the background of scalability systems.&lt;/li>
&lt;li>Inspect the bug reports from Apache JIRA and identify scale-dependent bugs.&lt;/li>
&lt;li>Craft shell scripts to trigger the exact scalability bug described by the bug report.&lt;/li>
&lt;li>Organize the reproducible scalability bugs and write documentation to build the code
and trigger the bug.&lt;/li>
&lt;/ul></description></item><item><title>LabOP - an open specification for laboratory protocols, that solves common interchange problems stemming from variations in scale, labware, instruments, and automation.</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsd/labop/</link><pubDate>Mon, 06 Feb 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsd/labop/</guid><description>&lt;!---
Instructions for project submission here: https://ospo.ucsc.edu/osredocs/formentors/
All the projects so far:
https://ospo.ucsc.edu/osre/#projects
-->
&lt;h3 id="project-idea-1-software-hardware-and-wetware-building-labop-with-simultaneous-language--protocol-development--test-executions">Project idea 1: Software, hardware, and wetware building LabOP with simultaneous language &amp;amp; protocol development &amp;amp; test executions&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Software standard development, Laboratory automation, Biology&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, Semantic Web Technologies (RDF, OWL), interest to think about describing biological &amp;amp; chemical laboratory processes&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong>
&lt;ol>
&lt;li>&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/tim-fallon/">Tim Fallon&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/dan-bryce/">Dan Bryce&lt;/a>&lt;/li>
&lt;/ol>
&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h4 id="about-the-laboratory-open-protocol-language-labop">About: The Laboratory Open Protocol Language (LabOP)&lt;/h4>
&lt;p>&lt;strong>See link: &lt;a href="https://bioprotocols.github.io/labop/" target="_blank" rel="noopener">https://bioprotocols.github.io/labop/&lt;/a>&lt;/strong>&lt;/p>
&lt;p>LabOP is an &lt;em>open&lt;/em> specification for laboratory protocols, that solves common interchange problems stemming from variations in scale,
labware, instruments, and automation. LabOP was built from the ground-up to support protocol interchange. It provides an extensible
library of protocol primitives that capture the control and data flow needed for simple calibration and culturing protocols to
industrial control.&lt;/p>
&lt;h5 id="software-ecosystem">Software Ecosystem&lt;/h5>
&lt;p>LabOP&amp;rsquo;s rich representation underpins an ecosystem of several powerful software tools, including:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://www.github.com/bioprotocols/labop" target="_blank" rel="noopener">labop&lt;/a>: the Python LabOP library, which supports:
&lt;ul>
&lt;li>&lt;em>Programming&lt;/em> LabOP protocols in Python,&lt;/li>
&lt;li>&lt;em>Serialization&lt;/em> of LabOP protocols conforming to the LabOP RDF specification,&lt;/li>
&lt;li>&lt;em>Execution&lt;/em> in the native LabOP semantics (rooted in the UML activity model),&lt;/li>
&lt;li>&lt;em>Specialization&lt;/em> of protocols to 3rd-party protocol formats (including Autoprotocol, OpenTrons, and human readible formats), and&lt;/li>
&lt;li>&lt;em>Integration&lt;/em> with instruments (including OpenTrons OT2, Echo, and SiLA-based automation).&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="https://www.github.com/bioprotocols/laboped" target="_blank" rel="noopener">laboped&lt;/a>: the web-based LabOP Editor, which supports:
&lt;ul>
&lt;li>&lt;em>Programming&lt;/em> LabOP protocols quickly with low-code visual scripts,&lt;/li>
&lt;li>&lt;em>Storing&lt;/em> protocols on the cloud,&lt;/li>
&lt;li>&lt;em>Exporting&lt;/em> protocol specializations for use in other execution frameworks,&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h4 id="about-the-bioprotocols-working-group">About the Bioprotocols Working Group&lt;/h4>
&lt;p>The Bioprotocols Working Group is an open community organization developing a free and open standard for representation of biological
protocols.&lt;/p>
&lt;p>To join the Bioprotocols Working Group:&lt;/p>
&lt;ul>
&lt;li>Join the community mailing list at: &lt;a href="https://groups.google.com/g/bioprotocols" target="_blank" rel="noopener">https://groups.google.com/g/bioprotocols&lt;/a>&lt;/li>
&lt;li>Join the &lt;code>#collab-bioprotocols&lt;/code> channel on the &lt;a href="https://bitsinbio.org/" target="_blank" rel="noopener">Bits in Bio&lt;/a> Slack.&lt;/li>
&lt;/ul>
&lt;h5 id="leadership">Leadership&lt;/h5>
&lt;p>&lt;em>Elected Term: August 24th, 2022 - August 23rd, 2023&lt;/em>&lt;/p>
&lt;p>&lt;strong>Chair:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/dan-bryce/">Dan Bryce&lt;/a> (SIFT)&lt;/p>
&lt;p>&lt;strong>Finance Committee:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;a href="mailto:jeremy.cahill@metamerlabs.io">Jeremy Cahill (Metamer Labs)&lt;/a>&lt;/li>
&lt;li>&lt;a href="mailto:mark.doerr@uni-greifswald.de">Mark Doerr (University of Greifswald)&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/tim-fallon/">Tim Fallon&lt;/a> (UCSD)&lt;/li>
&lt;/ul>
&lt;h5 id="governance">Governance&lt;/h5>
&lt;p>&lt;em>Approved by community vote on August 16th, 2022&lt;/em>&lt;/p>
&lt;p>&lt;strong>&lt;a href="https://bioprotocols.github.io/labop/about#Governance" target="_blank" rel="noopener">https://bioprotocols.github.io/labop/about#Governance&lt;/a>&lt;/strong>&lt;/p>
&lt;h5 id="mission">Mission:&lt;/h5>
&lt;p>The Bioprotocols Working Group is an open community organization developing free and open standards for representation of biological
protocols. In support of that goal, the organization also develops tools and practices and works with other organizations to
facilitate dissemination and adoption of these standards.&lt;/p>
&lt;p>As an organization, the Bioprotocols Working Group holds the following values:&lt;/p>
&lt;ul>
&lt;li>The standards developed by the community should be available under permissive free and open licenses.&lt;/li>
&lt;li>Technical decisions of the community should be made following open and inclusive processes.&lt;/li>
&lt;li>The community is strengthened by fostering a culture of diversity and inclusion, in which all constructive participants feel
comfortable making their voices heard.&lt;/li>
&lt;/ul></description></item><item><title>GPU Emulator for Easy Reproducibility of DNN Training</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/utexas/gpuemulator/</link><pubDate>Sun, 05 Feb 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/utexas/gpuemulator/</guid><description>&lt;p>Deep Neural Networks (DNN) have achieved success in many machine learning (ML) tasks including image recognition, video classification and natural language processing. Nonetheless, training DNN models is highly computation intensive and usually requires running complex computations on GPUs, while GPU is a very expensive and scarce resource. Therefore, many research works on DNN training are delayed because of the lack of access to GPUs. However, many research prototypes don&amp;rsquo;t require GPUs but only the performance profiles of GPUs. For example, research on DNN training storage systems doesn’t need to run real computations on GPUs, but only needs to know how much time each GPU computation will take. Meanwhile, GPU performance in DNN training is predictable and reproducible, as every batch of training performs a deterministic sequence of mathematical operations on a fixed number of data.&lt;/p>
&lt;p>Therefore, in this project we seek to build a GPU emulator platform on PyTorch to easily reproduce DNN training without using real GPUs. We will measure the performance profiles of GPU computations for different models, GPU types, and batch sizes. Based on the measured GPU performance profiles, we will build a platform to emulate the GPU behaviors and reproduce DNN training using CPUs only. We will make the platform and the measurements open-source, allowing other researchers to reproduce the performance measurements and easily conduct research on DNN training systems. We will also encourage the community to enrich the database by adding GPU performance measurements for their own models and GPU types. We will be the first one to build and release this kind of GPU emulator for DNN training, and we believe researchers and the community can benefit a lot from it, especially after more and more GPU performance profiles are added by the community.&lt;/p>
&lt;h3 id="building-a-platform-to-emulate-gpu-performance-in-dnn-training">Building a platform to emulate GPU performance in DNN training&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> DNN training, reproducibility, GPU emulator, performance measurement - Skills: Linux, Python, PyTorch, deep learning&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> 350 hours&lt;/li>
&lt;li>&lt;strong>Mentor(s):&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/vijay-chidambaram/">Vijay Chidambaram&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/yeonju-ro/">Yeonju Ro&lt;/a>&lt;/li>
&lt;li>&lt;strong>Contributor(s):&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/haoran-wu/">Haoran Wu&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The student will measure the GPU performance profiles for different models and GPU types, based on which the student will build a platform to emulate the GPU behaviors and easily reproduce DNN training. The GPU performance measurements should be made open-source and reproducible for other researchers to reproduce results and add GPU profiles for their own needs.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Work with mentors on understanding the context of the project.&lt;/li>
&lt;li>Study and get familiar with the PyTorch DNN training pipelines&lt;/li>
&lt;li>Measure GPU performance profiles for different DNN models and GPU types&lt;/li>
&lt;li>Based on the GPU performance measurements, build a platform to emulate the GPU behaviors and reproduce DNN training without using real GPUs&lt;/li>
&lt;li>Organize and document the codes to make them reproducible for the community&lt;/li>
&lt;/ul></description></item><item><title>Reproduce and benchmark self-adaptive edge applications under dynamic resource management</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/uchicago/edgebench/</link><pubDate>Thu, 02 Feb 2023 00:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/uchicago/edgebench/</guid><description>&lt;p>With the flourishing of the ideas like smart cities or smart manufacturing, a massive amount of edge devices (e.g., traffic or security cameras, thermometers, flood sensors, et al.) are deployed and connected to the network to collect/analyze data across the space and time and help the stakeholders like city governments or manufacturers optimizing their plans and operations. Such a large number of edge devices and large amount of communications among the devicesdd or to the central servers rise a big challenge on how to manage/schedule the resource (i.e., network bandwidth between the devices and/or computing power on both edge devices and bare metal servers) to ensure the running applications&amp;rsquo; capability of providing a reliable service. Furthermore, with the nature of limited resources available to the edge devices, there is an uprising trend to reduce the average compute and/or bandwidth usage by leveraging the uneven distribution of interesting events with respect to both time and space in the input data. This brings further challenges for provisioning and managing the amount of resources available to the edge devices, as the running applications&amp;rsquo; resource demands can greatly depend on the input data which is both dynamic and unpredictable.&lt;/p>
&lt;p>With these challenges in mind, the team previously designed and implemented a dynamic resource manager that could understand the applications and make decisions based on such understanding at run time. This understanding is achieved based on a key insight - applications will have different magnitudes of performance improvement/degradation toward the change in the amount of resources available depending on the input data and how many resources the applications currently have, which we define as applications&amp;rsquo; sensitivities. However, such a resource manager has only been tested with a limited number and types of video analytic applications. Hence, through the OSRE23 project, we aim to:&lt;/p>
&lt;ol>
&lt;li>reproduce other state-of-art self-adaptive video analytic applications,&lt;/li>
&lt;li>integrate the reproducible applications into the resource manager framework,&lt;/li>
&lt;li>compare the performance with and without resource manager.&lt;/li>
&lt;/ol>
&lt;h3 id="reproducebenchmark-the-self-adaptive-video-analytic-applications-performance-under-dynamic-resource-management">Reproduce/benchmark the self-adaptive video analytic applications&amp;rsquo; performance under dynamic resource management&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Benchmark, Reproducibility, Video analytics, Machine Learning, Resource Management&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, PyTorch, TensorFlowd&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:junchenj@uchicago.edu">Junchen Jiang&lt;/a>, &lt;a href="mailto:yuyangh@uchicago.edu">Yuyang Huang&lt;/a>&lt;/li>
&lt;li>&lt;strong>Contributor(s):&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/faishal-zharfan/">Faishal Zharfan&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Integrate various types of video analytic applications into the aforementioned dynamic resource manager and reproduce/benchmark the applications&amp;rsquo; performance.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Reproduce state-of-art video analytic applications&lt;/li>
&lt;li>Integrate such applications into the resource manager framework - Benchmark video analytic applications&lt;/li>
&lt;li>Analysis the benchmarked performance results&lt;/li>
&lt;/ul></description></item><item><title>FlashNet: Towards Reproducible Data Science for Storage System</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/uchicago/flashnet/</link><pubDate>Thu, 02 Feb 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/uchicago/flashnet/</guid><description>&lt;p>The Data Storage Research Vision 2025, organized in an NSF workshop, calls for more “AI for storage” research. However, performing ML-for-storage research can be a daunting task for new storage researchers. The person must know both the storage side as well the ML side as if studying two different fields at the same time. This project aims to answer these questions:&lt;/p>
&lt;ol>
&lt;li>How can we encourage data scientists to look into storage problems?&lt;/li>
&lt;li>How can we create a transparent platform that allows such decoupling?&lt;/li>
&lt;li>Within the storage/ML community can we create two collaborative communities, the storage engineers and the storage data scientists?&lt;/li>
&lt;/ol>
&lt;p>In the ML/Deep Learning community, the large ImageNet benchmarks have spurred research in image recognition. Similarly, we would like to provide benchmarks for fostering storage research in ML-based per-IO latency prediction. Therefore, we present FlashNet, a reproducible data science platform for storage systems. To start a big task, we use I/O latency prediction as a case study. Thus, FlashNet has been built for I/O latency prediction tasks. With FlashNet, data engineers can collect the IO traces of various devices. The data scientists then can train the ML models to predict the IO latency based on those traces. All traces, results, and codes will be shared in the FlashNet training ground platform which utilizes Chameleon trovi for better reproducibility.&lt;/p>
&lt;p>In this project, we plan to improve the modularity of the FlashNet pipeline and develop the Chameleon trovi packages. We will also continue to improve the performance of our binary-class and multiclass classifiers and test them on the new production traces that we collected from SNIA IOTA public trace repository. Finally, we will optimize the deployment of our continual-learning mechanism and test it in a cloud system environment. To the best of our knowledge, we are building the world-first end-to-end data science platform for storage systems.&lt;/p>
&lt;h3 id="building-flashnet-platform">Building FlashNet Platform&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Storage systems, reproducibility, machine learning, continual learning&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> C++, Python, PyTorch, Experienced with Machine Learning pipeline&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/haryadi-s.-gunawi/">Haryadi S. Gunawi&lt;/a>&lt;/li>
&lt;li>&lt;strong>Contributor(s):&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/justin-shin/">Justin Shin&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/maharani-ayu-putri-irawan/">Maharani Ayu Putri Irawan&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Build an open-source platform to enable collaboration between storage and ML communities, specifically to provide a common platform for advancing data science research for storage systems. The platform will be able to reproduce and evaluate different ML models/architecture, dataset patterns, data preprocessing techniques, and various feature engineering strategies.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Work with mentors on understanding the context of the project.&lt;/li>
&lt;li>Reproduce the FlashNet evaluation results from prior works.&lt;/li>
&lt;li>Build and improve FlashNet components based on the existing blueprint.&lt;/li>
&lt;li>Collect and analyze the FlashNet evaluation results.&lt;/li>
&lt;/ul></description></item><item><title>Reproducible Analysis &amp; Models for Predicting Genomics Workflow Execution Time</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/uga/genomicswfmodels/</link><pubDate>Thu, 02 Feb 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/uga/genomicswfmodels/</guid><description>&lt;p>A high-throughput workflow execution system is needed to continuously gain insights from th e increasingly abundant genomics data. However, genomics workflows often have long execution times (e.g., hours to days) due to their large input files. This characteristic presents many complexities when managing systems for genomics workflow execution. Furthermore, based on our observation of a large-scale genomics data processing platform, ~2% of genomics workflows exhibit a tail behavior which multiplied their execution time up to 15x of the median, resulting in weeks of execution.&lt;/p>
&lt;p>On the other hand, input files for genomic workflows often vary in quality due to differences in how they are collected. Prior works suggested that these quality differences can affect genomics workflow execution time. Yet, to the best of our knowledge, input quality has never been accounted for in the design of a high-throughput workflow execution system. Even worse, there does not appear to be a consensus on what constitutes ‘input quality,’ at least from a computer systems perspective.&lt;/p>
&lt;p>In this project, we seek to analyze a huge dataset from a large-scale genomics processing platform in order to gain insights on how ‘input quality’ affects genomic workflows’ execution times. Following that, we will build machine learning (ML) models for predicting workflow execution time, in particular those which exhibit tail behavior. We believe these insights and models can become the foundation for designing a novel tail-resilient genomics workflow execution system. Along the way, we will ensure that each step of our analysis is reproducible (e.g., in the form of Jupyter notebooks) and make all our ML models open-source (e.g., in the form of pre-trained models). We sincerely hope our work can offload some burdens commonly faced by operators of systems for genomics and, at the same time, benefit future researchers who work on the intersection of computer systems and genomics.&lt;/p>
&lt;h3 id="analyze-genomics-data-quality--build-exec-time-prediction-models">Analyze genomics data quality &amp;amp; build exec. time prediction models&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> genomics, data analysis, machine learning&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Linux, Python, Matplotlib, Pandas/Numpy, any ML library&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> 350 hours&lt;/li>
&lt;li>&lt;strong>Mentor(s):&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/in-kee-kim/">In Kee Kim&lt;/a>&lt;/li>
&lt;li>&lt;strong>Contributor(s):&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/charis-christopher-hulu/">Charis Christopher Hulu&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Analyze a large-scale trace of genomics workflow execution along with metrics from various genomics alignment tools (e.g., FastQC, Picard, and GATK metrics) and find features that
correlate the most with workflow execution time and its tail behavior. Then, based on the results, we will build ML models that accurately predict genomic workflows’ execution times.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Acquire basic understanding of genomics data processing &amp;amp; workflow execution (will be guided by the mentor)&lt;/li>
&lt;li>Reproduce past analysis &amp;amp; models built by prior members of the project&lt;/li>
&lt;li>Propose features from FastQC/Picard/GATK metrics that can be used as a predictor for execution time and tail behavior&lt;/li>
&lt;li>Write a brief analysis as to why those features might work&lt;/li>
&lt;li>Build ML models for predicting execution time&lt;/li>
&lt;li>Package the analysis in the form of Jupyter notebooks&lt;/li>
&lt;li>Package the models in a reloadable format (e.g., pickle)&lt;/li>
&lt;/ul></description></item><item><title>Reproducible Evaluation of Multi-level Erasure Coding</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ornl/multilevelerasure/</link><pubDate>Thu, 02 Feb 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ornl/multilevelerasure/</guid><description>&lt;p>Massive storage systems rely heavily on erasure coding (EC) to protect data from drive failures and provide data durability. Existing storage systems mostly adopt single-level erasure coding (SLEC) to protect data, either performing EC at the network level or performing EC at the local level. However, both SLEC approaches have limitations, as network-only SLEC introduces heavy network traffic overhead, and local-only SLEC cannot tolerate rack failures.&lt;/p>
&lt;p>Accordingly, some data centers are starting to use multi-level erasure coding (MLEC), which is a hybrid approach performing EC at both the network level and the local level. However, prior EC research and evaluations mostly focused on SLEC, and it remains to be answered how MLEC is compared to SLEC in terms of durability, capacity overhead, encoding throughput, network traffic, and other overheads.&lt;/p>
&lt;p>Therefore, in this project we seek to build a platform to evaluate the durability and overheads of MLEC. The platform will allow us to evaluate dozens of EC strategies in many dimensions including recovery strategies, chunk placement choices, various parity schemes, etc. To the best of our knowledge, there is no other evaluation platform like what we propose here. We seek to make the platform open-source and the evaluation reproducible, allowing future researchers to benefit from it and conduct more research on MLEC.&lt;/p>
&lt;h3 id="building-a-platform-to-evaluate-mlec">Building a platform to evaluate MLEC&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Storage systems, reproducibility, erasure coding, evaluation&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Linux, C, Python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> 350 hours&lt;/li>
&lt;li>&lt;strong>Mentor(s):&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/john-bent/">John Bent&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/anjus-george/">Anjus George&lt;/a>&lt;/li>
&lt;li>&lt;strong>Contributor(s):&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/zhiyan-alex-wang/">Zhiyan &amp;quot;Alex&amp;quot; Wang&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Build a platform to evaluate the durability and overheads of MLEC. The platform will be able to evaluate different EC strategies in various dimensions including repair strategies, chunk placement choices, parity schemes, etc. Analyze the evaluation results.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Work with mentors on understanding the context of the project.&lt;/li>
&lt;li>Reproduce the SLEC evaluation results from prior SLEC evaluation tools&lt;/li>
&lt;li>Based on prior SLEC evaluation tools, build a platform to evaluate the durability and overheads of MLEC under various EC strategies&lt;/li>
&lt;li>Collect and analyze the MLEC evaluation results&lt;/li>
&lt;/ul></description></item><item><title>Automatic Cluster Performance Shifts Detection Toolkit</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/anl/perfdrift/</link><pubDate>Wed, 01 Feb 2023 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/anl/perfdrift/</guid><description>&lt;p>High-performance computing (HPC) clusters typically suffer from performance degradation over time. The heterogeneous nature of clusters and the inevitable defects in various infrastructure layers will result in a harder performance prediction inside. On the other hand, when software upgrades or any such events happen, we might also observe performance improvement or degradation even though nothing changes in the hardware. Due to these uncertainties, it is necessary to send early notification to administrators of changes in cluster performance in a specific time window to inform scheduling decisions and increase cluster utilization.&lt;/p>
&lt;p>We are targeting HPC clusters that cater to heterogeneous, compute, and I/O intensive workloads that range from scientific simulation to AI model training that have high degree of parallelization in their workloads. In this scenario, we plan to use the Darshan open-source toolkit (&lt;a href="https://github.com/darshan-hpc/darshan" target="_blank" rel="noopener">https://github.com/darshan-hpc/darshan&lt;/a>) as data collection or profiling tools to design our performance drift algorithms. Furthermore, we will possibly incorporate the distribution shift detection into Darshan, making it viable as a notification to the HPC system administrators.&lt;/p>
&lt;p>Our goal is to show the efficacy of our algorithm by plotting the profiling data that display specific time windows where the performance shifts happened after being processed by our algorithm. Finally, we will package all our profiling data and experiment scripts inside Jupyter notebook, especially Chameleon Trovi, to help others reproduce our experiments.&lt;/p>
&lt;p>Through this research, we seek to contribute the following:&lt;/p>
&lt;ul>
&lt;li>Designing an algorithm to detect performance shifts in HPC clusters that can be adapted for heterogeneous workloads&lt;/li>
&lt;li>Real-time detection of the performance shifts without introducing great overheads into the system&lt;/li>
&lt;li>Contribute to Darshan to be able to automatically detect performance changes while profiling the clusters.&lt;/li>
&lt;/ul>
&lt;h3 id="automatic-and-adaptive-performance-shifts-detection">Automatic and Adaptive Performance Shifts Detection&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Statistical Machine Learning, Deep Learning, and High-Performance Computing (HPC)&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> C++, Python, Statistics, good to have: Machine Learning, Deep learning&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> Sandeep Madireddy (&lt;a href="https://www.anl.gov/profile/sandeep-r-madireddy" target="_blank" rel="noopener">https://www.anl.gov/profile/sandeep-r-madireddy&lt;/a>, &lt;a href="http://www.mcs.anl.gov/~smadireddy/" target="_blank" rel="noopener">http://www.mcs.anl.gov/~smadireddy/&lt;/a> ), Ray Andrew Sinurat (&lt;a href="https://rayandrew.me" target="_blank" rel="noopener">https://rayandrew.me&lt;/a>)&lt;/li>
&lt;li>&lt;strong>Contributor(s):&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/kangrui-wang/">Kangrui Wang&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>All in all, these are the specific tasks that the student should do:&lt;/p>
&lt;ul>
&lt;li>Collaborate and work with mentors to understand the goal of this project.&lt;/li>
&lt;li>Implement distribution shift detection in pure statistical or machine/deep learning&lt;/li>
&lt;li>Deploy the algorithm and try to see its efficacy in the clusters.&lt;/li>
&lt;li>Package this experiment to make it easier for others to reproduce&lt;/li>
&lt;/ul></description></item><item><title>OpenROAD - An Open-Source, Autonomous RTL-GDSII Flow for VLSI Designs (2023)</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsd/openroad/</link><pubDate>Wed, 01 Feb 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsd/openroad/</guid><description>&lt;p>The &lt;a href="https://theopenroadproject.org" target="_blank" rel="noopener">OpenROAD&lt;/a> project is a non-profit, DARPA-funded and Google sponsored project committed to creating low-cost and innovative Electronic Design Automation (EDA) tools and flows for IC design. Our mission is to democratize IC design, break down barriers of cost and access and mitigate schedule risk through native and open source innovation and collaboration with ecosystem partners. &lt;a href="https://github.com/The-OpenROAD-Project" target="_blank" rel="noopener">OpenROAD&lt;/a> provides an autonomous, no-human-in-the-loop, 24-hour, RTL-GDSII flow for fast ASIC design exploration, QoR estimation and physical implementation for a range of technologies above 12 nm. We welcome a diverse community of designers, researchers, enthusiasts, software engineers and entrepreneurs to use and contribute to OpenROAD and make a far-reaching impact. OpenROAD has been used in &amp;gt; 600 tapeouts across a range of ASIC applications with a rapidly growing and diverse user community.&lt;/p>
&lt;h3 id="enhance-openroad-gui-flow-manager">Enhance OpenROAD GUI Flow Manager&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>GUI&lt;/code>, &lt;code>Visualization&lt;/code>, &lt;code>User Interfaces&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++, Qt&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/matt-liberty/">Matt Liberty&lt;/a>, &lt;a href="mailto:ethanmoon@google.com">Ethan Mahintorabi&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Develop custom features for analysis and visualizations in the [OpenROAD GUI] (&lt;a href="https://openroad.readthedocs.io/en/latest/main/src/gui/README.html" target="_blank" rel="noopener">https://openroad.readthedocs.io/en/latest/main/src/gui/README.html&lt;/a>) to support native and third party flows. These include &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts" target="_blank" rel="noopener">OpenROAD-flow-scripts&lt;/a>, &lt;a href="https://github.com/The-OpenROAD-Project/OpenLane" target="_blank" rel="noopener">OpenLane&lt;/a> and other third-party flows . Create documentation: commands, developer guide notes, tutorials to show GUI usage for supported flows.&lt;/p>
&lt;h3 id="profile-and-tune-openroad-flow-for-runtime-improvements">Profile and tune OpenROAD flow for Runtime improvements&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>OpenROAD-flow-scripts&lt;/code>, &lt;code>Flow Manager&lt;/code>, &lt;code>Runtime Optimization&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Knowledge about Computational resource optimization, Cloud-based computation, Basic VLSI design and tools knowledge&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/matt-liberty/">Matt Liberty&lt;/a>, &lt;a href="mailto:ethanmoon@google.com">Ethan Mahintorabi&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Test, analyze and develop verifiable and re-producible strategies to improve run times in &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts" target="_blank" rel="noopener">OpenROAD-flow-scripts&lt;/a>. These include optimizations of computational resources over the cloud, tuning of algorithmic and design flow parameters. Create test plans using existing or new designs to show runtime improvements.&lt;/p>
&lt;h3 id="update-openroad-documentation-and-tutorials">Update OpenROAD Documentation and Tutorials&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Documentation&lt;/code>, &lt;code>Tutorials&lt;/code>, &lt;code>VLSI design basics&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Knowledge of EDA tools, basics of VLSI design flow, tcl, shell scripts, Documentation, Markdown&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/indira-iyer/">Indira Iyer&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/vitor-bandeira/">Vitor Bandeira&lt;/a>&lt;/li>
&lt;li>&lt;strong>Contributor(s)&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jack-luar/">Jack Luar&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Review and update missing documentation and tutorials in &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts" target="_blank" rel="noopener">OpenROAD-flow-scripts&lt;/a> for existing and new features. Here is an example Tutorial link: &lt;a href="https://openroad-flow-scripts.readthedocs.io/en/latest/tutorials/FlowTutorial.html" target="_blank" rel="noopener">https://openroad-flow-scripts.readthedocs.io/en/latest/tutorials/FlowTutorial.html&lt;/a> for reference.&lt;/p>
&lt;h3 id="lef-and-liberty-model-testing">LEF and Liberty Model Testing&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Testing&lt;/code>, &lt;code>LEF&lt;/code>, &amp;lsquo;LIB&amp;rsquo;, &lt;code>VLSI design basics&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Knowledge of EDA tools, basics of VLSI design, lef and lib model abstracts, tcl, shell scripts, Verilog, Layout&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/matt-liberty/">Matt Liberty&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Test the accuracy of generated LIB and LEF models for signoff in &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts" target="_blank" rel="noopener">OpenROAD-flow-scripts&lt;/a> for flat and hierarchical design flows. Build test cases to validate and add to the regression suite.&lt;/p></description></item><item><title>Teaching Computer Networks with Reproducible Research</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/nyu/edunet/</link><pubDate>Wed, 18 Jan 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/nyu/edunet/</guid><description>&lt;p>Lead Mentor: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fraida-fund/">Fraida Fund&lt;/a>&lt;/p>
&lt;p>In the field of computer networks and wireless communication systems, the availability of open access networking and cloud computing testbeds (&lt;a href="https://portal.geni.net/" target="_blank" rel="noopener">GENI&lt;/a>, &lt;a href="https://cloudlab.us/" target="_blank" rel="noopener">CloudLab&lt;/a>, &lt;a href="https://chameleoncloud.org/" target="_blank" rel="noopener">Chameleon&lt;/a>, &lt;a href="https://fabric-testbed.net/" target="_blank" rel="noopener">FABRIC&lt;/a>, and others) has been transformative in promoting reproducible research &lt;em>and&lt;/em> in making high-quality experiential learning available to students and educators at a wide range of colleges and universities. This project seeks to unite research and education use of these testbeds by developing new ways of using reproducible research to teach computer networks and related topics.&lt;/p>
&lt;h3 id="bringing-foundational-results-into-the-classroom">Bringing foundational results into the classroom&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: Computer networks, reproducibility, education&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Linux, writing&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: 350 hours&lt;/li>
&lt;li>&lt;strong>Mentor(s)&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fraida-fund/">Fraida Fund&lt;/a> and TBD&lt;/li>
&lt;/ul>
&lt;p>To make foundational results from computer networks more concrete, this project seeks to reproduce a selection of key results and package them for use as interactive classroom demonstrations. (An example of a &amp;ldquo;foundational&amp;rdquo; result might be the result from the 1980s that motivates congestion control by showing how &lt;a href="http://dx.doi.org/10.1016/0169-7552%2889%2990019-6" target="_blank" rel="noopener">congestion collapse occurs when the network is under heavy load&lt;/a>.) This involves:&lt;/p>
&lt;ul>
&lt;li>Reproducing the original results on an open-access testbed&lt;/li>
&lt;li>Packaging the materials for use as a classroom demo, with interactive elements&lt;/li>
&lt;li>Creating assessment questions and sample &amp;ldquo;solutions&amp;rdquo; related to the materials, that instructors may use in homework assignments or exams&lt;/li>
&lt;/ul>
&lt;h3 id="developing-a-classroom-competition-for-adaptive-video-delivery-policies">Developing a &amp;ldquo;classroom competition&amp;rdquo; for adaptive video delivery policies&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: Computer networks, adaptive video, reproducibility, education&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Linux, Python, writing&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: 350 hours&lt;/li>
&lt;li>&lt;strong>Mentor(s)&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fraida-fund/">Fraida Fund&lt;/a> and TBD&lt;/li>
&lt;li>&lt;strong>Contributor(s)&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/srishti-jaiswal/">Srishti Jaiswal&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>A carefully designed competition can be a fun and exciting way for students to challenge themselves and gain &amp;ldquo;ownership&amp;rdquo; of a new topic. This projects builds on an existing open source &lt;a href="https://witestlab.poly.edu/blog/adaptive-video-reproducing/" target="_blank" rel="noopener">reproducible result&lt;/a> for adaptive video delivery, and will challenge students to extend this work and design their own adaptive video policies for head-to-head competition against their classmates. This includes:&lt;/p>
&lt;ul>
&lt;li>Packaging the result to make it easier for students to reproduce and then build on the original work&lt;/li>
&lt;li>Implementing other adaptive video policies from the literature, so that students can use them as a baseline&lt;/li>
&lt;li>Developing different network settings (using live link traces and emulated link patterns) in which student submissions may be evaluated&lt;/li>
&lt;li>Developing an evaluation framework for scoring student submissions on different criteria and in different network settings, and making the results available in a leaderboard format&lt;/li>
&lt;/ul></description></item><item><title>Using Reproducibility in Machine Learning Education</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/nyu/eduml/</link><pubDate>Wed, 18 Jan 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/nyu/eduml/</guid><description>&lt;p>Lead Mentor: &lt;a href="mailto:ffund@nyu.edu">Fraida Fund&lt;/a>&lt;/p>
&lt;p>The computer science and engineering classroom is as essential part of the reproducibility &amp;ldquo;ecosystem&amp;rdquo; - because of broad reach and potential for big impact, and because for many students, the classroom is their first exposure to research in their field. For machine learning in particular, reproducibility is an important element of the research culture, and can be a valuable part of any introductory or advanced courses in the field. These projects will develop highly interactive open educational resources, that may be adopted by instructors of graduate or undergraduate machine learning courses to incorporate more instruction about reproducibility and reproducible research.&lt;/p>
&lt;h3 id="introducing-levels-of-reproduction-and-replication-in-ml">Introducing &amp;ldquo;levels&amp;rdquo; of reproduction and replication in ML&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: Machine learning, reproducibility, education&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Python, machine learning, writing&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: 350 hours&lt;/li>
&lt;li>&lt;strong>Mentor(s)&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fraida-fund/">Fraida Fund&lt;/a> and TBD&lt;/li>
&lt;li>&lt;strong>Contributor(s)&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mohamed-saeed/">Mohamed Saeed&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>In machine learning, replicating a published result to confirm the validity of the experimental results and the broader conclusions of the paper can take several forms, with increasing levels of effort:&lt;/p>
&lt;ul>
&lt;li>using authors&amp;rsquo; code and pre-trained weights, run the model on the same benchmarks as the original paper&lt;/li>
&lt;li>training a model using authors&amp;rsquo; code and published hyperparameters,&lt;/li>
&lt;li>training a model using authors&amp;rsquo; code and a new hyperparamter search,&lt;/li>
&lt;li>validating the authors&amp;rsquo; code e.g. with unit tests, in addition to training,&lt;/li>
&lt;li>re-implementing the model,&lt;/li>
&lt;li>designing additional experiments to validate that the suggested mechanism is in fact responsible for the result,&lt;/li>
&lt;li>and more.&lt;/li>
&lt;/ul>
&lt;p>This project will develop interactive materials (using one or more exemplar published results) to illustrate and to highlight relevant aspects and pitfalls of each of these &amp;ldquo;levels&amp;rdquo; of reproduction and replication.&lt;/p>
&lt;h3 id="packaging-existing-reproducible-results-for-the-ml-classroom">Packaging existing reproducible results for the ML classroom&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: Machine learning, reproducibility, education&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Python, machine learning, writing&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: 350 hours&lt;/li>
&lt;li>&lt;strong>Mentor(s)&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/fraida-fund/">Fraida Fund&lt;/a> and TBD&lt;/li>
&lt;li>&lt;strong>Contribuor(s)&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/shekhar/">Shekhar&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jonathan-edwin/">Jonathan Edwin&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The goal is to make it easier for instructors to expose students to state-of-the-art research in the classroom. This project will work with an existing set of recent reproducible results in machine learning, and will package them for easier consumption by students and more effective use in the classroom. This may include, but is not necessarily limited to:&lt;/p>
&lt;ul>
&lt;li>Re-validating the result and re-packaging along with computational environment on an open access testbed&lt;/li>
&lt;li>Creating tutorial material around the result, including interactive visualizations to demonstrate key elements of the work&lt;/li>
&lt;li>Creating one-click demos for applying the model/technique to a new test sample&lt;/li>
&lt;li>Curating test samples to highlight important advantages and limitations of the result&lt;/li>
&lt;li>Creating assessment questions and sample &amp;ldquo;solutions&amp;rdquo; that instructors may use to &amp;ldquo;assign&amp;rdquo; the work to students&lt;/li>
&lt;/ul></description></item><item><title>Public Artifact Data and Visualization</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/intel/artifactviz/</link><pubDate>Mon, 09 Jan 2023 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/intel/artifactviz/</guid><description>&lt;p>Reproducibility and Artifact Evaluation efforts have focused on reproducing the results, but not necessarily on storing, visualizing and making the results accessible. This set of projects builds the initial building blocks to log, capture, and visualize experiments.&lt;/p>
&lt;h3 id="experiment-log">Experiment Log&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Provide tools to log experiments&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Simple&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/anjo-vahldiek-oberwagner/">Anjo Vahldiek-Oberwagner&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Develop a client and server side tool to start/stop an experiment, timestamp the experiment. Document each iteration of the experiment and create a database to visualize the log of experiments.&lt;/p>
&lt;h3 id="capture-hwsw-state--continuous-monitoring">Capture HW/SW state &amp;amp; continuous monitoring&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Record initial state&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/anjo-vahldiek-oberwagner/">Anjo Vahldiek-Oberwagner&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Provide simple tools to gather the initial state of each experimental machine and its connected devices, configurations, software versions, &amp;hellip; Upload into the experiment log database and visualize the recorded data. Ideally, provide diff function between experimental runs.&lt;/p>
&lt;p>In a second step, monitor the machine’s state during the execution. This includes, network, memory, CPU, general OS statistics.&lt;/p>
&lt;h3 id="record-and-visualize-experimental-results">Record and visualize experimental results&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Record results in various formats and visualize them&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/anjo-vahldiek-oberwagner/">Anjo Vahldiek-Oberwagner&lt;/a>&lt;/li>
&lt;li>&lt;strong>Contributor(s):&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jiayuan-zhu/">Jiayuan Zhu&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/krishna-madhwani/">Krishna Madhwani&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Description: Experiments generate results in various formats (e.g., CSV, json, text files, …). The goal of this project is to provide tools to extract common formats, connect the results to the experiment log and visualize them. Ideally, allowing to compare different experimental runs. Initially, the project could dump their results into a Prometheus instance (&lt;a href="https://prometheus.io/" target="_blank" rel="noopener">https://prometheus.io/&lt;/a>) which would later become available for everyone to explore the data.&lt;/p></description></item><item><title>Package Management &amp; Reproducibility</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/packaging/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/packaging/</guid><description>&lt;p>Project ideas related to reproducibility and package management, especially as it relates to &lt;em>store type package managers&lt;/em> (&lt;a href="http://nixos.org/" target="_blank" rel="noopener">NixOS&lt;/a>, &lt;a href="https://guix.gnu.org/" target="_blank" rel="noopener">Guix&lt;/a> or &lt;a href="https://spack.io/" target="_blank" rel="noopener">Spack&lt;/a>).&lt;/p>
&lt;p>Lead Mentor: &lt;a href="https://users.soe.ucsc.edu/~fmzakari" target="_blank" rel="noopener">Farid Zakaria&lt;/a> &lt;a href="mailto:fmzakari@ucsc.edu">mailto:fmzakari@ucsc.edu&lt;/a>&lt;/p>
&lt;h3 id="investigate-the-dynamic-linking-landscape">Investigate the dynamic linking landscape&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Operating Systems&lt;/code> &lt;code>Compilers&lt;/code> &lt;code>Linux&lt;/code> &lt;code>Package Management&lt;/code> &lt;code>NixOS&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Experience with systems programming and Linux familiarity&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate to Challenging&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:fmzakari@ucsc.edu">Farid Zakaria&lt;/a> &amp;amp; &lt;a href="https://people.llnl.gov/scogland1" target="_blank" rel="noopener">Tom Scogland&lt;/a> &lt;a href="mailto:scogland1@llnl.gov">mailto:scogland1@llnl.gov&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Dynamic linking as specified in the ELF file format has gone unchallenged since it&amp;rsquo;s invention. With many new package management models that eschew the filesystem hierarchy standard (i.e. Nix, Guix and Spack), many of the idiosyncrasies that define the way in which libraries are discovered are no longer useful and potentially harmful.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Continue development on &lt;a href="https://github.com/fzakaria/shrinkwrap" target="_blank" rel="noopener">Shrinkwrap&lt;/a> a tool to make dynamic library loading simpler and more robust.&lt;/li>
&lt;li>Evaluate it&amp;rsquo;s effectiveness across a wide range of binaries.&lt;/li>
&lt;li>Upstream contributions to &lt;a href="http://nixos.org/" target="_blank" rel="noopener">NixOS&lt;/a> or &lt;a href="https://guix.gnu.org/" target="_blank" rel="noopener">Guix&lt;/a> to leverage the improvement when suitable.&lt;/li>
&lt;li>Investigate alternative improvements to dynamic linking by writing a dynamic linker &amp;ldquo;loadder wrapper&amp;rdquo; to explore new ideas.&lt;/li>
&lt;/ul></description></item></channel></rss>