Tanu Malik | UCSC OSPO

Assessing the Computational Reproducibility of Jupyter Notebooks

Tue, 18 Jun 2024 00:00:00 +0000

Like so many authors before me, my first reproducibility study and very first academic publication started with the age-old platitude, “Reproducibility is a cornerstone of the scientific method.” My team and I participated in a competition to replicate the performance improvements promised by a paper presented at last year’s Supercomputing conference. We weren’t simply re-executing the same experiment with the same cluster; instead, we were trying to confirm that we got similar results on a different cluster with an entirely different architecture. From the very beginning, I struggled to wrap my mind around the many reasons for reproducing computational experiments, their significance, and how to prioritize them. All I knew was that there seemed to be a consensus that reproducibility is important to science and that the experience left me with more questions than answers.

Not long after that, I started a job as a research software engineer at Purdue University, where I worked heavily with Jupyter Notebooks. I used notebooks and interactive components called widgets to create a web application, which I turned into a reusable template. Our team was enthusiastic about using Jupyter Notebooks to quickly develop web applications because the tools were accessible to the laboratory researchers who ultimately needed to maintain them. I was fortunate to receive the Better Scientific Software Fellowship to develop tutorials to teach others how to use notebooks to turn their scientific workflows into web apps. I collected those and other resources and established the Jupyter4Science website, a knowledgebase and blog about Jupyter Notebooks in scientific contexts. That site aims to improve the accessibility of research data and software.

There seemed to be an important relationship between improved accessibility and reuse of research code and data and computational reproducibility, but I still had trouble articulating it. In pursuit of answers, I moved to sunny Arizona to pursue a History and Philosophy of Science degree. My research falls at the confluence of my prior experiences; I’m studying the reproducibility of scientific Jupyter Notebooks. I have learned that questions about reproducibility aren’t very meaningful without considering specific aspects such as who is doing the experiment and replication, the nature of the experimental artifacts, and the context in which the experiment takes place.

I was fortunate to have found a mentor for the Summer of Reproducibility, Tanu Malik, who shares the philosophy that the burden of reproducibility should not solely rest on domain researchers who must develop other expertise. She and her lab have developed FLINC, an application virtualization tool that improves the portability of computational notebooks. Her prior work demonstrated that FLINC provides efficient reproducibility of notebooks and takes significantly less time and space to execute and repeat notebook execution than Docker containers for the same notebooks. My work will expand the scope of this original experiment to include more notebooks to FLINC’s test coverage and show robustness across even more diverse computational tasks. We expect to show that infrastructural tools like FLINC improve the success rate of automated reproducibility.

I’m grateful to both the Summer of Reproducibility program managers and my research mentor for this incredible opportunity to further my dissertation research in the context of meaningful collaboration.

(Re)Evaluating Artifacts for Understanding Resource Artifacts

Wed, 20 Mar 2024 00:00:00 +0000

Project Idea Description

Topics: Virtualization, Containerization, Profiling, Reproducibility
Skills: C and Python and DevOps experience.
Difficulty: Medium
Size: Large; 350 hours
Mentors: Tanu Malik

This project aims to characterize computer-science related artifacts that are either submitted to conferences or deposited in reproducibility hubs such as Chameleon. We aim to characterize experiments into different types and understand reproducibility requirements of this rich data set, possibly leading to a benchmark. We will then understand packaging requirements, especially of distributed experiments and aim to instrument a package archiver to reproduce a distributed experiment. Finally, we will use learned experiment characteristics to develop a classifier that will determine alternative resources where experiment can be easily reproduced.

Project Deliverable Specific Tasks include: A pipeline consisting of a set of scripts to characterize artifacts. Packaged artifacts and an analysis report with open-sourced data about the best guidelines to package using Chameleon. A classifier system based on artifact and resource characteristics.

ReproNB: Reproducibility of Interactive Notebook Systems

Mon, 26 Feb 2024 00:00:00 +0000

Project Idea Description

Topics: HPC, MPI, distributed systems
Skills: C++, Python
Difficulty: Difficult
Size: Large; 350 hours
Mentors: Tanu Malik

Notebooks have gained wide popularity in scientific computing. A notebook is both a web-based interactive front- end to program workflows and a lightweight container for sharing code and its output. Reproducing notebooks in different target environments, however, is a challenge. Notebooks do not share the computational environment in which they are executed. Consequently, despite being shareable they are often not reproducible. We have developed FLINC (see also eScience'22 paper) to address this problem. However, it currently does not support all forms of experiments, especially those relating to HPC experiments. In this project we will extend FLINC to HPC experiments. This will involve using recording and replaying mechanisms such as ReMPI and rr within FLINC.

Project Deliverable

The project deliverable will be a set of HPC experiments that are packaged with FLINC and available on Chamaeleon.