<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Arya Sarkar | UCSC OSPO</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/arya-sarkar/</link><atom:link href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/arya-sarkar/index.xml" rel="self" type="application/rss+xml"/><description>Arya Sarkar</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><image><url>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/arya-sarkar/avatar_hu9d1c110d95f71b128727cf00460e8803_618693_270x270_fill_q75_lanczos_center.jpg</url><title>Arya Sarkar</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/arya-sarkar/</link></image><item><title>Static and Interactive Visualization Capture</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/niu/repro-vis/20250301-aryas/</link><pubDate>Fri, 30 Aug 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/niu/repro-vis/20250301-aryas/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Hello! My name is &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/arya-sarkar/">Arya Sarkar&lt;/a> a machine learning engineer and researcher based out of Kolkata, a city in Eastern India dubbed the City of Joy.
During summer of 2024, I worked closely with Professor &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/david-koop/">David Koop&lt;/a> on the project titled &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/niu/repro-vis/">Reproducibility in Data Visualization&lt;/a>.
We explored multiple existing solutions and tested different stratergies and made great progress in the capture of visualiations using a relatively less used method of embedding visualization meta-information into the final resultant visualizations jpg as a json object.&lt;/p>
&lt;h2 id="progress-and-challenges">Progress and Challenges&lt;/h2>
&lt;p>Static Visualization Capture&lt;/p>
&lt;p>We successfully developed a method to capture static visualizations as .png files along with embedded metadata in a JSON format.
This approach enables seamless reproducibility of the visualization by storing all necessary metadata within the image file itself.
Our method supports both Matplotlib and Bokeh libraries and demonstrated near-perfect reproducibility, with only a minimal 1-2% pixel difference in cases where jitter (randomness) was involved.&lt;/p>
&lt;p>Interactive Visualization Capture&lt;/p>
&lt;p>For interactive visualizations, our focus shifted to capturing state changes in Plotly visualizations on the web.
We developed a script that tracks user interactions (e.g., zoom, box, lasso, slider) using event listeners and automatically captures the visualization state as both image and metadata files.
This script also maintains a history of interactions to ensure reproducibility of all interaction states.&lt;/p>
&lt;p>The challenge of capturing web-based visualizations from platforms like ObservableHq remains, as iframe restrictions prevent direct access to SVG elements.
Further exploration is needed to create a more robust capture method for these environments.&lt;/p>
&lt;p align="center">
&lt;img src="./bokeh_interactive.png" alt="bokeh interactive capture" style="width: 80%; height: auto;">
&lt;/p>
&lt;h1 id="future-work">Future Work&lt;/h1>
&lt;p>We aim to package our interactive capture script into a Google Chrome extension.&lt;/p>
&lt;p>Temporarily store interaction session files in the browser’s local storage.&lt;/p>
&lt;p>Enable users to download captured files as a zip archive, using base64 encoding for images.&lt;/p>
&lt;h1 id="conclusion">Conclusion&lt;/h1>
&lt;p>The last summer, we made significant strides in enhancing data visualization reproducibility.
Our innovative approach to embedding metadata directly into visualization files offers a streamlined method for recreating static visualizations.
The progress in capturing interactive visualization states opens new possibilities for tackling a long-standing challenge in the field of reproducibility.&lt;/p></description></item><item><title> Reproducibility in Data Visualization</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/niu/repro-vis/20240718-aryas/</link><pubDate>Thu, 18 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/niu/repro-vis/20240718-aryas/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Hello! My name is &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/arya-sarkar/">Arya Sarkar&lt;/a> a machine learning engineer and researcher based out of Kolkata, a city in Eastern India dubbed the City of Joy.
For the last month and a half I have been working closely with Professor &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/david-koop/">David Koop&lt;/a> on the project titled &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/niu/repro-vis/">Reproducibility in Data Visualization&lt;/a>. I’m thrilled to be able to make my own little mark on this amazing project and aid in exploring solutions to capture visualizations in hopes of making reproducibility easier in this domain.&lt;/p>
&lt;h2 id="progress-and-challenges">Progress and Challenges&lt;/h2>
&lt;p>The last month and a half have mostly been spent trying to explore best possible solutions to facilitate the reproducibility of STATIC visualizations from local sources and/or the web.
We have taken inspiration from existing work in the domain and successfully captured meta-information required to ensure reproducibility in the regenerated visualizations from the said metadata. The metadata extracted is saved into the generated .png figure of the visualization therefore allowing reproducibility as long as you have (a) The original dataset (b) The generated .png of the visualization. Every other information is stored inside the .png file as a json object and can be used to regenerate the original image with a very high accuracy.&lt;/p>
&lt;p>The problem however remains with visualizations where randomness such as jitter is involved. Capturing the randomness has not been 100% successful as of now, and we are looking into options to ensure the capture of certain plots that contains randomness.&lt;/p>
&lt;p>The following images can be used to highlight some results from our reproducibility experiments:
Original Histogram using Matplotlib on the iris dataset:
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="original_figure4" srcset="
/report/osre24/niu/repro-vis/20240718-aryas/original_histogram_hua04132746cb0ed26b86c32673b823c8f_29642_4d5ccda2a3e4409f5fb5bfccad4abae9.webp 400w,
/report/osre24/niu/repro-vis/20240718-aryas/original_histogram_hua04132746cb0ed26b86c32673b823c8f_29642_3d4477374e3469fd72bbb32675129816.webp 760w,
/report/osre24/niu/repro-vis/20240718-aryas/original_histogram_hua04132746cb0ed26b86c32673b823c8f_29642_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/niu/repro-vis/20240718-aryas/original_histogram_hua04132746cb0ed26b86c32673b823c8f_29642_4d5ccda2a3e4409f5fb5bfccad4abae9.webp"
width="760"
height="468"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
Reproduced Histogram using metainformation from the original:
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="reproduced_figure4" srcset="
/report/osre24/niu/repro-vis/20240718-aryas/Reproduced_histogram_hub205e2d6c877abb784c35befc8616823_26597_9ca3975509f66dbedf2746a253660ec4.webp 400w,
/report/osre24/niu/repro-vis/20240718-aryas/Reproduced_histogram_hub205e2d6c877abb784c35befc8616823_26597_ca77d573979d523935009285864d087b.webp 760w,
/report/osre24/niu/repro-vis/20240718-aryas/Reproduced_histogram_hub205e2d6c877abb784c35befc8616823_26597_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/niu/repro-vis/20240718-aryas/Reproduced_histogram_hub205e2d6c877abb784c35befc8616823_26597_9ca3975509f66dbedf2746a253660ec4.webp"
width="760"
height="490"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="the-next-steps">The next steps&lt;/h2>
&lt;p>We have already started looking into solutions and ways to capture visualizations from the web i.e. from platforms such as ObservableHq and use these experiments to transition into capturing interactive visualizations from the web.&lt;/p>
&lt;p>Capturing user interactions and all states in an interactive visualization can prove to be very useful as it is a very known pain-point in the reproducibility community and has been a challenge that needs to be solved. My next steps involve working on finding a solution to capture these interactive visualizations especially those living on the web and ensuring their reproducibility.&lt;/p></description></item><item><title> Reproducibility in Data Visualization</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/niu/repro-vis/20240614-aryas/</link><pubDate>Fri, 14 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/niu/repro-vis/20240614-aryas/</guid><description>&lt;p>Hello! My name is &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/arya-sarkar/">Arya Sarkar&lt;/a> and I will be contributing to the research project titled &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/niu/repro-vis/">Reproducibility in Data Visualization&lt;/a>, with a focus on investigating and coming up with novel solutions to capture both static and dynamic visualizations from different sources. My project is titled Investigate Solutions for Capturing Visualizations and I am mentored by Prof. &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/david-koop/">David Koop&lt;/a>.&lt;/p>
&lt;p>Open-source has always piqued my interest, but often I found it hard to get started in as a junior in university. I spent a lot of time working with data visualizations but had never dived into the problem of reproducibility before diving into this project. When I saw a plethora of unique and interesting projects during the contribution phase of OSRE-2024, I was confused at the beginning. However, the more I dived into this project and understood the significance of research in this domain to ensure reproducibility, the more did I find myself getting drawn towards it. I am glad to be presented this amazing opportunity to work in the Open-source space as a researcher in reproducibility.&lt;/p>
&lt;p>This project aims to investigate, augment, and/or develop solutions to capture visualizations that appear in formats including websites and Jupyter notebooks. We have a special interest on capturing the state of interactive visualizations and preserving the user interactions required to reach a certain visualization in an interactive environment to ensure reproducibility.&lt;a href="https://drive.google.com/file/d/1SGLd37zBjnAU-eYytr7mYzfselHgxvK1/view?usp=sharing" target="_blank" rel="noopener">My proposal can be viewed here!&lt;/a>&lt;/p></description></item></channel></rss>