<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Raül Sirvent | UCSC OSPO</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/raul-sirvent/</link><atom:link href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/raul-sirvent/index.xml" rel="self" type="application/rss+xml"/><description>Raül Sirvent</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><image><url>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/raul-sirvent/avatar_hud738f276fdadbe21bdf5cd2996bc7298_552976_270x270_fill_lanczos_center_3.png</url><title>Raül Sirvent</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/raul-sirvent/</link></image><item><title>Final blog: Automatic reproducibility of COMPSs experiments through the integration of RO-Crate in Chameleon</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/intel/20240822-architd/</link><pubDate>Thu, 22 Aug 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/intel/20240822-architd/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Hello everyone,&lt;/p>
&lt;p>I&amp;rsquo;m Archit from India, an undergraduate student at the Indian Institute of Technology, Banaras Hindu University (IIT BHU), Varanasi. As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/bsc/ro-crate-compss/">Automatic Reproducibility of COMPSs Experiments through the Integration of RO-Crate in Chameleon&lt;/a> project, my &lt;a href="https://drive.google.com/file/d/1qY-uipQZPox144LD4bs05rn3islfcjky/view" target="_blank" rel="noopener">proposal&lt;/a>, under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/raul-sirvent/">Raül Sirvent&lt;/a>, aims to develop a service that facilitates the automated replication of COMPSs experiments within the Chameleon infrastructure.&lt;/p>
&lt;h2 id="about-the-project">About the Project&lt;/h2>
&lt;p>The project proposes to create a service that can take a COMPSs crate (an artifact adhering to the RO-Crate specification) and, through analysis of the provided metadata, construct a Chameleon-compatible image for replicating the experiment on the testbed.&lt;/p>
&lt;h2 id="final-product">Final Product&lt;/h2>
&lt;p align="center">
&lt;img src="./logo.png" alt="Logo" style="width: 60%; height: auto;">
&lt;/p>
&lt;p>The basic workflow of the COMPSs Reproducibility Service can be explained as follows:&lt;/p>
&lt;ol>
&lt;li>The service takes the workflow path or link as the first argument from the user.&lt;/li>
&lt;li>The program shifts the execution to a separate sub-directory, &lt;code>reproducibility_service_{timestamp}&lt;/code>, to store the results from the reproducibility process.&lt;/li>
&lt;li>Two main flags are required:
&lt;ul>
&lt;li>&lt;strong>Provenance flag&lt;/strong>: If you want to generate the provenance of the workflow via the runcompss runtime.&lt;/li>
&lt;li>&lt;strong>New Dataset flag&lt;/strong>: If you want to reproduce the experiment with a new dataset instead of the one originally used.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>If there are any remote datasets, they are fetched into the sub-directory.&lt;/li>
&lt;li>The main work begins with parsing the metadata from &lt;code>ro-crate-metadata.json&lt;/code> and verifying the files present inside the dataset, as well as any files downloaded as remote datasets. This step generates a status table for the user to check if any files are missing or have modified sizes.&lt;/li>
&lt;/ol>
&lt;p align="center">
&lt;img src="./status_table.png" alt="Status Table" style="width: 70%; height: auto;">
&lt;/p>
&lt;ol start="6">
&lt;li>The final step is to transform the &lt;code>compss-command-line.txt&lt;/code> and all the paths specified inside it to match the local environment where the experiment will be reproduced. This includes:
&lt;ul>
&lt;li>Mapping the paths from the old machine to new paths inside the RO-Crate.&lt;/li>
&lt;li>Changing the runtime to &lt;code>runcompss&lt;/code> or &lt;code>enqueue_compss&lt;/code>, depending on whether the environment is a SLURM cluster.&lt;/li>
&lt;li>Detecting if the paths specified in the command line are for results, and redirecting them to new results inside the &lt;code>reproducibility_service_{timestamp}\Results&lt;/code> directory.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>After this, the service prompts the user to add any additional flags to the final command. Upon final verification, the command is executed via Python&amp;rsquo;s subprocess pipe.&lt;/li>
&lt;/ol>
&lt;p align="center">
&lt;img src="./end.png" alt="End Image" style="width: 50%; height: auto;">
&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Logging System&lt;/strong>: All logs related to the Reproducibility Service are stored inside the &lt;code>reproducibility_service_{timestamp}\log&lt;/code>.&lt;/li>
&lt;/ul>
&lt;p>You can view the basic &lt;a href="https://github.com/Minimega12121/COMPSs-Reproducibility-Service/blob/main/pseudocode.txt" target="_blank" rel="noopener">pseudocode&lt;/a> of the service.&lt;/p>
&lt;h2 id="conclusion-and-future-work">Conclusion and Future Work&lt;/h2>
&lt;p>It&amp;rsquo;s been a long journey since I started this project, and now it&amp;rsquo;s finally coming to an end. I have learned a lot from this experience, from weekly meetings with my mentor to working towards long-term goals—it has all been thrilling. I would like to thank the OSRE community and my mentor for providing me with this learning opportunity.&lt;/p>
&lt;p>This is only version 1.0.0 of the Reproducibility Service. If I have time from my coursework, I would like to fix any bugs or improve the service further to meet user needs.&lt;/p>
&lt;p>However, the following issues still exist with the service and can be improved upon:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Third-party software dependencies&lt;/strong>: Automatic detection and loading of these dependencies on a SLURM cluster are not yet implemented. Currently, these must be handled manually by the user.&lt;/li>
&lt;li>&lt;strong>Support for workflows with &lt;code>data_persistence = False&lt;/code>&lt;/strong>: There is no support for workflows where all datasets are remote files.&lt;/li>
&lt;/ul>
&lt;h2 id="deliverables">Deliverables&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://github.com/Minimega12121/COMPSs-Reproducibility-Service" target="_blank" rel="noopener">Reproducibility Service Repository&lt;/a>: This repository contains the main service along with guidelines on how to use it. The service will be integrated with the COMPSs official distribution in its next release.&lt;/li>
&lt;li>&lt;a href="https://www.chameleoncloud.org/appliances/121/" target="_blank" rel="noopener">Chameleon Appliance&lt;/a> : This is a single-node appliance with COMPSs 3.3.1 installed, so that anyone with access to Chameleon can reproduce experiments.&lt;/li>
&lt;/ul>
&lt;!-- - [Experiments Analysis](https://docs.google.com/spreadsheets/d/1W4CKqiYVPquSwXFRITbb1Hga1xcyv2_3DJIcq7JalZk/edit?gid=0#gid=0) : This report contains details of experiments I have reproduced using the Reproducibility Service on a SLURM cluster, a local machine, and a Chameleon appliance, along with observations. -->
&lt;h2 id="previous-blogs">Previous Blogs&lt;/h2>
&lt;p>Make sure to check out my other blogs to see how I started this project and the challenges I faced along the way:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/intel/20240612-architd/">First blog&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/intel/20240729-architd/">Mid-term blog&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Thank you for reading the blog, have a nice day!!&lt;/p></description></item><item><title>Mid-term Blog: Automatic reproducibility of COMPSs experiments through the integration of RO-Crate in Chameleon</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/intel/20240729-architd/</link><pubDate>Mon, 29 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/intel/20240729-architd/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Hello everyone
I&amp;rsquo;am Archit from India. An undergraduate student at the Indian Institute of Technology, Banaras Hindu University, IIT (BHU), Varanasi. As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/bsc/ro-crate-compss/">Automatic reproducibility of COMPSs experiments through the integration of RO-Crate in Chameleon&lt;/a> my &lt;a href="https://drive.google.com/file/d/1qY-uipQZPox144LD4bs05rn3islfcjky/view" target="_blank" rel="noopener">proposal&lt;/a> under mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/raul-sirvent/">Raül Sirvent&lt;/a> aims to develop a service that facilitates the automated replication of COMPSs experiments within the Chameleon infrastructure.&lt;/p>
&lt;h2 id="about-the-project">About the project:&lt;/h2>
&lt;p>The project proposes to create a service that will have the capability to take a COMPSs crate (an artifact adhering to the RO-Crate specification) and, through analysis of the provided metadata construct a Chameleon-compatible image for replicating the experiment on the testbed.&lt;/p>
&lt;h2 id="progress">Progress&lt;/h2>
&lt;p>It has been more than six weeks since the ReproducibilityService project began, and significant progress has been made. You can test the actual service from my GitHub repository: &lt;a href="https://github.com/Minimega12121/COMPSs-Reproducibility-Service" target="_blank" rel="noopener">ReproducibilityService&lt;/a>. Let&amp;rsquo;s break down what the ReproducibilityService is capable of doing now:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Support for Reproducing Basic COMPSs Experiments&lt;/strong>: The RS program is now fully capable of reproducing basic COMPSs experiments with no third-party dependencies on any device with the COMPSs Runtime installed. Here&amp;rsquo;s how it works:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Getting the Crate&lt;/strong>: The RS program can accept the COMPSs workflow from the user either as a path to the crate or as a link from WorkflowHub. In either case, it creates a sub-directory for further execution named &lt;code>reproducibility_service_{timestamp}&lt;/code> and stores the workflow as &lt;code>reproducibility_service_{timestamp}/Workflow&lt;/code>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Address Mapping&lt;/strong>: The ro-crate contains &lt;code>compss_submission_command_line.txt&lt;/code>, which is the command originally used to execute the experiment. This command may include many paths such as &lt;code>runcompss flag1 flag2 ... flagn &amp;lt;main_workflow_file.py&amp;gt; input1 input2 ... inputn output&lt;/code>. The RS program maps all the paths for &lt;code>&amp;lt;main_workflow_file.py&amp;gt; input1 input2 ... inputn output&lt;/code> to paths inside the machine where we want to reproduce the experiment. The flags are dropped as they may be device-specific, and the service asks the user for any new flags they want to add to the COMPSs runtime.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Verifying Files&lt;/strong>: Before reproducing an experiment, it&amp;rsquo;s crucial to check whether the inputs or outputs have been tampered with. The RS program cross-verifies the &lt;code>contentSize&lt;/code> from the &lt;code>ro-crate-metadata.json&lt;/code> and generates warnings in case of any abnormalities.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Error Logging&lt;/strong>: In case of any problems during execution, the &lt;code>std_out&lt;/code> and &lt;code>std_err&lt;/code> are stored inside &lt;code>reproducibility_service_{timestamp}/log&lt;/code>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Results&lt;/strong>: If any results do get generated by the experiment, the RS program stores them inside &lt;code>reproducibility_service_{timestamp}/Results&lt;/code>. If we
ask for the provenance of the workflow also, the ro-crate thus generated is also stored here only.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="REPRODUCIBILITY SERVICE FLOWCHART" srcset="
/report/osre24/intel/20240729-architd/RS_chart_hu1a952b7a4697c53cd74822153911f260_56808_4df9e9a771513277aaf5c7a4d8182666.webp 400w,
/report/osre24/intel/20240729-architd/RS_chart_hu1a952b7a4697c53cd74822153911f260_56808_0b96071409b70d8356241465bf214510.webp 760w,
/report/osre24/intel/20240729-architd/RS_chart_hu1a952b7a4697c53cd74822153911f260_56808_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/intel/20240729-architd/RS_chart_hu1a952b7a4697c53cd74822153911f260_56808_4df9e9a771513277aaf5c7a4d8182666.webp"
width="760"
height="267"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;ol start="2">
&lt;li>&lt;strong>Support for Reproducing Remote Datasets&lt;/strong>: If a remote dataset is specified inside the metadata file, the RS program fetches the dataset from the specified link using &lt;code>wget&lt;/code>, stores the remote dataset inside the crate, and updates the path in the new command line it generates.&lt;/li>
&lt;/ol>
&lt;h2 id="challenges-and-end-term-goals">Challenges and End-Term Goals&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Support for DATA_PERSISTENCE_FALSE&lt;/strong>: The RS program still needs to support crates with &lt;code>dataPersistence&lt;/code> set to false. After weeks of brainstorming ideas on how to implement this, we recently concluded that since the majority of &lt;code>DATA_PERSISTENCE_FALSE&lt;/code> crates are run on SLURM clusters, and the dataset required to fetch in such a case is somewhere inside the cluster, the RS program will support this case for such clusters. Currently, I am working with the Nord3v2 cluster to further enhance the functionality of ReproducibilityService.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Chameleon Cluster Setup&lt;/strong>: I have made some progress towards creating a new COMPSs 3.3 Appliance on Chameleon to test the service. However, creating the cluster setup script needed for the service to run on a COMPSs 3.3.1 cluster to execute large experiments has been challenging.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Integrating with COMPSs Repository&lt;/strong>: After completing the support for &lt;code>dataPersistence&lt;/code> false cases, we aim to launch this service as a tool inside the &lt;a href="https://github.com/bsc-wdc/compss" target="_blank" rel="noopener">COMPSs repository&lt;/a>. This will be a significant milestone in my developer journey as it will be the first real-world project I have worked on, and I hope everything goes smoothly.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Stay tuned for the next blog!!&lt;/p></description></item><item><title>Automatic reproducibility of COMPSs experiments through the integration of RO-Crate in Chameleon</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/intel/20240612-architd/</link><pubDate>Wed, 12 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/intel/20240612-architd/</guid><description>&lt;p>Hello everyone
I&amp;rsquo;am Archit from India. An undergraduate student at the Indian Institute of Technology, Banaras Hindu University, IIT (BHU), Varanasi. As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/bsc/ro-crate-compss/">Automatic reproducibility of COMPSs experiments through the integration of RO-Crate in Chameleon&lt;/a> my &lt;a href="https://drive.google.com/file/d/1qY-uipQZPox144LD4bs05rn3islfcjky/view" target="_blank" rel="noopener">proposal&lt;/a> under mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/raul-sirvent/">Raül Sirvent&lt;/a> aims to develop a service that facilitates the automated replication of COMPSs experiments within the Chameleon infrastructure.&lt;/p>
&lt;h2 id="about-the-project">About the project:&lt;/h2>
&lt;p>The project proposes to create a service that will have the capability to take a COMPSs crate (an artifact adhering to the RO-Crate specification) and, through analysis of the provided metadata construct a Chameleon-compatible image for replicating the experiment on the testbed.&lt;/p>
&lt;h2 id="how-it-all-started">How it all started&lt;/h2>
&lt;p>This journey began amidst our college&amp;rsquo;s cultural fest, in which I was participating, just 15 days before the proposal submission deadline. Many of my friends had been working for months to get selected for GSoC. I didn’t think I could participate this year because I was late, so I thought, &amp;ldquo;Better luck next year.&amp;rdquo; But during the fest, I kept hearing about UC OSPO and that a senior had been selected within a month. So, I was in my room when my friend told me, &amp;ldquo;What&amp;rsquo;s the worst that can happen? Just apply,&amp;rdquo; and so I did. I chose this project and wrote my introduction in Slack without knowing much. After that, it&amp;rsquo;s history. I worked really hard for the next 10 days learning about the project, making the proposal, and got selected.&lt;/p>
&lt;h2 id="first-few-weeks">First few weeks:&lt;/h2>
&lt;p>I started the project a week early from June 24, and it’s been two weeks since. The start was a bit challenging since it required setting up a lot of things on my local machine. For the past few weeks, the majority of my time has been dedicated to learning about COMPSs, RO-Crate, and Chameleon, the three technologies this project revolves around. The interaction with my mentor has also been great. From the weekly report meetings to the daily bombardment of doubts by me, he seems really helpful.
It is my first time working with Chameleon or any cloud computing software, so it can be a bit overwhelming sometimes, but it is getting better with practice.&lt;/p>
&lt;p>Stay tuned for progress in the next blog!!&lt;/p></description></item><item><title>Automatic reproducibility of COMPSs experiments through the integration of RO-Crate in Chameleon</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/bsc/ro-crate-compss/</link><pubDate>Mon, 19 Feb 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/bsc/ro-crate-compss/</guid><description>&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Provenance, reproducibility, standards, image creation&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, JSON, Bash scripting, Linux, image creation and deployment&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/raul-sirvent/">Raül Sirvent&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Project Idea Description&lt;/strong>&lt;/p>
&lt;p>The &lt;a href="https://compss.bsc.es/" target="_blank" rel="noopener">COMPSs programming model&lt;/a> provides an interface for the programming of a
sequential application that is transformed in a workflow that, thanks to the COMPSs runtime, is later
scheduled in the available computing resources. Programming is enabled for different languages through
the use of bindings: Java, C/C++ and Python (named PyCOMPSs).
COMPSs is able to generate &lt;a href="https://compss-doc.readthedocs.io/en/stable/Sections/05_Tools/04_Workflow_Provenance.html" target="_blank" rel="noopener">Workflow Provenance information&lt;/a>
after the execution of an experiment. The generated artifact (code + data + recorded metadata)
enables the sharing of results through the use of tools such as the &lt;a href="https://workflowhub.eu/" target="_blank" rel="noopener">WorkflowHub portal&lt;/a>,
that provides the capacity of generating a DOI of the results to include them as permanent references
in scientific papers.&lt;/p>
&lt;p>The format of the metadata generated in COMPSs experiments follows the &lt;a href="https://www.researchobject.org/ro-crate/" target="_blank" rel="noopener">RO-Crate specification&lt;/a>,
and, more specifically, two &lt;a href="https://www.researchobject.org/ro-crate/profiles.html" target="_blank" rel="noopener">profiles&lt;/a>:
the Workflow and Workflow Run Crate profiles. This metadata enables not only the sharing of results, but also their
reproducibility.&lt;/p>
&lt;p>This project proposes the creation of a service that enables the automatic reproducibility of COMPSs experiments
in the Chameleon infrastructure. The service will be able to get a COMPSs crate (artifact that follows the RO-Crate
specification), and, by parsing the available metadata, build a Chameleon compatible image for reproducing the
experiment in the testbed. Small modifications to the COMPSs RO-Crate are foreseen (i.e. the inclusion of third party
software required by the application).&lt;/p>
&lt;p>&lt;strong>Project Deliverables&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Study the different environments and specifications (COMPSs, RO-Crate, Chameleon, Trovi, &amp;hellip;).&lt;/li>
&lt;li>Design the most appropriate integration, considering all the elements involved.&lt;/li>
&lt;li>Integrate PyCOMPSs basic experiments reproducibility in Chameleon.&lt;/li>
&lt;li>Integrate PyCOMPSs complex experiments reproducibility in Chameleon (i.e. with third party software dependencies).&lt;/li>
&lt;/ul></description></item></channel></rss>