<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Jesse Lima | UCSC OSPO</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jesse-lima/</link><atom:link href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jesse-lima/index.xml" rel="self" type="application/rss+xml"/><description>Jesse Lima</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><image><url>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jesse-lima/avatar_hu806a0d82385bb8c23201bf9c6584a8d4_450987_270x270_fill_q75_lanczos_center.jpg</url><title>Jesse Lima</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jesse-lima/</link></image><item><title>noWorkflow as an experiment management tool - Final Report</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/noworkflow/20230914-jesselima/</link><pubDate>Thu, 14 Sep 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/noworkflow/20230914-jesselima/</guid><description>&lt;p>This post describes our midterm work status and some achievements we
have made so far in our project
&lt;a href="https://docs.google.com/document/d/1YMtPjZXcgt5eplyxIgQE8IBpQIiRlB9eqVSQiIPhXNU/edit?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a>
for
&lt;a href="https://ospo.ucsc.edu/project/osre23/nyu/noworkflow" target="_blank" rel="noopener">noWorkflow&lt;/a>.&lt;/p>
&lt;p>For a more friendly introduction to our work, please, refer to this
&lt;a href="https://github.com/jaglima/noworkflow_usecase/blob/main/README.md" target="_blank" rel="noopener">tutorial
available&lt;/a>.&lt;/p>
&lt;p>Our final code to merge is available in &lt;a href="https://github.com/jaglima/noworkflow/tree/sor_features" target="_blank" rel="noopener">this repository&lt;/a>.&lt;/p>
&lt;h2 id="different-ways-of-managing-experiments">Different ways of managing experiments&lt;/h2>
&lt;p>From our starting point at the midterm, and from our initial aspirations
for the SoR, we kept on track with the goal of adding features to
noWorkflow related to managing DS/ML experimental setups focusing on
reproducibility.&lt;/p>
&lt;p>With the emergence of IA across multiple fields in industry and
academia, the subject of reproducibility has become increasingly
relevant. In [1] we have an
interesting description of the sources of irreproducibility in Machine
Learning. All these sources are present at different stages during the
project's experimental phases and may even persist in production
environments, leading to the accumulation of technical debt
[2]. The problem of
irreproducibility is also discussed in [[3],
[4]], pointing out that the
velocity of deliverances usually comes at the expense of
reproducibility, among other victims.&lt;/p>
&lt;p>The CRISP-DM process as reviewed in
[5] demonstrates that Data
Science experiments follows a typical path of execution. In the same
manner, [[3], [6],
[7]], points out that
Machine Learning pipelines are composed of well-defined layers (or
stages) through its lifecycle. The emergence of IA in real world
applications stressed the almost artisanal ways of creating and managing
analytical experiments and reinforced that there is room to make things
more efficiently.&lt;/p>
&lt;p>In the search for possible approaches to the problem, we came across
several projects that aimed to address these issues. Not surprisingly,
multiple authors pursued the same goal, for instance [[9],
[10]]. In these references,
and confirmed in our survey, we found from targeted solutions to
specific steps in modeling to services aiming for end-to-end AIOps
management. Some are available as software packages, others as SaaS in
cloud environments. In general terms, all of them end up offering
features in different layers of the workflow (i.e. data, feature,
scoring, and evaluation) or with different conceptualizations of
reproducibility/replicability/repeatability as noticed by
[11]. On one hand, this lack of
standards makes any assessment difficult. On the other hand, it suggests
a community in an exploratory process of a hot topic subject.&lt;/p>
&lt;p>Specifically for this project, our focus is in the initial stages of
computational scientific experiments. As studied in [8], in this
phase, experiments are i) implemented by people as prototypes, ii) with
minor focus on pipeline design and iii) in tools like Notebooks, that
mix documentation, visualization and code with no required sequential
structure. These three practices impact reproducibility and efficiency
and are prone to create technical debts. However, tools like noWorkflow
show a huge potential in such scenarios. It is promising because they i)
demands a minimal setup to be functional, ii) works well with almost
nonexistent workflows iii) require minimal additional intrusive code
among the experimental one and iv) integrates well with Notebooks that
are the typical artifact in these experiments.&lt;/p>
&lt;p>According to its core team, the primary goal of noWorkflow is to
&amp;quot;...allow scientists to benefit from provenance data analysis even
when they don't use a workflow system.&amp;quot;. Unlike other tools,
&amp;quot;noWorkflow captures provenance from Python scripts without needing a
version control system or any other environment&amp;quot;. It is particularly
interesting when we are in the scenario described above, where we lack
any structured system at the beginning of experiments. In fact, after
going through the docs, we can verify that noWorkflow provides:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Command-line accessibility&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Seamless integration with Jupyter Notebooks&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Minimal setup requirements in your environment&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Elimination of the need for virtual machines or containers in its
setup&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Workflow-free operation&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Open source license&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Framework-agnostic position&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Finally, in our research, we confirmed that there is an open spot in the
management of scientific experiments that needs to be occupied by
reproducibility. Provenance tools can help the academy and industry
groups in this goal, and in this summer we focused on adding relevant
features to leverage the noWorkflow in this direction.&lt;/p>
&lt;h2 id="different-tools-for-different-needs">Different tools for different needs&lt;/h2>
&lt;p>In our research phase, we didn't find any taxonomy that fully
accommodated our review of different categories of tools providing
reproducibility and experimental management. So, we could describe some
tools in the following categories (freely adapted from this online
references
&lt;a href="https://ml-ops.org/content/mlops-principles" target="_blank" rel="noopener">[here]&lt;/a> and
&lt;a href="https://ambiata.com/blog/2020-12-07-mlops-tools/" target="_blank" rel="noopener">[here]&lt;/a>):&lt;/p>
&lt;p>&lt;strong>Data and Pipeline Versioning&lt;/strong>: Platforms dealing with ingestion,
processing, and exposing of features for model training and inference.
They enable collaboration and discoverability of already existing
Feature Sets throughout the teams and organizations. Provide provenance
and lineage for data in different levels of complexity.&lt;/p>
&lt;p>&lt;strong>Metadata Stores/Experiment Trackers&lt;/strong>: They are specifically built to
store metadata about ML experiments and expose it to stakeholders. They
help with debugging, comparing, and collaborating on experiments. It is
possible to divide them into Experiment Trackers and a Model Registry.
Moreover, there are projects offering reproducibility features like
hyperparameter search, experiment versioning, etc. However, they demand
more robust workflows and are better suited for projects in the
production/monitoring phases.&lt;/p>
&lt;p>&lt;strong>Pipeline frameworks&lt;/strong>: They operate within the realm of production,
similar to Data Engineering workflows. Their usual goal is to allow any
ML/AI products to be served across a wide range of architectures, and
integrate all the low-hanging fruits along the way. For instance,
pipelines adding hyperparameter optimization tasks, experiment tracking
integrations, boilerplate containerized deployment, etc.&lt;/p>
&lt;p>&lt;strong>Deployment and Observability&lt;/strong>: They focus on deploying models for
real-time inference and monitoring model quality once they are deployed
in production. Their aim is to facilitate post-deployment control tasks
such as monitoring feature drifts, conducting A/B testing, facilitating
fast model shifts, and more.&lt;/p>
&lt;p>The most remarkable aspect of this survey is that there are different
tools for different phases in the life cycle of AI products. There are
tools like DVC and Pachyderm that are Metadata Stores, allowing
Experiment Tracking with features of tagging variables, as well as Data
and Pipeline tracking. They are the most similar tools to noWorkflow in
functionality. However, DVC possesses a more complex framework in
dealing with different 'types' of tags, and relies on command line
tools to extract and analyze tagged variables. Also, it depends strongly
on git and replicate the git logics. Pachyderm requires a more
sophisticated setup at the start, relying on containers and a server. It
is an obstacle to small and lean prototypes, requiring installation of a
docker image, and all friction on managing it.&lt;/p>
&lt;p>There are other tools, like MLFlow and Neptune that pose themselves as
Model Experiment Versioning with features of Monitoring and Deployment.
They also have elements of pipeline frameworks, offering full
integration and boiler plates for seamless integration with cloud
platforms.&lt;/p>
&lt;p>Pipelines are a vast field. They are AWS SageMaker, Google Vertex,
DataRobot and Weights &amp;amp; Biases, among others. All of them offer features
helping in all categories, with a strong focus on exploring all
automation that can be offered to the final user, suggesting automatic
parameter tuning, model selection, retraining, data lineage, metadata
storing, etc.&lt;/p>
&lt;p>Finally, Deployment and Observability frameworks are in the deployment
realm, which is another stage far removed from prototypical phases of
experiments. They come into the scene when all experimental and
inferential processes are done, and there is an AI artifact that needs
to be deployed and monitored. Such tools like Seldon, H2O, Datarobot do
this job, again, with some features of Hyperparameter tuning, pipeline
frameworks, data and pipeline tracking.&lt;/p>
&lt;p>In light of this, when considering management and operation of
experiments, we have a reduced sample of alternatives. Among them,
Notebook integration/management are rare. Some of them rely on other
tools like Git or enforces an overhead in the coding/setup with reserved
keywords, tags and managerial workflows that hinder the process.&lt;/p>
&lt;p>At first sight, our &amp;quot;informal&amp;quot; taxonomy positions noWorkflow as a
Data/Pipeline Versioning and Metadata Store/Experiment Tracker. It is
not a Pipeline Framework which works like a building block, facilitating
the integration of artifacts at production stages. It is not a
Deployment and Observability framework, because they are in the
post-deployment realm, which is another stage far removed from
prototypical phases of experiments.&lt;/p>
&lt;h2 id="desiderata">Desiderata&lt;/h2>
&lt;p>As mentioned earlier, a typical workflow in DS/ML projects is well
described by the CRISP-DM [5]
and precede phases of deployment and production in the whole lifecycle
of DS/ML projects.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./images/media/image1.png" alt="" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>Fig 1: CRISP-DM example of trajectory through a data science project&lt;/p>
&lt;p>Briefly speaking, a workflow starts when a user creates a Jupyter
Notebook and starts writing code. Usually, he/she imports or selects
data from a source, explore features which are expected to have the
highest inference potential, tunes some parameters to set up its
training, trains and evaluates the predictive power of the model through
different metrics. At this final step, we have delineated a trial. This
trial result can suggest further improvements and new hypotheses about
data, features, model types and hyperparameters. Then, we have a new
experiment in mind that will result in a new trial.&lt;/p>
&lt;p>When this process repeats multiple times, a researcher may end with
different notebooks storing, each one, a different experiment. Each
notebook has multiple hyperparameters, modeling choices and modeling
hypotheses. Otherwise, the experimenter may have a unique notebook where
different experiments were executed, in a nonlinear order between the
cells. This former case is pointed out in
[8], where Notebook flexibility
makes it difficult to understand which execution order resulted in a
specific output.&lt;/p>
&lt;p>In a dream space, any researcher/team would have benefited at most if
they could&lt;/p>
&lt;p>a) in a running Notebook, being able to retrieve all the operations
that contributed to the result of a variable of interest. In this
case, modifications applied in the inputs or in the order of
operations would be easily detectable. In the same way, any
nonlinear execution that interferes in a control result.&lt;/p>
&lt;p>b) Compare trials after different experiments. After experimenting with
different hypotheses about hyperparameters, features or operation
order, the user should easily compare the history of two trials and
spot differences.&lt;/p>
&lt;p>c) Retrieve a target variable among different trials that were executed
in the context of an experiment. After proceeding with multiple
experimental trials, users should be able to compare the results
that are stored in different Notebooks (or even not).&lt;/p>
&lt;p>d) Be as much &amp;quot;no workflow&amp;quot; as possible. All the former requisites
should be possible with minimal code intervention, tags, reserved
words or any active coding effort.&lt;/p>
&lt;p>With these goals in mind, we worked on our deliverables and used the
experiment carried out by [12]
as a guideline to validate the new noWorkflow features.&lt;/p>
&lt;h2 id="deliverables">Deliverables&lt;/h2>
&lt;p>In this session, we will describe what we have implemented during this
summer.&lt;/p>
&lt;p>We started on tagging cells and variables and then navigating through
its pre-dependencies, or all other variables and function calls that
contributed to its final value. This was a fundamental step that allowed
us to evolve to create features that are really useful in day-to-day
practice.&lt;/p>
&lt;p>From the features of tagging a cell and tagging a variable, we evolved
to the following features (an interactive notebook is available here):&lt;/p>
&lt;ul>
&lt;li>&lt;em>backwards_deps('var_name', glanularity_level)&lt;/em> : returns a
dictionary storing operations/functions calls and their associated
values that contributed to the final value of the tagged variable.
Glanularity_level allows to set if the internal operations of the
functions must be included or not.&lt;/li>
&lt;/ul>
&lt;blockquote>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./images/media/image5.png" alt="backwards_deps" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;/blockquote>
&lt;ul>
&lt;li>
&lt;p>&lt;em>global_backwards_deps&lt;/em>('var_name', glanularity_level) : does the
same as backwards_deps, but from all different tagging and
re-tagging events in the notebook. It allows to retrieval of the
complete operation of a tagged variable across all executed cells in
the notebook&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;em>store_operations(trial_id, dictionary_ops)&lt;/em> : save the current
trial in order to make further comparisons with other experiments.
The dictionaries aren't stored in the &lt;em>.noworkflow/db.sqlite&lt;/em>, but
in a shelve object named *ops.db* in the current notebook local
folder.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;em>resume_trials()&lt;/em> : to support the management of experiments, the
user can see the trial_ids of all experiments stored in the ops.db
available for comparison/analysis.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;em>trial_intersection_diff(trial_id1, trial_id2)&lt;/em> : all mutual
variables/funcion_calls between two experiments have its scalar
values compared&lt;/p>
&lt;/li>
&lt;/ul>
&lt;blockquote>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./images/media/image2.png" alt="trial_intersection_diff" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;/blockquote>
&lt;ul>
&lt;li>&lt;em>trial_diff(trial_id1, trial_id2)&lt;/em> : The values of variables and
function calls are exhibited in a diff file format, emphasizing the
operations' order. The goal here is to show that between the two
experiments, the order of operations was different. Again, only
scalar values are exhibited. More complex data structures (matrices,
vectors, tensors, etc.) are only signaled as &lt;em>'complex_type'&lt;/em>&lt;/li>
&lt;/ul>
&lt;blockquote>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./images/media/image3.png" alt="" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;/blockquote>
&lt;ul>
&lt;li>&lt;em>var_tag_plot('var_name')&lt;/em> : Chart the evolution of a given
variable across multiple trials in the database. In this case, all
experiments stored in ops.db and tagged as *target_var* have their
values plotted&lt;/li>
&lt;/ul>
&lt;blockquote>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./images/media/image4.png" alt="" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;/blockquote>
&lt;ul>
&lt;li>&lt;em>var_tag_values('var_name') :&lt;/em> Provides access to pandas.dataframe
var_name entries with correspondent values across different trials.&lt;/li>
&lt;/ul>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./images/media/image6.png" alt="" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="challenges">Challenges&lt;/h2>
&lt;p>As expected, we had unexpected findings along the project. Bellow, we
delve into the most significant challenges we had to face:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Jupyter notebooks allow a nonlinear execution of small parts of code
through cells. More than once, we had to align about how to create
functionalities to attend different scenarios that were unexpected.
One example was the backwards_deps() and global_backwards_deps()
functions. The latter function was born to cover the case where the
user wants all dependencies rather than the local cell dependencies.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Despite the high quality of the current version of the package, the
project needs documentation, which slows down the analysis of any
new development. In this project, the aid of mentors was crucial at
some points where a deeper knowledge was needed.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>What is the vocation of noWorkflow? At some points in the project,
we had to discuss forcing some kind of workflow over the user. And
it would go against the philosophy of the project.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>When working on comparing results, especially in DS/ML fields,
complex types arise. Numerical vectors, matrices, and tensors from
NumPy and other frameworks, as well as data frames, can't be
properly manipulated based on our current approach.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The dilemma of focusing on graphic visual features versus more
sophisticated APIs. More than once, we needed to choose between
making a visual add-on to Jupyter or implementing a more complete
API.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The current version of Jupyter support in noWorkflow doesn&amp;rsquo;t
integrate well with Jupyter Lab. Also, even the IPython version has
new versions, and noWorkflow needs to adapt to a new version.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="future-improvements">Future Improvements&lt;/h2>
&lt;p>Given our current achievements and the insights gained along the
project, we would highlight the following points as crucial future
roadmap improvements:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Add a complex type treatment for comparisons. Today, visualizing and
navigating through matrices, data frames, tensors, isn't possible
with noWorkflow, although the user can do by its own means.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Integrate the dictionaries storing sequences of operations from
shelve objects to a more efficient way of storage and retrieval.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Make it easier for users to manage (store, retrieve, and navigate)
through different trials.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Add graphical management instead of relying upon API calls only.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Evolve the feature of tagging cells.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>When tagging a model, save its binary representation to be recovered
in the future.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Adding the capability of tracking the local dataset reading.
Currently, it is possible to track changes in the name/path of the
dataset. However, any modification in the integrity of a dataset is
not traceable.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="what-ive-learned">What I've learned&lt;/h2>
&lt;p>This was a great summer with two personal discoveries. The first one was
my first formal contact with the Reproducibility subject. The second was
to fully contribute with an Open Source project. In the research phase,
I could get in touch with the state-of-the-art of reproducibility
research and some of it nuances. In the Open Source contributing
experience, I could be mentored by the core team of the noWorkflow and
exercise all the skills required in doing high level software product.&lt;/p>
&lt;h2 id="acknowledgments">Acknowledgments&lt;/h2>
&lt;p>I would like to thank the organization of Summer of Reproducibility for
aiding this wonderful opportunity for interested people to engage with
Open Source software. Also, thanks to the core team of noWorkflow for
supporting me in doing this work.&lt;/p>
&lt;h2 id="bibliography">Bibliography&lt;/h2>
&lt;p>[1] [O. E. Gundersen, K. Coakley, C. Kirkpatrick, and Y. Gil, &amp;ldquo;Sources
of irreproducibility in machine learning: A review,&amp;rdquo; &lt;em>arXiv preprint
arXiv:2204. 07610&lt;/em>.]&lt;/p>
&lt;p>[2] [D. Sculley &lt;em>et al.&lt;/em>, &amp;ldquo;Machine Learning: The High Interest Credit
Card of Technical Debt,&amp;rdquo; in &lt;em>SE4ML: Software Engineering for Machine
Learning (NIPS 2014 Workshop)&lt;/em>,
2014.]&lt;/p>
&lt;p>[3] [P. Sugimura and F. Hartl, &amp;ldquo;Building a reproducible machine
learning pipeline,&amp;rdquo; &lt;em>arXiv preprint arXiv:1810. 04570&lt;/em>,
2018.]&lt;/p>
&lt;p>[4] [D. Sculley &lt;em>et al.&lt;/em>, &amp;ldquo;Hidden technical debt in machine learning
systems,&amp;rdquo; &lt;em>Adv. Neural Inf. Process. Syst.&lt;/em>, vol. 28,
2015.]&lt;/p>
&lt;p>[5] [F. Martínez-Plumed &lt;em>et al.&lt;/em>, &amp;ldquo;CRISP-DM twenty years later: From
data mining processes to data science trajectories,&amp;rdquo; &lt;em>IEEE Trans. Knowl.
Data Eng.&lt;/em>, vol. 33, no. 8, pp. 3048&amp;ndash;3061,
2019.]&lt;/p>
&lt;p>[6] [N. A. Lynnerup, L. Nolling, R. Hasle, and J. Hallam, &amp;ldquo;A Survey on
Reproducibility by Evaluating Deep Reinforcement Learning Algorithms on
Real-World Robots,&amp;rdquo; in &lt;em>Proceedings of the Conference on Robot
Learning&lt;/em>, L. P. Kaelbling, D. Kragic, and K. Sugiura, Eds., in
Proceedings of Machine Learning Research, vol. 100. PMLR, 30 Oct--01
Nov 2020, pp. 466&amp;ndash;489.]&lt;/p>
&lt;p>[7] [A. Masood, A. Hashmi, A. Masood, and A. Hashmi, &amp;ldquo;AIOps:
predictive analytics &amp;amp; machine learning in operations,&amp;rdquo; &lt;em>Cognitive
Computing Recipes: Artificial Intelligence Solutions Using Microsoft
Cognitive Services and TensorFlow&lt;/em>, pp. 359&amp;ndash;382,
2019.]&lt;/p>
&lt;p>[8] [J. F. Pimentel, L. Murta, V. Braganholo, and J. Freire,
&amp;ldquo;Understanding and improving the quality and reproducibility of Jupyter
notebooks,&amp;rdquo; &lt;em>Empirical Software Engineering&lt;/em>, vol. 26, no. 4, p. 65,
2021.]&lt;/p>
&lt;p>[9] [D. Kreuzberger, N. Kühl, and S. Hirschl, &amp;ldquo;Machine Learning
Operations (MLOps): Overview, Definition, and Architecture,&amp;rdquo; &lt;em>IEEE
Access&lt;/em>, vol. 11, pp. 31866&amp;ndash;31879,
2023.]&lt;/p>
&lt;p>[10] [N. Hewage and D. Meedeniya, &amp;ldquo;Machine learning operations: A
survey on MLOps tool support,&amp;rdquo; &lt;em>arXiv preprint arXiv:2202. 10169&lt;/em>,
2022.]&lt;/p>
&lt;p>[11] [H. E. Plesser, &amp;ldquo;Reproducibility vs. replicability: a brief
history of a confused terminology,&amp;rdquo; &lt;em>Front. Neuroinform.&lt;/em>, vol. 11, p.
76, 2018.]&lt;/p>
&lt;p>[12] [Z. Salekshahrezaee, J. L. Leevy, and T. M. Khoshgoftaar, &amp;ldquo;The
effect of feature extraction and data sampling on credit card fraud
detection,&amp;rdquo; &lt;em>Journal of Big Data&lt;/em>, vol. 10, no. 1, pp. 1&amp;ndash;17,
2023.]&lt;/p></description></item><item><title>[Mid-term] Capturing provenance into Data Science/Machine Learning workflows</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/noworkflow/20230731-jesselima/</link><pubDate>Mon, 31 Jul 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/noworkflow/20230731-jesselima/</guid><description>&lt;p>This post describes our midterm work status and some achievements we have done so far in &lt;a href="https://docs.google.com/document/d/1YMtPjZXcgt5eplyxIgQE8IBpQIiRlB9eqVSQiIPhXNU/edit#heading=h.nnxl1g16trg0" target="_blank" rel="noopener">the project&lt;/a> for the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/nyu/noworkflow/">noWorkflow&lt;/a> package.&lt;/p>
&lt;h4 id="the-initial-weeks">The initial weeks&lt;/h4>
&lt;p>I started doing a bibliographical review on reproducibility in the Data Science (DS) and Machine Learning (ML) realms. It was a new subject to me, and I aimed to build a more robust theoretical background in the field. Meanwhile, I took notes in &lt;a href="https://jaglima.github.io/" target="_blank" rel="noopener">this series of posts&lt;/a>.&lt;/p>
&lt;p>Then, as planned, I integrated with the current noWorkflow supporters in order get a broader view of the project and their contributions. Additionally, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/juliana-freire/">Juliana Freire&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/joao-felipe-pimentel/">João Felipe Pimentel&lt;/a>, and I set up a weekly one-hour schedule to keep track of my activities.&lt;/p>
&lt;h3 id="brainstormed-opportunities">Brainstormed opportunities&lt;/h3>
&lt;p>At the beginning of June, we also met with other project supporters to brainstorm about our initial proposal. From this meeting, we came up with a plan on how technically approach a noWorkflow new feature in Data Science and Machine Learning experimental management.&lt;/p>
&lt;p>In this brainstorm, we aligned that &lt;em>Jupyter Notebooks are, by far, the most frequent set up in DS/ML computational experiments. They established themselves as the fundamental artifact by embedding code, text and enabling execution and visualization. Entire experiments are created and kept in Jupyter notebooks until they are sent to production. And the opportunity at hand is to integrate noWorkflow with Jupyter Notebooks&lt;/em>.
Then, our mid-term goal was adapted from the original plan of only selecting and executing a prototypical ML experiment. We added the goal of paving the way for providing a tagging feature for Notebook cells.&lt;/p>
&lt;p>More specifically, DS/ML experimental workflows usually have well-defined stages composed of &lt;em>data reading&lt;/em>, &lt;em>feature engineering&lt;/em>, &lt;em>model scoring&lt;/em>, and &lt;em>metrics evaluation&lt;/em>. In our dream space, the user would tag a cell in their experiment, enabling the capture of the tagged metadata into a database. This step integrates the ultimate goal of facilitating comparisons, management, and even causal inference across different trials of a DS/ML experiment.&lt;/p>
&lt;h3 id="current-deliverables">Current deliverables&lt;/h3>
&lt;p>So, based on our plans, we create a separate table to store the metadata from cell tagging. This table stores the cell hash codes and information to match the code executed within a cell. As a result, we can store tags and the activation ids of the cells enabling us to identify a cell containing a given stage in a DS/ML experiment.&lt;/p>
&lt;p>The second feature implemented was tagging a specific variable. In the same way for a cell, now it is possible to stamp a given variable with a tag, keeping its name, id, and received value in this separated table.&lt;/p>
&lt;p>Finally, we worked on displaying the dependencies of a given variable. In this case, by tagging a given variable, we can display the other variables, values, and cells activated in its construction. Then, we can visualize the dependencies that contributed to its final value.&lt;/p>
&lt;p>For an overview of current developments, please refer to my &lt;a href="https://github.com/jaglima/noworkflow/tree/stage_tagging" target="_blank" rel="noopener">fork of the main project&lt;/a>.&lt;/p>
&lt;h3 id="challenges">Challenges&lt;/h3>
&lt;p>During this period, we had to make choices along the way. For instance, capturing the provenance of cells through tags is a different solution than tagging code chunks in scripts. In this case, we decided to stick with tagging Notebook cells at this moment. We also opted to start storing the metadata to enable comparisons between trials rather than focus on a sophisticated graphic and user-friendly cell tagging system. We also opted to keep this metadata info stored in a separate table in the database.&lt;/p>
&lt;h3 id="next-steps">Next steps&lt;/h3>
&lt;p>In the second half of the summer, our goal is to integrate these features in order to proceed with comparisons among experiments. Such comparisons would use the tagged variables as the hyperparameters of DS/ML experiments or key variables to assess the experiments, such as errors or scores. As a result, we will be able to compare the results of two trials in a more accurate, and easily reproducible experiment.&lt;/p></description></item><item><title>Verify the reproducibility of an experiment</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/noworkflow/20230524-jesselima/</link><pubDate>Wed, 24 May 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/nyu/noworkflow/20230524-jesselima/</guid><description>&lt;p>Hello everyone,
my name is Jesse and I&amp;rsquo;m proud to be a fellow in this 2023 Summer of Reproducibility program, contributing to &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/nyu/noworkflow">noWorkflow&lt;/a> project.&lt;/p>
&lt;p>My &lt;a href="https://docs.google.com/document/d/1YMtPjZXcgt5eplyxIgQE8IBpQIiRlB9eqVSQiIPhXNU/edit?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> was accepted under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/joao-felipe-pimentel/">João Felipe Pimentel&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/juliana-freire/">Juliana Freire&lt;/a> and aims to
work mapping and testing the capture of the provenance in typical Data Science and Machine Learning experiments.&lt;/p>
&lt;h4 id="what">What&amp;hellip;&lt;/h4>
&lt;p>Although much can be said about what reproducibility means, the ability to replicate results in day-to-day Data Science and Machine Learning experiments can pose a significant challenge for individuals, companies and researche centers. This challenge becomes even more pronounced with the emergence of analytics and IA, where scientific methodologies are extensively applied on an industrial scale. Then reproducibility assumes a key role in productivity and accountability expected from Data Scientists, Machine Learning Engineers, and other roles engaged in ML/AI projects.&lt;/p>
&lt;h4 id="how">How&amp;hellip;&lt;/h4>
&lt;p>In the day-to-day, the pitfalls of non-reproducibility appear at different points of the experiment lifecycle. These challenges arise when multiple experiments need to be managed for an individual or a team of scientists. In a typical experiment workflow, reproducibility appears in different steps of the process:&lt;/p>
&lt;ul>
&lt;li>The need to track the provenance of datasets.&lt;/li>
&lt;li>The need to manage changes in hypothesis tests.&lt;/li>
&lt;li>Addressing the management of system hardware and OS setups.&lt;/li>
&lt;li>Dealing with outputs from multiple experiments, including the results of various model trials.&lt;/li>
&lt;/ul>
&lt;p>In academic environments, these issues can result in mistakes and inaccuracies. In companies, they can lead to inefficiencies and technical debts that are difficult to address in the future.&lt;/p>
&lt;h4 id="finally">Finally&amp;hellip;&lt;/h4>
&lt;p>I believe this is a great opportunity to explore the emergence of these two hot topics that are IA and reproducilibity! I will share more updateds here throughout this summer and hope we can learn a lot together!&lt;/p></description></item></channel></rss>