<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>OS | UCSC OSPO</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/tag/os/</link><atom:link href="https://deploy-preview-1007--ucsc-ospo.netlify.app/tag/os/index.xml" rel="self" type="application/rss+xml"/><description>OS</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><lastBuildDate>Wed, 05 Nov 2025 00:00:00 +0000</lastBuildDate><image><url>https://deploy-preview-1007--ucsc-ospo.netlify.app/media/logo_hub6795c39d7c5d58c9535d13299c9651f_74810_300x300_fit_lanczos_3.png</url><title>OS</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/tag/os/</link></image><item><title>Final Report for Smart Environments</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/smart_environments/20251105-sam_huang/</link><pubDate>Wed, 05 Nov 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/smart_environments/20251105-sam_huang/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>The process of creating the necessary software environment for code to run is a significant challenge in software development. Given a piece of open-source software intended for research, setting up the environmental dependencies to run the software could take significant manual effort. Existing automation methods struggle due to the complexity of managing diverse languages, dependencies, and hardware. In Smart Environments, I have created ENVAGENT, a general multi-agent framework designed to automate the construction of executable environments for reproducing research prototypes from top-tier conferences and journals. While reproducibility has become a growing concern in the research community, the process of setting up environments remains time-consuming, error-prone, and often poorly documented.&lt;/p>
&lt;p>To assess this capability, a new benchmark, ENVBENCH, was created, containing 54 popular projects across seven languages. Results show ENVAGENT dramatically improves environment construction compared to current agents (+16.2%). Furthermore, the system shows initial promise in dynamically adjusting cloud-based hardware resources based on the code’s needs.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="EnvGym Cover" srcset="
/report/osre25/uchicago/smart_environments/20251105-sam_huang/cover_hue02fdf353b4e99cf1af213026c4f6804_1815797_30e3b2194be140fa608780847e6c7fa1.webp 400w,
/report/osre25/uchicago/smart_environments/20251105-sam_huang/cover_hue02fdf353b4e99cf1af213026c4f6804_1815797_d39b2369b5df80ffa715197c993f0681.webp 760w,
/report/osre25/uchicago/smart_environments/20251105-sam_huang/cover_hue02fdf353b4e99cf1af213026c4f6804_1815797_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/smart_environments/20251105-sam_huang/cover_hue02fdf353b4e99cf1af213026c4f6804_1815797_30e3b2194be140fa608780847e6c7fa1.webp"
width="760"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="method">Method&lt;/h2>
&lt;h3 id="envagent">EnvAgent&lt;/h3>
&lt;p>The EnvAgent I created during my time at OSRE utilizes a multi-agent workflow to automatically build software execution environments. The process is structured into three phases: preparation, construction, and refinement.&lt;/p>
&lt;p>Phase 1 (Preparation): Specialized agents collect information about the software repository – its structure, relevant files, and the host system’s hardware specifications (CPU, memory, etc.). This data is then used by a planning agent to generate a detailed, step-by-step instruction set for creating a functional Dockerfile.&lt;/p>
&lt;p>Phase 2 (Construction): Two agents work in tandem: one generates or modifies the Dockerfile based on the plan, while the other executes the Dockerfile within an isolated container, capturing any errors.&lt;/p>
&lt;p>Phase 3 (Refinement): A final agent analyzes the container execution data, identifying areas for improvement in the Dockerfile. This process repeats until a stable, executable environment is achieved.&lt;/p>
&lt;p>To improve efficiency, EnvAgent incorporates rule-based tools for predictable tasks like directory setup and log management, reducing the need for complex agent reasoning. This combination of intelligent agents and automated routines (&amp;ldquo;scaffolding&amp;rdquo;) ensures a robust and adaptive system.&lt;/p>
&lt;h3 id="enveval-benchmark">EnvEval Benchmark&lt;/h3>
&lt;p>In addition to the agent, one significant contribution is the manual curation of a benchmark that measures the quality of generated environments. EnvEval is a benchmark specifically designed to assess environment setup qualities across 54 carefully curated open-source repositories. They are chosen from both Chameleon reproducible artifacts and Multi-SWE-bench dataset. EnvEval contains json rubrics that can be used to automatically determine the quality of constructed environments.&lt;/p>
&lt;p>Each rubric is divided into three parts, corresponding to three major objectives that a successfully constructed environment should have:&lt;/p>
&lt;ol>
&lt;li>Structure: Checks for basic directory structure, file presence, and environment variables.&lt;/li>
&lt;li>Configuration: Asks the question &amp;ldquo;Is this configured?&amp;rdquo;, checks for whether dependencies have been correctly configured.&lt;/li>
&lt;li>Functionality: Asks the question &amp;ldquo;Is this usable?&amp;rdquo;, runs actual tests to see if the functionalities are present.&lt;/li>
&lt;/ol>
&lt;p>There are many tests in each category, and their weights are adjusted based on their importance.&lt;/p>
&lt;h2 id="evaluation">Evaluation&lt;/h2>
&lt;p>Baseline Systems:&lt;/p>
&lt;p>The study compared EnvAgent to two established automated code generation systems: one utilizing Anthropic’s advanced reasoning models and the other employing OpenAI’s code-focused models. These systems were chosen for their strong performance in creating software code and their prevalence in automated engineering processes. Both baselines were given full access to the target software repositories and complete details about the host system’s hardware.&lt;/p>
&lt;p>Evaluation Metrics:&lt;/p>
&lt;p>The performance of EnvAgent was assessed using three key metrics. These included the ability to create working environments, the quality of those environments, and a single combined score. Results showed EnvAgent significantly outperformed the baselines, achieving a 33.91% improvement in the final overall score – reaching 74.01, which was higher than the best baseline score of 30.10. This suggests EnvAgent produced both more functional environments and ensured greater accuracy through extensive testing.&lt;/p>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>The process of creating the necessary software environments for code agents is a major hurdle in scaling up research and development. Currently, this task relies heavily on manual labor. To address this, a new system, ENVAGENT, was created to automatically build these environments using intelligent agents and by understanding dependencies. A new benchmark, ENVBENCH, was also developed to assess this system’s effectiveness. Preliminary results demonstrate a significant improvement – ENVAGENT achieved a 33.91% increase in success rates compared to existing automated agents, representing a substantial step towards more efficient and reproducible research.&lt;/p>
&lt;h1 id="thank-you">Thank you!&lt;/h1>
&lt;p>Autofill&lt;/p>
&lt;p>;
20251105-Sam_Huang&lt;/p></description></item><item><title>Midterm for Smart Environments</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/smart_environments/20250724-sam_huang/</link><pubDate>Thu, 24 Jul 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/smart_environments/20250724-sam_huang/</guid><description>&lt;h2 id="what-is-envgym">What is EnvGym?&lt;/h2>
&lt;p>EnvGym is a general multi-agent framework designed to automate the construction of executable environments for reproducing research prototypes from top-tier conferences and journals. While reproducibility has become a growing concern in the research community, the process of setting up environments remains time-consuming, error-prone, and often poorly documented.&lt;/p>
&lt;p>EnvGym addresses this gap by leveraging LLM-powered agents to analyze project instructions, resolve dependencies, configure execution environments, and validate results—thereby reducing human overhead and improving reproducibility at scale.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="EnvGym Cover" srcset="
/report/osre25/uchicago/smart_environments/20250724-sam_huang/cover_hue02fdf353b4e99cf1af213026c4f6804_1815797_30e3b2194be140fa608780847e6c7fa1.webp 400w,
/report/osre25/uchicago/smart_environments/20250724-sam_huang/cover_hue02fdf353b4e99cf1af213026c4f6804_1815797_d39b2369b5df80ffa715197c993f0681.webp 760w,
/report/osre25/uchicago/smart_environments/20250724-sam_huang/cover_hue02fdf353b4e99cf1af213026c4f6804_1815797_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/smart_environments/20250724-sam_huang/cover_hue02fdf353b4e99cf1af213026c4f6804_1815797_30e3b2194be140fa608780847e6c7fa1.webp"
width="760"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="progress">Progress&lt;/h2>
&lt;h3 id="new-tools">New Tools&lt;/h3>
&lt;p>Initially, our agent had access to only one tool: the command line. This constrained the agent’s ability to decompose complex tasks and respond flexibly to failures. Over the last few weeks, we introduced a modular tool system, enabling the agent to handle specific subtasks more effectively.&lt;/p>
&lt;p>The new toolset includes:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>dockerrun: Executes Dockerfiles.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>hardware_checking, hardware_adjustment: Tailor builds to available resources.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>history_manager, stats: Tracks historical data for improvement and reproducibility.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>planning: Generates high-level execution plans.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>summarize: Interprets build results to adjust subsequent iterations.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>writing_docker_initial, writing_docker_revision: Generate and refine Dockerfiles.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>While some of those tools, such as dockerrun, run programmatic scripts, other scripts such as planning are more complex and use LLMs themselves.&lt;/p>
&lt;h3 id="agent-re-architecture-moving-beyond-codex">Agent Re-Architecture: Moving Beyond Codex&lt;/h3>
&lt;p>We transitioned away from OpenAI&amp;rsquo;s Codex agent implementation. While powerful, Codex&amp;rsquo;s framework was overly reliant on its CLI frontend, which added unnecessary complexity and limited customizability for our research context.&lt;/p>
&lt;p>We implemented our own lightweight, customizable agent pipeline that integrates LLM-based planning with iterative execution. Conceptually, the agent executes the following loop:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Repo Scanning&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Hardware Check&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Planning &amp;amp; Initial Dockerfile Generation&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Docker Execution&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Progress Summarization &amp;amp; Adjustment&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Iterative Dockerfile Refinement (up to 20 rounds)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Success Check &amp;amp; Logging&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>This new agent design is easier to control, extend, and debug—aligning better with the needs of reproducibility research.&lt;/p>
&lt;h3 id="prompt-engineering">Prompt Engineering&lt;/h3>
&lt;p>For each tool that requires LLMs to function, we created a set of custom prompts that outline the task and breaks down the goals. For instance, the prompt used in summarize differs from the one in planning, allowing us to optimize the behavior of LLM agents per context.&lt;/p>
&lt;h3 id="performance-gains">Performance Gains&lt;/h3>
&lt;p>With these improvements, EnvGym now successfully replicates 9 repositories, surpassing our baseline Codex agent which struggled with the same set. We’ve observed more reliable planning, better handling of edge-case dependencies, and faster convergence in iterative Dockerfile revisions.&lt;/p>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;h3 id="granular-evaluation-metric">Granular Evaluation Metric&lt;/h3>
&lt;p>We plan to adopt a tree-structured rubric-based evaluation, inspired by PaperBench. Instead of binary success/failure, each repo will be assigned a reproducibility score from 0–100.&lt;/p>
&lt;p>Key tasks include:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Rubric Design: Define a hierarchical rubric with criteria like dependency resolution, test success rate, runtime match, etc.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Manual Annotation: Build a dataset of ground-truth rubrics for a subset of repos to calibrate our automatic judge.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Judge Implementation: Develop an LLM-based judge function that takes (i) rubric and (ii) environment state, and returns a reproducibility score.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Example of a rubric tree" srcset="
/report/osre25/uchicago/smart_environments/20250724-sam_huang/rubric-tree_hu9020427fa0020bc8ab99a7f01a351cd0_70521_ae181d659b85544bd98fa2bbdbe0c09d.webp 400w,
/report/osre25/uchicago/smart_environments/20250724-sam_huang/rubric-tree_hu9020427fa0020bc8ab99a7f01a351cd0_70521_700416bce638eba7acc49573f12b11b0.webp 760w,
/report/osre25/uchicago/smart_environments/20250724-sam_huang/rubric-tree_hu9020427fa0020bc8ab99a7f01a351cd0_70521_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/smart_environments/20250724-sam_huang/rubric-tree_hu9020427fa0020bc8ab99a7f01a351cd0_70521_ae181d659b85544bd98fa2bbdbe0c09d.webp"
width="557"
height="497"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
Source: Starace, Giulio, et al. &amp;ldquo;PaperBench: Evaluating AI&amp;rsquo;s Ability to Replicate AI Research.&amp;rdquo; arXiv preprint arXiv:2504.01848 (2025).&lt;/p>
&lt;p>This will make EnvGym suitable for benchmarking. We will run our new method and obtain a score to compare with baseline methods!&lt;/p>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>EnvGym has made strong progress toward automating reproducibility in computational research. Through modularization, agentic design, and prompt optimizations, we’ve surpassed existing baselines and laid the groundwork for even more improvement.&lt;/p>
&lt;p>The upcoming focus on metrics and benchmarking will elevate EnvGym from a functional prototype to a standardized reproducibility benchmark tool and also quantitatively prove that our new agentic method is better than existing tools such as Codex. Excited for what&amp;rsquo;s to come!&lt;/p>
&lt;p>Autofill&lt;/p>
&lt;p>;
20250724-Sam_Huang&lt;/p></description></item><item><title>EnvGym – An AI System for Reproducible Custom Computing Environments</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/envgym/</link><pubDate>Mon, 16 Jun 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/envgym/</guid><description>&lt;p>Hello, My name is Yiming Cheng. I am a Pre-doc researcher in Computer Science at University of Chicago. I&amp;rsquo;m excited to be working with the Summer of Reproducibility and the Chameleon Cloud community as a project leader. My project is &lt;a href="https://github.com/eaminc/envgym" target="_blank" rel="noopener">EnvGym&lt;/a> that focuses on developing an AI-driven system to automatically generate and configure reproducible computing environments based on natural language descriptions from artifact descriptions, Trovi artifacts, and research papers.&lt;/p>
&lt;p>The complexity of environment setup often hinders reproducibility in scientific computing. My project aims to bridge the knowledge gap between experiment authors and reviewers by translating natural language requirements into actionable, reproducible configurations using AI and NLP techniques.&lt;/p>
&lt;h3 id="project-overview">Project Overview&lt;/h3>
&lt;p>EnvGym addresses fundamental reproducibility barriers by:&lt;/p>
&lt;ul>
&lt;li>Using AI to translate natural language environment requirements into actionable configurations&lt;/li>
&lt;li>Automatically generating machine images deployable on bare metal and VM instances&lt;/li>
&lt;li>Bridging the knowledge gap between experiment authors and reviewers&lt;/li>
&lt;li>Standardizing environment creation across different hardware platforms&lt;/li>
&lt;/ul>
&lt;h3 id="june-10--june-16-2025">June 10 – June 16, 2025&lt;/h3>
&lt;p>Getting started with the project setup and initial development:&lt;/p>
&lt;ul>
&lt;li>I began designing the NLP pipeline architecture to parse plain-English descriptions (e.g., &amp;ldquo;I need Python 3.9, CUDA 11, and scikit-learn&amp;rdquo;) into structured environment &amp;ldquo;recipes&amp;rdquo;&lt;/li>
&lt;li>I set up the initial project repository and development environment&lt;/li>
&lt;li>I met with my mentor Prof. Kexin Pei to discuss the project roadmap and technical approach&lt;/li>
&lt;li>I started researching existing artifact descriptions from conferences and Trovi to understand common patterns in environment requirements&lt;/li>
&lt;li>I began prototyping the backend environment builder logic that will convert parsed requirements into machine-image definitions&lt;/li>
&lt;li>I explored Chameleon&amp;rsquo;s APIs for provisioning servers and automated configuration&lt;/li>
&lt;/ul>
&lt;h3 id="next-steps">Next Steps&lt;/h3>
&lt;ul>
&lt;li>Continue developing the NLP component for requirement parsing&lt;/li>
&lt;li>Implement the core backend logic for environment generation&lt;/li>
&lt;li>Begin integration with Chameleon Cloud APIs&lt;/li>
&lt;li>Start building the user interface for environment specification&lt;/li>
&lt;/ul>
&lt;p>This is an exciting and challenging project that combines my interests in AI systems and reproducible research. I&amp;rsquo;m looking forward to building a system that will help researchers focus on their science rather than struggling with environment setup issues.&lt;/p>
&lt;p>Thanks for reading, I will keep you updated as I make progress on EnvGym!&lt;/p></description></item><item><title>Smart Environments – An AI System for Reproducible Custom Computing Environments</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/smart_environments/20250616-sam_huang/</link><pubDate>Mon, 16 Jun 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uchicago/smart_environments/20250616-sam_huang/</guid><description>&lt;p>Hi everyone, I&amp;rsquo;m Sam! I&amp;rsquo;m excited to be working with the Argonne National Laboratory and SoR this summer on Smart Environments. Have you ever encountered a great opensource project and wanted to run it or use it locally, only to find that it&amp;rsquo;s such a headache to set up all the dependencies? Maybe your system version wasn&amp;rsquo;t correct, or a piece of software was outdated, or the dependencies were incompatible with something you had already on your machine?&lt;/p>
&lt;p>In comes EnvGym to save the day! We want EnvGym to be an agent that would help reproduce opensource projects by automatically setting up the environmental dependencies required to get them running. That&amp;rsquo;s what I will be working on for the rest of the summer! To make EnvGym work, we will be leveraging LLM agents to tackle the problem. We will use EnvGym to read documentations, understand code structures, run commands to set up environments, and reflectively react to any errors and warnings.&lt;/p>
&lt;p>To build EnvGym, I have the following to-do&amp;rsquo;s in mind:&lt;/p>
&lt;ul>
&lt;li>Building a dataset that includes repos to be reproduced&lt;/li>
&lt;li>Establishing a baseline using current methods&lt;/li>
&lt;li>Implementing the actual EnvGym algorithm&lt;/li>
&lt;li>Testing EnvGym against baseline performance and iteratively improving it&lt;/li>
&lt;li>Deploying EnvGym to real-world use cases and gathering feedback&lt;/li>
&lt;/ul>
&lt;p>Here is the repo that we are working on:
&lt;a href="https://github.com/EaminC/EnvGym/tree/main" target="_blank" rel="noopener">https://github.com/EaminC/EnvGym/tree/main&lt;/a>&lt;/p>
&lt;p>More updates to come, thanks for reading!&lt;/p></description></item><item><title>Smart Environments – An AI System for Reproducible Custom Computing Environments</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/uchicago/envgym/</link><pubDate>Tue, 18 Feb 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/uchicago/envgym/</guid><description>&lt;h2 id="overview">Overview&lt;/h2>
&lt;p>The complexity of environment setup and the expertise required to configure specialized software stacks can often hinder efforts to reproduce important scientific achievements in HPC and systems studies. Researchers often struggle with incomplete or ambiguous artifact descriptions that make assumptions about &amp;ldquo;common knowledge&amp;rdquo; that is actually specific domain expertise. When trying to reproduce experiments, reviewers may spend excessive time debugging environment inconsistencies rather than evaluating the actual research. These challenges are compounded when experiments need to run on different hardware configurations.&lt;/p>
&lt;p>This project seeks to address these fundamental reproducibility barriers by using AI to translate natural language environment requirements often used in papers or artifact descriptions into actionable, reproducible configurations—bridging the knowledge gap between experiment authors and reviewers while standardizing environment creation across different hardware platforms. We will develop an AI-driven system that automatically generates and configures reproducible computing environments based on artifact descriptions from conferences, Trovi artifacts on the &lt;a href="chameleoncloud.org">Chameleon&lt;/a> testbed, and other reliable sources for scientific experiment code and associated documentation. Leveraging Natural Language Processing (NLP), the system will allow researchers to describe desired environments in plain English, then map those descriptions onto predefined configuration templates. By simplifying environment creation and ensuring reproducibility, the system promises to eliminate duplicate setup efforts, accelerate research workflows, and promote consistent experimentation practices across diverse hardware.&lt;/p>
&lt;h2 id="key-outcomes">Key Outcomes&lt;/h2>
&lt;ul>
&lt;li>Working Prototype: A system that automatically generates machine images deployable on bare metal and VM instances, based on user-provided requirements.&lt;/li>
&lt;li>Comprehensive Documentation: Detailed user manuals, guides, and best practices tailored to researchers, ensuring a smooth adoption process.&lt;/li>
&lt;li>Live Demo: A demonstration environment (e.g., a web app or Jupyter notebook) that shows how to request, configure, and launch reproducible cloud environments on both hardware profiles.&lt;/li>
&lt;li>Long-Term Impact: Building blocks for future AI-driven automation of cloud infrastructure, reducing human error and enabling fast, repeatable research pipelines.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Topics&lt;/strong>: Reproducibility, AI &amp;amp; NLP, Cloud Computing, DevOps and Automation&lt;/p>
&lt;p>&lt;strong>Skills&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>Machine Learning / AI: Familiarity with NLP methods to interpret user requirements.&lt;/li>
&lt;li>Python: Primary language for backend services and cloud interactions.&lt;/li>
&lt;li>Cloud API Integration: Experience with OpenStack or similar APIs to provision and configure images on both bare metal and virtual machines.&lt;/li>
&lt;li>DevOps: Automated environment configuration, CI/CD workflows, and containerization.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Difficulty&lt;/strong>: Hard&lt;/p>
&lt;p>&lt;strong>Size&lt;/strong>: Large&lt;/p>
&lt;p>&lt;strong>Mentors&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/paul-marshall/">Paul Marshall&lt;/a>&lt;/p>
&lt;p>&lt;strong>Tasks&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>Requirement Gathering &amp;amp; NLP Design
&lt;ul>
&lt;li>Research the specific needs of researchers building experimental setups.&lt;/li>
&lt;li>Design an NLP pipeline to parse plain-English descriptions (e.g., “I need Python 3.9, CUDA 11, and scikit-learn”) into environment “recipes.”&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Backend Environment Builder
&lt;ul>
&lt;li>Implement logic that converts parsed user requirements into machine-image definitions for bare metal and VM instances.&lt;/li>
&lt;li>Integrate with Chameleon’s APIs to provision servers, install software, and run configuration validation automatically.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Front-End &amp;amp; User Experience
&lt;ul>
&lt;li>Develop an intuitive web or CLI interface that researchers can use to capture experiment environment requirements.&lt;/li>
&lt;li>Provide real-time status updates during environment setup, along with meaningful error messages and quick-start templates.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Testing &amp;amp; Validation
&lt;ul>
&lt;li>Conduct end-to-end tests using diverse software stacks (e.g., HPC libraries, machine learning frameworks) on bare metal and VM instances.&lt;/li>
&lt;li>Ensure reproducibility by re-creating the same environment multiple times and comparing configurations.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Documentation &amp;amp; Demonstration
&lt;ul>
&lt;li>Produce user-facing documentation, including tutorials and best practices for researchers who frequently run experiments on Chameleon Cloud.&lt;/li>
&lt;li>Create a short live demo or screencast showcasing how to configure an environment for a specific research workflow.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul></description></item><item><title>Smart Environments – An AI System for Reproducible Custom Computing Environments</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/uchicago/smart-environments/</link><pubDate>Tue, 18 Feb 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/uchicago/smart-environments/</guid><description>&lt;h2 id="overview">Overview&lt;/h2>
&lt;p>The complexity of environment setup and the expertise required to configure specialized software stacks can often hinder efforts to reproduce important scientific achievements in HPC and systems studies. Researchers often struggle with incomplete or ambiguous artifact descriptions that make assumptions about &amp;ldquo;common knowledge&amp;rdquo; that is actually specific domain expertise. When trying to reproduce experiments, reviewers may spend excessive time debugging environment inconsistencies rather than evaluating the actual research. These challenges are compounded when experiments need to run on different hardware configurations.&lt;/p>
&lt;p>This project seeks to address these fundamental reproducibility barriers by using AI to translate natural language environment requirements often used in papers or artifact descriptions into actionable, reproducible configurations—bridging the knowledge gap between experiment authors and reviewers while standardizing environment creation across different hardware platforms. We will develop an AI-driven system that automatically generates and configures reproducible computing environments based on artifact descriptions from conferences, Trovi artifacts on the &lt;a href="chameleoncloud.org">Chameleon&lt;/a> testbed, and other reliable sources for scientific experiment code and associated documentation. Leveraging Natural Language Processing (NLP), the system will allow researchers to describe desired environments in plain English, then map those descriptions onto predefined configuration templates. By simplifying environment creation and ensuring reproducibility, the system promises to eliminate duplicate setup efforts, accelerate research workflows, and promote consistent experimentation practices across diverse hardware.&lt;/p>
&lt;h2 id="key-outcomes">Key Outcomes&lt;/h2>
&lt;ul>
&lt;li>Working Prototype: A system that automatically generates machine images deployable on bare metal and VM instances, based on user-provided requirements.&lt;/li>
&lt;li>Comprehensive Documentation: Detailed user manuals, guides, and best practices tailored to researchers, ensuring a smooth adoption process.&lt;/li>
&lt;li>Live Demo: A demonstration environment (e.g., a web app or Jupyter notebook) that shows how to request, configure, and launch reproducible cloud environments on both hardware profiles.&lt;/li>
&lt;li>Long-Term Impact: Building blocks for future AI-driven automation of cloud infrastructure, reducing human error and enabling fast, repeatable research pipelines.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Topics&lt;/strong>: Reproducibility, AI &amp;amp; NLP, Cloud Computing, DevOps and Automation&lt;/p>
&lt;p>&lt;strong>Skills&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>Machine Learning / AI: Familiarity with NLP methods to interpret user requirements.&lt;/li>
&lt;li>Python: Primary language for backend services and cloud interactions.&lt;/li>
&lt;li>Cloud API Integration: Experience with OpenStack or similar APIs to provision and configure images on both bare metal and virtual machines.&lt;/li>
&lt;li>DevOps: Automated environment configuration, CI/CD workflows, and containerization.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Difficulty&lt;/strong>: Hard&lt;/p>
&lt;p>&lt;strong>Size&lt;/strong>: Large&lt;/p>
&lt;p>&lt;strong>Mentors&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/paul-marshall/">Paul Marshall&lt;/a>&lt;/p>
&lt;p>&lt;strong>Tasks&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>Requirement Gathering &amp;amp; NLP Design
&lt;ul>
&lt;li>Research the specific needs of researchers building experimental setups.&lt;/li>
&lt;li>Design an NLP pipeline to parse plain-English descriptions (e.g., “I need Python 3.9, CUDA 11, and scikit-learn”) into environment “recipes.”&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Backend Environment Builder
&lt;ul>
&lt;li>Implement logic that converts parsed user requirements into machine-image definitions for bare metal and VM instances.&lt;/li>
&lt;li>Integrate with Chameleon’s APIs to provision servers, install software, and run configuration validation automatically.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Front-End &amp;amp; User Experience
&lt;ul>
&lt;li>Develop an intuitive web or CLI interface that researchers can use to capture experiment environment requirements.&lt;/li>
&lt;li>Provide real-time status updates during environment setup, along with meaningful error messages and quick-start templates.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Testing &amp;amp; Validation
&lt;ul>
&lt;li>Conduct end-to-end tests using diverse software stacks (e.g., HPC libraries, machine learning frameworks) on bare metal and VM instances.&lt;/li>
&lt;li>Ensure reproducibility by re-creating the same environment multiple times and comparing configurations.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Documentation &amp;amp; Demonstration
&lt;ul>
&lt;li>Produce user-facing documentation, including tutorials and best practices for researchers who frequently run experiments on Chameleon Cloud.&lt;/li>
&lt;li>Create a short live demo or screencast showcasing how to configure an environment for a specific research workflow.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul></description></item></channel></rss>