<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>uc | UCSC OSPO</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/tag/uc/</link><atom:link href="https://deploy-preview-1007--ucsc-ospo.netlify.app/tag/uc/index.xml" rel="self" type="application/rss+xml"/><description>uc</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><lastBuildDate>Thu, 05 Feb 2026 00:00:00 +0000</lastBuildDate><image><url>https://deploy-preview-1007--ucsc-ospo.netlify.app/media/logo_hub6795c39d7c5d58c9535d13299c9651f_74810_300x300_fit_lanczos_3.png</url><title>uc</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/tag/uc/</link></image><item><title>NETAI: AI-Powered Network Anomaly Detection and Diagnostics Platform</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/ucsd/netai/</link><pubDate>Thu, 05 Feb 2026 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/ucsd/netai/</guid><description>&lt;p>NETAI (Network AI) is an AI-powered network anomaly detection and diagnostics platform for the National Research Platform (NRP). This project combines Kubernetes-native LLM integration, network performance monitoring, and predictive analytics to create an intelligent assistant for network operators. Students will work with cutting-edge technologies including Large Language Models (LLMs), Kubernetes, perfSONAR network measurements, time-series analysis, and containerized AI/ML workloads, while contributing to real-world applications in network operations and diagnostics.&lt;/p>
&lt;p>The project involves developing a &lt;strong>Kubernetes chatbot&lt;/strong> that leverages NRP&amp;rsquo;s managed LLM service (providing access to models like Qwen3-VL, GLM-4.7, and GPT-OSS) to help network operators understand complex network behaviors, diagnose anomalies, and receive natural language explanations of network issues. Students will integrate perfSONAR measurement data with traceroute path analysis to create an interactive network topology visualization, and develop &lt;strong>AI/ML models&lt;/strong> for predictive network performance analysis using NRP&amp;rsquo;s GPU resources.&lt;/p>
&lt;p>In addition, students will gain hands-on experience with &lt;strong>fine-tuning LLMs&lt;/strong> on historical network diagnostics data, developing &lt;strong>time-series forecasting models&lt;/strong> for network metrics, and implementing &lt;strong>anomaly detection&lt;/strong> using deep learning techniques. The entire AI/ML pipeline will be containerized and deployed as Kubernetes workloads, utilizing GPU-enabled pods for model training and inference, ensuring scalability and seamless integration with existing NRP infrastructure.&lt;/p>
&lt;p>The platform builds upon existing network diagnostics capabilities, combining end-to-end throughput measurements with detailed traceroute data to enable operators to visualize network paths, identify performance bottlenecks, and understand relationships between metrics and underlying infrastructure. The AI enhancement will provide predictive capabilities, automated incident reporting, and intelligent recommendations for network remediation strategies.&lt;/p>
&lt;h3 id="netai--llm-integration--kubernetes-chatbot">NETAI / LLM Integration &amp;amp; Kubernetes Chatbot&lt;/h3>
&lt;p>The proposed work includes developing a &lt;strong>Kubernetes-native chatbot&lt;/strong> that integrates with NRP&amp;rsquo;s managed LLM service to provide intelligent network diagnostics assistance. Students will create a conversational interface that can answer questions about network performance, explain anomalies in natural language, and suggest remediation strategies. They will fine-tune LLMs on historical network diagnostics data, test results, and traceroute information to create domain-specific assistants. Students will implement &lt;strong>RESTful APIs&lt;/strong> for chatbot interactions, develop &lt;strong>prompt engineering&lt;/strong> strategies for network diagnostics, and create &lt;strong>context-aware responses&lt;/strong> that incorporate real-time network telemetry. The chatbot will be deployed as Kubernetes services, utilizing GPU pods for inference and integrating with the existing diagnostics platform.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Large Language Models, Kubernetes, Chatbots, Natural Language Processing, Network Diagnostics, API Development&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, Kubernetes, LLM APIs (Qwen3-VL, GLM-4.7, GPT-OSS), Prompt Engineering, REST APIs, Docker, GPU Computing&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/dmitry-mishin/">Dmitry Mishin&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/derek-weitzel/">Derek Weitzel&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="netai--network-anomaly-detection-models">NETAI / Network Anomaly Detection Models&lt;/h3>
&lt;p>The proposed work includes developing &lt;strong>deep learning models&lt;/strong> for network anomaly detection using historical perfSONAR and traceroute data. Students will create models that can identify slow links, high packet loss, excessive retransmits, and failed network tests automatically. They will implement &lt;strong>anomaly detection algorithms&lt;/strong> using techniques such as autoencoders, LSTM networks, and transformer architectures. Students will train models on NRP&amp;rsquo;s GPU clusters using historical network telemetry stored in SQLite databases, develop &lt;strong>feature engineering&lt;/strong> pipelines for network metrics, and create &lt;strong>real-time inference services&lt;/strong> deployed as Kubernetes workloads. The models will be integrated into the diagnostics platform to provide automated anomaly detection alongside the interactive visualization.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Deep Learning, Anomaly Detection, Time-Series Analysis, Network Monitoring, Model Training, GPU Computing&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, PyTorch/TensorFlow, scikit-learn, Pandas, NumPy, SQLite, Kubernetes, GPU Pods, MLOps&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/dmitry-mishin/">Dmitry Mishin&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/derek-weitzel/">Derek Weitzel&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="netai--predictive-analytics--forecasting">NETAI / Predictive Analytics &amp;amp; Forecasting&lt;/h3>
&lt;p>The proposed work includes developing &lt;strong>predictive models&lt;/strong> that can forecast network performance degradation and identify patterns in network anomalies before they impact users. Students will create &lt;strong>time-series forecasting models&lt;/strong> for network metrics such as throughput, latency, and packet loss, using techniques like ARIMA, Prophet, and deep learning-based forecasting. They will implement &lt;strong>few-shot learning approaches&lt;/strong> to adapt models to new network topologies and measurement patterns, develop &lt;strong>early warning systems&lt;/strong> for potential network issues, and create &lt;strong>automated incident report generation&lt;/strong> using LLMs. Students will leverage NRP&amp;rsquo;s GPU resources for training forecasting models and deploy them as Kubernetes services for real-time predictions integrated with the diagnostics dashboard.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Time-Series Forecasting, Predictive Analytics, Machine Learning, Network Performance, Early Warning Systems, LLM Integration&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, PyTorch/TensorFlow, Prophet, ARIMA, Pandas, NumPy, Time-Series Analysis, Kubernetes, GPU Computing&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/dmitry-mishin/">Dmitry Mishin&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/derek-weitzel/">Derek Weitzel&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="netai--kubernetes-deployment--infrastructure">NETAI / Kubernetes Deployment &amp;amp; Infrastructure&lt;/h3>
&lt;p>The proposed work includes setting up &lt;strong>Kubernetes-based infrastructure&lt;/strong> for deploying the entire NETAI platform, including LLM services, ML models, and the diagnostics dashboard. Students will create &lt;strong>Helm charts&lt;/strong> for deploying containerized AI/ML workloads, configure &lt;strong>GPU-enabled pods&lt;/strong> for model training and inference, and implement &lt;strong>persistent storage&lt;/strong> solutions for maintaining historical network telemetry. They will develop &lt;strong>GitLab CI/CD pipelines&lt;/strong> for automated testing and deployment, set up &lt;strong>monitoring and observability&lt;/strong> using Prometheus and Grafana for tracking model performance and resource usage, and create &lt;strong>scalable deployment strategies&lt;/strong> that leverage NRP&amp;rsquo;s distributed computing resources. Students will also integrate the platform with existing perfSONAR infrastructure and ensure seamless operation within the NRP cluster.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Kubernetes, DevOps, CI/CD, GPU Computing, Container Orchestration, Infrastructure as Code, Monitoring&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Kubernetes, Helm, GitLab CI/CD, Prometheus, Grafana, Docker, GPU Pods, Persistent Storage, Infrastructure Automation&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium to Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/dmitry-mishin/">Dmitry Mishin&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/derek-weitzel/">Derek Weitzel&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="project-resources">Project Resources&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>National Research Platform&lt;/strong>: &lt;a href="https://nrp.ai/" target="_blank" rel="noopener">https://nrp.ai/&lt;/a>&lt;/li>
&lt;li>&lt;strong>NRP LLM Service&lt;/strong>: &lt;a href="https://nrp.ai/documentation/userdocs/ai/llm-managed/" target="_blank" rel="noopener">https://nrp.ai/documentation/userdocs/ai/llm-managed/&lt;/a>&lt;/li>
&lt;li>&lt;strong>perfSONAR&lt;/strong>: &lt;a href="https://www.perfsonar.net/" target="_blank" rel="noopener">https://www.perfsonar.net/&lt;/a>&lt;/li>
&lt;li>&lt;strong>MaDDash&lt;/strong>: &lt;a href="https://github.com/esnet/maddash" target="_blank" rel="noopener">https://github.com/esnet/maddash&lt;/a>&lt;/li>
&lt;li>&lt;strong>Network Monitoring Documentation&lt;/strong>: &lt;a href="https://nrp.ai/documentation/" target="_blank" rel="noopener">https://nrp.ai/documentation/&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="background">Background&lt;/h2>
&lt;p>This project addresses critical gaps in network performance monitoring for the National Research Platform by integrating AI/ML capabilities with existing perfSONAR-based diagnostics. The platform combines end-to-end network measurements with detailed path-level analysis, enhanced by intelligent AI assistants that can help operators understand complex network behaviors and predict potential issues. By leveraging NRP&amp;rsquo;s managed LLM service and GPU resources, students will create a Kubernetes-native system that scales across the distributed research network infrastructure, providing both real-time diagnostics and predictive analytics to improve network reliability and performance for researchers nationwide.&lt;/p></description></item><item><title>VINE: Precision Agriculture Data Platform &amp; Digital Twin</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/ucsd/vine/</link><pubDate>Thu, 05 Feb 2026 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/ucsd/vine/</guid><description>&lt;p>VINE (Vineyard Intelligence Network &amp;amp; Environment) is an AI/ML research project focused on precision agriculture using the &lt;strong>National Research Platform (NRP)&lt;/strong>. This project leverages the innovative demonstration at Iron Horse Vineyards to study how AI and machine learning can optimize agricultural practices through data-driven insights. Students will work with cutting-edge AI/ML technologies, distributed computing on NRP, and large-scale data analysis, while contributing to real-world applications in sustainable agriculture and climate adaptation.&lt;/p>
&lt;p>The project involves &lt;strong>AI/ML research&lt;/strong> using agricultural data from Iron Horse Vineyards, leveraging the computational resources of the &lt;strong>National Research Platform&lt;/strong> for training and deploying machine learning models. Students will work with agricultural datasets including sensor data, multi-spectral drone imagery, and historical records, developing models for predictive analytics, computer vision, and time-series forecasting. The integration of &lt;strong>NRP&amp;rsquo;s distributed infrastructure&lt;/strong> enables scalable AI research that can process large volumes of sensor data, multi-spectral imagery, and historical agricultural records.&lt;/p>
&lt;p>Students will gain hands-on experience with &lt;strong>AI/ML model development&lt;/strong> for agricultural applications, learning how to analyze multi-spectral drone imagery, process time-series sensor data, and build predictive models for irrigation scheduling, pest detection, and harvest timing. They will deploy and train models on &lt;strong>NRP&amp;rsquo;s Kubernetes clusters&lt;/strong>, utilize &lt;strong>GPU resources&lt;/strong> for deep learning workloads, and work with agricultural datasets for comprehensive research. The project emphasizes using &lt;strong>distributed computing&lt;/strong> on NRP to scale AI/ML experiments and create open, shareable datasets for collaborative research.&lt;/p>
&lt;p>The platform builds upon the success demonstrated at Iron Horse Vineyards, where AI-driven analytics have shown potential for &lt;strong>10% water use reduction&lt;/strong> and improved yield optimization. This project aims to advance AI/ML research in precision agriculture by utilizing NRP&amp;rsquo;s computational capabilities, creating reproducible research that can benefit the broader agricultural and research communities.&lt;/p>
&lt;h3 id="vine--data-pipeline--integration">VINE / Data Pipeline &amp;amp; Integration&lt;/h3>
&lt;p>The proposed work includes building &lt;strong>data pipelines&lt;/strong> to ingest, process, and prepare agricultural data from Iron Horse Vineyards and other sources for AI/ML research. Students will develop pipelines to collect sensor data (soil moisture, temperature, CO2, weather), multi-spectral drone imagery, and historical agricultural records. They will create &lt;strong>data validation and quality assurance&lt;/strong> processes, implement &lt;strong>data preprocessing&lt;/strong> for ML model training, and develop &lt;strong>data integration&lt;/strong> workflows that connect agricultural datasets with NRP computational resources. Students will also work on &lt;strong>data sharing&lt;/strong> mechanisms to make processed datasets available for the research community.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Data Engineering, Time-Series Data, Data Preprocessing, Data Sharing, ML Data Pipelines&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, Pandas, NumPy, Data Validation, REST APIs, Docker, Kubernetes, Data Processing&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium to Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mohammad-firas-sada/">Mohammad Firas Sada&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="vine--aiml-models-for-agricultural-analytics-on-nrp">VINE / AI/ML Models for Agricultural Analytics on NRP&lt;/h3>
&lt;p>The proposed work includes developing and training &lt;strong>machine learning models&lt;/strong> for agricultural applications using the &lt;strong>National Research Platform (NRP)&lt;/strong>. Students will create models for &lt;strong>predictive irrigation scheduling&lt;/strong> based on soil moisture, weather forecasts, and historical data. They will develop &lt;strong>computer vision models&lt;/strong> for analyzing multi-spectral drone imagery to detect plant health, identify pests, and estimate yield. Students will also work on &lt;strong>time-series forecasting&lt;/strong> models for predicting harvest timing and optimizing resource allocation. The project will involve training models on &lt;strong>NRP&amp;rsquo;s GPU clusters&lt;/strong>, utilizing distributed training capabilities, and deploying models for real-time inference. Students will leverage agricultural datasets for training and validation, and contribute model outputs and insights for the research community.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Machine Learning, Computer Vision, Time-Series Analysis, Predictive Analytics, Agricultural AI, Distributed Training&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, PyTorch/TensorFlow, scikit-learn, OpenCV, Pandas, NumPy, MLOps, NRP Kubernetes, GPU Computing&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mohammad-firas-sada/">Mohammad Firas Sada&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="vine--digital-twin--ai-driven-visualization">VINE / Digital Twin &amp;amp; AI-Driven Visualization&lt;/h3>
&lt;p>The proposed work includes creating &lt;strong>AI-enhanced digital twin&lt;/strong> systems for agricultural sites using computational resources on NRP. Students will develop &lt;strong>3D visualization&lt;/strong> systems (potentially using Omniverse or similar platforms) to represent vineyards and farms, integrate &lt;strong>AI model predictions&lt;/strong> into the digital twin for real-time insights, and create &lt;strong>interactive dashboards&lt;/strong> for monitoring and analysis. They will implement &lt;strong>spatial data processing&lt;/strong> using ML models to map sensor locations and readings to geographic coordinates, and develop &lt;strong>AI-driven simulation capabilities&lt;/strong> for testing different agricultural strategies (irrigation patterns, planting layouts, etc.) before implementation. Students will deploy visualization services on &lt;strong>NRP infrastructure&lt;/strong> and integrate with agricultural data sources for real-time updates.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Digital Twin, AI-Enhanced Visualization, GIS, Spatial Data, ML-Driven Simulation, Real-Time Systems&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, 3D Graphics (Omniverse/Unity/Blender), GIS tools, WebGL, React/Three.js, ML Integration, NRP Deployment&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mohammad-firas-sada/">Mohammad Firas Sada&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="vine--web-dashboard--nrp-integration-platform">VINE / Web Dashboard &amp;amp; NRP Integration Platform&lt;/h3>
&lt;p>The proposed work includes building a &lt;strong>comprehensive web dashboard&lt;/strong> for visualizing agricultural data, AI model predictions, and research insights. Students will develop a &lt;strong>full-stack web application&lt;/strong> using modern frameworks (React, Flask/FastAPI) deployed on the &lt;strong>National Research Platform (NRP)&lt;/strong>. The dashboard will display real-time sensor readings, historical trends from agricultural datasets, AI model predictions, and digital twin visualizations. Students will create &lt;strong>API endpoints&lt;/strong> that integrate with &lt;strong>NRP computational resources&lt;/strong> and agricultural data sources, implement &lt;strong>role-based access control&lt;/strong> for researchers, and enable &lt;strong>data export/sharing&lt;/strong> with the broader research community. The platform will support &lt;strong>interactive data exploration&lt;/strong> tools and provide programmatic access to AI/ML models running on NRP.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Full-Stack Web Development, Data Visualization, API Development, NRP Deployment, ML Model Serving&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> React, Flask/FastAPI, PostgreSQL, D3.js/Plotly, Bootstrap/Tailwind CSS, REST APIs, Kubernetes, NRP APIs&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium to Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mohammad-firas-sada/">Mohammad Firas Sada&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="project-resources">Project Resources&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>National Research Platform&lt;/strong>: &lt;a href="https://nrp.ai/" target="_blank" rel="noopener">https://nrp.ai/&lt;/a>&lt;/li>
&lt;li>&lt;strong>Iron Horse Vineyards Project&lt;/strong>: &lt;a href="https://gitlab.nrp-nautilus.io/ihv" target="_blank" rel="noopener">https://gitlab.nrp-nautilus.io/ihv&lt;/a>&lt;/li>
&lt;li>&lt;strong>Omniverse Integration&lt;/strong>: &lt;a href="https://gitlab.nrp-nautilus.io/omniverse" target="_blank" rel="noopener">https://gitlab.nrp-nautilus.io/omniverse&lt;/a>&lt;/li>
&lt;li>&lt;strong>CENIC Network&lt;/strong>: &lt;a href="https://cenic.org/" target="_blank" rel="noopener">https://cenic.org/&lt;/a>&lt;/li>
&lt;li>&lt;strong>CENIC Precision Agriculture Blog&lt;/strong>: &lt;a href="https://nrp.ai/cenic-precision-agriculture-2025" target="_blank" rel="noopener">https://nrp.ai/cenic-precision-agriculture-2025&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="background">Background&lt;/h2>
&lt;p>This project builds upon the successful demonstration at Iron Horse Vineyards, where CENIC, UC San Diego, and partners have created a living laboratory for precision agriculture. The VINE project focuses on &lt;strong>AI/ML research&lt;/strong> using the &lt;strong>National Research Platform (NRP)&lt;/strong> for computational resources. By leveraging NRP&amp;rsquo;s distributed infrastructure and GPU clusters, students can train and deploy sophisticated ML models for agricultural applications. The project works with agricultural datasets from Iron Horse Vineyards and aims to create open, shareable datasets for the research community. This approach creates a scalable, reproducible framework for AI/ML research in precision agriculture that can benefit researchers, educators, and practitioners nationwide.&lt;/p></description></item><item><title>AI Data Readiness Inspector (AIDRIN)</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/lbl/aidrin/</link><pubDate>Fri, 30 Jan 2026 10:15:00 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/lbl/aidrin/</guid><description>&lt;p>Garbage In, Garbage Out (GIGO) is a widely accepted quote in computer science across various domains, including Artificial Intelligence (AI). As data is the fuel for AI, models trained on low-quality, biased data are often ineffective. Computer scientists who use AI invest considerable time and effort in preparing the data for AI.&lt;/p>
&lt;p>&lt;a href="https://arxiv.org/pdf/2406.19256" target="_blank" rel="noopener">AIDRIN&lt;/a> (AI Data Readiness INspector) is a framework that provides a quantifiable assessment of data readiness for AI processes, covering a broad range of dimensions from the literature. AIDRIN uses metrics from traditional data quality assessment, such as completeness, outliers, and duplicates, to evaluate data. Furthermore, AIDRIN uses metrics specific to assessing AI data, such as feature importance, feature correlations, class imbalance, fairness, privacy, and compliance with the FAIR (Findability, Accessibility, Interoperability, and Reusability) principles. AIDRIN provides visualizations and reports to assist data scientists in further investigating data readiness.&lt;/p>
&lt;h3 id="aidrin-multiple-file-formats">AIDRIN Multiple File Formats&lt;/h3>
&lt;p>The proposed work will include improvements in the AIDRIN framework to (1) add support for new file formats such as Zarr, ROOT, and HDF5; and (2) to allow providing custom data ingestion mechanisms.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>data readiness&lt;/code>, &lt;code>AI&lt;/code>, &lt;code>data analysis&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, C/C++, data analysis, good communicator&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jean-luca-bez/">Jean Luca Bez&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/suren-byna/">Suren Byna&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Drishti</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/lbl/drishti/</link><pubDate>Fri, 30 Jan 2026 10:15:00 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/lbl/drishti/</guid><description>&lt;p>&lt;a href="https://github.com/hpc-io/drishti" target="_blank" rel="noopener">Drishti&lt;/a> is a novel interactive web-based analysis framework to visualize I/O traces, highlight bottlenecks, and help understand the I/O behavior of scientific applications. Drishti aims to fill the gap between the trace collection, analysis, and tuning phases. The framework contains an interactive I/O trace analysis component for end-users to visually inspect their applications&amp;rsquo; I/O behavior, focusing on areas of interest and getting a clear picture of common root causes of I/O performance bottlenecks. Based on the automatic detection of I/O performance bottlenecks, our framework maps numerous common and well-known bottlenecks and their solution recommendations that can be implemented by users.&lt;/p>
&lt;h3 id="drishti-comparisons-and-heatmaps">Drishti Comparisons and Heatmaps&lt;/h3>
&lt;p>The proposed work will include investigating and building a solution to allow comparing and finding differences between two I/O trace files (similar to a &lt;code>diff&lt;/code>), covering the analysis and visualization components. It will also explore additional metrics and counters such as Darshan heatmaps in the analysis and visualization components of the framework.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>I/O&lt;/code>, &lt;code>HPC&lt;/code>, &lt;code>data analysis&lt;/code>, &lt;code>visualization&lt;/code>, &lt;code>profiling&lt;/code>, &lt;code>tracing&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, data analysis, performance profiling&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jean-luca-bez/">Jean Luca Bez&lt;/a> and &lt;a href="mailto:sbyna@lbl.gov">Suren Byna&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>EnergyAPI: An End-to-End API for Energy-Aware Forecasting and Scheduling</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/ucsc/energy-api/</link><pubDate>Fri, 30 Jan 2026 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/ucsc/energy-api/</guid><description>&lt;p>Over the past decades, electricity demand has increased steadily, driven by structural shifts such as the electrification of transportation and, more recently, the rapid expansion of artificial intelligence (AI). Power grids have responded by expanding generation capacity, integrating renewable energy sources such as solar and wind, and deploying demand-response mechanisms. However, the current pace of demand growth is increasingly outstripping grid expansion, leading to integration delays, greater reliance on behind-the-meter consumption, and rising operational complexity.&lt;/p>
&lt;p>To mitigate the environmental and socioeconomic impacts of electricity consumption, large consumers such as cloud data centers and electric vehicle (EV) charging infrastructures are increasingly participating in demand-response programs. These programs provide consumers with grid signals indicating favorable periods for electricity usage, such as when energy is cheapest or has the lowest carbon intensity. Consumers can then shift workloads across time and location to better align with grid conditions and their own operational constraints. A key challenge, however, is the online nature of this problem: operators must make real-time decisions without full knowledge of future grid conditions. While forecasting and optimization techniques exist, their effectiveness depends heavily on workload characteristics, such as whether tasks are delay-tolerant cloud jobs or EV charging sessions with route and deadline constraints.&lt;/p>
&lt;p>This project proposes the design and implementation of a modular, extensible API for energy-aware workload scheduling. The API will ingest grid signals alongside workload Service Level Objectives (SLOs) and operational requirements, and produce execution plans that adapt to changing grid conditions. It will support multiple pluggable scheduling strategies and heuristics, enabling developers to compare real-time and forecast-based approaches across different workload classes. By providing a reusable, open-source interface for demand-response-aware scheduling, this project aims to lower the barrier for developers to integrate energy-aware decision-making into distributed systems and applications.&lt;/p>
&lt;h3 id="building-an-end-to-end-service-for-energy-forecasting-and-scheduling">Building an End-to-End Service for Energy Forecasting and Scheduling&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Databases&lt;/code> &lt;code>Machine Learning&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, command line tools (bash), SQL (MySQL or SQLite), FastAPI, time-series analysis, basic machine learning&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/abel-souza/">Abel Souza&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Develop a containerized, end-to-end platform consisting of a backend, API, and web-based frontend for collecting, estimating, and visualizing real-time and forecasted electrical grid signals. These signals include electricity demand, prices, energy production, grid saturation, and carbon intensity. The system will support scalable data ingestion, region-specific forecasting models, and interactive visualizations to enable energy-aware application development and analysis.&lt;/p>
&lt;p>Tasks:&lt;/p>
&lt;ul>
&lt;li>Study electrical grid signals and demand-response data sources (e.g., demand, price, carbon intensity, grid saturation) and identify their requirements for real-time and forecast-based consumption planning.&lt;/li>
&lt;li>Design and implement a relational data model for storing historical, real-time, and forecasted grid signals.&lt;/li>
&lt;li>Ingest and validate grid signal data into a MySQL or SQLite database, ensuring data quality and time alignment across regions.&lt;/li>
&lt;li>Implement baseline time-series forecasting models for grid signals (e.g., demand, price, or carbon intensity), with support for region-specific configurations.&lt;/li>
&lt;li>Query European Network of Transmission System Operators for Electricity (ENTSO-E) and EIA (Energy Information Administration (EIA)) APIs to collect grid data.&lt;/li>
&lt;li>Develop a RESTful API that exposes both raw and forecasted grid signals for use by energy-aware applications and schedulers.&lt;/li>
&lt;li>Build a web-based user interface to visualize historical trends, forecasts, and regional differences in grid conditions.&lt;/li>
&lt;li>Implement an interactive choropleth map to display spatial variations in grid signals such as carbon intensity and electricity prices.&lt;/li>
&lt;li>Design an extensible architecture that allows different regions to plug in custom forecasting models or heuristics.&lt;/li>
&lt;li>Containerize the backend, API, and frontend components using Docker to enable reproducible deployment and easy integration by external users.&lt;/li>
&lt;/ul></description></item><item><title>Environmental NeTworked Sensor (ENTS)</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/ucsc/ents/</link><pubDate>Fri, 30 Jan 2026 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/ucsc/ents/</guid><description>&lt;h3 id="ents-i-usability-improvements-for-visualization-dashboard">ENTS I: Usability improvements for visualization dashboard&lt;/h3>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Data Visualization Dashboard" srcset="
/project/osre26/ucsc/ents/osp1_huda3c1d46887767e16b865c47973b8288_360491_2d797937cbe25a879de96b44cb5c65b3.webp 400w,
/project/osre26/ucsc/ents/osp1_huda3c1d46887767e16b865c47973b8288_360491_baae6484e015277af7b09e866b6869f5.webp 760w,
/project/osre26/ucsc/ents/osp1_huda3c1d46887767e16b865c47973b8288_360491_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/ucsc/ents/osp1_huda3c1d46887767e16b865c47973b8288_360491_2d797937cbe25a879de96b44cb5c65b3.webp"
width="760"
height="759"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Data Visualization, Backend, Frontend, UI/UX, Analytics&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong>
&lt;ul>
&lt;li>&lt;em>Required:&lt;/em> React, Javascript, Python, SQL, Git&lt;/li>
&lt;li>&lt;em>Nice to have:&lt;/em> Flask, Docker, CI/CD, AWS, Authentication&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/colleen-josephson/">Colleen Josephson&lt;/a>, &lt;a href="mailto:alevy1@ucsc.edu">Alec Levy&lt;/a>, &lt;a href="mailto:jtmadden@ucsc.edu">John Madden&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The Environmental NeTworked Sensor (ENTS) platform, formally Open Sensing Platform (OSP), implements data visualization website for monitoring microbial fuel cell sensors (see &lt;a href="https://github.com/jlab-sensing/ENTS-backend" target="_blank" rel="noopener">GitHub&lt;/a>). The mission is to scale up the current platform to support other researchers or citizen scientists in integrating their novel sensing hardware or microbial fuel cell sensors for monitoring and data analysis. Examples of the types of sensors currently deployed are sensors measuring soil moisture, temperature, current, and voltage in outdoor settings. The focus of the software half of the project involves building upon our existing visualization web platform, and adding additional features to support the mission. A live version of the website is available &lt;a href="https://dirtviz.jlab.ucsc.edu/" target="_blank" rel="noopener">here&lt;/a>.&lt;/p>
&lt;p>Below is a list of project ideas that would be beneficial to the ENTS project. You are not limited to the following projects, and encourage new ideas that enhance the platform:&lt;/p>
&lt;ul>
&lt;li>Drag and drop charts functionality&lt;/li>
&lt;li>Creation of unique charts by users (with unique equations)&lt;/li>
&lt;li>Customizable options of charts (color, line width, datapoint/line style, axis labels)&lt;/li>
&lt;li>Exportable charts (with customizable options)&lt;/li>
&lt;li>Saving layouts via url&lt;/li>
&lt;/ul>
&lt;h3 id="ents-ii-migration-to-tockos">ENTS II: Migration to TockOS&lt;/h3>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="ENTS in the wild" srcset="
/project/osre26/ucsc/ents/flower_bed_hua65f08ca6bedf0f2d60c653056e1b3a7_800588_c34f23edec4789d86dcf04482fa38282.webp 400w,
/project/osre26/ucsc/ents/flower_bed_hua65f08ca6bedf0f2d60c653056e1b3a7_800588_8a4ed9b7cf50d0c7493779c714094459.webp 760w,
/project/osre26/ucsc/ents/flower_bed_hua65f08ca6bedf0f2d60c653056e1b3a7_800588_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/ucsc/ents/flower_bed_hua65f08ca6bedf0f2d60c653056e1b3a7_800588_c34f23edec4789d86dcf04482fa38282.webp"
width="760"
height="369"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Embedded system, operating system&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong>
&lt;ul>
&lt;li>&lt;em>Required:&lt;/em> Rust, C/C++, Git, Github&lt;/li>
&lt;li>&lt;em>Nice to have:&lt;/em> STM32 HAL, python&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/colleen-josephson/">Colleen Josephson&lt;/a>, &lt;a href="mailto:jtmadden@ucsc.edu">John Madden&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The current version of the hardware firmware is implemented in baremetal
through the use of STM hardware abstraction layer (HAL) drivers. We are
interested in porting the firmware implementation to an operating system (OS)
to allow for additional functionality to support environmental data logging.
&lt;a href="https://tockos.org/" target="_blank" rel="noopener">TockOS&lt;/a> is an embedded operating system designed for
running multiple concurrent, mutually distrustful applications on low-memory
and low-power microcontrollers that will be used. TockOS allows for OTA
updates, dynamic app loading, hardware multiplexing, and more. We envision
multiple users utilizing shared ENTS hardware that provides communication and
measurement capabilities. Thus, the initial cost of deploying wireless sensor
networks is reduced.&lt;/p>
&lt;p>The TockOS kernel is written in &lt;a href="https://rust-lang.org/" target="_blank" rel="noopener">Rust&lt;/a> to enhance
security. Userspace apps can be written in either C, C++, or Rust. Development
will be done through a remote development server to access the hardware. See
the following repos for the current status of the project:&lt;/p>
&lt;ul>
&lt;li>Userspace library: &lt;a href="https://github.com/jlab-sensing/libtock-c" target="_blank" rel="noopener">libtock-c&lt;/a>&lt;/li>
&lt;li>Kernel: &lt;a href="https://github.com/jlab-sensing/tock" target="_blank" rel="noopener">tock&lt;/a>&lt;/li>
&lt;li>Baremetal: &lt;a href="https://github.com/jlab-sensing/ENTS-node-firmware" target="_blank" rel="noopener">ENTS-node-firmware&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Scope of work:&lt;/p>
&lt;ul>
&lt;li>Writing kernel peripheral drivers.
&lt;ul>
&lt;li>Done entirely in Rust.&lt;/li>
&lt;li>Low-level understanding of microcontroller&lt;/li>
&lt;li>Basic kernel functionality knowledge.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Porting baremetal components to userland apps.
&lt;ul>
&lt;li>Involves porting STM HAL calls to TockOS syscalls.&lt;/li>
&lt;li>Primarily done in C.&lt;/li>
&lt;li>Understanding of syscalls.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul></description></item><item><title>Reproducible CXL Emulation</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/ucmerced/cxl_emu/</link><pubDate>Fri, 30 Jan 2026 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/ucmerced/cxl_emu/</guid><description>&lt;p>Compute Express Link (CXL) is an emerging memory interconnect standard that enables shared, coherent memory across CPUs, accelerators, and multiple hosts, unlocking new possibilities in hyperscale, HPC, and disaggregated systems. However, because access to real multi-host CXL hardware is limited, it is difficult for researchers and students to experiment with, evaluate, and reproduce results on advanced CXL topologies.
OCEAN (Open-source CXL Emulation At Hyperscale) [https://github.com/cxl-emu/OCEAN] is a full-stack CXL emulation platform built on QEMU that enables detailed emulation of CXL 3.0 memory systems, including multi-host shared memory pools, coherent fabric topologies, and latency modeling. This project will create reproducible experiment pipelines, automated deployment workflows, and user-friendly tutorials so that others can reliably run and extend CXL emulation experiments without requiring specialized hardware.&lt;/p>
&lt;h3 id="reproducible-cxl-emulation-for-multi-host-memory-systems">Reproducible CXL Emulation for Multi-Host Memory Systems&lt;/h3>
&lt;p>Streamline multi-host CXL emulation without specialized hardware.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>CXL emulation&lt;/code> &lt;code>Memory Systems&lt;/code> &lt;code>Reproducibility&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> C/C++, Virtualization (QEMU), Scripting, Performance Modeling&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:mrafi@ucmerced.edu">Mujahid Al Rafi&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/luanzheng-lenny-guo/">Luanzheng &amp;quot;Lenny&amp;quot; Guo&lt;/a>.&lt;/li>
&lt;/ul>
&lt;p>Tasks:&lt;/p>
&lt;ul>
&lt;li>Create automated deployment scripts and configuration templates for OCEAN-based CXL emulation topologies (single-host and multi-host).&lt;/li>
&lt;li>Develop a standardized experiment harness for running memory performance benchmarks (e.g., OSU micro-benchmarks, STREAM-style tests) in emulated CXL environments.&lt;/li>
&lt;li>Build reproducible experiment pipelines that others can run to evaluate latency, bandwidth, and scaling properties of CXL memory systems.&lt;/li>
&lt;li>Produce tutorials, documentation, and reproducibility artifacts to guide new users through setup, execution, and analysis.&lt;/li>
&lt;li>Package and contribute all scripts, configurations, and documentation back to the OCEAN open-source repository.&lt;/li>
&lt;/ul>
&lt;h3 id="exploring-security-and-isolation-in-cxl-based-memory-systems">Exploring Security and Isolation in CXL-Based Memory Systems&lt;/h3>
&lt;p>Investigate security and isolation properties of CXL-based memory systems using software emulation.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>CXL Systems&lt;/code> &lt;code>Security&lt;/code> &lt;code>Memory Isolation&lt;/code> &lt;code>Side Channel&lt;/code> &lt;code>Emulation&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> C/C++, Virtualization (QEMU), Scripting, Computer Architecture, Security&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:mrafi@ucmerced.edu">Mujahid Al Rafi&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/luanzheng-lenny-guo/">Luanzheng &amp;quot;Lenny&amp;quot; Guo&lt;/a>.&lt;/li>
&lt;/ul>
&lt;p>Tasks:&lt;/p>
&lt;ul>
&lt;li>Study the CXL memory model and fabric architecture to identify potential security and isolation risks in multi-host shared memory environments (e.g., contention, timing variation, and resource interference).&lt;/li>
&lt;li>Set up multi-host or multi-VM CXL emulation environments using OCEAN that mimic realistic multi-tenant deployments.&lt;/li>
&lt;li>Design and implement reproducible micro-benchmarks to measure timing, bandwidth contention, or observable interference through shared CXL memory pools.&lt;/li>
&lt;li>Analyze how fabric configuration choices (e.g., topology, latency injection, memory partitioning, or allocation policies) affect isolation and leakage behavior.&lt;/li>
&lt;li>Explore and prototype mitigation strategies—such as memory partitioning, throttling, or policy-driven allocation—and evaluate their effectiveness using the emulation platform.&lt;/li>
&lt;/ul></description></item><item><title>Network Simulation Bridge • Enabling Interactive Network Models</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/ucsc/nsb-network-models/</link><pubDate>Wed, 28 Jan 2026 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/ucsc/nsb-network-models/</guid><description>&lt;p>The Network Simulation Bridge &amp;ndash; &lt;a href="https://github.com/nsb-ucsc/nsb" target="_blank" rel="noopener">NSB&lt;/a> &amp;ndash; is a network co-simulation framework that bridges together applications and network simulators. It enables students, researchers, and developers to prototype their applications and systems on simulated networks. It consists of a message server and client endpoint interfaces which together form a bridge, routing application message payloads through the network simulator. NSB is designed to be extensible through modular interfaces that serve to allow users to contribute new features and modules that suit evolving and emerging use cases. NSB is developed to be application-, network simulator-, and platform-agnostic so that users and developers are empowered to integrate any application front-end with any network simulator back-end, providing versatility and flexibility when used alongside other tools in larger systems and applications.&lt;/p>
&lt;p>NSB was created in-house by the &lt;a href="https://inrg.engineering.ucsc.edu/" target="_blank" rel="noopener">Inter-Networking Research Group&lt;/a> and is now being developed into a more full-featured open-source tool and ecosystem in partnership with the &lt;a href="https://ucsc-ospo.github.io/" target="_blank" rel="noopener">UCSC OSPO&lt;/a> and as part of the &lt;a href="https://www.nsf.gov/funding/opportunities/pose-pathways-enable-open-source-ecosystems" target="_blank" rel="noopener">NSF Pathways to Enable Open-Source Ecosystems&lt;/a> program. In this transition to a more polished and feature-rich product, the next phase of NSB development will involve the engineering of new quality-of-life features, testing and iteration of the core tool itself, and user-centric refinement via implementation in interdisciplinary system models.&lt;/p>
&lt;h3 id="develop-a-user-centric-website-for-nsb">Develop a User-Centric Website for NSB&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Web Development&lt;/code> &lt;code>Dynamic Updates&lt;/code> &lt;code>UX&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> web development experience, good communicator, (HTML/CSS), (Javascript)&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:hkuttive@ucsc.edu">Harikrishna Kuttivelil&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Develop a clean and welcoming landing page and website for the project. The organization needs to reflect the needs of both users and potential project contributors. This website will be the first impression for people new to the project and should&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Work with mentors on understanding the context of the project and the expected needs of the users.&lt;/li>
&lt;li>Port relevant documentation and tutorials from the &lt;a href="https://github.com/nsb-ucsc/nsb" target="_blank" rel="noopener">repository page&lt;/a>, ensuring updates in the repository are reflected in the website.&lt;/li>
&lt;li>Study existing open source product websites and draw insights to include in our own design.&lt;/li>
&lt;li>Design the structure of the website according to best OS, visual design, and accessibility design practices.&lt;/li>
&lt;li>Include visual content that showcases NSB integration and testimonials (if applicable).&lt;/li>
&lt;/ul>
&lt;h3 id="improve-the-user-experience-of-nsb">Improve the User Experience of NSB&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Software Engineering&lt;/code> &lt;code>User-Centric Development&lt;/code> &lt;code>Visualization&lt;/code> &lt;code>UI/UX&lt;/code> &lt;code>Documentation&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> package management, toolchain implementation, process automation, technical writing, (visualization), (bash), (Python), (C++)&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:hkuttive@ucsc.edu">Harikrishna Kuttivelil&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Our goal has always been to keep NSB streamlined and out of the way of the users and developers. In line with that, we want our tool to be easily available and installable, and we want the experience of using it to feel minimal and non-intrusive while providing sufficient observability of NSB&amp;rsquo;s internals for those who want it.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Work with mentors and potential users on identifying aspects of the user experience that can refined for better quality-of-life experiences.&lt;/li>
&lt;li>Verify and iterate on existing software packaging methods for NSB to ensure that tool setup is stress-free.&lt;/li>
&lt;li>Refine and update existing documentation and tutorials to reflect improvements in the setup, installation, and usage processes.&lt;/li>
&lt;li>Work with mentors and other contributors to work backwards from what the user wants to see to design the user interface.&lt;/li>
&lt;li>Work with other contributors (see below) to develop a &lt;em>Network-in-a-Box&lt;/em> experience with NSB.&lt;/li>
&lt;/ul>
&lt;h3 id="create-a-network-in-a-box-experience-with-nsb">Create a &lt;em>Network-in-a-Box&lt;/em> Experience with NSB&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Software Engineering&lt;/code>, &lt;code>Simulation&lt;/code>, &lt;code>System Modeling&lt;/code>, &lt;code>System Design&lt;/code>, &lt;code>Visualization&lt;/code>, &lt;code>UI/UX&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> software integration and interfacing, toolchain implementation, process automation, C++, (visualization), (LLM-enabled code generation), (technical writing)&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:hkuttive@ucsc.edu">Harikrishna Kuttivelil&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>NSB was originally designed for networking graduate students to interface with application-layer programs. But since then, there&amp;rsquo;s been more of an appetite for a simpler &lt;em>network-in-a-box&lt;/em> approach that would allow users to quickly deploy baseline or generated network simulations that are ready for use with NSB.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Learn how to use one of the major open-source network simulators (&lt;a href="https://www.nsnam.org/" target="_blank" rel="noopener">ns3&lt;/a> or &lt;a href="https://omnetpp.org/" target="_blank" rel="noopener">OMNeT++&lt;/a>).&lt;/li>
&lt;li>Work with mentors in designing a simpler, minimal user experience of operating NSB.&lt;/li>
&lt;li>Develop tools to automatically create network simulations given input parameters (type of network, number of nodes, description of infrastructure).&lt;/li>
&lt;li>Create documentation aimed at new users.&lt;/li>
&lt;li>Implement or embed network visualizations to enrich the user experience.&lt;/li>
&lt;/ul>
&lt;h3 id="implement-networked-system-models-to-evaluate-quality-of-nsb">Implement Networked System Models to Evaluate Quality of NSB&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>System Modeling&lt;/code> &lt;code>Simulation&lt;/code> &lt;code>System Design&lt;/code> &lt;code>Software Development&lt;/code> &lt;code>Product Testing&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> software integration, good communication, qualitative research, (proficiency in Python and/or C++), (processing scientific and technical literature)&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:hkuttive@ucsc.edu">Harikrishna Kuttivelil&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>NSB is a relatively new tool and has not been extensively tested outside of the core contributors, who know a bit too much about the tool. We need to better understand what external user and contributor experience will be like, and the best way to do that is to start developing with NSB to build models of connected systems, i.e., sensor networks, smart homes, smart farms, etc.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Research academic literature and relevant works to identify relevant distributed applications to model.&lt;/li>
&lt;li>Work with mentors and collaborators to plan implementation of selected system models.&lt;/li>
&lt;li>Track and report issues and concerns in quality-of-life experiences, critical errors, or difficulties.&lt;/li>
&lt;li>Work with mentors and contributors to address issues and concerns.&lt;/li>
&lt;li>Refine and update existing documentation and tutorials to reflect improvements in the setup, installation, and usage processes.&lt;/li>
&lt;li>Work with other contributors (see below) in reviewing and cross-referencing model implementations.&lt;/li>
&lt;/ul>
&lt;h3 id="model-autonomous-vehicle-networks-to-drive-new-feature-development-in-nsb">Model Autonomous Vehicle Networks to Drive New Feature Development in NSB&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>System Modeling&lt;/code> &lt;code>Simulation&lt;/code> &lt;code>System Design&lt;/code> &lt;code>Software Development&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> requirement-based software design, message parsing interfaces, server-client communication, (proficiency in Python and/or C++), (processing scientific and technical literature)&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:hkuttive@ucsc.edu">Harikrishna Kuttivelil&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>NSB today serves its named purpose &amp;ndash; message relaying. However, modeling complex systems can sometimes involving synchronizing other simulation features, like &lt;em>mobility&lt;/em> when dealing with vehivle networks. Implementing a generic layer of being able to synchronize user-defined features across endpoints would be a powerful, enabling feature in NSB. In the process, we may also uncover opportunities for improving the NSB developer experience.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Research academic literature and relevant works to identify and design potential autonomous vehicle network models.&lt;/li>
&lt;li>Work with mentors and collaborators to iterate on system designs to ensure it serves the purpose of furthering NSB development.&lt;/li>
&lt;li>Help mentors design and develop the &lt;em>new&lt;/em> feature synchronization feature in NSB, driven by the autonomous vehicle system model.&lt;/li>
&lt;li>Develop and iterate feature synchronization, using mobility as the synchronized feature.&lt;/li>
&lt;li>Create documentation and tutorials to serve as resources for future users, contributors, and developers.&lt;/li>
&lt;li>Work with other contributors (see above) in reviewing and cross-referencing model implementations.&lt;/li>
&lt;/ul></description></item><item><title>Scenic: A Language for Design and Verification of Autonomous Cyber-Physical Systems</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/ucsc/scenic/</link><pubDate>Sat, 24 Jan 2026 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/ucsc/scenic/</guid><description>&lt;p>&lt;a href="https://scenic-lang.org/" target="_blank" rel="noopener">Scenic&lt;/a> is a probabilistic programming language for the design and verification of autonomous cyber-physical systems like self-driving cars.
Scenic allows users to define &lt;em>scenarios&lt;/em> for testing or training their system by putting a probability distribution on the system&amp;rsquo;s environment: the positions, orientations, and other properties of objects and agents, as well as their behaviors over time.
Sampling these scenarios and running them in a simulator yields synthetic data which can be used to train or test a system.
Since Scenic was released open-source in 2019, our group and many others in academia have used Scenic to find, diagnose, and fix bugs in autonomous cars, aircraft, robots, and other kinds of systems.
In industry, it is being used by companies including Boeing, Meta, Deutsche Bahn, and Toyota in domains spanning autonomous driving, aviation, household robotics, railways, maritime, and virtual reality.&lt;/p>
&lt;p>Our long-term goal is for Scenic to become a widely-used common representation and toolkit supporting the entire design lifecycle of AI-based cyber-physical systems.
Towards this end, we have many summer projects available, ranging from adding new application domains to working on the Scenic compiler and sampler:&lt;/p>
&lt;ol>
&lt;li>Extensions to the Scenic driving domain&lt;/li>
&lt;li>Interfacing Scenic to new simulators&lt;/li>
&lt;li>Scenic distribution visualizer&lt;/li>
&lt;/ol>
&lt;p>See the sections below for details.&lt;/p>
&lt;h3 id="extensions-to-the-scenic-driving-domain">Extensions to the Scenic Driving Domain&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Autonomous Driving&lt;/code> &lt;code>3D modeling&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python; basic vector geometry&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/daniel-fremont/">Daniel Fremont&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/eric-vin/">Eric Vin&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Scenic scenarios written to test autonomous vehicles use the &lt;a href="https://docs.scenic-lang.org/en/latest/modules/scenic.domains.driving.html" target="_blank" rel="noopener">driving domain&lt;/a>, a Scenic library defining driving-specific concepts including cars, pedestrians, roads, lanes, and intersections.
The library extracts information about road networks, such as the shapes of lanes, from files in the standard &lt;a href="https://www.asam.net/standards/detail/opendrive/" target="_blank" rel="noopener">OpenDRIVE&lt;/a> format.&lt;/p>
&lt;p>There are several potential goals of this project, including:&lt;/p>
&lt;ul>
&lt;li>Supporting importing complex object information from simulators like CARLA.&lt;/li>
&lt;li>Extending the domain to incorporate additional metadata, such as highway entrances and exits.&lt;/li>
&lt;li>Fixing various bugs and limitations that exist in the driving domain (e.g. &lt;a href="https://github.com/BerkeleyLearnVerify/Scenic/issues/274" target="_blank" rel="noopener">Issue #274&lt;/a> and &lt;a href="https://github.com/BerkeleyLearnVerify/Scenic/issues/295" target="_blank" rel="noopener">Issue #295&lt;/a>).&lt;/li>
&lt;/ul>
&lt;h3 id="interfacing-scenic-to-new-simulators">Interfacing Scenic to New Simulators&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Simulation&lt;/code> &lt;code>Autonomous Driving&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/daniel-fremont/">Daniel Fremont&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/eric-vin/">Eric Vin&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Scenic is designed to be &lt;a href="https://docs.scenic-lang.org/en/latest/new_simulator.html" target="_blank" rel="noopener">easily-interfaced to new simulators&lt;/a>.
Depending on student interest, we could pick a simulator which would open up new kinds of applications for Scenic and write an interface for it.
Some possibilities include:&lt;/p>
&lt;ul>
&lt;li>The &lt;a href="https://github.com/tier4/AWSIM" target="_blank" rel="noopener">AWSIM&lt;/a> driving simulator (to allow testing the &lt;a href="https://autoware.org/" target="_blank" rel="noopener">Autoware&lt;/a> open-source autonomous driving software stack)&lt;/li>
&lt;li>The &lt;a href="https://www.ipg-automotive.com/solutions/product-portfolio/carmaker/" target="_blank" rel="noopener">CarMaker&lt;/a> driving simulator&lt;/li>
&lt;/ul>
&lt;p>The goal of the project would be to create an interface between Scenic and the new simulator and write scenarios demonstrating it.
If time allows, we could do a case study on a realistic system for publication at an academic conference.&lt;/p>
&lt;h3 id="tool-to-visualize-scenario-distributions">Tool to Visualize Scenario Distributions&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Visualization&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python; basic visualization and graphics&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/daniel-fremont/">Daniel Fremont&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/eric-vin/">Eric Vin&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>A Scenic scenario represents a distribution over scenes, but it can be difficult to interpret what exactly this distribution represents. Being able to visualize this distribution would be helpful for understanding and reasoning about Scenarios.&lt;/p>
&lt;p>The goal of this project would be to build on an existing prototype for visualizing these distributions, and to create a tool that can be used by the wider Scenic community.&lt;/p></description></item><item><title>CauST: Causal Gene Intervention for Robust Spatial Domain Identification</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/uci/caust/</link><pubDate>Wed, 21 Jan 2026 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/uci/caust/</guid><description>&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> spatial transcriptomics, spatial domain identification, causal inference, gene intervention&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong>
&lt;ul>
&lt;li>&lt;strong>Programming Languages:&lt;/strong> Python (PyTorch preferred)&lt;/li>
&lt;li>&lt;strong>Machine Learning:&lt;/strong> causal inference, representation learning, clustering&lt;/li>
&lt;li>&lt;strong>Data Analysis:&lt;/strong> spatial transcriptomics preprocessing and evaluation (ARI, cross-slice generalization)&lt;/li>
&lt;li>&lt;strong>Bioinformatics Knowledge (preferred):&lt;/strong> spatial transcriptomics, scRNA-seq, gene perturbation analysis&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Advanced&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/lijinghua-zhang/">Lijinghua Zhang&lt;/a> (contact person)&lt;/li>
&lt;/ul>
&lt;h3 id="project-idea-description">&lt;strong>Project Idea Description&lt;/strong>&lt;/h3>
&lt;p>Spatial domain identification is a core task in spatial transcriptomics (ST), aiming to segment tissue sections into biologically meaningful regions based on spatially resolved gene expression profiles. These spatial domains often correspond to anatomical layers, functional niches, or microenvironmental states, and are widely used as the basis for downstream biological interpretation.&lt;/p>
&lt;p>Despite strong empirical performance, most existing spatial domain identification methods rely on &lt;strong>purely correlational gene signals&lt;/strong>. Genes are selected or weighted based on association with spatial patterns, without distinguishing whether they &lt;em>causally drive&lt;/em> domain formation or merely reflect downstream or confounded effects. As a result, current models often suffer from limited robustness and poor generalization across tissue sections or donors.&lt;/p>
&lt;h3 id="problem-correlation-driven-gene-usage-and-limited-generalization">&lt;strong>Problem: Correlation-Driven Gene Usage and Limited Generalization&lt;/strong>&lt;/h3>
&lt;p>In standard pipelines, gene expression features are typically used wholesale or filtered using heuristic criteria (e.g., highly variable genes). However, many genes that are strongly correlated with spatial domains are not causally responsible for domain structure. Including such non-causal or confounded genes can:&lt;/p>
&lt;ul>
&lt;li>Reduce robustness across slices and donors&lt;/li>
&lt;li>Obscure true domain-driving biological signals&lt;/li>
&lt;li>Limit interpretability of spatial domain assignments&lt;/li>
&lt;/ul>
&lt;p>Empirically, domain identification performance often degrades substantially in cross-slice or cross-donor evaluation settings, underscoring the need for causally informed feature selection.&lt;/p>
&lt;h3 id="proposed-solution-caust">&lt;strong>Proposed Solution: CauST&lt;/strong>&lt;/h3>
&lt;p>This project proposes &lt;strong>CauST&lt;/strong>, a &lt;strong>Causal Gene Intervention framework&lt;/strong> for robust spatial domain identification.&lt;/p>
&lt;p>CauST aims to identify &lt;strong>domain-driving genes&lt;/strong> by estimating their causal influence on spatial domain assignments via &lt;strong>in-silico gene interventions&lt;/strong>. Instead of relying on observational correlations, CauST approximates counterfactual gene knockouts by perturbing individual gene expressions while controlling for confounding factors.&lt;/p>
&lt;p>In addition, CauST leverages &lt;strong>cross-slice invariance&lt;/strong> as a practical criterion for causal gene discovery, prioritizing genes whose effects on spatial domain identification remain stable across tissue sections and donors.&lt;/p>
&lt;p>By filtering or reweighting genes based on estimated causal influence, CauST improves the robustness, generalizability, and interpretability of spatial domain identification models.&lt;/p>
&lt;h3 id="project-objectives">&lt;strong>Project Objectives&lt;/strong>&lt;/h3>
&lt;ol>
&lt;li>&lt;strong>Causal Gene Effect Estimation&lt;/strong>
&lt;ul>
&lt;li>Design in-silico intervention strategies to estimate gene-level causal effects on spatial domain assignments.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Invariant Effect Analysis&lt;/strong>
&lt;ul>
&lt;li>Identify genes with stable effects across tissue sections or donors.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Causal Gene Filtering&lt;/strong>
&lt;ul>
&lt;li>Develop filtering or reweighting schemes based on estimated causal influence.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Integration with Existing Methods&lt;/strong>
&lt;ul>
&lt;li>Integrate CauST into state-of-the-art spatial domain identification pipelines.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Evaluation and Validation&lt;/strong>
&lt;ul>
&lt;li>Benchmark robustness, cross-slice generalization, and interpretability on public spatial transcriptomics datasets.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;h3 id="project-deliverables">&lt;strong>Project Deliverables&lt;/strong>&lt;/h3>
&lt;ol>
&lt;li>&lt;strong>CauST Framework Implementation&lt;/strong>
&lt;ul>
&lt;li>Open-source Python implementation compatible with common spatial transcriptomics toolchains.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Causal Gene Benchmarks&lt;/strong>
&lt;ul>
&lt;li>Quantitative evaluation of causal gene filtering and its impact on domain identification.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Visualization Tools&lt;/strong>
&lt;ul>
&lt;li>Tools for visualizing gene interventions, causal scores, and spatial effects.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Documentation and Tutorials&lt;/strong>
&lt;ul>
&lt;li>Clear examples enabling adoption of CauST by the broader community.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;h3 id="impact">&lt;strong>Impact&lt;/strong>&lt;/h3>
&lt;p>CauST introduces a causally grounded perspective to spatial domain identification by explicitly modeling gene-level interventions. By shifting from correlation-driven gene usage to causal gene selection, this project improves robustness, generalizability, and biological interpretability in spatial transcriptomics analysis. CauST has the potential to serve as a foundational framework for integrating causal reasoning into spatial omics representation learning.&lt;/p></description></item><item><title>Agent4Target: An Agent-based Evidence Aggregation Toolkit for Therapeutic Target Identification</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/uci/agent4target/</link><pubDate>Tue, 20 Jan 2026 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/uci/agent4target/</guid><description>&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> therapeutic target identification, drug discovery, evidence aggregation, AI agents, biomedical knowledge integration&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong>
&lt;ul>
&lt;li>&lt;strong>Programming Languages:&lt;/strong> Python; experience with modern ML tooling preferred&lt;/li>
&lt;li>&lt;strong>Machine Learning / AI:&lt;/strong> agent-based systems, workflow orchestration, weak supervision (basic), representation learning&lt;/li>
&lt;li>&lt;strong>Software Engineering:&lt;/strong> modular system design, APIs, CLI tools, documentation&lt;/li>
&lt;li>&lt;strong>Biomedical Knowledge (preferred):&lt;/strong> familiarity with drug–target databases (e.g., PHAROS, DepMap, Open Targets)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Advanced&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ziheng-duan/">Ziheng Duan&lt;/a> (contact person)&lt;/li>
&lt;/ul>
&lt;h3 id="project-idea-description">&lt;strong>Project Idea Description&lt;/strong>&lt;/h3>
&lt;p>Identifying and prioritizing high-quality therapeutic targets is a foundational yet challenging task in drug discovery. Modern target identification relies on aggregating heterogeneous evidence from multiple sources, including genetic perturbation screens, disease associations, chemical biology, and biomedical literature. These evidence sources are highly fragmented, noisy, and heterogeneous in both format and reliability.&lt;/p>
&lt;p>While large language models and AI agents have recently shown promise in automating scientific workflows, many existing approaches focus on end-to-end prediction or conversational interfaces. Such systems are often difficult to reproduce, extend, or integrate into existing research pipelines, limiting their practical adoption by the biomedical community.&lt;/p>
&lt;p>This project proposes &lt;strong>Agent4Target&lt;/strong>, an &lt;strong>agent-based evidence aggregation toolkit&lt;/strong> that reframes therapeutic target identification as a &lt;strong>structured, modular workflow&lt;/strong>. Instead of using agents for free-form reasoning, Agent4Target employs agents as &lt;strong>orchestrated components&lt;/strong> that systematically collect, normalize, score, and explain evidence supporting candidate therapeutic targets.&lt;/p>
&lt;p>The goal is to deliver a &lt;strong>reusable, open-source toolchain&lt;/strong> that can be integrated into diverse drug discovery workflows, independent of any single downstream prediction model or publication.&lt;/p>
&lt;hr>
&lt;h3 id="key-idea-and-technical-approach">&lt;strong>Key Idea and Technical Approach&lt;/strong>&lt;/h3>
&lt;p>Agent4Target models target identification as a multi-stage, agent-driven pipeline, coordinated by a central orchestrator:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Evidence Collector Agents&lt;/strong>&lt;br>
Specialized agents retrieve target-level evidence from heterogeneous sources, such as:&lt;/p>
&lt;ul>
&lt;li>Genetic perturbation and dependency data (e.g., DepMap)&lt;/li>
&lt;li>Target annotation and development status (e.g., PHAROS)&lt;/li>
&lt;li>Disease association scores (e.g., Open Targets)&lt;/li>
&lt;li>Automatically summarized literature evidence&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Normalization &amp;amp; Scoring Agent&lt;/strong>&lt;br>
Collected evidence is converted into a unified, structured schema using typed data models (e.g., JSON / Pydantic).&lt;br>
This agent performs:&lt;/p>
&lt;ul>
&lt;li>Evidence normalization across sources&lt;/li>
&lt;li>Confidence-aware scoring and aggregation&lt;/li>
&lt;li>Optional weighting or calibration strategies&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Explanation Agent&lt;/strong>&lt;br>
Rather than free-text generation, this agent produces &lt;strong>structured explanations&lt;/strong> that explicitly link scores to supporting evidence, enabling transparency and interpretability for downstream users.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Workflow Orchestrator&lt;/strong>&lt;br>
A lightweight orchestration layer (e.g., LangGraph or a state-machine-based controller) manages agent execution, dependencies, and failure handling, ensuring reproducibility and extensibility.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>This modular design allows individual agents to be replaced, extended, or reused without altering the overall system.&lt;/p>
&lt;hr>
&lt;h3 id="project-objectives">&lt;strong>Project Objectives&lt;/strong>&lt;/h3>
&lt;ol>
&lt;li>&lt;strong>Design a Modular Agent-based Architecture&lt;/strong>
&lt;ul>
&lt;li>Define clear interfaces for evidence collection, normalization, scoring, and explanation agents.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Implement a Standardized Evidence Schema&lt;/strong>
&lt;ul>
&lt;li>Develop a unified data model for heterogeneous target-level evidence.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Build a Reproducible Orchestration Framework&lt;/strong>
&lt;ul>
&lt;li>Implement a deterministic, inspectable workflow for agent coordination.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Deliver a Community-Ready Toolkit&lt;/strong>
&lt;ul>
&lt;li>Provide CLI tools, example notebooks, and clear documentation to support adoption.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Benchmark and Case Studies&lt;/strong>
&lt;ul>
&lt;li>Demonstrate the toolkit on representative target identification scenarios using public datasets.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h3 id="project-deliverables">&lt;strong>Project Deliverables&lt;/strong>&lt;/h3>
&lt;ol>
&lt;li>&lt;strong>Open-Source Agent4Target Codebase&lt;/strong>
&lt;ul>
&lt;li>A well-documented Python package with modular agent components.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Command-Line Interface (CLI)&lt;/strong>
&lt;ul>
&lt;li>Tools for running end-to-end evidence aggregation pipelines.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Standardized Output Schema&lt;/strong>
&lt;ul>
&lt;li>Machine-readable evidence summaries suitable for downstream modeling.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Example Notebooks and Benchmarks&lt;/strong>
&lt;ul>
&lt;li>Demonstrations of usage and performance on real-world target identification tasks.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Documentation&lt;/strong>
&lt;ul>
&lt;li>Installation guides, extension tutorials, and developer documentation.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h3 id="impact">&lt;strong>Impact&lt;/strong>&lt;/h3>
&lt;p>Agent4Target provides a practical bridge between AI agents and real-world drug discovery workflows. By emphasizing structured evidence aggregation, reproducibility, and interpretability, this project enables researchers to systematically reason about therapeutic targets rather than relying on opaque, end-to-end models. The resulting toolkit can serve as a foundation for future work in AI-assisted drug discovery, weak supervision, and biomedical knowledge integration.&lt;/p></description></item><item><title>HistoMoE: A Histology-Guided Mixture-of-Experts Framework for Gene Expression Prediction</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/uci/histomoe/</link><pubDate>Tue, 20 Jan 2026 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/uci/histomoe/</guid><description>&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> computational pathology, spatial transcriptomics, gene expression prediction, mixture-of-experts, multimodal learning&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong>
&lt;ul>
&lt;li>&lt;strong>Programming Languages:&lt;/strong> Python; experience with PyTorch preferred&lt;/li>
&lt;li>&lt;strong>Machine Learning:&lt;/strong> CNNs / vision encoders, mixture-of-experts, multimodal representation learning&lt;/li>
&lt;li>&lt;strong>Data Analysis:&lt;/strong> handling large-scale histology image patches and gene expression matrices&lt;/li>
&lt;li>&lt;strong>Bioinformatics Knowledge (preferred):&lt;/strong> familiarity with spatial transcriptomics or scRNA-seq data&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Advanced&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ziheng-duan/">Ziheng Duan&lt;/a> (contact person)&lt;/li>
&lt;/ul>
&lt;h3 id="project-idea-description">&lt;strong>Project Idea Description&lt;/strong>&lt;/h3>
&lt;p>Histology imaging is one of the most widely available data modalities in biomedical research and clinical practice, capturing rich morphological information about tissues and disease states. In parallel, spatial transcriptomics (ST) technologies provide spatially resolved gene expression measurements, enabling unprecedented insights into tissue organization and cellular heterogeneity. However, the high cost and limited accessibility of ST experiments remain a major barrier to their widespread adoption.&lt;/p>
&lt;p>Predicting gene expression directly from histology images offers a promising alternative, enabling molecular-level inference from routinely collected pathology data. Existing approaches typically rely on a single global model that maps image embeddings to gene expression profiles. While effective to some extent, these models struggle to capture the strong organ-, tissue-, and cancer-specific heterogeneity that underlies gene expression patterns.&lt;/p>
&lt;p>This project proposes &lt;strong>HistoMoE&lt;/strong>, a &lt;strong>histology-guided mixture-of-experts (MoE) framework&lt;/strong> that explicitly models biological heterogeneity by learning &lt;strong>specialized expert models&lt;/strong> for different cancer types or organs, and dynamically routing histology image patches to the most relevant experts.&lt;/p>
&lt;h3 id="key-idea-and-technical-approach">&lt;strong>Key Idea and Technical Approach&lt;/strong>&lt;/h3>
&lt;p>As illustrated in the figure above, HistoMoE integrates multiple data modalities and learning components:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Vision Encoder&lt;/strong>&lt;br>
Histology image patches are encoded into high-dimensional visual representations using a convolutional or transformer-based vision backbone.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Text / Metadata Encoder&lt;/strong>&lt;br>
Sample-level metadata (e.g., tissue type, organ, disease context) is encoded using a lightweight text or embedding model.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Gating Network&lt;/strong>&lt;br>
A gating network jointly considers image and metadata embeddings to infer routing weights over multiple &lt;strong>cancer- or organ-specific expert models&lt;/strong>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Expert Models&lt;/strong>&lt;br>
Each expert specializes in modeling gene expression patterns for a specific biological context (e.g., CCRCC, COAD, LUAD), producing patch-level gene expression predictions.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>By explicitly modeling biological structure through expert specialization, HistoMoE aims to improve both &lt;strong>prediction accuracy&lt;/strong> and &lt;strong>interpretability&lt;/strong>, allowing researchers to understand which biological experts drive each prediction.&lt;/p>
&lt;h3 id="project-objectives">&lt;strong>Project Objectives&lt;/strong>&lt;/h3>
&lt;ol>
&lt;li>&lt;strong>Design and Implement the HistoMoE Framework&lt;/strong>
&lt;ul>
&lt;li>Build a modular MoE architecture with pluggable vision encoders, gating networks, and expert models.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Multimodal Routing and Expert Specialization&lt;/strong>
&lt;ul>
&lt;li>Explore how image features and metadata jointly inform expert selection.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Benchmarking and Evaluation&lt;/strong>
&lt;ul>
&lt;li>Compare HistoMoE against single-model baselines on multiple cancer and organ-specific spatial transcriptomics datasets.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Interpretability Analysis&lt;/strong>
&lt;ul>
&lt;li>Analyze expert routing behavior to reveal biologically meaningful patterns.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;h3 id="project-deliverables">&lt;strong>Project Deliverables&lt;/strong>&lt;/h3>
&lt;ol>
&lt;li>&lt;strong>Open-Source HistoMoE Codebase&lt;/strong>
&lt;ul>
&lt;li>Well-documented Python implementation with training, evaluation, and visualization tools.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Benchmark Results&lt;/strong>
&lt;ul>
&lt;li>Quantitative comparisons demonstrating improvements over non-expert baselines.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Visualization and Analysis Tools&lt;/strong>
&lt;ul>
&lt;li>Tools for inspecting expert usage, routing weights, and gene-level predictions.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Documentation and Tutorials&lt;/strong>
&lt;ul>
&lt;li>Clear instructions and examples to enable adoption by the research community.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;h3 id="impact">&lt;strong>Impact&lt;/strong>&lt;/h3>
&lt;p>HistoMoE introduces an expert-system perspective to histology-based gene expression prediction, bridging morphological and molecular representations through biologically informed specialization. By combining multimodal learning with mixture-of-experts modeling, this project advances the interpretability and accuracy of computational pathology methods and contributes toward scalable, cost-effective alternatives to spatial transcriptomics experiments.&lt;/p></description></item><item><title>StaR: A Stability-Aware Representation Learning Framework for Spatial Domain Identification</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/uci/star/</link><pubDate>Tue, 20 Jan 2026 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/uci/star/</guid><description>&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> spatial transcriptomics, spatial domain identification, representation learning, model robustness&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong>
&lt;ul>
&lt;li>&lt;strong>Programming Languages:&lt;/strong> Python; PyTorch experience preferred&lt;/li>
&lt;li>&lt;strong>Machine Learning:&lt;/strong> representation learning, clustering, robustness and stability analysis&lt;/li>
&lt;li>&lt;strong>Data Analysis:&lt;/strong> spatial transcriptomics preprocessing and evaluation (ARI, clustering metrics)&lt;/li>
&lt;li>&lt;strong>Bioinformatics Knowledge (preferred):&lt;/strong> familiarity with spatial transcriptomics or scRNA-seq data&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Advanced&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ziheng-duan/">Ziheng Duan&lt;/a> (contact person)&lt;/li>
&lt;/ul>
&lt;h3 id="project-idea-description">&lt;strong>Project Idea Description&lt;/strong>&lt;/h3>
&lt;p>Spatial domain identification is a fundamental task in spatial transcriptomics (ST), aiming to partition tissue sections into biologically meaningful regions based on spatially resolved gene expression profiles. These spatial domains often correspond to distinct anatomical structures, cellular compositions, or functional microenvironments, and serve as a critical foundation for downstream biological analysis.&lt;/p>
&lt;p>Despite rapid methodological progress, &lt;strong>most existing spatial domain identification methods are highly sensitive to random initialization&lt;/strong>. In practice, simply changing the random seed can lead to substantially different clustering results and large performance fluctuations, even when using identical hyperparameters and datasets. This instability severely undermines the reliability, reproducibility, and interpretability of spatial transcriptomics analyses.&lt;/p>
&lt;h3 id="problem-seed-sensitivity-and-unstable-representations">&lt;strong>Problem: Seed Sensitivity and Unstable Representations&lt;/strong>&lt;/h3>
&lt;p>Empirical evidence shows that state-of-the-art spatial domain identification models can exhibit substantial performance variance across random seeds. For example, the Adjusted Rand Index (ARI) may vary from relatively strong performance (e.g., ARI ≈ 0.65) to noticeably degraded yet still reasonable outcomes (e.g., ARI ≈ 0.50) solely due to different random initializations.&lt;/p>
&lt;p>By systematically evaluating models across &lt;strong>hundreds to thousands of random seeds&lt;/strong>, we observe that:&lt;/p>
&lt;ul>
&lt;li>Model performance landscapes are highly &lt;strong>rugged&lt;/strong>, with sharp cliffs and isolated high-performing regions.&lt;/li>
&lt;li>Standard training objectives implicitly favor brittle representations that are not robust to small perturbations in initialization or optimization trajectories.&lt;/li>
&lt;/ul>
&lt;p>These observations suggest that instability is not a peripheral issue, but rather a &lt;strong>structural limitation of current representation learning approaches&lt;/strong> for spatial transcriptomics.&lt;/p>
&lt;h3 id="proposed-solution-star">&lt;strong>Proposed Solution: StaR&lt;/strong>&lt;/h3>
&lt;p>This project proposes &lt;strong>StaR&lt;/strong>, a &lt;strong>Stability-Aware Representation Learning framework&lt;/strong> designed to explicitly address seed sensitivity in spatial domain identification.&lt;/p>
&lt;p>The core idea of StaR is to &lt;strong>learn representations that are robust to perturbations in model parameters and training dynamics&lt;/strong>, rather than optimizing solely for peak performance under a single random seed. Concretely, StaR introduces controlled noise or perturbations into the training process and encourages consistency across multiple perturbed model instances, guiding the model toward flatter and more stable regions of the parameter space.&lt;/p>
&lt;p>By prioritizing stability during representation learning, StaR aims to produce embeddings that:&lt;/p>
&lt;ul>
&lt;li>Yield consistent spatial domain assignments across random seeds&lt;/li>
&lt;li>Maintain competitive or improved clustering accuracy&lt;/li>
&lt;li>Better reflect underlying biological structure&lt;/li>
&lt;/ul>
&lt;h3 id="project-objectives">&lt;strong>Project Objectives&lt;/strong>&lt;/h3>
&lt;ol>
&lt;li>&lt;strong>Characterize Instability in Existing Methods&lt;/strong>
&lt;ul>
&lt;li>Systematically quantify seed sensitivity across popular spatial domain identification models.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Develop Stability-Aware Training Objectives&lt;/strong>
&lt;ul>
&lt;li>Design perturbation-based or consistency-driven losses that encourage robust representations.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Integrate StaR into Existing Pipelines&lt;/strong>
&lt;ul>
&lt;li>Apply StaR to widely used spatial transcriptomics workflows with minimal architectural changes.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Evaluation and Benchmarking&lt;/strong>
&lt;ul>
&lt;li>Evaluate StaR using clustering metrics (e.g., ARI) and stability metrics across multiple datasets and random seeds.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Biological Validation&lt;/strong>
&lt;ul>
&lt;li>Assess whether stability-aware representations preserve biologically meaningful spatial patterns.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;h3 id="project-deliverables">&lt;strong>Project Deliverables&lt;/strong>&lt;/h3>
&lt;ol>
&lt;li>&lt;strong>StaR Framework Implementation&lt;/strong>
&lt;ul>
&lt;li>An open-source Python implementation compatible with common spatial transcriptomics toolchains.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Stability Benchmarks&lt;/strong>
&lt;ul>
&lt;li>Comprehensive evaluations demonstrating reduced performance variance across seeds.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Visualization Tools&lt;/strong>
&lt;ul>
&lt;li>Tools for visualizing performance landscapes, stability surfaces, and spatial domain consistency.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Documentation and Tutorials&lt;/strong>
&lt;ul>
&lt;li>Clear examples enabling researchers to adopt StaR in their own analyses.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;h3 id="impact">&lt;strong>Impact&lt;/strong>&lt;/h3>
&lt;p>StaR addresses a critical yet underexplored challenge in spatial transcriptomics: &lt;strong>model instability and poor reproducibility&lt;/strong>. By shifting the focus from single-run performance to stability-aware representation learning, this project improves the reliability and trustworthiness of spatial domain identification methods. StaR has the potential to become a foundational component in robust spatial transcriptomics pipelines and to inspire broader adoption of stability-aware principles in biological representation learning.&lt;/p></description></item><item><title>MedJEPA: Self-Supervised Medical Image Representation Learning with JEPA</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/nelbl/medjepa/</link><pubDate>Mon, 19 Jan 2026 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/nelbl/medjepa/</guid><description>&lt;h3 id="project-description">Project Description&lt;/h3>
&lt;p>[MedJEPA] Medical image analysis is fundamental to modern healthcare, enabling disease diagnosis, treatment planning, and patient monitoring across diverse clinical applications. In radiology and pathology, deep learning models support automated detection of abnormalities, tumor segmentation, and diagnostic assistance. Medical imaging modalities including X-rays, CT scans, MRI, ultrasound, and histopathology slides generate vast amounts of unlabeled data that could benefit from self-supervised representation learning. Clinical applications include cancer detection and staging, cardiovascular disease assessment, neurological disorder diagnosis, and infectious disease screening. In drug discovery and clinical research, analyzing medical images helps evaluate treatment efficacy, predict patient outcomes, and identify biomarkers for disease progression. Telemedicine and point-of-care diagnostics benefit from AI-powered image analysis that extends expert-level interpretation to underserved regions. However, medical imaging faces unique challenges: limited labeled datasets due to expensive expert annotation, patient privacy concerns restricting data sharing, domain shift across different imaging equipment and protocols, and the need for models that generalize across hospitals and populations.
Traditional medical image analysis relies heavily on supervised learning with manually annotated labels, creating bottlenecks due to the scarcity and cost of expert annotations. Existing self-supervised methods applied to medical imaging often employ complex training procedures with numerous heuristics—momentum encoders, stop-gradients, teacher-student architectures, and carefully tuned augmentation strategies—that may not translate well across different medical imaging modalities and clinical contexts. These approaches struggle with domain-specific challenges such as subtle pathological features, high-resolution images, 3D volumetric data, and the need for interpretable representations that clinicians can trust. To address these challenges, we propose MedicalJEPA: Self-Supervised Medical Image Representation Learning with Joint-Embedding Predictive Architecture, which leverages the theoretically grounded LeJEPA framework for 2D medical images and V-JEPA principles for medical video and volumetric data, creating a unified, scalable, and heuristics-free approach specifically tailored for medical imaging applications.
By utilizing the principled JEPA frameworks with objectives like Sketched Isotropic Gaussian Regularization (SIGReg), MedJEPA eliminates complex training heuristics while learning clinically meaningful representations from unlabeled medical images. Unlike conventional self-supervised methods that require extensive hyperparameter tuning and may not generalize across medical imaging modalities, MedicalJEPA provides a clean, theoretically motivated framework with minimal hyperparameters that adapts to diverse medical imaging contexts—from chest X-rays to histopathology slides to cardiac MRI sequences. The learned representations can support downstream tasks including disease classification, lesion detection, organ segmentation, and survival prediction, while requiring significantly fewer labeled examples for fine-tuning. This approach democratizes access to state-of-the-art medical AI by enabling effective learning from the vast amounts of unlabeled medical imaging data available in hospital archives, addressing the annotation bottleneck that has limited progress in medical AI.&lt;/p>
&lt;h3 id="project-objectives">Project Objectives&lt;/h3>
&lt;p>Aligned with the vision of the 2026 Open Source Research Experience (OSRE), this project aims to apply Joint-Embedding Predictive Architecture (JEPA) frameworks to medical image representation learning, addressing the critical challenge of learning from limited labeled medical data. Medical imaging generates enormous amounts of unlabeled data, but supervised learning approaches are bottlenecked by the scarcity and cost of expert annotations. Existing self-supervised methods often rely on complex heuristics that don&amp;rsquo;t generalize well across diverse medical imaging modalities, equipment vendors, and clinical protocols.
This project will leverage the theoretically grounded LeJEPA framework for 2D medical images (X-rays, histopathology slides, fundus images) and V-JEPA principles for temporal and volumetric medical data (cardiac MRI sequences, CT scans, surgical videos). The core challenge lies in adapting these heuristics-free, stable frameworks to medical imaging&amp;rsquo;s unique characteristics: subtle pathological features requiring fine-grained representations, high-resolution images demanding efficient processing, domain shift across hospitals and equipment, and the need for interpretable features that support clinical decision-making. The learned representations will be evaluated on diverse downstream clinical tasks including disease classification, lesion detection, organ segmentation, and prognosis prediction, with emphasis on few-shot learning scenarios that reflect real-world annotation constraints. Below is an outline of the methodologies and models that will be developed in this project.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Step 1: Medical Data Preparation&lt;/strong>:
Develop data processing pipelines for diverse medical imaging modalities, implementing DICOM/NIfTI parsing, standardized preprocessing, and efficient data loading for self-supervised pre-training.
Prepare 2D medical image datasets:
Chest X-rays: ChestX-ray14, MIMIC-CXR, CheXpert for lung disease detection
Histopathology: Camelyon16/17 (breast cancer), PCam (patch-level classification)
Retinal imaging: EyePACS, APTOS (diabetic retinopathy), Messidor
Dermatology: HAM10000, ISIC (skin lesion classification)
Prepare 3D volumetric and temporal medical data:
CT scans: LIDC-IDRI (lung nodules), Medical Segmentation Decathlon datasets
MRI sequences: BraTS (brain tumors), ACDC (cardiac MRI), UK Biobank cardiac videos
Medical video: Surgical procedure videos, endoscopy recordings, ultrasound sequences
Implement medical imaging-specific preprocessing: intensity normalization, resolution standardization, handling of multi-channel medical images (different MRI sequences, RGB histopathology), and privacy-preserving anonymization.
Design masking strategies appropriate for medical imaging: spatial masking for 2D images, volumetric masking for 3D scans, temporal masking for sequences, and anatomy-aware masking that respects organ boundaries.
Create data loaders supporting high-resolution medical images, 3D volumes, and multi-modal inputs (e.g., multiple MRI sequences).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Step 2: JEPA Model Implementation for Medical Imaging&lt;/strong>:
Implement LeJEPA for 2D medical images:
Adapt joint-embedding predictive architecture for medical image characteristics (high resolution, subtle features, domain-specific patterns)
Apply Sketched Isotropic Gaussian Regularization (SIGReg) to learn clinically meaningful embedding distributions
Maintain single trade-off hyperparameter and heuristics-free training for reproducibility across medical imaging centers
Support various encoder architectures: Vision Transformers for global context, ConvNets for local features, hybrid approaches
Extend to V-JEPA for medical video and volumetric data:
Spatiotemporal encoding for cardiac MRI sequences, surgical videos, and time-series medical imaging
Temporal prediction objectives for understanding disease progression and treatment response
3D volume processing for CT and MRI scans with efficient memory management
Multi-slice and multi-sequence learning for comprehensive medical imaging contexts
Develop medical domain-specific enhancements:
Multi-scale representation learning to capture both fine-grained pathological details and global anatomical context
Interpretability mechanisms: attention visualization, feature attribution, and embedding space analysis for clinical validation
Robustness to domain shift: training strategies that generalize across different scanners, protocols, and institutions
Privacy-preserving training considerations compatible with medical data regulations (HIPAA, GDPR)
Implement efficient training infrastructure:
Support for distributed training across multiple GPUs for large medical imaging datasets
Memory-efficient processing of high-resolution images and 3D volumes
Checkpoint management and model versioning for clinical deployment pipelines
Minimal-code implementation (≈50-100 lines) demonstrating framework simplicity&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Step 3: Evaluation &amp;amp; Safety Validation&lt;/strong>: :
Disease Classification Tasks:
Multi-label chest X-ray classification: 14 pathology classes on ChestX-ray14, MIMIC-CXR
Diabetic retinopathy grading: 5-class classification on EyePACS, APTOS
Skin lesion classification: 7-class classification on HAM10000
Brain tumor classification: glioma grading on BraTS dataset
Evaluate with linear probing, few-shot learning (5-shot, 10-shot), and full fine-tuning
Lesion Detection and Segmentation:
Lung nodule detection on LIDC-IDRI dataset
Tumor segmentation on Medical Segmentation Decathlon tasks
Polyp detection in colonoscopy videos
Cardiac structure segmentation in MRI sequences
Clinical Prediction Tasks:
Survival prediction from histopathology slides
Disease progression prediction from longitudinal imaging
Treatment response assessment from pre/post imaging pairs
Few-Shot and Low-Data Regime Evaluation:
Systematic evaluation with 1%, 5%, 10%, 25%, 50% of labeled training data
Comparison against supervised baselines and ImageNet pre-training
Analysis of annotation efficiency: performance vs. number of labeled examples required&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="project-deliverables">Project Deliverables&lt;/h3>
&lt;p>This project will deliver three components: software implementation, clinical evaluation, and practical deployment resources. The software implementing MedicalJEPA will be hosted on GitHub as an open-access repository with modular code supporting multiple medical imaging modalities (2D images, 3D volumes, videos), pre-trained model checkpoints on major medical imaging datasets (chest X-rays, histopathology, MRI), training and evaluation scripts with medical imaging-specific preprocessing pipelines, privacy-preserving training implementations compatible with clinical data regulations, and comprehensive documentation including tutorials for medical AI researchers and clinicians. The evaluation results will include benchmarks on 10+ medical imaging datasets across diverse modalities and clinical tasks, few-shot learning analysis demonstrating annotation efficiency gains, cross-institutional validation studies showing robustness to domain shift, interpretability visualizations enabling clinical validation of learned representations, and detailed comparisons against supervised baselines and existing medical self-supervised methods. .&lt;/p>
&lt;h3 id="neurohealth">NeuroHealth&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: Self-Supervised Medical Image Representation Learning with JEPA&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Proficiency in Python, Pytorch, Github, JEPA&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Difficult&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/bin-dong/">Bin Dong&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/linsey-pang/">Linsey Pang&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="references">References:&lt;/h3>
&lt;ul>
&lt;li>LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics - Randall Balestriero and Yann LeCun, arXiv 2024&lt;/li>
&lt;li>Revisiting Feature Prediction for Learning Visual Representations from Video (V-JEPA) - Adrien Bardes et al., arXiv 2024&lt;/li>
&lt;li>Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture - Mahmoud Assran et al., CVPR 2023 (I-JEPA)&lt;/li>
&lt;li>ChestX-ray14: Hospital-Scale Chest X-Ray Database - &lt;a href="https://nihcc.app.box.com/v/ChestXray-NIHCC" target="_blank" rel="noopener">https://nihcc.app.box.com/v/ChestXray-NIHCC&lt;/a>&lt;/li>
&lt;li>Medical Segmentation Decathlon - &lt;a href="http://medicaldecathlon.com/" target="_blank" rel="noopener">http://medicaldecathlon.com/&lt;/a>&lt;/li>
&lt;li>MIMIC-CXR Database - &lt;a href="https://physionet.org/content/mimic-cxr/" target="_blank" rel="noopener">https://physionet.org/content/mimic-cxr/&lt;/a>&lt;/li>
&lt;li>The Cancer Imaging Archive (TCIA) - &lt;a href="https://www.cancerimagingarchive.net/" target="_blank" rel="noopener">https://www.cancerimagingarchive.net/&lt;/a>&lt;/li>
&lt;li>UK Biobank Imaging Study - &lt;a href="https://www.ukbiobank.ac.uk/enable-your-research/about-our-data/imaging-data" target="_blank" rel="noopener">https://www.ukbiobank.ac.uk/enable-your-research/about-our-data/imaging-data&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>NeuroHealth: AI-Powered Health Assistant</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/nelbl/neurohealth/</link><pubDate>Mon, 19 Jan 2026 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/nelbl/neurohealth/</guid><description>&lt;h3 id="project-description">Project Description&lt;/h3>
&lt;p>[NeuroHealth] Intelligent health assistance systems are increasingly essential for improving healthcare accessibility, patient engagement, and clinical decision support. In primary care and preventive medicine, AI assistants help users understand symptoms, schedule appropriate appointments, and receive preliminary health guidance. Telemedicine applications include triage support, appointment scheduling optimization, and patient education based on health inquiries. In chronic disease management, these systems provide medication reminders, lifestyle recommendations, and timely alerts for medical follow-ups. Healthcare navigation applications include finding appropriate specialists, understanding treatment options, and coordinating care across multiple providers. In wellness and preventive care, intelligent assistants enhance health literacy by delivering personalized health information, screening recommendations, and proactive health management strategies. By leveraging natural language understanding and medical knowledge integration, these systems enhance healthcare access, reduce unnecessary emergency visits, and empower users to make informed health decisions across diverse populations.
Traditional health information systems often provide generic responses that fail to account for individual health contexts, medical history, and personal circumstances. Existing symptom checkers and health chatbots primarily rely on rule-based logic or simple decision trees, limiting their ability to understand nuanced health inquiries, reason about complex symptom patterns, or provide contextually appropriate guidance. These systems struggle with interpreting ambiguous descriptions, adapting to users&amp;rsquo; health literacy levels, and generating personalized recommendations that account for individual medical constraints and preferences. To address these challenges, we propose NeuroHealth: AI-Powered Health Assistant, which leverages Large Language Models (LLMs) to create an intelligent conversational agent that synthesizes user health inquiries, symptom descriptions, and contextual information into actionable, personalized health guidance and appointment recommendations.
By integrating LLM-based medical reasoning with structured clinical knowledge bases, NeuroHealth enhances symptom interpretation, appointment routing, and health education delivery. Unlike conventional systems that provide static responses from predetermined templates, NeuroHealth dynamically understands user intent, asks clarifying questions, assesses urgency levels, and generates appropriate recommendations—whether scheduling a doctor appointment, suggesting self-care measures, or directing users to emergency services. This fusion of LLM intelligence with validated medical knowledge enables a more accessible, adaptive, and helpful health assistance platform, bridging the gap between users seeking health information and appropriate medical care.&lt;/p>
&lt;h3 id="project-objectives">Project Objectives&lt;/h3>
&lt;p>Aligned with the vision of the 2026 Open Source Research Experience (OSRE), this project aims to develop an AI-Powered Health Assistant (NeuroHealth) to improve healthcare accessibility and patient engagement through intelligent conversational guidance. Healthcare systems face significant challenges in providing timely, personalized health information and connecting patients with appropriate care resources. Traditional symptom checkers and health information systems often deliver generic, rule-based responses that fail to account for individual contexts and struggle with natural language understanding.
To address these limitations, this project will leverage Large Language Models (LLMs) to create an intelligent health assistant that understands user health inquiries, interprets symptom descriptions, assesses urgency, and provides personalized recommendations including doctor appointment suggestions, self-care guidance, and healthcare navigation support. The core challenge lies in designing NeuroHealth as a safe, accurate, and user-friendly system capable of natural conversation, medical knowledge retrieval, and appropriate response generation while maintaining clinical safety guardrails. Unlike conventional health chatbots that follow rigid conversation flows, NeuroHealth will reason over user inputs, ask clarifying questions, and dynamically adapt responses based on context, resulting in more helpful, accurate, and appropriate health assistance. Below is an outline of the methodologies and models that will be developed in this project.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Step 1: Data Collection &amp;amp; Knowledge Base Construction&lt;/strong>:
Develop a comprehensive medical knowledge base integrating validated health information sources, symptom databases, condition descriptions, and appointment routing guidelines.
Collect and curate conversational health inquiry datasets from public medical Q&amp;amp;A forums, symptom checker logs, and healthcare chatbot interactions to create training and evaluation data.
Design structured representations for symptoms, conditions, urgency levels, and appointment recommendations to enable effective retrieval and reasoning.
Extract common health inquiry patterns, symptom descriptions, and user intent categories to inform conversation flow design.
Data sources can include public medical knowledge bases such as MedlinePlus, Mayo Clinic health information, clinical practice guidelines, and synthetic patient inquiry scenarios based on common healthcare use cases.
Implement data validation mechanisms to ensure medical accuracy and clinical safety compliance.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Step 2: Model Development&lt;/strong>:
Design and implement an LLM-based conversational health assistant that integrates medical knowledge retrieval with natural language understanding and generation.
Develop a Retrieval-Augmented Generation (RAG) architecture that grounds LLM responses in validated medical information sources, reducing hallucination risks and ensuring factual accuracy.
Create prompt engineering strategies and reasoning frameworks that enable the system to: interpret symptom descriptions, assess urgency levels, ask appropriate clarifying questions, and generate personalized health guidance.
Implement a multi-component architecture including: intent recognition, symptom extraction, urgency assessment, appointment recommendation generation, and response formatting modules.
Develop clinical safety guardrails that detect high-risk scenarios requiring immediate medical attention and provide appropriate emergency guidance.
Design conversation management strategies that maintain context across multi-turn dialogues and adapt to users&amp;rsquo; health literacy levels.
The baseline architecture can leverage state-of-the-art models such as GPT-4, Claude, or open-source alternatives like Llama, Qwen, combined with medical knowledge retrieval systems.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Step 3: Evaluation &amp;amp; Safety Validation&lt;/strong>: :
Benchmark NeuroHealth against existing symptom checkers and health chatbots, evaluating on metrics including response accuracy, appropriateness of appointment recommendations, urgency assessment precision, and user satisfaction.
Conduct human evaluation studies with healthcare professionals to assess clinical safety, response quality, and appropriateness of medical guidance.
Perform adversarial testing to identify potential failure modes, unsafe responses, or inappropriate recommendations under edge cases.
Conduct ablation studies to analyze the impact of retrieval-augmented generation, safety guardrails, and conversation management strategies on system performance.
Evaluate system performance across diverse health inquiry types including acute symptoms, chronic condition management, preventive care questions, and healthcare navigation requests.
Assess response quality across different user demographics and health literacy levels to ensure equitable access.
Optimize inference efficiency and response latency for real-time conversational interaction across web and mobile platforms.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="project-deliverables">Project Deliverables&lt;/h3>
&lt;p>This project will deliver three components: model development, evaluation and validation, and interactive demonstration. The software implementing the NeuroHealth system will be hosted on GitHub as an open-access repository with comprehensive documentation, deployment guides, and API specifications. The evaluation results, including benchmark comparisons against existing systems, clinical safety assessments, and user study findings, will be published alongside the GitHub repository. An interactive demo showcasing the conversational interface, symptom interpretation capabilities, and appointment recommendation generation will be provided to illustrate real-world application scenarios.&lt;/p>
&lt;h3 id="neurohealth">NeuroHealth&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: AI-Powered Health Assistant&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Proficiency in Python, Github, LLM&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Difficult&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/linsey-pang/">Linsey Pang&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/bin-dong/">Bin Dong&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="references">References:&lt;/h3>
&lt;ul>
&lt;li>Large Language Models in Healthcare - Singhal et al., Nature 2023&lt;/li>
&lt;li>Med-PaLM: Large Language Models for Medical Question Answering - Singhal et al., arXiv 2022&lt;/li>
&lt;li>Capabilities of GPT-4 on Medical Challenge Problems - Nori et al., arXiv 2023&lt;/li>
&lt;li>MedlinePlus Medical Encyclopedia - &lt;a href="https://medlineplus.gov/" target="_blank" rel="noopener">https://medlineplus.gov/&lt;/a>&lt;/li>
&lt;li>Clinical Practice Guidelines Database - &lt;a href="https://www.guidelines.gov/" target="_blank" rel="noopener">https://www.guidelines.gov/&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>LMS Toolkit</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/ucsc/lms-toolkit/</link><pubDate>Tue, 13 Jan 2026 13:00:00 -0800</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/ucsc/lms-toolkit/</guid><description>&lt;p>The &lt;a href="https://github.com/edulinq/lms-toolkit" target="_blank" rel="noopener">EduLinq LMS Toolkit&lt;/a> is a suite of tools used by several courses at UCSC
to interact with LMS&amp;rsquo;s (e.g. Canvas) from the command line or Python.
A &lt;a href="https://en.wikipedia.org/wiki/Learning_management_system" target="_blank" rel="noopener">Learning Management System&lt;/a> (LMS) is a system that institutions use to manage courses, assignments, students, and grades.
The most popular LMSs are
&lt;a href="https://en.wikipedia.org/wiki/Instructure#Canvas" target="_blank" rel="noopener">Canvas&lt;/a>,
&lt;a href="https://en.wikipedia.org/wiki/Blackboard_Learn" target="_blank" rel="noopener">Blackboard&lt;/a>,
&lt;a href="https://en.wikipedia.org/wiki/Moodle" target="_blank" rel="noopener">Moodle&lt;/a>,
and &lt;a href="https://en.wikipedia.org/wiki/D2L#Brightspace" target="_blank" rel="noopener">Brightspace&lt;/a>.
These tools can be very helpful, especially from an administrative standpoint, but can be hard to interact with.
They can be especially difficult when instructors and TAs want to do something that is not explicitly supported by their built-in GUIs
(e.g., when an instructor wants to use a special grading policy).
The LMS Toolkit project is an effort to create a single suite of command-line tools (along with a Python interface)
to connect to all the above mentioned LMSs in a simple and uniform way.
So, not only can instructors and TAs easily access the modify the data held in an LMS (like a student&amp;rsquo;s grades),
but they can also do it the same way on any LMS.
The &lt;a href="https://linqs.org" target="_blank" rel="noopener">LINQS Lab&lt;/a> has made many contributions to the maintain and improve the Quiz Composer.&lt;/p>
&lt;p>Currently, the LMS Toolkit supports Canvas, Moodle, and Blackboard.
But, the degree of support for each LMS varies.&lt;/p>
&lt;p>All students interested in LINQS projects for OSRE/GSoC 2026 should fill out &lt;a href="https://forms.gle/Mr4YR3N35pWDb4uz7" target="_blank" rel="noopener">this form&lt;/a>.
Towards the end of the application window, we will contact those who we believe to be a good fit for a LINQS project.
The form will stop accepting responses once the application window closes.
Do not post on any of the project repositories about OSRE/GSoC
(e.g., comment on an issue that you want to tackle it as a part of OSRE/GSoC 2026).
Remember, these are active repositories that were not created for OSRE/GSoC.&lt;/p>
&lt;h3 id="advanced-lms-support">Advanced LMS Support&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Backend&lt;/code> &lt;code>Teaching Tools&lt;/code> &lt;code>API&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> software development, backend, rest api, data munging, http request inspection, python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:linqs.osre26@gmail.com">Eriq Augustine&lt;/a>, &lt;a href="mailto:linqs.osre26@gmail.com">Batuhan Salih&lt;/a>, &lt;a href="mailto:linqs.osre26@gmail.com">Lise Getoor&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The LMS Toolkit already has basic read-write support for many core pieces of LMS functionality (e.g., working with grades and assignments).
However, there are still many more features that can be supported such as
&lt;a href="https://github.com/edulinq/lms-toolkit/issues/17" target="_blank" rel="noopener">group management&lt;/a>,
&lt;a href="https://github.com/edulinq/lms-toolkit/issues/7" target="_blank" rel="noopener">quiz management&lt;/a>,
&lt;a href="https://github.com/edulinq/lms-toolkit/issues/10" target="_blank" rel="noopener">quiz statistics&lt;/a>,
and &lt;a href="https://github.com/edulinq/lms-toolkit/issues/19" target="_blank" rel="noopener">assignment statuses&lt;/a>.&lt;/p>
&lt;p>The task for this project is to choose a set of advanced features
(not limited to those features mentioned above),
design an LMS-agnostic way to support those features,
and implement those features.
The flexibility in the features chosen to implement account for the variable size of this project.&lt;/p>
&lt;p>See Also:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/edulinq/lms-toolkit" target="_blank" rel="noopener">Repository for LMS Toolkit&lt;/a>&lt;/li>
&lt;li>GitHub Issues
&lt;ul>
&lt;li>&lt;a href="https://github.com/edulinq/lms-toolkit/issues/17" target="_blank" rel="noopener">Group Management&lt;/a>,&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/lms-toolkit/issues/7" target="_blank" rel="noopener">Quiz Management&lt;/a>,&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/lms-toolkit/issues/10" target="_blank" rel="noopener">Quiz Statistics&lt;/a>,&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/lms-toolkit/issues/19" target="_blank" rel="noopener">Assignment Statuses&lt;/a>.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="new-lms-support-brightspace">New LMS Support: Brightspace&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Backend&lt;/code> &lt;code>Teaching Tools&lt;/code> &lt;code>API&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> software development, backend, rest api, data munging, http request inspection, python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:linqs.osre26@gmail.com">Eriq Augustine&lt;/a>, &lt;a href="mailto:linqs.osre26@gmail.com">Batuhan Salih&lt;/a>, &lt;a href="mailto:linqs.osre26@gmail.com">Lise Getoor&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The goal of the LMS toolkit is to provide a single interface for all LMSs.
&lt;a href="https://en.wikipedia.org/wiki/D2L#Brightspace" target="_blank" rel="noopener">D2L Brightspace&lt;/a> is one of the more popular LMSs.
Naturally, the LMS Toolkit wants to support Brightspace as well.
However, a challenge in supporting Brightspace is that it is not open source (unlike Canvas and Moodle).
Therefore, support and testing on Brightspace may be very challenging.&lt;/p>
&lt;p>The task for this project is to add basic support for the Brightspace LMS.
It is not necessary to support all the same features that are supported for other LMSs,
but at least the core features of score and assignment management should be implemented.
The closed-source nature of Brightspace makes this a challenging and uncertain project.&lt;/p>
&lt;p>See Also:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/edulinq/lms-toolkit" target="_blank" rel="noopener">Repository for LMS Toolkit&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://en.wikipedia.org/wiki/D2L#Brightspace" target="_blank" rel="noopener">Brightspace Wiki Page&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/lms-toolkit/issues/23" target="_blank" rel="noopener">GitHub Issue&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Lynx Grader</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/ucsc/autograder/</link><pubDate>Tue, 13 Jan 2026 13:00:00 -0800</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/ucsc/autograder/</guid><description>&lt;p>The &lt;a href="https://github.com/edulinq/autograder-server" target="_blank" rel="noopener">EduLinq Lynx Grader&lt;/a> (also referred to as &amp;ldquo;autograder&amp;rdquo;) is an open source tool used by several courses at UCSC
to safely and quickly grade programming assignments.
Grading student code is something that may seem simple at first (you just need to run their code!),
but quickly becomes exceeding complex as you get more into the details.
Specifically, grading a student&amp;rsquo;s code securely while providing the &amp;ldquo;last mile&amp;rdquo; service of getting code from students
and sending results to instructors/TAs and the course&amp;rsquo;s LMS (e.g., Canvas) can be very difficult.
The Lynx Grader provides all of this in a free and open source project.
The &lt;a href="https://linqs.org" target="_blank" rel="noopener">LINQS Lab&lt;/a> has made many contributions to the maintain and improve the Lynx Grader.&lt;/p>
&lt;p>As an open source project, there are endless opportunities for development, improvements, and collaboration.
Here, we highlight some specific projects that will work well in the summer mentorship setting.&lt;/p>
&lt;p>All students interested in LINQS projects for OSRE/GSoC 2026 should fill out &lt;a href="https://forms.gle/Mr4YR3N35pWDb4uz7" target="_blank" rel="noopener">this form&lt;/a>.
Towards the end of the application window, we will contact those who we believe to be a good fit for a LINQS project.
The form will stop accepting responses once the application window closes.
Do not post on any of the project repositories about OSRE/GSoC
(e.g., comment on an issue that you want to tackle it as a part of OSRE/GSoC 2026).
Remember, these are active repositories that were not created for OSRE/GSoC.&lt;/p>
&lt;h3 id="llm-detection">LLM Detection&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>AI/ML&lt;/code> &lt;code>LLM&lt;/code> &lt;code>Research&lt;/code> &lt;code>Backend&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> software development, backend, systems, data munging, go, docker&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:linqs.osre26@gmail.com">Eriq Augustine&lt;/a>, &lt;a href="mailto:linqs.osre26@gmail.com">Fabrice Kurmann&lt;/a>, &lt;a href="mailto:linqs.osre26@gmail.com">Lise Getoor&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>As &lt;a href="https://en.wikipedia.org/wiki/Large_language_model" target="_blank" rel="noopener">Large Language Model (LLM)&lt;/a> tools like ChatGPT become more common and powerful,
instructors need tools to help determine if students are the actual authors of the code they submit.
More classical instances of plagiarism are often discovered by code similarity tools like &lt;a href="https://theory.stanford.edu/~aiken/moss/" target="_blank" rel="noopener">MOSS&lt;/a>.
However these tools are not sufficient for detecting code written not by a student,
but by an AI model like &lt;a href="https://en.wikipedia.org/wiki/ChatGPT" target="_blank" rel="noopener">ChatGPT&lt;/a> or &lt;a href="https://en.wikipedia.org/wiki/GitHub_Copilot" target="_blank" rel="noopener">GitHub Copilot&lt;/a>.&lt;/p>
&lt;p>The task for this project is to create a system that provides a score indicating the system&amp;rsquo;s confidence that a given piece of code was written by an AI tool and not a student.
This will supplement the existing code analysis tools in the Lynx Grader.
There are many approaches to completing this task that will be considered.
A more software development approach can consist of levering exiting systems to create a production-ready system,
whereas a more research approach can consist of creating a novel approach complete with a paper and experiments.&lt;/p>
&lt;p>There has been &lt;a href="https://github.com/anvichip/AI-code-detection-ML/blob/main/experiment/report.md" target="_blank" rel="noopener">previous work on this issue&lt;/a>,
where a student did a survey of existing solutions, collection of initial datasets, and exploratory experiments on possible directions.
This project would build off of this previous work.&lt;/p>
&lt;p>See Also:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/edulinq/autograder-server" target="_blank" rel="noopener">Repository for Lynx Grader Server&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/autograder-server/issues/140" target="_blank" rel="noopener">GitHub Issue&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="code-analysis-gui">Code Analysis GUI&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Frontend&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> software development, frontend, data munging, js, css, go&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Easy&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:linqs.osre26@gmail.com">Eriq Augustine&lt;/a>, &lt;a href="mailto:linqs.osre26@gmail.com">Fabrice Kurmann&lt;/a>, &lt;a href="mailto:linqs.osre26@gmail.com">Lise Getoor&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The Lynx Grader has existing functionality to analyze the code in a student&amp;rsquo;s submission for malicious content.
Relevant to this project is that the Lynx Grader can run a pairwise similarity analysis against all submitted code.
This is how most existing software plagiarism systems detect offending code.
The existing infrastructure provides detailed statistics on code similarity,
but does not currently have a visual way to display this data.&lt;/p>
&lt;p>The task for this project is to create a web GUI using the Lynx Grader REST API
to display the results of a code analysis.
The size of this project depends on how many of the existing features are going to be supported by the web GUI.&lt;/p>
&lt;p>See Also:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/edulinq/autograder-web" target="_blank" rel="noopener">Repository for Lynx Grader Web GUI&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/autograder-server/issues/142" target="_blank" rel="noopener">GitHub Issue&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/autograder-server/blob/main/internal/model/analysis.go#L78" target="_blank" rel="noopener">Pairwise Code Analysis Type&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/autograder-py/blob/v0.6.16/tests/api/testdata/courses/assignments/analysis/courses_assignments_submissions_analysis_pairwise_wait.json" target="_blank" rel="noopener">Sample API Data&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="web-gui">Web GUI&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Frontend&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> software development, frontend, js, css&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Easy&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:linqs.osre26@gmail.com">Eriq Augustine&lt;/a>, &lt;a href="mailto:linqs.osre26@gmail.com">Fabrice Kurmann&lt;/a>, &lt;a href="mailto:linqs.osre26@gmail.com">Lise Getoor&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The Lynx Grader contains dozens of &lt;a href="https://github.com/edulinq/autograder-server/blob/main/resources/api.json" target="_blank" rel="noopener">API endpoints&lt;/a>,
most directly representing a piece of functionality exposed to the user.
All of these features are exposed in the &lt;a href="https://github.com/edulinq/autograder-py" target="_blank" rel="noopener">Lynx Grader&amp;rsquo;s Python Interface&lt;/a>.
However, the Python interface is a purely command-line interface.
And although command-line interface are objectively (read: subjectively) the best,
a web GUI would be more accessible to a wider audience.
The autograder already has a &lt;a href="https://github.com/edulinq/autograder-web" target="_blank" rel="noopener">web GUI&lt;/a>,
but it does not cover all the features available in the Lynx Grader.&lt;/p>
&lt;p>The task for this project is to augment the Lynx Grader&amp;rsquo;s web GUI with more features.
Specifically, add support for more tools used to create and administer courses.&lt;/p>
&lt;p>See Also:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/edulinq/autograder-web" target="_blank" rel="noopener">Repository for Lynx Grader Web GUI&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/autograder-server/issues/61" target="_blank" rel="noopener">GitHub Issue&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/autograder-server/blob/main/resources/api.json" target="_blank" rel="noopener">Lynx Grader API Endpoints&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/autograder-py" target="_blank" rel="noopener">Lynx Grader&amp;rsquo;s Python Interface&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Quiz Composer</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/ucsc/quiz-composer/</link><pubDate>Tue, 13 Jan 2026 13:00:00 -0800</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre26/ucsc/quiz-composer/</guid><description>&lt;p>The &lt;a href="https://github.com/edulinq/quiz-composer" target="_blank" rel="noopener">EduLinq Quiz Composer&lt;/a> (also called the &amp;ldquo;Quiz Generator&amp;rdquo;) is a tool used by several courses at UCSC
to create and maintain platform-agnostic quizzes (including exams and worksheets).
Knowledge assessments like quizzes, exams, and tests are a core part of the learning process for many courses.
However maintaining banks of questions, collaborating on new questions, and converting quizzes to new formats can use up a lot of time,
taking time away from actually working on improving course materials.
The Quiz Composer helps by providing a single text-based format that can be stored in a repository and &amp;ldquo;compiled&amp;rdquo; into many different formats including:
HTML, LaTeX, PDF, Canvas, GradeScope, and QTI.
The &lt;a href="https://linqs.org" target="_blank" rel="noopener">LINQS Lab&lt;/a> has made many contributions to the maintain and improve the Quiz Composer.&lt;/p>
&lt;p>As an open source project, there are endless opportunities for development, improvements, and collaboration.
Here, we highlight some specific projects that will work well in the summer mentorship setting.&lt;/p>
&lt;p>All students interested in LINQS projects for OSRE/GSoC 2026 should fill out &lt;a href="https://forms.gle/Mr4YR3N35pWDb4uz7" target="_blank" rel="noopener">this form&lt;/a>.
Towards the end of the application window, we will contact those who we believe to be a good fit for a LINQS project.
The form will stop accepting responses once the application window closes.
Do not post on any of the project repositories about OSRE/GSoC
(e.g., comment on an issue that you want to tackle it as a part of OSRE/GSoC 2026).
Remember, these are active repositories that were not created for OSRE/GSoC.&lt;/p>
&lt;h3 id="canvas-import">Canvas Import&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Backend&lt;/code> &lt;code>Teaching Tools&lt;/code> &lt;code>API&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> software development, backend, rest api, data munging, http request inspection, python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium (175 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:linqs.osre26@gmail.com">Eriq Augustine&lt;/a>, &lt;a href="mailto:linqs.osre26@gmail.com">Lucas Ellenberger&lt;/a>, &lt;a href="mailto:linqs.osre26@gmail.com">Lise Getoor&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The Quiz Composer houses quizzes and quiz questions in a simple and unambiguous format based
on &lt;a href="https://en.wikipedia.org/wiki/JSON" target="_blank" rel="noopener">JSON&lt;/a> and &lt;a href="https://en.wikipedia.org/wiki/Markdown" target="_blank" rel="noopener">Markdown&lt;/a> (specifically, the &lt;a href="https://commonmark.org" target="_blank" rel="noopener">CommonMark specification&lt;/a>).
This allows the Quiz Composer to unambiguously create versions of the same quiz in many different formats.
However, creating a quiz in the Quiz Composer format can be a daunting task for those not familiar with JSON or Markdown.
Instead, it would be easier for people to import quizzes from another format into the Quiz Composer format,
and then edit it as they see fit.
Unfortunately not all other quiz formats, namely Canvas in this case, are unambiguous.&lt;/p>
&lt;p>The task for this project is to implement the functionality of importing quizzes from Canvas to the standard Quiz Composer format.
The unambiguous nature of Canvas quizzes makes this task non-trivial,
and adds an additional element of design decisions to this task.
It will be impossible to import quizzes 100% correctly,
but we want to be able to get close enough that most people can import their quizzes without issue.&lt;/p>
&lt;p>See Also:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/edulinq/quiz-composer" target="_blank" rel="noopener">Repository for Quiz Composer&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/quiz-composer/issues/27" target="_blank" rel="noopener">GitHub Issue&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="google-forms-export">Google Forms Export&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Backend&lt;/code> &lt;code>Teaching Tools&lt;/code> &lt;code>API&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> software development, backend, rest api, data munging, python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium (175 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:linqs.osre26@gmail.com">Eriq Augustine&lt;/a>, &lt;a href="mailto:linqs.osre26@gmail.com">Lucas Ellenberger&lt;/a>, &lt;a href="mailto:linqs.osre26@gmail.com">Lise Getoor&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The Quiz Composer can export quizzes to many different formats,
each with a varying level of interactivity and feature support.
For example, quizzes can be exported to PDFs which will be printed and the students will just write down their answers to be checked in the future.
Quizzes can also be exported to interactive platforms like Canvas where students can enter answers that may be automatically checked with feedback immediately provided to the student.
On potential platform with functionality somewhere between the above two examples is &lt;a href="https://workspace.google.com/products/forms/" target="_blank" rel="noopener">Google Forms&lt;/a>.
&amp;ldquo;Forms&amp;rdquo; (an entity on Google Forms) can be something like a survey or (as of more recently) a quiz.&lt;/p>
&lt;p>The task for this project is to add support for exporting quizzes from the Quiz Composer to Google Forms.
There is a large overlap in the quiz features supported in Canvas (which the Quiz Composer already supports) and Google Forms,
so most settings should be fairly straightforward.
There may be some design work around deciding what features are specific to one quiz platform
and what features can be abstracted to work across several platforms.&lt;/p>
&lt;p>See Also:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/edulinq/quiz-composer" target="_blank" rel="noopener">Repository for Quiz Composer&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/quiz-composer/issues/19" target="_blank" rel="noopener">GitHub Issue&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="template-questions">Template Questions&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Backend&lt;/code> &lt;code>Teaching Tools&lt;/code> &lt;code>API&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> software development, backend, data munging, python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate-Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:linqs.osre26@gmail.com">Eriq Augustine&lt;/a>, &lt;a href="mailto:linqs.osre26@gmail.com">Lucas Ellenberger&lt;/a>, &lt;a href="mailto:linqs.osre26@gmail.com">Lise Getoor&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Questions in the Quiz Composer are described using &lt;a href="https://en.wikipedia.org/wiki/JSON" target="_blank" rel="noopener">JSON&lt;/a> and &lt;a href="https://en.wikipedia.org/wiki/Markdown" target="_blank" rel="noopener">Markdown&lt;/a>
files which contain the question prompt, possible answers, and the correct answer.
(Of course there are many differ &lt;a href="https://github.com/edulinq/quiz-composer/blob/main/docs/question-types.md" target="_blank" rel="noopener">question types&lt;/a>,
each with different semantics and requirements.)
However, a limitation of this is that each question is always the same.
You can have multiple copies of a question with slightly different prompts, numbers, and answers;
but you are still limited to each question being static and unchanging.
It would be useful to have &amp;ldquo;template questions&amp;rdquo; that can dynamically create static questions from a template
and collection of replacement data.&lt;/p>
&lt;p>The task for this project is to add support for the &amp;ldquo;template questions&amp;rdquo; discussed above.
Much of the high-level design work for this issue has &lt;a href="https://github.com/edulinq/quiz-composer/issues/26" target="_blank" rel="noopener">already been completed&lt;/a>.
But there is still the implementation and low-level design decision left to do.&lt;/p>
&lt;p>See Also:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/edulinq/quiz-composer" target="_blank" rel="noopener">Repository for Quiz Composer&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/quiz-composer/issues/26" target="_blank" rel="noopener">GitHub Issue&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Scenic-RoboSuite Integration: Building the First Working Prototype</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/ucsc/scenic/20250929-sahil-tgs/</link><pubDate>Mon, 29 Sep 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/ucsc/scenic/20250929-sahil-tgs/</guid><description>&lt;p>I&amp;rsquo;m &lt;a href="https://sahiltgs.super.site/" target="_blank" rel="noopener">Sahil&lt;/a>, presenting the first working prototype of the Scenic-RoboSuite integration. This &lt;a href="https://sahiltgs.super.site/gsoc/uc-ospo-proposal" target="_blank" rel="noopener">project&lt;/a> is being mentored by &lt;a href="https://ucsc-ospo.github.io/author/daniel-fremont/" target="_blank" rel="noopener">Daniel Fremont&lt;/a> and &lt;a href="https://ucsc-ospo.github.io/author/eric-vin/" target="_blank" rel="noopener">Eric Vin&lt;/a>.&lt;/p>
&lt;p>After months of development, we have achieved a functional prototype of the &lt;a href="https://scenic-lang.org/" target="_blank" rel="noopener">Scenic&lt;/a>-&lt;a href="https://robosuite.ai/" target="_blank" rel="noopener">RoboSuite&lt;/a> interface. Researchers can now write basic declarative robotic manipulation scenarios in Scenic that execute with physics simulation in RoboSuite. While still in development, the prototype demonstrates the feasibility and potential of bridging probabilistic scenario generation with detailed robot control.&lt;/p>
&lt;h2 id="major-achievements">Major Achievements&lt;/h2>
&lt;h3 id="mjcf-xml-injection">MJCF XML Injection&lt;/h3>
&lt;p>The interface introduces direct MJCF XML support, allowing Scenic to build RoboSuite-native manipulable objects from raw XML definitions. Users can define custom objects with complex mesh geometries, textures, and physics properties directly in their Scenic scenarios:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">dragon_xml = &amp;#39;&amp;#39;&amp;#39;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;lt;mujoco&amp;gt;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;lt;asset&amp;gt;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;lt;mesh file=&amp;#34;dragon.stl&amp;#34; scale=&amp;#34;0.01 0.01 0.01&amp;#34;/&amp;gt;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;lt;texture file=&amp;#34;dragon_texture.png&amp;#34;/&amp;gt;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;lt;/asset&amp;gt;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;lt;worldbody&amp;gt;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;lt;body name=&amp;#34;object&amp;#34;&amp;gt;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;lt;geom mesh=&amp;#34;dragon_mesh&amp;#34; type=&amp;#34;mesh&amp;#34;/&amp;gt;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;lt;/body&amp;gt;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &amp;lt;/worldbody&amp;gt;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;lt;/mujoco&amp;gt;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;&amp;#39;&amp;#39;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">dragon = new CustomObject with mjcfXml dragon_xml
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The system automatically handles collision geometry generation, joint creation for physics, and asset file resolution.&lt;/p>
&lt;h3 id="complex-mesh-object-support">Complex Mesh Object Support&lt;/h3>
&lt;p>Import and manipulate arbitrary 3D models (STL, OBJ) with automatic mesh repair and texture mapping. The interface resolves file paths relative to Scenic files, copies assets to temporary directories for MuJoCo, and converts textures (JPG to PNG) when needed. This enables using custom robotic tools, industrial parts, or any 3D model in manipulation scenarios.&lt;/p>
&lt;h3 id="custom-arena-definition">Custom Arena Definition&lt;/h3>
&lt;p>Define complete custom environments using MJCF XML, extending beyond RoboSuite&amp;rsquo;s built-in arenas:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">custom_arena = new CustomArena with arenaXml localPath(&amp;#34;warehouse.xml&amp;#34;)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This allows creating specialized workspaces, factory floors, or research-specific environments while maintaining full physics simulation.&lt;/p>
&lt;h3 id="multi-robot-support">Multi-Robot Support&lt;/h3>
&lt;p>The interface handles multiple robots operating in the same workspace:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">robot1 = new Panda at (-0.5, 0, 0)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">robot2 = new UR5e at (0.5, 0, 0)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">table = new Table at (0, 0, 0.425)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Each robot maintains independent control and can execute coordinated or individual behaviors.&lt;/p>
&lt;h3 id="built-in-manipulation-behaviors">Built-in Manipulation Behaviors&lt;/h3>
&lt;p>Ready-to-use behaviors for immediate testing and development:&lt;/p>
&lt;ul>
&lt;li>&lt;code>MoveToPosition&lt;/code> - Precise end-effector positioning&lt;/li>
&lt;li>&lt;code>PickObject&lt;/code> - Automated grasping with approach and closure&lt;/li>
&lt;li>&lt;code>LiftToHeight&lt;/code> - Controlled lifting to target heights&lt;/li>
&lt;li>&lt;code>PickAndLift&lt;/code> - Complete pick-and-place sequence&lt;/li>
&lt;/ul>
&lt;p>These behaviors use Operational Space Control (OSC) for intuitive 3D movement commands.&lt;/p>
&lt;h3 id="extended-environment-configuration">Extended Environment Configuration&lt;/h3>
&lt;p>The interface extends RoboSuite&amp;rsquo;s configurability through Scenic&amp;rsquo;s parameter system:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">param controller_config = {&amp;#39;type&amp;#39;: &amp;#39;OSC_POSITION&amp;#39;, &amp;#39;impedance&amp;#39;: &amp;#39;low&amp;#39;}
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">param camera_view = &amp;#39;robot0_eye_in_hand&amp;#39;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">param lite_physics = True # Faster simulation for testing
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="example-probabilistic-pick-and-place">Example: Probabilistic Pick-and-Place&lt;/h2>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">model scenic.simulators.robosuite.model
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"># Randomly position cube on table
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">table = new Table at (0.6, 0, 0.425)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">cube = new Box on table,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> with color (1, 0, 0, 1),
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> with position (Uniform(-0.2, 0.2), Uniform(-0.2, 0.2), _)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"># Robot adapts to random cube position
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">behavior AdaptivePickup():
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> do PickAndLift(cube, height=1.1)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">ego = new Panda at (0, 0, 0),
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> with behavior AdaptivePickup()
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Each scenario run generates a different cube position, testing the robot&amp;rsquo;s adaptive capabilities.&lt;/p>
&lt;h2 id="challenges-overcome">Challenges Overcome&lt;/h2>
&lt;h3 id="understanding-dual-architecture-paradigms">Understanding Dual Architecture Paradigms&lt;/h3>
&lt;p>RoboSuite and Scenic operate on fundamentally different principles. RoboSuite builds environments imperatively through MuJoCo XML composition, expecting complete scene specification upfront. Scenic generates scenes probabilistically through constraint solving, requiring geometric knowledge before simulation. Bridging these required developing a two-pass system where we first extract geometry from a temporary RoboSuite environment, update Scenic&amp;rsquo;s understanding, then create the final simulation. This architectural mismatch touched every aspect of the integration, from object creation to property updates.&lt;/p>
&lt;h3 id="discovering-and-extending-manipulationenv">Discovering and Extending ManipulationEnv&lt;/h3>
&lt;p>RoboSuite&amp;rsquo;s documentation focuses on using pre-built tasks, not creating custom environments. Through extensive source code analysis, we discovered that &lt;code>ManipulationEnv&lt;/code> was the key - it accepts robots as configuration while allowing customizable arenas and objects as components. This class became our foundation, but required significant extension. We implemented &lt;code>ScenicManipulationEnv&lt;/code> to intercept Scenic&amp;rsquo;s object configurations, handle dynamic arena selection (EmptyArena vs MultiTableArena based on scene content), and manage the complex initialization sequence where robots, arenas, and objects must be assembled in specific order for MuJoCo compilation.&lt;/p>
&lt;h3 id="xml-to-3d-mesh-pipeline">XML to 3D Mesh Pipeline&lt;/h3>
&lt;p>Converting MJCF XML to usable 3D meshes proved complex. MuJoCo uses XML to describe geometry, but Scenic needs actual mesh data for collision checking. We built a multi-stage pipeline: First, &lt;code>ElementTree&lt;/code> parses the XML to extract mesh references and primitive definitions. Then, we handle two paths - for mesh files, we load STL/OBJ files with trimesh and apply XML-specified transformations; for primitives (boxes, cylinders), we generate meshes programmatically. The challenge intensified with composite objects - a table might have a box tabletop and four cylinder legs. We developed &lt;code>ComponentExtractor&lt;/code> to analyze the MuJoCo scene graph, identify related geometries through naming patterns and hierarchy, and export each component as a separate GLB file with proper world transforms preserved.&lt;/p>
&lt;h3 id="file-path-resolution-discrepancies">File Path Resolution Discrepancies&lt;/h3>
&lt;p>Scenic and RoboSuite handle file paths completely differently. Scenic uses &lt;code>localPath()&lt;/code> for paths relative to the scenario file, while RoboSuite expects paths relative to its package structure or absolute paths. MJCF XML compounds this - mesh references can be relative to the XML file location, not the calling code. We implemented a sophisticated path resolution system: detect whether paths come from embedded XML (relative to Scenic file) or external XML files (relative to XML location), copy all referenced assets (meshes, textures) to temporary directories accessible to MuJoCo, and handle texture format conversion (JPG to PNG) when needed. This system transparently manages assets whether they&amp;rsquo;re in the Scenic project, RoboSuite package, or absolute paths, making the interface truly portable.&lt;/p>
&lt;h2 id="impact-and-applications">Impact and Applications&lt;/h2>
&lt;p>This bridge enables:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Research&lt;/strong>: Generate diverse manipulation scenarios for robot learning algorithms&lt;/li>
&lt;li>&lt;strong>Testing&lt;/strong>: Validate robotic systems against probabilistic task variations&lt;/li>
&lt;li>&lt;strong>Development&lt;/strong>: Rapid prototyping of manipulation tasks without manual scene setup&lt;/li>
&lt;li>&lt;strong>Education&lt;/strong>: Teach robotics concepts through declarative scenario specification&lt;/li>
&lt;/ul>
&lt;p>The integration makes complex robotic simulations accessible through Scenic&amp;rsquo;s intuitive language while preserving RoboSuite&amp;rsquo;s detailed physics and control capabilities.&lt;/p>
&lt;h2 id="documentation-and-resources">Documentation and Resources&lt;/h2>
&lt;p>The project includes:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>example scenarios&lt;/strong> demonstrating all features&lt;/li>
&lt;li>&lt;strong>Comprehensive STATUS.md&lt;/strong> tracking working features and known issues&lt;/li>
&lt;li>&lt;strong>Technical documentation&lt;/strong> in &lt;code>docs/&lt;/code> covering architecture and troubleshooting&lt;/li>
&lt;li>&lt;strong>Mesh extraction utilities&lt;/strong> for pre-processing and caching&lt;/li>
&lt;/ul>
&lt;h2 id="current-status-and-future-work">Current Status and Future Work&lt;/h2>
&lt;p>This prototype demonstrates that the Scenic-RoboSuite bridge is viable and functional. Basic features are working reliably:&lt;/p>
&lt;ul>
&lt;li>Single-robot manipulation scenarios execute successfully&lt;/li>
&lt;li>MJCF XML injection creates custom objects&lt;/li>
&lt;li>Pick-and-place behaviors operate consistently&lt;/li>
&lt;li>Multi-robot support functions in controlled scenarios&lt;/li>
&lt;/ul>
&lt;p>However, significant work remains:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Stability improvements&lt;/strong>: Some features work intermittently and need refinement&lt;/li>
&lt;li>&lt;strong>Velocity tracking&lt;/strong>: Full implementation awaits framework updates&lt;/li>
&lt;li>&lt;strong>Multi-robot coordination&lt;/strong>: Advanced synchronization primitives needed&lt;/li>
&lt;li>&lt;strong>Performance optimization&lt;/strong>: Mesh extraction and caching can be streamlined&lt;/li>
&lt;li>&lt;strong>Extended testing&lt;/strong>: More diverse scenarios and edge cases need validation&lt;/li>
&lt;/ul>
&lt;p>The prototype serves as a proof of concept, showing that probabilistic scenario specification can successfully drive physics-based robot simulation. The architecture is sound, the core features function, and the path forward is clear.&lt;/p>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>This working prototype of the Scenic-RoboSuite integration represents significant progress toward bridging probabilistic programming with robotic simulation. We&amp;rsquo;ve successfully demonstrated that declarative scenario specification can control detailed physics simulation, opening new possibilities for robotic system development and testing.&lt;/p>
&lt;p>While not yet production-ready, the prototype provides a solid foundation for future development. Researchers can begin experimenting with basic manipulation scenarios, developers can test the interface with their use cases, and the community can contribute to making this bridge more robust and feature-complete.&lt;/p>
&lt;p>The challenges overcome - from understanding dual architectures to implementing XML-to-mesh pipelines - have resulted in a functional system that validates our approach. This prototype proves that Scenic&amp;rsquo;s elegant scenario language and RoboSuite&amp;rsquo;s detailed physics can work together, setting the stage for a powerful new tool in robotics research and development.&lt;/p></description></item><item><title>Final Report: CarbonCast — An end-to-end consumption-based Carbon Intensity Forecasting service</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/ucsc/carboncast/20250915-tanushsavadi/</link><pubDate>Mon, 15 Sep 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/ucsc/carboncast/20250915-tanushsavadi/</guid><description>&lt;p>Hi everyone—this is my final report for &lt;strong>CarbonCast&lt;/strong>, mentored by &lt;strong>Professor Abel Souza&lt;/strong>. Back in June, my goal was simple to say and harder to pull off: help people &lt;strong>see&lt;/strong> when the grid is cleaner and make it easy to act on that information. Over the summer I turned CarbonCast from a research prototype into something you can open, click, and rely on: a containerized backend, a clean API, and a fast, friendly map UI.&lt;/p>
&lt;h2 id="background">Background&lt;/h2>
&lt;p>CarbonCast forecasts the &lt;strong>carbon intensity&lt;/strong> of electricity (gCO₂e/kWh) using grid data and weather. Earlier versions were accurate but difficult to run and even harder to use outside a research context. My OSRE focus was to make CarbonCast usable for real people: provide a standard API, build a web UI that feels responsive, and package everything so it starts quickly and keeps itself healthy.&lt;/p>
&lt;h2 id="goals">Goals&lt;/h2>
&lt;p>I centered the work around four goals. First, I wanted to &lt;strong>ship an end-to-end containerized stack&lt;/strong>—data collection, validation, storage, API, and UI—that someone else could run without digging through my notes. Second, I aimed to &lt;strong>expand coverage&lt;/strong> beyond a handful of regions so the map would be genuinely useful. Third, I needed to &lt;strong>make it reliable&lt;/strong>, with retries, monitoring, and graceful fallbacks so the system could run for weeks without babysitting. Finally, I wanted to &lt;strong>lay the groundwork for a consumption-based signal&lt;/strong>, because imports from neighboring regions also shape a region’s true emissions picture.&lt;/p>
&lt;h2 id="what-i-built">What I built&lt;/h2>
&lt;p>By the end of the program, CarbonCast runs as a &lt;strong>containerized backend + API + web app&lt;/strong> that you can bring up with Docker. The pipelines now reach &lt;strong>85+ regions&lt;/strong>, and the UI currently exposes &lt;strong>58+&lt;/strong> while we finish integrating the rest. The API offers straightforward endpoints for current conditions and multi-day views, plus region metadata so clients can discover what’s available. The UI presents an &lt;strong>interactive choropleth map&lt;/strong> with a side panel for the &lt;strong>energy mix&lt;/strong> and a simple &lt;strong>timeline&lt;/strong> to move between past, now, and the next few days. To keep things feeling snappy, I tuned caching so “now” data updates quickly while historical and forecast views load instantly from cache. I also added a small &lt;strong>“mission control” dashboard&lt;/strong> that shows what updated, what failed, and how the system recovered, which makes maintenance far less mysterious.&lt;/p>
&lt;h2 id="how-it-works">How it works&lt;/h2>
&lt;p>Fresh weather and grid data arrive on a regular schedule. The system checks each file for sanity, stores it, and serves it through a clean API. The React app calls that API and paints the map. Hovering reveals regional details; clicking opens a richer panel with the energy mix and trends; the timeline lets you scrub through hours naturally. In short, the path is &lt;strong>fresh data → API → map&lt;/strong>, and each step is designed to be obvious and quick.&lt;/p>
&lt;p>Behind the scenes, I extended the existing Django backend with a &lt;strong>SQLite path&lt;/strong> so the UI works out of the box on a laptop. For production, you can point the same code at Postgres or MySQL without changing the UI. This choice made local testing easy while leaving room for scale later.&lt;/p>
&lt;h2 id="highlights">Highlights&lt;/h2>
&lt;p>A few moments stand out. The first time the dashboard flipped from red to green on its own—after the system retried through a wave of timeouts—was a turning point. Clicking across the map and getting instant responses because the right data was cached felt great too. And packaging everything so another person can run it without asking me for help might be the biggest quality-of-life win for future contributors.&lt;/p>
&lt;h2 id="challenges">Challenges&lt;/h2>
&lt;p>The first big hurdle was &lt;strong>refactoring the old vanilla-JS interface&lt;/strong>. The original UI worked, but it was dated and hard to extend. I rebuilt it as a modern React + TypeScript app with a cleaner component structure and a fresh look—think &lt;strong>glassmorphic panels&lt;/strong>, readable color scales, and a layout that feels consistent on both laptops and smaller screens. Moving to this design system made the codebase far easier to maintain, theme, and iterate on.&lt;/p>
&lt;p>The next challenge was &lt;strong>performance under real-time load&lt;/strong>. With dozens of regions updating, it was easy to hit API limits and make the UI feel jittery. I solved this by adding a smart &lt;strong>caching layer&lt;/strong> with short, volatility-aware timeouts, request de-duplication, and background prefetching. That combination dramatically reduced round-trips, essentially &lt;strong>eliminated rate-limit hits&lt;/strong>, and made the map feel responsive even as you scrub through time. The result is a UI that can handle many simultaneous updates &lt;strong>without hiccups&lt;/strong>.&lt;/p>
&lt;p>Finally, there were plenty of &lt;strong>stubborn UI bugs&lt;/strong>. Some regions wouldn’t color even when data was available, certain charts refused to render, and a few elements flickered or never showed up. Most of this came down to learning &lt;strong>React state management&lt;/strong> in a real project: taming race conditions, canceling in-flight requests when users navigate, and making sure state only updates when fresh data actually arrives. Fixing those issues taught me a lot about how maps re-paint, how charts expect their data, and how to keep components simple enough that they behave the way users expect.&lt;/p>
&lt;h2 id="what-didnt-make-the-cut-yet">What didn’t make the cut (yet)&lt;/h2>
&lt;p>I designed—but did not finish—&lt;strong>per-region plug-in models&lt;/strong> so each grid can use the approach that fits it best. We decided to ship a stable, deployable service first and reserve that flexibility work for the next phase. The design is written down and ready to build.&lt;/p>
&lt;h2 id="links-and-resources">Links and resources:&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Project page:&lt;/strong> &lt;a href="project/osre25/ucsc/carboncast/">CarbonCast&lt;/a>&lt;/li>
&lt;li>&lt;strong>Proposal:&lt;/strong> &lt;a href="https://ucsc-ospo.github.io/report/osre25/ucsc/carboncast/20250710-tanushsavadi/" target="_blank" rel="noopener">https://ucsc-ospo.github.io/report/osre25/ucsc/carboncast/20250710-tanushsavadi/&lt;/a>&lt;/li>
&lt;li>&lt;strong>Midterm blog:&lt;/strong> &lt;a href="https://ucsc-ospo.github.io/report/osre25/ucsc/carboncast/20250803-tanushsavadi/" target="_blank" rel="noopener">https://ucsc-ospo.github.io/report/osre25/ucsc/carboncast/20250803-tanushsavadi/&lt;/a>&lt;/li>
&lt;li>&lt;strong>Backend/API (branch):&lt;/strong> &lt;a href="https://github.com/carbonfirst/CarbonCast/tree/django_apis_sqlite" target="_blank" rel="noopener">https://github.com/carbonfirst/CarbonCast/tree/django_apis_sqlite&lt;/a>&lt;/li>
&lt;li>&lt;strong>Frontend/UI:&lt;/strong> &lt;a href="https://github.com/carbonfirst/CarbonCastUI/tree/main" target="_blank" rel="noopener">https://github.com/carbonfirst/CarbonCastUI/tree/main&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="whats-next">What’s next&lt;/h2>
&lt;p>My next steps are clear. I want to finish the &lt;strong>per-region model plug-ins&lt;/strong> so grids can bring their own best forecasting logic. I also plan to carry the &lt;strong>consumption-based&lt;/strong> signal end-to-end, including imports and interconnects surfaced directly in the UI. Finally, I’ll harden the system for production by enabling auth and throttling and by moving to a production-grade database where appropriate.&lt;/p>
&lt;h2 id="thank-you">Thank you&lt;/h2>
&lt;p>Huge thanks to &lt;strong>Professor Abel Souza&lt;/strong> for steady mentorship and to the &lt;strong>OSRE&lt;/strong> community for thoughtful feedback. The most rewarding part of this summer was watching a research idea become something people can &lt;strong>click on—and use&lt;/strong> to make cleaner choices.&lt;/p></description></item><item><title>Midterm Report: Learning and Building ORB</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/ucsc/orb/08072025-param/</link><pubDate>Thu, 07 Aug 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/ucsc/orb/08072025-param/</guid><description>&lt;h2 id="project-overview">Project Overview&lt;/h2>
&lt;p>UC ORB is an open-source platform developed to increase visibility and engagement with open source projects across the University of California system.&lt;/p>
&lt;p>By providing a structured and searchable repository browser, ORB makes it easier for researchers, students, and collaborators to discover relevant open source initiatives, track their impact, and connect with contributors. It also helps campuses demonstrate the value of their open source output to potential funders and institutional partners.&lt;/p>
&lt;h2 id="progress-so-far">Progress So Far&lt;/h2>
&lt;p>Significant progress has been made in building out core features of the ORB Showcase platform:&lt;/p>
&lt;h3 id="searching-and-filtering-options">Searching and Filtering Options&lt;/h3>
&lt;p>Users can now search and filter repositories using multiple criteria:&lt;/p>
&lt;ul>
&lt;li>Development Team / UC Campus&lt;/li>
&lt;li>Programming Language&lt;/li>
&lt;li>License Type&lt;/li>
&lt;li>Topic / Domain Area&lt;/li>
&lt;/ul>
&lt;p>These filtering tools make it easy to explore the growing set of repositories in a meaningful and personalized way.&lt;/p>
&lt;p>Pagination has been added to ensure scalability and smooth performance, even as the number of projects continues to grow.&lt;/p>
&lt;h3 id="repository-details-view">Repository Details View&lt;/h3>
&lt;p>Each repository page now displays rich metadata and contextual information, including:&lt;/p>
&lt;p>README preview – offering a quick look at the project’s purpose and usage&lt;/p>
&lt;p>License – clearly indicating how the project can be used or adapted&lt;/p>
&lt;p>Contributors and Funders – acknowledging the people and institutions behind the work&lt;/p>
&lt;h2 id="whats-next">What&amp;rsquo;s Next&lt;/h2>
&lt;p>As we prepare UC ORB for public launch, we’re focused on improving the backend workflow and addressing some key challenges:&lt;/p>
&lt;p>⚙️ GitHub Workflow Challenges
Creating a GitHub-first workflow for adding repositories is powerful, but also tricky:&lt;/p>
&lt;p>GitHub Actions cannot be triggered by API calls from a backend directly, which limits automation via server-side tools.&lt;/p>
&lt;p>The GitHub bot has permission limitations, especially when it comes to interacting with PRs and validating submissions outside of standard GitHub UI flows.&lt;/p>
&lt;p>I’m currently working on designing a more robust and maintainable workflow to handle these edge cases, including:&lt;/p>
&lt;p>A standalone script that can add repositories directly to the database, bypassing the need for a pull request and enabling more flexible internal submissions.&lt;/p>
&lt;p>Better logging and validation to ensure consistency between the file-based data model and the live PostgreSQL database.&lt;/p>
&lt;h1 id="reflection">Reflection&lt;/h1>
&lt;p>This project has been a great learning experience despite challenges with Frontend, Backend, GitHub Actions / Bots and APIs, it’s been exciting to build a platform that highlights open source work across the UC system.&lt;/p>
&lt;p>I&amp;rsquo;m looking forward to what&amp;rsquo;s coming next as we get closer to launching ORB.&lt;/p></description></item><item><title>Midterm Report: Learning, Building, and Documenting Brahma</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/ucsc/brahma/08052025-kajaljotwani/</link><pubDate>Tue, 05 Aug 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/ucsc/brahma/08052025-kajaljotwani/</guid><description>&lt;h2 id="project-overview">Project Overview&lt;/h2>
&lt;p>&lt;strong>Brahma-XR&lt;/strong> is an open-source WebXR framework designed for building collaborative virtual environments especially those involving spatial data and scientific visualization.&lt;/p>
&lt;p>What makes Brahma powerful is that the same codebase runs seamlessly across both the browser and XR devices like the Apple Vision Pro, Meta Quest 3, and VARJO. This makes it ideal for rapid prototyping and creating cross-platform immersive experiences.&lt;/p>
&lt;p>Some of Brahma’s built-in features include:&lt;/p>
&lt;ul>
&lt;li>Grab-and-pull locomotion&lt;/li>
&lt;li>Raycasting and interaction&lt;/li>
&lt;li>Avatar embodiment&lt;/li>
&lt;li>Spatial rendering&lt;/li>
&lt;li>Support for geospatial and data-driven visualizations&lt;/li>
&lt;/ul>
&lt;p>Brahma is intentionally lightweight, optimized to run even on low-compute devices—making immersive collaboration more accessible to everyone.&lt;/p>
&lt;h2 id="what-worked-and-what-didnt">What Worked (and What Didn’t)&lt;/h2>
&lt;p>As Brahma transitioned from a private research repo to a public open-source project, a lot of important foundational work had to be done around documentation, packaging, and example previews.&lt;/p>
&lt;p>There are two aspects that make Brahma especially unique:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Bipartite npm package structure&lt;/strong> – which requires detailed and thoughtful documentation.&lt;/li>
&lt;li>&lt;strong>Immersive, real-time examples&lt;/strong> – unlike typical libraries, Brahma’s examples aren’t just static demos. They are live, multi-user XR apps designed to be interacted with.&lt;/li>
&lt;/ol>
&lt;p>The first half of the project focused on setting the stage—structuring and preparing the framework for broader use.&lt;/p>
&lt;h3 id="-key-accomplishments">🔧 Key Accomplishments&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Learning Three.js&lt;/strong>&lt;br>
I spent time learning the fundamentals of Three.js—how it handles 3D rendering, scene setup, materials, cameras, and animations. I also explored how large-scale Three.js projects are organized, which helped me understand how Brahma’s example apps are built.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Setting up the project structure&lt;/strong>&lt;br>
I looked at the architecture of various open-source projects and used that knowledge to shape Brahma’s structure. The goal was to align with community best practices while keeping things clean and modular for future contributors.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Understanding npm packaging (especially bipartite)&lt;/strong>&lt;br>
Since Brahma includes both client- and server-side logic, I spent time understanding how multi-part npm packages are published and maintained. I explored best practices around versioning, distribution, and separating internal vs public modules.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Creating a documentation system&lt;/strong>&lt;br>
After exploring different approaches (and with my mentor’s help), I set up a static documentation site using &lt;a href="https://jsdoc.app/" target="_blank" rel="noopener">JSDoc&lt;/a> with the Docdash theme. The current version includes guides, API references, and contribution instructions. This is just the beginning—the docs will evolve as the community grows.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="whats-next">What’s Next&lt;/h2>
&lt;p>In the second half of the project, I’ll be focusing on:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Building a routing system&lt;/strong>&lt;br>
For both documentation and example apps, so that users can easily browse through different components and use cases.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Setting up UI and 3D infrastructure&lt;/strong>&lt;br>
To make it easier for others to start building apps with Brahma by providing clean base layers for interface and spatial development.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Prepping for the first public release&lt;/strong>&lt;br>
Publishing the Brahma NPM package along with a curated set of featured examples and contributor-friendly documentation—making it easier for developers to get started and contribute.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="reflection">Reflection&lt;/h2>
&lt;p>This project has truly been the highlight of my summer. Learning about WebXR, Three.js, and open-source workflows has been both exciting and rewarding. Every challenge taught me something new.&lt;/p>
&lt;p>I am specially greatfull to my mentor &lt;strong>Samir Ghosh&lt;/strong> for his constant support, patience, and guidance. It’s been a privilege learning from you!&lt;/p>
&lt;p>I&amp;rsquo;m looking forward to what’s coming next as we get closer to the first public release of Brahma!&lt;/p></description></item><item><title>Midterm blog: CarbonCast Midpoint Update: From Vision to Reality</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/ucsc/carboncast/20250803-tanushsavadi/</link><pubDate>Sun, 03 Aug 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/ucsc/carboncast/20250803-tanushsavadi/</guid><description>&lt;p>A few months ago, I shared my vision for making carbon intensity forecasts more accessible through the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/carboncast">CarbonCast project&lt;/a>. My &lt;a href="https://summerofcode.withgoogle.com/programs/2025/projects/7yvAix3k" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of Professor Abel Souza aims to build an API that makes carbon intensity forecasts more accessible and actionable. I had two main goals: expand CarbonCast to work with more regional electricity grids, and transform it from a research project into something that could actually run and be interacted with in the real world.&lt;/p>
&lt;p>Today, I&amp;rsquo;m excited to share that we&amp;rsquo;ve not only hit those goals – we&amp;rsquo;ve exceeded them in ways I didn&amp;rsquo;t expect.&lt;/p>
&lt;h2 id="what-weve-built-so-far">What We&amp;rsquo;ve Built So Far&lt;/h2>
&lt;p>Remember how I mentioned that CarbonCast needed to support more regional grids? Well, we&amp;rsquo;ve gone big. The system now covers 85+ regions across two continents. We&amp;rsquo;re talking about major US grid operators like ERCOT (Texas), CISO (California), PJM (Mid-Atlantic), MISO (Midwest), and NYISO (New York), plus we&amp;rsquo;ve expanded into European countries like Germany, France, Spain, and the UK.&lt;/p>
&lt;p>But here&amp;rsquo;s the thing – collecting weather data for carbon intensity forecasting isn&amp;rsquo;t as simple as just downloading a few files. Each region needs four different types of weather data: solar radiation (for solar power predictions), wind patterns (for wind power), temperature and humidity (for energy demand), and precipitation (which affects both supply and demand). That means we&amp;rsquo;re managing data collection for over 340 different combinations of regions and weather variables.&lt;/p>
&lt;h2 id="the-automation-challenge">The Automation Challenge&lt;/h2>
&lt;p>When I started this project, I quickly realized that manually managing data collection for this many regions would be impossible. We&amp;rsquo;re talking about thousands of data requests, each taking time to process, with various things that can go wrong along the way.&lt;/p>
&lt;p>So we built something I&amp;rsquo;m really proud of: an intelligent automation system that handles 95% of the work without human intervention. That means 19 out of every 20 data collection tasks happen automatically, even when things go wrong.&lt;/p>
&lt;p>The system is smart about it too. It knows when to speed up data collection, when to slow down to avoid overwhelming the servers, and how to recover when errors happen. We&amp;rsquo;ve achieved 99% data completeness, which means almost every piece of weather data we need actually makes it into our system successfully.&lt;/p>
&lt;h2 id="making-it-production-ready">Making It Production-Ready&lt;/h2>
&lt;p>The biggest challenge was taking CarbonCast from a research project that worked on my laptop to something that could run reliably for weeks without me babysitting it. This meant building in all the boring but crucial stuff that makes software actually work in the real world.&lt;/p>
&lt;p>We created a comprehensive error handling system that can automatically recover from 95% of the problems it encounters. Network hiccups, server timeouts, data format changes – the system handles these gracefully and keeps running.&lt;/p>
&lt;p>There&amp;rsquo;s also a real-time monitoring dashboard that shows exactly what&amp;rsquo;s happening across all regions. I can see which areas are collecting data successfully, which ones might be having issues, and get alerts if anything needs attention. It&amp;rsquo;s like having a mission control center for carbon data.&lt;/p>
&lt;h2 id="the-dashboard-mission-control-for-carbon-data">The Dashboard: Mission Control for Carbon Data&lt;/h2>
&lt;p>Let me show you what this monitoring system actually looks like. We built a comprehensive web dashboard that gives us real-time visibility into everything that&amp;rsquo;s happening:&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Dashboard Overview" srcset="
/report/osre25/ucsc/carboncast/20250803-tanushsavadi/dashboard-overview_hucf78c7d7b58d9515d431a2744915c5c5_523170_def2a560c75da61de5422b7a6a6dbc38.webp 400w,
/report/osre25/ucsc/carboncast/20250803-tanushsavadi/dashboard-overview_hucf78c7d7b58d9515d431a2744915c5c5_523170_5fcfb689e6283d1720e50da81cfb540f.webp 760w,
/report/osre25/ucsc/carboncast/20250803-tanushsavadi/dashboard-overview_hucf78c7d7b58d9515d431a2744915c5c5_523170_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/ucsc/carboncast/20250803-tanushsavadi/dashboard-overview_hucf78c7d7b58d9515d431a2744915c5c5_523170_def2a560c75da61de5422b7a6a6dbc38.webp"
width="760"
height="456"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;em>The main dashboard showing real-time system metrics and status across all regions&lt;/em>&lt;/p>
&lt;p>The dashboard shows key metrics at a glance – total requests, completion rates, and active regions. But it goes much deeper than that. You can drill down into individual requests to see their complete lifecycle:&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Request Details" srcset="
/report/osre25/ucsc/carboncast/20250803-tanushsavadi/dashboard-requests_hu4620d0cbd193aecbbe0c5858e2ba9128_195009_876f419901e0b51127b81f1f37bf33f6.webp 400w,
/report/osre25/ucsc/carboncast/20250803-tanushsavadi/dashboard-requests_hu4620d0cbd193aecbbe0c5858e2ba9128_195009_3ae7b2ac3a29478b49913635f43aac19.webp 760w,
/report/osre25/ucsc/carboncast/20250803-tanushsavadi/dashboard-requests_hu4620d0cbd193aecbbe0c5858e2ba9128_195009_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/ucsc/carboncast/20250803-tanushsavadi/dashboard-requests_hu4620d0cbd193aecbbe0c5858e2ba9128_195009_876f419901e0b51127b81f1f37bf33f6.webp"
width="760"
height="458"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;em>Detailed view of individual data requests showing processing timelines and status&lt;/em>&lt;/p>
&lt;p>Each request card shows everything from the initial request time to when the data becomes available for download. This level of visibility is crucial when you&amp;rsquo;re managing hundreds of data requests across different regions and weather variables.&lt;/p>
&lt;p>The regional analytics view shows how well we&amp;rsquo;re doing across different grid operators:&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Regional Analytics" srcset="
/report/osre25/ucsc/carboncast/20250803-tanushsavadi/dashboard-regions_hubaa80dcd4d7309dd18fca00b148c0f0f_628115_913b55e9f6633983aaaaf25607ac13bf.webp 400w,
/report/osre25/ucsc/carboncast/20250803-tanushsavadi/dashboard-regions_hubaa80dcd4d7309dd18fca00b148c0f0f_628115_950373fdefaf9bd595da010d29c37849.webp 760w,
/report/osre25/ucsc/carboncast/20250803-tanushsavadi/dashboard-regions_hubaa80dcd4d7309dd18fca00b148c0f0f_628115_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/ucsc/carboncast/20250803-tanushsavadi/dashboard-regions_hubaa80dcd4d7309dd18fca00b148c0f0f_628115_913b55e9f6633983aaaaf25607ac13bf.webp"
width="760"
height="445"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;em>Regional breakdown showing completion status across different electricity grid operators&lt;/em>&lt;/p>
&lt;p>What I&amp;rsquo;m particularly proud of is the error handling dashboard. When things do go wrong (which they inevitably do with any large-scale data system), we can see exactly what happened and how the system recovered:&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Error Management" srcset="
/report/osre25/ucsc/carboncast/20250803-tanushsavadi/dashboard-errors_hua5a5c30a5cd8b72a26622f5af77b2406_480389_2864a9c4d56dcc6220d2fe406daddc17.webp 400w,
/report/osre25/ucsc/carboncast/20250803-tanushsavadi/dashboard-errors_hua5a5c30a5cd8b72a26622f5af77b2406_480389_ca1e5cbfdb24da4f1e6531c7be2eed54.webp 760w,
/report/osre25/ucsc/carboncast/20250803-tanushsavadi/dashboard-errors_hua5a5c30a5cd8b72a26622f5af77b2406_480389_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/ucsc/carboncast/20250803-tanushsavadi/dashboard-errors_hua5a5c30a5cd8b72a26622f5af77b2406_480389_2864a9c4d56dcc6220d2fe406daddc17.webp"
width="760"
height="254"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;em>Error tracking and resolution system showing 100% success rate in region mapping&lt;/em>&lt;/p>
&lt;p>The fact that we&amp;rsquo;re showing &amp;ldquo;No unknown regions found&amp;rdquo; means our coordinate-based region detection system is working perfectly – every weather data request gets properly mapped to the right electricity grid.&lt;/p>
&lt;h2 id="the-technical-foundation">The Technical Foundation&lt;/h2>
&lt;p>Under the hood, we&amp;rsquo;ve built what I&amp;rsquo;d call enterprise-grade infrastructure. The system can run autonomously for weeks, automatically organizing data by region and weather type, managing storage efficiently, and even optimizing its own performance based on what it learns.&lt;/p>
&lt;p>We&amp;rsquo;ve also created comprehensive testing systems to make sure everything works reliably. When you&amp;rsquo;re dealing with data that people might use to make real decisions about when to charge their electric vehicles or run their data centers, reliability isn&amp;rsquo;t optional.&lt;/p>
&lt;p>The architecture follows a modular, service-oriented design with clear separation between data collection, processing, monitoring, and user interfaces. This makes it much easier to maintain and extend as we add new features.&lt;/p>
&lt;h2 id="why-this-matters">Why This Matters&lt;/h2>
&lt;p>All of this infrastructure work might sound technical, but it&amp;rsquo;s directly connected to the original vision: making carbon intensity forecasts accessible to everyone.&lt;/p>
&lt;p>With this foundation in place, we can now provide reliable, up-to-date weather data for carbon intensity forecasting across major electricity grids in North America and Europe. That means developers building carbon-aware applications, companies trying to reduce their emissions, and individuals wanting to time their energy use for lower environmental impact all have access to the data they need.&lt;/p>
&lt;h2 id="whats-next-breaking-down-carboncast">What&amp;rsquo;s Next: Breaking Down CarbonCast&lt;/h2>
&lt;p>The next phase is where things get really exciting. Now that we have this solid data collection foundation, we&amp;rsquo;re going to break down CarbonCast itself into modular components. This will make it easier for developers to integrate carbon intensity forecasting into their own applications, whether that&amp;rsquo;s a smart home system, a cloud computing platform, or a mobile app that helps people make greener energy choices.&lt;/p>
&lt;h2 id="looking-back">Looking Back&lt;/h2>
&lt;p>When I started this project, I knew we needed better infrastructure for carbon data. What I didn&amp;rsquo;t expect was how much we&amp;rsquo;d end up building – or how well it would work. We&amp;rsquo;ve created something that can reliably collect and organize weather data across two continents, handle errors gracefully, and run without constant supervision.&lt;/p>
&lt;p>More importantly, we&amp;rsquo;ve built the foundation that will make it possible for anyone to access accurate carbon intensity forecasts. Whether you&amp;rsquo;re a developer building the next generation of carbon-aware applications or someone who just wants to know the best time to do laundry to minimize your environmental impact, the infrastructure is now there to support those decisions.&lt;/p>
&lt;p>The vision of making carbon data accessible and actionable is becoming reality, one automated data collection at a time.&lt;/p>
&lt;h2 id="impact-beyond-research">Impact Beyond Research&lt;/h2>
&lt;p>This work builds directly on the foundation of Multi-day Forecasting of Electric Grid Carbon Intensity using Machine Learning, transforming research into practical, real-world infrastructure. We&amp;rsquo;re not just making carbon intensity forecasts more accurate – we&amp;rsquo;re making them accessible to everyone who wants to reduce their environmental impact.&lt;/p>
&lt;p>The open-source nature of CarbonCast means that anyone can run, contribute to, and benefit from this work. Whether you&amp;rsquo;re a developer building carbon-aware applications, a policymaker working on grid decarbonization strategies, or a sustainability-conscious individual looking to reduce your carbon footprint, the tools are now there to make informed, impactful choices.&lt;/p>
&lt;p>Looking ahead, I&amp;rsquo;m excited to see how this infrastructure will enable the next generation of carbon-aware computing and smart energy decisions.&lt;/p></description></item><item><title>Robot Manipulation with Scenic-RoboSuite</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/ucsc/scenic/20250730-sahil-tgs/</link><pubDate>Wed, 30 Jul 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/ucsc/scenic/20250730-sahil-tgs/</guid><description>&lt;p>We&amp;rsquo;re &lt;a href="https://sahiltgs.super.site/" target="_blank" rel="noopener">Sahil&lt;/a>, continuing work on the Scenic-RoboSuite integration for GSoC 2025. This &lt;a href="https://sahiltgs.super.site/gsoc/uc-ospo-proposal" target="_blank" rel="noopener">project&lt;/a> is mentored by &lt;a href="https://ucsc-ospo.github.io/author/daniel-fremont/" target="_blank" rel="noopener">Daniel Fremont&lt;/a> and &lt;a href="https://ucsc-ospo.github.io/author/eric-vin/" target="_blank" rel="noopener">Eric Vin&lt;/a>.&lt;/p>
&lt;p>Since the last update, the &lt;a href="https://scenic-lang.org/" target="_blank" rel="noopener">Scenic&lt;/a>-&lt;a href="https://robosuite.ai/" target="_blank" rel="noopener">RoboSuite&lt;/a> interface has made significant progress. The bidirectional bridge is now functional - robots can read sensor data and execute behaviors based on observations. However, these features are still in early stages and we&amp;rsquo;re working on making them more stable and consistent.&lt;/p>
&lt;p>We&amp;rsquo;ve integrated RoboSuite&amp;rsquo;s Operational Space Control into Scenic. This control method lets you command the robot&amp;rsquo;s hand directly in 3D space (like &amp;ldquo;move 10cm left&amp;rdquo;) instead of calculating complex joint rotations. While the integration works, it&amp;rsquo;s rough around the edges and we&amp;rsquo;re currently focused on stabilizing it across different scenarios.&lt;/p>
&lt;p>The main challenge was architectural - RoboSuite expects all robot commands bundled together each timestep, while Scenic processes them one by one. We solved this with a pending actions system that collects everything first, then executes in one go. Time synchronization was another challenge, matching Scenic&amp;rsquo;s steps with MuJoCo&amp;rsquo;s physics.&lt;/p>
&lt;p>We&amp;rsquo;ve implemented a basic pick-and-place behavior for basic testing. The robot reads sensor data, calculates where to move, and adjusts continuously. It can successfully grasp and lift objects, though consistency varies between runs. The system supports three robot models and works with RoboSuite&amp;rsquo;s pre-built environments.&lt;/p>
&lt;p>Custom world building is currently on hold. We&amp;rsquo;ve decided to focus on integrating existing RoboSuite features into Scenic first, then build Scenic&amp;rsquo;s capabilities like dynamic scenario randomization on top. For our first prototype, we&amp;rsquo;re aiming to extend the pick-and-place behavior into a full randomization demo - Scenic will randomly position the cube each run, and the robot will adapt to find and grasp it regardless of location.&lt;/p>
&lt;p>The next two weeks focus on stabilizing current features and preparing this randomized scenario prototype. Expanding the behavior library and supporting additional environments will come in future phases after we have a solid foundation.&lt;/p>
&lt;p>The core bridge between Scenic and RoboSuite is operational, but there&amp;rsquo;s significant work ahead to make it reliable and user-friendly.&lt;/p></description></item><item><title>AIDRIN Privacy-Centric Enhancements: Backend &amp; UX Upgrades</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/lbl/aidrin/20250725-harish_balaji/</link><pubDate>Fri, 25 Jul 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/lbl/aidrin/20250725-harish_balaji/</guid><description>&lt;p>⏱️ Reading time: 5–6 minutes&lt;/p>
&lt;p>Hey everyone,&lt;/p>
&lt;p>If you’ve ever wondered what it takes to make AI data pipelines not just smarter, but safer and more transparent, you’re in the right place. The last few weeks working on AIDRIN for GSoC have been a deep dive into the engine room of privacy and backend systems that power the AIDRIN project. My focus has been on building out the core privacy infrastructure and backend features that power AIDRIN’s ability to give users real, actionable insights about their data. It’s been challenging, sometimes messy, but incredibly rewarding to see these changes make a tangible difference.&lt;/p>
&lt;p>Having Dr. Jean Luca Bez and Prof. Suren Byna as mentors, along with the support of the entire team, has truly made all the difference. Their guidance, encouragement, and collaborative spirit have been a huge part of this journey, whether I’m brainstorming new ideas or just trying to untangle a tricky bug.&lt;/p>
&lt;h2 id="privacy-metrics-making-data-safer">Privacy Metrics: Making Data Safer&lt;/h2>
&lt;p>A major part of my work has been putting data privacy at the front and center in AIDRIN. I focused on integrating essential privacy metrics like k-anonymity, l-diversity, t-closeness, and more, making sure they’re not just theoretical checkboxes, but real tools that users can interact with and understand. Now, these metrics are fully wired up in the backend and visualized in AIDRIN, so privacy risks are no longer just a vague concern. They are something AI data preparers can actually see and act on. Getting these metrics to work seamlessly with different datasets and ensuring their accuracy took some serious backend engineering, but the payoff has been worth it.&lt;/p>
&lt;h2 id="speeding-things-up-so-you-dont-have-to-wait-around">Speeding Things Up (So You Don’t Have To Wait Around)&lt;/h2>
&lt;p>As AIDRIN started handling bigger datasets, some of the calculations can be time-consuming because data has to be accessed every time a metric is computed. To address this, I added caching for previously computed metrics, like class imbalance and privacy checks, and set up asynchronous execution with Celery and Redis. This should make the app super responsive. Rather than waiting for heavy computations to finish, one can start taking notes about other metrics or explore different parts of the app while their results are loading in the background. It’s a small change, but it helps keep the workflow moving smoothly.&lt;/p>
&lt;h2 id="small-touch-ups-that-hopefully-make-a-big-difference">Small Touch Ups That (Hopefully) Make a Big Difference&lt;/h2>
&lt;p>I also spent time on the details that make the app easier to use. Tooltips now explain what the privacy metrics actually mean, error messages are clearer, and there’s a new cache info page where you can see and clear your cached data. The sensitive attribute dropdown is less confusing now, especially if you’re working with quasi-identifiers. These tweaks might seem minor, but they add up and make the app friendlier for everyone.&lt;/p>
&lt;h2 id="docs-docs-docs">Docs, Docs, Docs&lt;/h2>
&lt;p>I’m a big believer that good documentation is just as important as good code. I updated the docs to cover all the new features, added citations for the privacy metrics, and made the install process a bit more straightforward. Hopefully, this means new users and contributors can get up to speed without too much hassle.&lt;/p>
&lt;h2 id="huge-thanks-to-my-mentors-and-the-team">Huge Thanks to My Mentors and the Team&lt;/h2>
&lt;p>I really want to shine a light on Dr. Bez, Prof. Byna, and the entire AIDRIN team here. Their encouragement, practical advice, and collaborative spirit have been a huge part of my progress. Whether I’m stuck on a bug, brainstorming a new feature, or just need a second opinion, there’s always someone ready to help me think things through. Their experience and support have shaped not just the technical side of my work, but also how I approach problem-solving and teamwork.&lt;/p>
&lt;h2 id="whats-next">What’s Next?&lt;/h2>
&lt;p>Looking ahead, I’m planning to expand AIDRIN’s support for multimodal datasets and keep refining the privacy and fairness modules. There’s always something new to learn or improve, and I’m excited to keep building. If you’re interested in data quality, privacy, or open-source AI tools, I’d love to connect and swap ideas.&lt;/p>
&lt;p>Thanks for reading and for following along with my GSoC journey. I’ll be back soon with more updates!&lt;/p>
&lt;p>&lt;em>This is the second post in my 3-part GSoC series with AIDRIN. Stay tuned for the final update.&lt;/em>&lt;/p></description></item><item><title>LLMSeqRec: LLM Enhanced Contextual Sequential Recommender</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/sf/llmseqrec/20250722-connor/</link><pubDate>Tue, 22 Jul 2025 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/sf/llmseqrec/20250722-connor/</guid><description>&lt;h1 id="midway-through-osre">Midway Through OSRE&lt;/h1>
&lt;h2 id="my-journey-with-llmseqrec">My Journey with LLMSeqRec&lt;/h2>
&lt;h3 id="hello-from-the-midpoint">Hello from the Midpoint!&lt;/h3>
&lt;p>Hi everyone! I’m Connor Lee, a student at NYU studying Computer Science and Mathematics, and I’m excited to share the progress I’ve made halfway through the Open Source Research Experience (OSRE) with my project: &lt;strong>LLMSeqRec&lt;/strong> – a large language model-enhanced sequential recommender system.&lt;/p>
&lt;p>Over the past several weeks, I’ve had the opportunity to explore the intersection of recommender systems and large language models (LLMs), and it’s been a deep, challenging, and rewarding dive into building smarter, more contextual recommendation engines.&lt;/p>
&lt;hr>
&lt;h3 id="what-is-llmseqrec">What is LLMSeqRec?&lt;/h3>
&lt;p>&lt;strong>LLMSeqRec&lt;/strong> stands for &lt;strong>LLM-Enhanced Contextual Sequential Recommender&lt;/strong>. Traditional sequential recommendation systems like SASRec are great at capturing patterns from user-item interactions, but they often fall short in two areas: understanding &lt;strong>semantic context&lt;/strong> (e.g., item descriptions, reviews) and dealing with &lt;strong>cold-start&lt;/strong> problems.&lt;/p>
&lt;p>LLMSeqRec aims to address this by incorporating &lt;strong>pretrained LLM embeddings&lt;/strong> into the recommendation pipeline. The goal is to enhance models like SASRec with semantic signals from text (like product reviews or titles), allowing them to better model user intent, long-range dependencies, and generalize to new items or users.&lt;/p>
&lt;hr>
&lt;h3 id="progress-so-far">Progress So Far&lt;/h3>
&lt;h4 id="-baseline-sasrec-runs">✅ Baseline SASRec Runs&lt;/h4>
&lt;p>To establish a benchmark, I successfully ran the original SASRec implementation (in PyTorch) using both the &lt;strong>MovieLens 1M&lt;/strong> and &lt;strong>Amazon Beauty&lt;/strong> datasets. After debugging initial data formatting issues and adjusting batch sizes for local CPU/GPU compatibility, I automated training with scripts that let me scale to &lt;strong>200+ epochs&lt;/strong> to acheive the best performance in both Colab and on my MacBook via CPU.&lt;/p>
&lt;p>&lt;strong>Note:&lt;/strong> At this stage, we have not yet integrated LLMs into the model. These baseline runs (SASRec) serve as the control group for evaluating the future impact of LLM-based enhancements.&lt;/p>
&lt;hr>
&lt;h3 id="whats-next">What’s Next&lt;/h3>
&lt;p>As I enter the second half of the OSRE, I’ll be shifting gears toward &lt;strong>LLM integration, model evaluation, and running LLM-powered sequential recommendations using product metadata and contextual information&lt;/strong>. Here&amp;rsquo;s what’s ahead:&lt;/p>
&lt;ul>
&lt;li>Designing pipelines to extract and align textual metadata with item sequences&lt;/li>
&lt;li>Integrating LLM-generated embeddings into the recommender model&lt;/li>
&lt;li>Evaluating performance changes across different dataset characteristics&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h3 id="-experimental-results">📊 Experimental Results&lt;/h3>
&lt;p>We have &lt;strong>not yet utilized LLMs&lt;/strong> in our current experiments. The results below reflect our &lt;strong>reproduced baseline performance of SASRec&lt;/strong> across datasets.&lt;/p>
&lt;p>Below are the &lt;strong>performance curves on different test sets&lt;/strong>, where we evaluate model performance every 20 epochs during training:&lt;/p>
&lt;h4 id="beauty-dataset-performance">Beauty Dataset Performance&lt;/h4>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Beauty Hit@10 Performance" srcset="
/report/osre25/sf/llmseqrec/20250722-connor/beauty-hr_hu655ec71a9ef1f87543ab22378365f6fe_152488_6d3cf991cc5172e392edbb398afef774.webp 400w,
/report/osre25/sf/llmseqrec/20250722-connor/beauty-hr_hu655ec71a9ef1f87543ab22378365f6fe_152488_91a98a3d515a172aed7283ab8b04a8b6.webp 760w,
/report/osre25/sf/llmseqrec/20250722-connor/beauty-hr_hu655ec71a9ef1f87543ab22378365f6fe_152488_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/sf/llmseqrec/20250722-connor/beauty-hr_hu655ec71a9ef1f87543ab22378365f6fe_152488_6d3cf991cc5172e392edbb398afef774.webp"
width="760"
height="497"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;em>Hit@10 performance on the test set for the Beauty dataset (every 20 epochs)&lt;/em>&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Beauty Loss Training" srcset="
/report/osre25/sf/llmseqrec/20250722-connor/beauty-loss-epoch_huc2cddabd12f6ed04444e319cba850bc9_141963_f4e0cc23660b4c974056c8b5d603c0ca.webp 400w,
/report/osre25/sf/llmseqrec/20250722-connor/beauty-loss-epoch_huc2cddabd12f6ed04444e319cba850bc9_141963_7c62f735e3e920d3561bd9113c662533.webp 760w,
/report/osre25/sf/llmseqrec/20250722-connor/beauty-loss-epoch_huc2cddabd12f6ed04444e319cba850bc9_141963_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/sf/llmseqrec/20250722-connor/beauty-loss-epoch_huc2cddabd12f6ed04444e319cba850bc9_141963_f4e0cc23660b4c974056c8b5d603c0ca.webp"
width="760"
height="489"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;em>Training loss for the Beauty dataset&lt;/em>&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Beauty NDCG@10 Performance" srcset="
/report/osre25/sf/llmseqrec/20250722-connor/beauty-ndcg_hu4bef43ef38566a5009aa70da37ebbc50_151414_a1a39dc055b888f5de47c25c87ccf913.webp 400w,
/report/osre25/sf/llmseqrec/20250722-connor/beauty-ndcg_hu4bef43ef38566a5009aa70da37ebbc50_151414_3e4c7d0050bef8ec9f8f7928c2c6c7af.webp 760w,
/report/osre25/sf/llmseqrec/20250722-connor/beauty-ndcg_hu4bef43ef38566a5009aa70da37ebbc50_151414_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/sf/llmseqrec/20250722-connor/beauty-ndcg_hu4bef43ef38566a5009aa70da37ebbc50_151414_a1a39dc055b888f5de47c25c87ccf913.webp"
width="760"
height="483"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;em>NDCG@10 performance on the test set for the Beauty dataset (every 20 epochs)&lt;/em>&lt;/p>
&lt;h4 id="ml-1m-dataset-performance">ML-1M Dataset Performance&lt;/h4>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="ML-1M Loss Training" srcset="
/report/osre25/sf/llmseqrec/20250722-connor/m1-m1-loss-epoch_hua4b125e87ed4debb93bde68ff9b86489_146604_828aa4c04e00024c863cb89e245d358a.webp 400w,
/report/osre25/sf/llmseqrec/20250722-connor/m1-m1-loss-epoch_hua4b125e87ed4debb93bde68ff9b86489_146604_d913a345a32ce7ac5bcff72438283a01.webp 760w,
/report/osre25/sf/llmseqrec/20250722-connor/m1-m1-loss-epoch_hua4b125e87ed4debb93bde68ff9b86489_146604_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/sf/llmseqrec/20250722-connor/m1-m1-loss-epoch_hua4b125e87ed4debb93bde68ff9b86489_146604_828aa4c04e00024c863cb89e245d358a.webp"
width="760"
height="490"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;em>Training loss for the ML-1M dataset&lt;/em>&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="ML-1M Hit@10 Performance" srcset="
/report/osre25/sf/llmseqrec/20250722-connor/ml-m1-hr_huaac170547624f58b168df1545691a3d4_153677_8e8f20a29b2657093b23e780efd1d072.webp 400w,
/report/osre25/sf/llmseqrec/20250722-connor/ml-m1-hr_huaac170547624f58b168df1545691a3d4_153677_257879da50059e5cc3e64fd8ed1d9d72.webp 760w,
/report/osre25/sf/llmseqrec/20250722-connor/ml-m1-hr_huaac170547624f58b168df1545691a3d4_153677_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/sf/llmseqrec/20250722-connor/ml-m1-hr_huaac170547624f58b168df1545691a3d4_153677_8e8f20a29b2657093b23e780efd1d072.webp"
width="760"
height="484"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;em>Hit@10 performance on the test set for the ML-1M dataset (every 20 epochs)&lt;/em>&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="ML-1M NDCG@10 Performance" srcset="
/report/osre25/sf/llmseqrec/20250722-connor/ml-m1-ndcg_huad2935749c06fc72562e3df395457d92_144728_dfd4334fbae2a7067cf9f91b1595e36b.webp 400w,
/report/osre25/sf/llmseqrec/20250722-connor/ml-m1-ndcg_huad2935749c06fc72562e3df395457d92_144728_271754cbcc6eac53f84162a93b670d17.webp 760w,
/report/osre25/sf/llmseqrec/20250722-connor/ml-m1-ndcg_huad2935749c06fc72562e3df395457d92_144728_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/sf/llmseqrec/20250722-connor/ml-m1-ndcg_huad2935749c06fc72562e3df395457d92_144728_dfd4334fbae2a7067cf9f91b1595e36b.webp"
width="760"
height="488"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;em>NDCG@10 performance on the test set for the ML-1M dataset (every 20 epochs)&lt;/em>&lt;/p>
&lt;p>These results demonstrate that our &lt;strong>baseline SASRec reproductions&lt;/strong> are converging as expected and will serve as a solid foundation for comparison once LLM integration is complete.&lt;/p>
&lt;hr>
&lt;h3 id="closing-thoughts">Closing Thoughts&lt;/h3>
&lt;p>This project has been an exciting journey into both research and engineering and I’m excited to explore &lt;strong>LLM-powered embedding integration&lt;/strong> in the upcoming phase.&lt;/p>
&lt;p>I’m incredibly grateful to my mentors &lt;strong>Dr. Linsey Pang and Dr. Bin Dong&lt;/strong> for their support and guidance throughout the project so far. I’m looking forward to sharing more technical results as we work toward building smarter, more adaptable recommender systems.&lt;/p></description></item><item><title>CarbonCast</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/ucsc/carboncast/20250710-tanushsavadi/</link><pubDate>Thu, 10 Jul 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/ucsc/carboncast/20250710-tanushsavadi/</guid><description>&lt;p>As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/carboncast">CarbonCast project&lt;/a>, my &lt;a href="https://summerofcode.withgoogle.com/programs/2025/projects/7yvAix3k" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of Professor Abel Souza aims to build an API that makes carbon intensity forecasts more accessible and actionable.&lt;/p>
&lt;p>Under the mentorship of Professor Abel Souza, my proposal is centered around building upon CarbonCast to create an API to enable user access and utilization of energy data in optimizing their electricity consumption. Before diving into the details of the project, I’d like to share a bit about my background.&lt;/p>
&lt;h2 id="about-me">About Me&lt;/h2>
&lt;p>Hi, I’m Tanush—a rising senior at the University of Massachusetts Amherst, majoring in Computer Science and Mathematics and graduating in Spring 2026. Currently, I’m an AI Intern for the Commonwealth of Massachusetts Department of Unemployment Assistance, where I’m developing an end-to-end retrieval-augmented generation (RAG) chatbot on AWS.&lt;/p>
&lt;p>In the past, I’ve contributed to CarbonCast in a different capacity, designing a user interface to help visualize carbon intensity forecasts. I also worked at MathWorks as a Machine Learning Intern, where I collaborated in an AGILE environment to design and deploy predictive models that improved precision torque control and dynamic responsiveness in motor-driven robotic and industrial systems.&lt;/p>
&lt;p>I’m excited to bring these experiences to this year’s GSoC project, where I’ll be building tools to make carbon data more accessible and actionable for everyone.&lt;/p>
&lt;h2 id="what-is-carboncast">What is CarbonCast?&lt;/h2>
&lt;p>CarbonCast is a Python-based machine-learning library designed to forecast the carbon intensity of electrical grids. Carbon intensity refers to the amount of carbon emitted per kilowatt-hour (kWh) of electricity consumed. Developed in Python, the current version of CarbonCast delivers accurate forecasts in numerous regions by using historical energy production data of a particular geographical region, time of day/year, and weather forecasts as features.&lt;/p>
&lt;p>However, there is no easy way to access, visualize, and utilize the data through a standard interface. In addition, much important information is left out and is not available to users. For instance, electricity grids often import electricity from neighboring regions, and so electricity consumption depends on both electricity generation and imports. Moreover, it is imperative for each energy source to utilize a tailored predictive mechanism. Consequently, any carbon optimization solution trying to reduce carbon emissions due to its electricity consumption will benefit more from following a consumption-based carbon intensity signal.&lt;/p>
&lt;p>Unlike other third-party carbon services, CarbonCast’s model is open-sourced, allowing users to study, understand, and improve its behavior. This transparency invites public collaboration and innovation. It also contrasts sharply with proprietary services that often withhold both the logic behind their models and the data they are trained on.&lt;/p>
&lt;h2 id="why-this-matters">Why This Matters&lt;/h2>
&lt;p>Electricity usage is one of the largest contributors to carbon emissions globally. Carbon intensity—the amount of carbon emitted per kilowatt-hour of electricity consumed—varies based on how electricity is generated and demanded (for example, coal versus solar). With better visibility into when the grid is cleaner, individuals and organizations can shift their energy consumption to lower-carbon periods and lower prices. This enables everyday energy optimizations without compromising comfort or productivity.&lt;/p>
&lt;p>By improving CarbonCast’s accessibility and functionality, we are helping people and institutions answer questions like:&lt;/p>
&lt;ul>
&lt;li>When is the best time to charge my EV to reduce environmental impact?&lt;/li>
&lt;li>Can I run my energy-hungry server jobs when the electricity is cheaper?&lt;/li>
&lt;li>How do I actually reduce my emissions without guessing?&lt;/li>
&lt;/ul>
&lt;p>By providing clear, accurate forecasts of carbon intensity, CarbonCast can help users make informed decisions to optimize their energy footprint and reduce emissions without sacrificing convenience or productivity.&lt;/p>
&lt;h2 id="what-im-building">What I’m Building&lt;/h2>
&lt;p>The plan for this summer is to develop the backend API services for CarbonCast. This summer, I’m focused on two major goals:&lt;/p>
&lt;h3 id="geographical-expansion">Geographical Expansion&lt;/h3>
&lt;p>I am extending CarbonCast’s compatibility to support more regional electricity grids. Each model will be customized for local grid behavior and renewable energy characteristics. This involves tuning the model pipeline to adapt to each region’s energy mix, weather patterns, and reporting granularity.&lt;/p>
&lt;h3 id="system-refactoring-and-modularity">System Refactoring and Modularity&lt;/h3>
&lt;p>The original CarbonCast system was built as a research artifact. To refine it into production-grade infrastructure, I am refactoring the codebase to improve modularity. This makes it easier to plug in new regions, update forecasting algorithms, and integrate new data sources.&lt;/p>
&lt;h2 id="impact-beyond-research">Impact Beyond Research&lt;/h2>
&lt;p>The paper that inspired this project, &lt;em>Multi-day Forecasting of Electric Grid Carbon Intensity using Machine Learning&lt;/em>, pioneered the idea of forecasting carbon intensity over multiple days using a hierarchical machine learning model. This goes beyond the typical 24-hour day-ahead models that are common in the industry and allows for better planning and longer-term decision-making.&lt;/p>
&lt;p>CarbonCast builds directly on that foundation by transforming research into practical, real-world infrastructure. It is an open-source library that anyone can run, contribute to, and benefit from. Whether you&amp;rsquo;re a developer building carbon-aware applications, a policymaker working on grid decarbonization strategies, or a sustainability-conscious individual looking to reduce your carbon footprint, CarbonCast provides the tools to make informed, impactful choices.&lt;/p>
&lt;h2 id="looking-ahead">Looking Ahead&lt;/h2>
&lt;p>I am excited to contribute to a project that blends machine learning, systems engineering, sustainability, and public impact. My goal is to help make it easier for everyone to see, understand, and act on their carbon footprint while also providing the &amp;ldquo;visibility&amp;rdquo; people need to take meaningful, informed actions.&lt;/p></description></item><item><title>Develop a clean and intuitive web-based interface for WildberryEye</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/ucsc/wildberryeye/20250615-sophietao127/</link><pubDate>Sun, 15 Jun 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/ucsc/wildberryeye/20250615-sophietao127/</guid><description>&lt;p>As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/wildberryeye">WildberryEye&lt;/a>, my &lt;a href="./GSoC-proposal.pdf">proposal&lt;/a> under the mentorship of Isaac Espinosa aims to develop a clean, intuitive, and responsive web-based interface to support real-time pollinator detection, data visualization, and system configuration.&lt;/p>
&lt;p>WildberryEye leverages edge computing (Raspberry Pi 5) and object detection (YOLO) to monitor pollinators like bees and hummingbirds. The expectations for this project focuse on developing a full-stack web interface to support real-time pollinator detection, data visualization, and system configuration. The whole development also include the real-time data extraction from the Raspberry Pi 5). The final result empowers researchers and contributors to engage with environmental data in an accessible and meaningful way.&lt;/p></description></item><item><title>Into the VR-Verse: My GSoC Adventure Begins!</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/ucsc/brahma/06152025-kajaljotwani/</link><pubDate>Sun, 15 Jun 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/ucsc/brahma/06152025-kajaljotwani/</guid><description>&lt;p>Hello! I’m Kajal Jotwani, an undergraduate Computer Science student from India who is passionate about building creative, interactive technologies and contributing to open source. This summer, as part of Google Summer of Code 2025, I will be working on the Brahma / Allocentric WebXR Interfaces project under the mentorship of &lt;strong>Samir Ghosh&lt;/strong>. You can read my complete &lt;a href="https://docs.google.com/document/d/1Ne7ADVM72jRuxU7wzRYK8Hvp1zqCUviU0Fh1sTtRWe4/edit?usp=sharing" target="_blank" rel="noopener">proposal here.&lt;/a>&lt;/p>
&lt;p>This project focuses on creating a formalized framework for building collaborative and cross-platform WebXR-based experiences. As part of its first public release of Brahma- a lightweight open-source toolkit, our goal is to formalize the framework, create documentation, and implement example applications like multi-user games and scientific visualizations. This will help make Brahma extensible and accessible for a wider developer community.&lt;/p>
&lt;p>I&amp;rsquo;m excited to be working on this project and will be documenting my journey, learnings, and progress here throughout the summer.&lt;/p></description></item><item><title>Introducing Scenic-RoboSuite Interface</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/ucsc/scenic/20250616-sahil-tgs/</link><pubDate>Sun, 15 Jun 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/ucsc/scenic/20250616-sahil-tgs/</guid><description>&lt;p>Hey! I&amp;rsquo;m &lt;a href="https://sahiltgs.super.site/" target="_blank" rel="noopener">Sahil&lt;/a>, working on integrating Scenic with RoboSuite for GSoC 2025. My &lt;a href="https://sahiltgs.super.site/gsoc/uc-ospo-proposal" target="_blank" rel="noopener">project&lt;/a> is mentored by &lt;a href="https://ucsc-ospo.github.io/author/daniel-fremont/" target="_blank" rel="noopener">Daniel Fremont&lt;/a> and &lt;a href="https://ucsc-ospo.github.io/author/eric-vin/" target="_blank" rel="noopener">Eric Vin&lt;/a> .&lt;/p>
&lt;p>I&amp;rsquo;m connecting &lt;a href="https://scenic-lang.org/" target="_blank" rel="noopener">Scenic&lt;/a> (a probabilistic programming language for scenarios) with &lt;a href="https://robosuite.ai/" target="_blank" rel="noopener">RoboSuite&lt;/a> (a robotics simulation framework). Basically, you write simple scenario descriptions and get complex 3D robot simulations automatically.&lt;/p>
&lt;p>Currently, as I&amp;rsquo;m building things and learning how Scenic works, I have been able to get the basic skeleton for the simulator interface working. I&amp;rsquo;ve implemented the simulator class and built a world model that can translate Scenic objects into RoboSuite&amp;rsquo;s simulator (which is MuJoCo-based). The interface now handles precise object placement in the world pretty well.&lt;/p>
&lt;p>One of the trickier parts was figuring out the translation logic between Scenic and RoboSuite. I managed to overcome this by building a system that automatically detects the shape of objects when moving between the two frameworks, which lays a foundation for more complex object mapping later on.&lt;/p>
&lt;p>I&amp;rsquo;ve also built some basic example scenarios to run and test with. Currently working on more complex examples and testing Scenic&amp;rsquo;s features like probabilistic object placement, constraint satisfaction, and spatial relationships between objects.&lt;/p>
&lt;p>In summary, the &amp;ldquo;Scenic to RoboSuite&amp;rdquo; part of the interface is pretty much done. For next week, I need to work on the &amp;ldquo;RoboSuite to Scenic&amp;rdquo; part - basically getting feedback and state information flowing back from the simulation. Achieving this will make a complete bridge and give us a working simulator interface, which is the first major milestone for the project.&lt;/p></description></item><item><title>Improving AI Data Pipelines in AIDRIN: A Privacy-Centric and Multimodal Expansion</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/lbl/aidrin/20250612-harish_balaji/</link><pubDate>Thu, 12 Jun 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/lbl/aidrin/20250612-harish_balaji/</guid><description>&lt;p>⏱️ Reading time: 4–5 minutes&lt;/p>
&lt;p>Hi 👋&lt;/p>
&lt;p>I’m Harish Balaji, a Master’s student at NYU with a focus on Artificial Intelligence, Machine Learning, and Cybersecurity. I’m especially interested in building scalable systems that reflect responsible AI principles. For me, data quality isn’t just a technical detail. It’s a foundational aspect of building models that are reliable, fair, and reproducible in the real world.&lt;/p>
&lt;p>This summer, I’m contributing to AIDRIN (AI Data Readiness Inspector) as part of Google Summer of Code 2025. I’m grateful to be working under the mentorship of Dr. Jean Luca Bez and Prof. Suren Byna from the &lt;a href="https://crd.lbl.gov/divisions/scidata/sdm/" target="_blank" rel="noopener">Scientific Data Management Group&lt;/a> at Lawrence Berkeley National Laboratory (LBNL).&lt;/p>
&lt;p>AIDRIN is an open-source framework that helps researchers and practitioners evaluate whether a dataset is truly ready to be used in production-level AI workflows. From fairness to privacy, it provides a structured lens through which we can understand the strengths and gaps in our data.&lt;/p>
&lt;h2 id="why-this-work-matters">Why this work matters&lt;/h2>
&lt;p>In machine learning, one principle always holds true:&lt;/p>
&lt;blockquote>
&lt;p>&amp;ldquo;Garbage in, garbage out.&amp;rdquo;&lt;/p>
&lt;/blockquote>
&lt;p>Even the most advanced models can underperform or amplify harmful biases if trained on incomplete, imbalanced, or poorly understood data. This is where AIDRIN steps in. It provides practical tools to assess datasets across key dimensions like privacy, fairness, class balance, interpretability, and support for multiple modalities.&lt;/p>
&lt;p>By making these characteristics measurable and transparent, AIDRIN empowers teams to make informed decisions early in the pipeline. It helps ensure that datasets are not only large or complex, but also trustworthy, representative, and purpose-fit.&lt;/p>
&lt;h2 id="my-focus-this-summer">My focus this summer&lt;/h2>
&lt;p>As part of my GSoC 2025 project, I’ll be focusing on extending AIDRIN’s evaluation capabilities. A big part of this involves strengthening its support for privacy metrics and designing tools that can handle non-tabular datasets, such as image-based data.&lt;/p>
&lt;p>The goal is to expand AIDRIN’s reach without compromising on interpretability or ease of use. More technical insights and updates will follow in the next posts as the summer progresses.&lt;/p>
&lt;h2 id="what-comes-next">What comes next&lt;/h2>
&lt;p>As the AI community continues to evolve, there’s a growing shift toward data-centric practices. I believe frameworks like AIDRIN are essential for helping us move beyond the question of &lt;em>&amp;ldquo;Does the model work?&amp;rdquo;&lt;/em> toward a deeper and more meaningful one: &lt;em>&amp;ldquo;Was the data ready in the first place?&amp;rdquo;&lt;/em>&lt;/p>
&lt;p>Over the next few weeks, I’ll be working on development, testing, and integration. I’m excited to contribute to a tool that emphasizes transparency and reproducibility across the AI lifecycle, and to share lessons and ideas with others who care about responsible AI.&lt;/p>
&lt;p>If you’re exploring similar challenges or working in the space of dataset evaluation and readiness, I’d love to connect and exchange thoughts. You can also read my full GSoC 2025 proposal below for more context around the project scope and vision:&lt;/p>
&lt;p>👉 &lt;a href="https://drive.google.com/file/d/1RUyU2fHkc8GZ9vTj5SUr6jj84ZaRUvNt/view" target="_blank" rel="noopener">Read my GSoC 2025 proposal here&lt;/a>&lt;/p>
&lt;p>&lt;em>This is the first in a 3-part blog series documenting my GSoC journey with AIDRIN. Stay tuned for technical updates and behind-the-scenes insights as the summer unfolds!&lt;/em>&lt;/p></description></item><item><title>LLMSeqRec: LLM Enhanced Contextual Sequential Recommender</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/sf/llmseqrec/20250614-connor/</link><pubDate>Fri, 06 Jun 2025 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/sf/llmseqrec/20250614-connor/</guid><description>&lt;h3 id="project-description">Project Description&lt;/h3>
&lt;p>Sequential Recommender Systems are widely used in scientific and business applications to analyze and predict patterns over time. In biology and ecology, they help track species behavior by suggesting related research on migration patterns and environmental changes. Medical applications include personalized treatment recommendations based on patient history and predicting disease progression. In physics and engineering, these systems optimize experimental setups by suggesting relevant past experiments or simulations. Environmental and climate science applications include forecasting climate trends and recommending datasets for monitoring deforestation or pollution. In business and e-commerce, sequential recommenders enhance user experiences by predicting consumer behavior, suggesting personalized products, and optimizing marketing strategies based on browsing and purchase history. By leveraging sequential dependencies, these recommender systems enhance research efficiency, knowledge discovery, and business decision-making across various domains. Traditional sequential recommendation systems rely on historical user interactions to predict future preferences, but they often struggle with capturing complex contextual dependencies and adapting to dynamic user behaviors. Existing models primarily use predefined embeddings and handcrafted features, limiting their ability to generalize across diverse recommendation scenarios. To address these challenges, we propose LLM Enhanced Contextual Sequential Recommender (LLMSeqRec), which leverages Large Language Models (LLMs) to enrich sequential recommendations with deep contextual understanding and adaptive reasoning.
By integrating LLM-generated embeddings and contextual representations, LLMSeqRec enhances user intent modeling, cold-start recommendations, and long-range dependencies in sequential data. Unlike traditional models that rely solely on structured interaction logs, LLMSeqRec dynamically interprets and augments sequences with semantic context, leading to more accurate and personalized recommendations. This fusion of LLM intelligence with sequential modeling enables a more scalable, adaptable, and explainable recommender system, bridging the gap between traditional sequence-based approaches and advanced AI-driven recommendations.&lt;/p>
&lt;h3 id="project-objectives">Project Objectives&lt;/h3>
&lt;p>Aligned with the vision of the 2025 Open Source Research Experience (OSRE), this project aims to develop an LLM-Enhanced Contextual Sequential Recommender (LLMSeqRec) to improve sequential recommendation accuracy across various scientific and business applications. Sequential recommender systems are widely used to analyze and predict patterns over time, assisting in fields such as biology, ecology, medicine, physics, engineering, environmental science, and e-commerce. However, traditional models often struggle with capturing complex contextual dependencies and adapting to dynamic user behaviors, as they primarily rely on vanilla sequential Id orders.
To address these limitations, this project will leverage Large Language Models (LLMs) to enhance context-aware sequential recommendations by dynamically integrating LLM-generated embeddings and contextual representations. The core challenge lies in designing LLMSeqRec, a unified and scalable model capable of enriching user intent modeling, mitigating cold-start issues, and capturing long-range dependencies within sequential data. Unlike conventional systems that rely solely on structured interaction logs, LLMSeqRec will interpret and augment sequences with semantic context, resulting in more accurate, adaptable, and explainable recommendations. Below is an outline of the methodologies and models that will be developed in this project:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Step 1: Data Preprocessing &amp;amp; Feature Creation&lt;/strong>:
Develop a data processing pipeline to parse user’s sequential interaction behaviors into sequential data points for LLM-based embeddings and contextual sequential transformer modeling; Extract user behavior sequences, items’ metadata, and temporal patterns to create context-aware sequential representations for training, validation and testing; The data source can be from Amazon open public data or Movie Lense data set. The data points creation can follow SASRec (in the reference 1).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Step 2: Model Development&lt;/strong>:
Design and implement LLM-enhanced sequential recommendation models, integrating pretrained language models to augment user-item interactions with semantic context; Develop an adaptive mechanism to incorporate external contextual signals, such as product descriptions, reviews into the sequential recommendation process; The baseline model can be SASRec pytorch implementation.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Step 3: Evaluation&lt;/strong>: :
Benchmark LLMSeqRec against state-of-the-art sequential recommenders, evaluating on accuracy, NDCG and cold-start performance; Conduct ablation studies to analyze the impact of LLM-generated embeddings on recommendation quality; Optimize model inference speed and efficiency for real-time recommendation scenarios.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="project-deliverables">Project Deliverables&lt;/h3>
&lt;p>This project will deliver three components, software, model training, validation and performance evaluation and demo. The software which implements the above LLMSeqRec model will be hosted on the github repo as open-access repositories. The evaluation results and demo will be published along the github repo .&lt;/p>
&lt;h3 id="llmseqrec">LLMSeqRec&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: LLM Enhanced Contextual Sequential Recommender&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Proficiency in Python, Pytorch, Github, Self-attention, Transformer&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Difficult&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/linsey-pang/">Linsey Pang&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/bin-dong/">Bin Dong&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="references">References:&lt;/h3>
&lt;ul>
&lt;li>Self-Attentive Sequential Recommendation (SASRec)&lt;/li>
&lt;li>BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer&lt;/li>
&lt;li>Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks&lt;/li>
&lt;li>Amazon Dataset: &lt;a href="https://cseweb.ucsd.edu/~jmcauley/datasets.html#amazon_reviews" target="_blank" rel="noopener">https://cseweb.ucsd.edu/~jmcauley/datasets.html#amazon_reviews&lt;/a>&lt;/li>
&lt;li>Movie Lense Data: &lt;a href="https://grouplens.org/datasets/movielens/" target="_blank" rel="noopener">https://grouplens.org/datasets/movielens/&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>I&amp;rsquo;m Connor, a student at NYU studying CS and Math. This summer I&amp;rsquo;ve gotten the opportunity to work on LLMSeqRec under Dr. Bin Dong and Dr. Linsey Pang.&lt;/p>
&lt;p>In today’s digital age, sequential recommender systems power everything from e-commerce suggestions to personalized content everywhere. However, traditional models fall short in capturing user intent, adapting to dynamic behavior, or tackling cold-start problems. That’s where LLMSeqRec comes in.&lt;/p>
&lt;h2 id="problem-statement">Problem Statement&lt;/h2>
&lt;p>Most sequential recommender systems rely heavily on historical user-item interactions and predefined embeddings. This approach limits their ability to understand nuanced user preferences, struggles to scale across domains, and performs poorly in scenarios like new users or sparse data. The absence of semantic and contextual modeling is a major gap in current solutions.&lt;/p>
&lt;h2 id="overview-of-project">Overview of project&lt;/h2>
&lt;p>LLMSeqRec is a novel, LLM-enhanced sequential recommender framework that bridges this gap. By leveraging large language models (LLMs), it incorporates semantic embeddings and prompt-based contextual modeling to understand both user behavior and item metadata at a deeper level. The system explores two core approaches:&lt;/p>
&lt;ul>
&lt;li>Embedding-based: LLMs generate embeddings from item attributes.&lt;/li>
&lt;li>Prompt-based: LLMs receive full transaction history in natural language format and infer recommendations.&lt;/li>
&lt;/ul>
&lt;p>These techniques are tested using well-known datasets (e.g., Amazon, MovieLens), and evaluated with ranking metrics like NDCG@10 and Hit@10. The goal: deliver more accurate, context-rich, and explainable recommendations.&lt;/p>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;p>The project is currently progressing through stages including model training, embedding integration, and evaluation. Upcoming tasks include:&lt;/p>
&lt;ul>
&lt;li>Fine-tuning enhanced models&lt;/li>
&lt;li>Designing zero-/few-shot prompts&lt;/li>
&lt;li>Running comparative experiments&lt;/li>
&lt;li>Publishing findings and writing technical blogs&lt;/li>
&lt;/ul>
&lt;p>As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/sf/LLMSeqRec">LLMSeqRec&lt;/a> my &lt;a href="https://drive.google.com/file/d/1cs9lsjacSJUbXWzTfcHIukfKFwKJjUZF/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of Dr. Bin Dong and Dr. Linsey Pang.&lt;/p></description></item><item><title>Understanding Skin-Tone based Bias in Text-to-Image Models Using Stable Diffusion</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/fairface/</link><pubDate>Tue, 27 May 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/fairface/</guid><description>&lt;p>This project investigates &lt;strong>skin tone bias in text-to-image generation&lt;/strong> by analyzing the output of &lt;strong>Stable Diffusion&lt;/strong> models when prompted with socially and occupationally descriptive text. Despite the growing popularity of generative models like Stable Diffusion, little has been done to evaluate how these models reproduce or amplify visual bias—especially related to &lt;strong>skin tone, perceived race, and social class&lt;/strong>—based solely on textual prompts.&lt;/p>
&lt;p>This work builds on prior studies of bias in large language models (LLMs) and vision-language models (VLMs), and aims to explore how biases manifest visually, without explicitly specifying race or ethnicity in the input prompt. Our approach combines &lt;strong>systematic prompt generation&lt;/strong>, &lt;strong>model-based image creation&lt;/strong>, and &lt;strong>skin tone quantification&lt;/strong> to assess disparities across generated samples.&lt;/p>
&lt;p>The ultimate goal is to develop a &lt;strong>reproducible evaluation pipeline&lt;/strong>, visualize disparities across demographic and occupational prompts, and explore strategies to mitigate representational harms in generative models.&lt;/p>
&lt;p>Our goal is to create a reproducible pipeline for:&lt;/p>
&lt;ul>
&lt;li>Generating images from prompts&lt;/li>
&lt;li>Annotating or analyzing them using computer vision tools&lt;/li>
&lt;li>Measuring bias across categories like skin tone, gender presentation, or status markers&lt;/li>
&lt;/ul>
&lt;p>Project webpage: &lt;a href="https://github.com/marzianizam/ucsc-ospo.github.io/tree/main/content/project/osre25/UCSC/FairFace" target="_blank" rel="noopener">https://github.com/marzianizam/ucsc-ospo.github.io/tree/main/content/project/osre25/UCSC/FairFace&lt;/a>&lt;/p>
&lt;h3 id="project-idea-measuring-bias-in-ai-generated-portraits">Project Idea: Measuring Bias in AI-Generated Portraits&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: Responsible AI, Generative Models, Ethics in AI&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Python, PyTorch, Stable Diffusion, Prompt Engineering, Data Analysis&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: 350 hours&lt;/li>
&lt;li>&lt;strong>Mentors&lt;/strong>:
&lt;ul>
&lt;li>&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/marzia-binta-nizam/">Marzia Binta Nizam&lt;/a> (mailto:manizam@ucsc.edu)&lt;/li>
&lt;li>Professor James Davis (mailto:davisje@ucsc.edu)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="background">Background&lt;/h3>
&lt;p>Recent research has shown that text-to-image models can perpetuate racial and gender stereotypes through visual output. For instance, prompts like “CEO” or “nurse” often produce racially skewed results even when no explicit race or demographic cues are provided. This project examines whether similar disparities exist &lt;strong>along skin tone dimensions&lt;/strong>, focusing on &lt;strong>subtle biases&lt;/strong> rather than overt stereotypes.&lt;/p>
&lt;p>The key challenge is that visual bias is not always easy to measure. This project addresses this issue by utilizing &lt;strong>melanin-level quantification&lt;/strong>, a continuous and interpretable proxy for skin tone, in conjunction with consistent prompt templating and multi-sample averaging to ensure statistical rigor.&lt;/p>
&lt;hr>
&lt;h3 id="objectives">Objectives&lt;/h3>
&lt;ul>
&lt;li>Generate datasets using consistent prompts (e.g., &amp;ldquo;A portrait of a doctor&amp;rdquo;, &amp;ldquo;A homeless person&amp;rdquo;, etc.)&lt;/li>
&lt;li>Use Stable Diffusion (and optionally, other models like DALL·E or Midjourney) to generate diverse image sets&lt;/li>
&lt;li>Measure bias across demographic and occupational categories using image processing tools&lt;/li>
&lt;li>Visualize the distribution of melanin values and facial features across samples&lt;/li>
&lt;li>Explore prompt-level mitigation strategies to improve fairness in output&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h3 id="deliverables">Deliverables&lt;/h3>
&lt;ul>
&lt;li>Open-source codebase for prompt generation and image evaluation&lt;/li>
&lt;li>Statistical analysis of visual bias trends&lt;/li>
&lt;li>Blog post or visual explainer on findings&lt;/li>
&lt;li>Final report and recommendations on prompt engineering or model constraints&lt;/li>
&lt;/ul>
&lt;hr></description></item><item><title>UC Open Source Repository Browser</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/orb/</link><pubDate>Mon, 03 Mar 2025 13:00:00 -0800</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/orb/</guid><description>&lt;p>The University of California Open Source Repository Browser (UC ORB) is a discovery tool designed to map and classify open source projects across the UC system. This project is a collaboration with the &lt;a href="https://ucospo.net" target="_blank" rel="noopener">UC Network of Open Source Program Offices (OSPOs)&lt;/a>, which brings together six UC campuses (Santa Cruz, Berkeley, Davis, Los Angeles, Santa Barbara, and San Diego) to support open source research, promote sustainability, and establish best practices within academic environments.&lt;/p>
&lt;p>By providing a centralized platform, UC ORB enhances the visibility of UC’s open source contributions, fosters collaboration among researchers and developers, and serves as a model for other institutions aiming to improve open source discovery and sustainability.&lt;/p>
&lt;p>This project focuses on building the web application for UC ORB, which will serve as the primary interface for users to explore and interact with UC’s open source projects. The student will work on developing a clean, user-friendly, and scalable web application.&lt;/p>
&lt;h3 id="develop-the-uc-orb-application">Develop the UC ORB Application&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Web development&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Experience in Python and at least one Python-based web framework (e.g., Flask, Django, FastAPI), experience with front-end technologies (React, HTML, CSS, JavaScript), familiarity with Git and collaborative development workflows, familiarity with database interaction (SQL).&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:jgomez91@ucsc.edu">Juanita Gomez&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Develop a web application that serves as the front-end interface for the UC ORB. The application will allow users to browse, search, and explore open source projects across the UC system. The project will involve integrating with the repository database to fetch and display repository data, designing an intuitive user interface, and ensuring the application is scalable and maintainable.&lt;/p>
&lt;p>Specific Tasks:&lt;/p>
&lt;ul>
&lt;li>Choose an appropriate Python-based web framework (e.g., Flask, Django, or FastAPI) for the backend and set up the basic structure of the application.&lt;/li>
&lt;li>Develop a responsive and user-friendly front-end interface ensuring that it is accessible and works well on both desktop and mobile devices.&lt;/li>
&lt;li>Add search functionality to allow users to find projects by keywords, tags, or other metadata.&lt;/li>
&lt;li>Implement filtering options to narrow down search results (e.g., by campus, topic, or programming language).&lt;/li>
&lt;li>Deploy the application to a cloud platform (e.g., AWS, or Google Cloud) or GitHub Pages (GitHub.io) for public access.&lt;/li>
&lt;li>Create developer documentation that explains the application’s architecture, setup instructions, and contribution guidelines.&lt;/li>
&lt;li>Write a short user manual to help end-users browse and use the web application effectively.&lt;/li>
&lt;/ul></description></item><item><title>FairFace</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/fair-face/</link><pubDate>Fri, 28 Feb 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/fair-face/</guid><description>&lt;h3 id="fairface-reproducible-bias-evaluation-in-facial-ai-models-via-controlled-skin-tone-manipulation">FairFace: Reproducible Bias Evaluation in Facial AI Models via Controlled Skin Tone Manipulation&lt;/h3>
&lt;p>Bias in facial AI models remains a persistent issue, particularly concerning skin tone disparities. Many studies report that AI models perform differently on lighter vs. darker skin tones, but these findings are often difficult to reproduce due to variations in datasets, model architectures, and evaluation settings.
The goal of this project is to investigate bias in facial AI models by manipulating skin tone and related properties in a controlled, reproducible manner. By leveraging BioSkin, we will adjust melanin levels and other skin properties on existing human datasets to assess whether face-based AI models (e.g., classification and vision-language models) exhibit biased behavior toward specific skin tones.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Fairness &amp;amp; Bias in AI&lt;/code>, &lt;code>Face Recognition &amp;amp; Vision-Language Models&lt;/code>, &lt;code>Dataset Augmentation for Reproducibility&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Machine Learning &amp;amp; Computer Vision, Deep Learning (PyTorch/TensorFlow), Data Augmentation &amp;amp; Image Processing, Reproducibility &amp;amp; Documentation (GitHub, Jupyter Notebooks).&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large ( Can be completed in either 175 or 350 hours, depending on the depth of analysis and number of models tested.)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:davisje@ucsc.edu">James Davis&lt;/a>, &lt;a href="mailto:pang@soe.ucsc.edu">Alex Pang&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="key-research-questions">Key Research Questions&lt;/h3>
&lt;ol>
&lt;li>Do AI models perform differently based on skin tone?
&lt;ul>
&lt;li>How do classification accuracy, confidence scores, and error rates change when skin tone is altered systematically?&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>What are the underlying causes of bias?
&lt;ul>
&lt;li>Is bias solely dependent on skin tone, or do other skin-related properties (e.g., texture, reflectance) contribute to model predictions?&lt;/li>
&lt;li>Is bias driven by dataset imbalances (e.g., underrepresentation of certain skin tones)?&lt;/li>
&lt;li>Do facial features beyond skin tone (e.g., structure, expression, pose) contribute to biased predictions?&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Are bias trends reproducible?
&lt;ul>
&lt;li>Can we replicate bias patterns across different datasets, model architectures, and experimental setups?&lt;/li>
&lt;li>How consistent are the findings when varying image sources and preprocessing methods?&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;h3 id="specific-tasks">Specific Tasks:&lt;/h3>
&lt;ol>
&lt;li>Dataset Selection &amp;amp; Preprocessing
&lt;ul>
&lt;li>Choose appropriate face/human datasets (e.g., FairFace, CelebA, COCO-Human).&lt;/li>
&lt;li>Preprocess images to ensure consistent lighting, pose, and resolution before applying transformations.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Skin Tone Manipulation with BioSkin
&lt;ul>
&lt;li>Systematically modify melanin levels while keeping facial features unchanged.&lt;/li>
&lt;li>Generate multiple variations per image (lighter to darker skin tones).&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Model Evaluation &amp;amp; Bias Analysis
&lt;ul>
&lt;li>Test face classification models (e.g., ResNet, FaceNet) and vision-language models (e.g., BLIP, LLaVA) on the modified images.&lt;/li>
&lt;li>Compute fairness metrics (e.g., demographic parity, equalized odds).&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Investigate Underlying Causes of Bias
&lt;ul>
&lt;li>Compare model behavior across different feature sets.&lt;/li>
&lt;li>Test whether bias persists across multiple datasets and model architectures.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Ensure Reproducibility
&lt;ul>
&lt;li>Develop an open-source pipeline for others to replicate bias evaluations.&lt;/li>
&lt;li>Provide codebase and detailed documentation for reproducibility.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol></description></item><item><title>ReasonWorld</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/reason-world/</link><pubDate>Fri, 28 Feb 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/reason-world/</guid><description>&lt;h3 id="reasonworld-real-world-reasoning-with-a-long-term-world-model">ReasonWorld: Real-World Reasoning with a Long-Term World Model&lt;/h3>
&lt;p>A world model is essentially an internal representation of an environment that an AI system would construct based on external information to plan, reason, and interpret its surroundings. It stores the system’s understanding of relevant objects, spatial relationships, and/or states in the environment. Recent augmented reality (AR) and wearable technologies like Meta Aria glasses provide an opportunity to gather rich information from the real world in the form of vision, audio, and spatial data. Along with this, large language (LLM), vision language models (VLMs), and general machine learning algorithms have enabled nuanced understanding and processing of multimodal inputs that can label, summarize, and analyze experiences.&lt;/p>
&lt;p>With &lt;strong>ReasonWorld&lt;/strong>, we aim to utilize these technologies to enable advanced reasoning about important objects/events/spaces in real-world environments in a structured manner. With the help of wearable AR technology, the system would be able to capture real-world multimodal data. We aim to utilize this information to create a long-memory modeling toolkit that would support features like:&lt;/p>
&lt;ul>
&lt;li>Longitudinal and structured data logging: Capture and storing of multimodal data (image, video, audio, location coordinates etc.)&lt;/li>
&lt;li>Semantic summarization: Automatic scene labeling via LLMs/VLMs to identify key elements in the surroundings&lt;/li>
&lt;li>Efficient retrieval: For querying and revisiting past experiences and answering questions like “Where have I seen this painting before?”&lt;/li>
&lt;li>Adaptability: Continuously refining and understanding the environment and/or relationships between objects/locations.&lt;/li>
&lt;li>Adaptive memory prioritization: Where the pipeline can assess the contextual significance of the captured data and retrieve those that are the most significant. The model retains meaningful, structured representations rather than raw, unfiltered data.&lt;/li>
&lt;/ul>
&lt;p>This real-world reasoning framework with a long-term world model can function as a structured search engine for important objects and spaces, enabling:&lt;/p>
&lt;ul>
&lt;li>Recognizing and tracking significant objects, locations, and events&lt;/li>
&lt;li>Supporting spatial understanding and contextual analysis&lt;/li>
&lt;li>Facilitating structured documentation of environments and changes over time&lt;/li>
&lt;/ul>
&lt;h3 id="alignment-with-summer-of-reproducibility">Alignment with Summer of Reproducibility:&lt;/h3>
&lt;ul>
&lt;li>Core pipeline for AR data ingestion, event segmentation, summarization, and indexing (knowledge graph or vector database) would be made open-source.&lt;/li>
&lt;li>Clear documentation of each module and how they collaborate with one another&lt;/li>
&lt;li>The project could be tested with standardized datasets, simulated environments as well as controlled real-world scenarios, promoting reproducibility&lt;/li>
&lt;li>Opportunities for Innovation - A transparent, modular approach invites a broad community to propose novel expansions&lt;/li>
&lt;/ul>
&lt;h3 id="specific-tasks">Specific Tasks:&lt;/h3>
&lt;ul>
&lt;li>A pipeline for real-time/batch ingestion of data with the wearable AR device and cleaning&lt;/li>
&lt;li>Have an event segmentation module to classify whether the current object/event is contextually significant, filtering out the less relevant observations.&lt;/li>
&lt;li>Have VLMs/LLMs summarize the events with the vision/audio/location data to be stored and retrieved later by structured data structures like knowledge graph, vector databases etc.&lt;/li>
&lt;li>Storage optimization with prioritizing important objects and spaces, optimizing storage based on contextual significance and frequency of access.&lt;/li>
&lt;li>Implement key information retrieval mechanisms&lt;/li>
&lt;li>Ensure reproducibility by providing datasets and scripts&lt;/li>
&lt;/ul>
&lt;h3 id="reasonworld">ReasonWorld&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Augmented reality&lt;/code> &lt;code>Multimodal learning&lt;/code> &lt;code>Computer vision for AR&lt;/code> &lt;code>LLM/VLM&lt;/code> &lt;code>Efficient data indexing&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Machine Learning and AI, Augmented Reality and Hardware integration, Data Engineering &amp;amp; Storage Optimization&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:davisje@ucsc.edu">James Davis&lt;/a>, &lt;a href="mailto:pang@soe.ucsc.edu">Alex Pang&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>AI for Science: Automating Domain Specific Tasks with Large Language Models</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucr/domain-automation/</link><pubDate>Sun, 23 Feb 2025 21:30:56 -0800</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucr/domain-automation/</guid><description>&lt;p>Recent advancements in Large Language Models (LLMs) have transformed various fields by demonstrating remarkable capabilities in processing and generating human-like text. This project aims to explore the development of an open-source framework that leverages LLMs to enhance discovery across specialized domains.&lt;/p>
&lt;p>The proposed framework will enable LLMs to analyze and interpret complex datasets, automate routine tasks, and uncover novel insights. A key focus will be on equipping LLMs with domain-specific expertise, particularly in areas where specialized tools &amp;ndash; such as ANDES &amp;ndash; are not widely integrated with LLM-based solutions. By bridging this gap, the framework will empower researchers and professionals to harness LLMs as intelligent assistants capable of navigating and utilizing niche computational tools effectively.&lt;/p>
&lt;h3 id="ai-for-science-automating-domain-specific-tasks-with-large-language-models">AI for Science: Automating Domain Specific Tasks with Large Language Models&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Large Language Models&lt;/code> &lt;code>AI for Science&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Python, Experience with LLMs, Prompt Engineering, Fine-Tuning, LLM Frameworks&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium-Difficult&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: [Daniel Wong]&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/daniel-wong/">Daniel Wong&lt;/a>, [Luanzheng &amp;ldquo;Lenny&amp;rdquo; Guo]&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/luanzheng-lenny-guo/">Luanzheng &amp;#34;Lenny&amp;#34; Guo&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="project-tasks-and-milestones">Project Tasks and Milestones&lt;/h3>
&lt;ul>
&lt;li>Designing an extensible framework that facilitates the integration of LLMs with specialized software and datasets.&lt;/li>
&lt;li>Developing methodologies for fine-tuning LLMs to act as domain experts.&lt;/li>
&lt;li>Implementing strategies for improving tool interoperability, allowing LLMs to interact seamlessly with less commonly used but critical analytical platforms.&lt;/li>
&lt;/ul></description></item><item><title>Exploration of I/O Reproducibility with HDF5</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/pnnl/h5_reproducibility/</link><pubDate>Wed, 19 Feb 2025 09:00:00 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/pnnl/h5_reproducibility/</guid><description>&lt;p>Parallel I/O is a critical component in high-performance computing (HPC), allowing multiple processes to read and write data concurrently from a shared storage system. &lt;a href="https://github.com/HDFGroup/hdf5" target="_blank" rel="noopener">HDF5&lt;/a>—a widely adopted data model and library for managing complex scientific data—supports parallel I/O but introduces challenges in I/O reproducibility, where repeated executions do not always produce identical results. This lack of reproducibility can stem from non-deterministic execution orders, variations in collective buffering strategies, and race conditions in metadata and dataset chunking operations within HDF5’s parallel I/O hierarchy. Moreover, many HDF5 operations that leverage &lt;a href="%28https://www.hdfgroup.org/wp-content/uploads/2020/02/20200206_ECPTutorial-final.pdf%29">MPI I/O&lt;/a> require collective communication; that is, all processes within a communicator must participate in operations such as metadata creation, chunk allocation, and data aggregation. These collective calls ensure that the file structure and data layout remain consistent across processes, but they also introduce additional synchronization complexity that can impact reproducibility if not properly managed. In HPC scientific workflows, consistent I/O reproducibility is essential for accurate debugging, validation, and benchmarking, ensuring that scientific results are both verifiable and trustworthy. Tools such as &lt;a href="https://github.com/hpc-io/h5bench" target="_blank" rel="noopener">h5bench&lt;/a>—a suite of I/O kernels designed to exercise HDF5 I/O on parallel file systems—play an important role in identifying these reproducibility challenges, tuning performance, and ultimately supporting the overall robustness of large-scale scientific applications.&lt;/p>
&lt;h3 id="workplan">Workplan&lt;/h3>
&lt;p>The proposed work will include (1) analyzing and characterizing parallel I/O operations in &lt;a href="https://www.hdfgroup.org/wp-content/uploads/2020/02/20200206_ECPTutorial-final.pdf" target="_blank" rel="noopener">HDF5&lt;/a> with &lt;a href="https://github.com/hpc-io/h5bench" target="_blank" rel="noopener">h5bench&lt;/a> miniapps, (2) exploring and validating potential reproducibility challenges within the parallel I/O hierarchy (e.g., MPI I/O), and (3) implementing solutions to address parallel I/O reproducibility.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Parallel I/O&lt;/code> &lt;code>MPI-I/O&lt;/code> &lt;code>Reproducibility&lt;/code> &lt;code>HPC&lt;/code> &lt;code>HDF5&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> C/C++, Python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/luanzheng-lenny-guo/">Luanzheng &amp;quot;Lenny&amp;quot; Guo&lt;/a> and [Wei Zhang]&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/wei-zhang/">Wei Zhang&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>AR4VIP</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/ar4vip/</link><pubDate>Tue, 18 Feb 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/ar4vip/</guid><description>&lt;p>We are interested in developing navigation aids for visually impaired people (VIP) using AR/VR technologies.
Our intended use is primarily indoors or outdoors but within private confines e.g. person&amp;rsquo;s backyard.
Using AR/VR headsets or smart glasses allows navigation without using a cane and frees
the users&amp;rsquo; hands for other tasks.&lt;/p>
&lt;h3 id="continue-development-on-meta-quest-3-headset">Continue Development on Meta Quest 3 Headset&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Dynamic scenes&lt;/code> &lt;code>Spatial audio&lt;/code> &lt;code>Proximity detection&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> AR/VR familiarity, WebXR, Unity, SLAM, good communicator, good documentation skills&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:pang@soe.ucsc.edu">Alex Pang&lt;/a>, &lt;a href="davis@cs.ucsc.edu">James Davis&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Continue development and field testing with the Meta Quest 3 headset.
See this &lt;a href="https://github.com/sail360/UCSC-VIP-Research" target="_blank" rel="noopener">repository page&lt;/a> for current status.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Improve spatial audio mapping&lt;/li>
&lt;li>Improve obstacle detection, at different heights, with pre-scanned geometry as well as dynamic objects
e.g. other people, pets, doors&lt;/li>
&lt;li>Special handling of hazards e.g. stairs, uneven floors, etc.&lt;/li>
&lt;li>Explore/incorporate AI to help identify objects in the scene when requested by user&lt;/li>
&lt;/ul>
&lt;h3 id="new-development-on-smart-glasses">New Development on Smart Glasses&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Dynamic scenes&lt;/code> &lt;code>Spatial audio&lt;/code> &lt;code>Proximity detection&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> AR/VR familiarity, WebXR, Unity, SLAM, good communicator, good documentation skills&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:pang@soe.ucsc.edu">Alex Pang&lt;/a>, &lt;a href="mailto:davis@cs.ucsc.edu">James Davis&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>VR headsets are bulky and awkward, but currently is more advanced than AR glasses in terms of programmability.
Ultimately, the form factor of smart glasses is more practical for extended use by our target users.
There are many vendors working on pushing out their version of smart glasses targetting various applications
e.g. alternative for watching TV, etc. We are interested in those that provide capabilities to support
spatial computing. Most of these will likely have their own brand specific APIs. This project has 2 goals:
(a) develop generic brand-independent API, perhaps extensions to WebXR, to support overarching goal of navigation
aid for VIP, and
(b) port functionality of VR version to smart glasses while taking advantage of smart glass functionalities and sensors.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Explore current and soon-to-be-available smart glass options e.g. Snap Spectacles, Xreal Air 2 ultra, etc. and select a platform to work on (subject to cost and availability of SDK). At a minimum, glass should be microphones and speakers, and cameras. Infrared cameras or other low light capability is a plus. Sufficient battery life or option for quick exchange.&lt;/li>
&lt;li>Identify support provided by SDK e.g. does it do realtime scene reconstruction? does it support spatial audio? etc. If it supports features outside of WebXR, provide generic hooks to improve portability of code to other smart glasses.&lt;/li>
&lt;li>Port and extend functionalities from the Meta Quest 3 VR headsets to smart glass platform.&lt;/li>
&lt;li>Add AI support if glasses support them.&lt;/li>
&lt;li>Provide documentation of work.&lt;/li>
&lt;/ul></description></item><item><title>CarbonCast: Building an end-to-end consumption-based Carbon Intensity Forecasting service</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/carboncast/</link><pubDate>Tue, 18 Feb 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/carboncast/</guid><description>&lt;p>&lt;a href="https://github.com/carbonfirst/carboncast" target="_blank" rel="noopener">CarbonCast&lt;/a> is a machine-learning-based approach to provide multi-day forecasts of the electrical grid&amp;rsquo;s carbon intensity. Developed in Python, the current version of CarbonCast delivers accurate forecasts in numerous regions by using historical source production data of a particular geographical region, time of day/year, and weather forecasts as features. However, there is no easy way to access and visualize the data through a standard interface. In addition, much important information is left out and is not available to users. For instance, electricity grids often import electricity from neighboring regions and so electricity consumption depends on both electricity generation and imports. Moreover, it is imperative for each energy source to utilize a tailored predictive mechanism. Consequently, any carbon optimization solution trying to reduce carbon emissions due to its electricity consumption will benefit more from following a consumption-based CI signal.&lt;/p>
&lt;p>The plan for this project is to develop both the frontend and the backend API services for CarbonCast. We also intend to enhance CarbonCast by implementing an architecture wherein each region can employ a distinct interface for their predictive modeling. In scenarios where these new models do not yield superior outcomes within a region, the current architecture will serve as a fallback solution.&lt;/p>
&lt;h3 id="building-an-end-to-end-consumption-based-carbon-intensity-forecasting-service">Building an end-to-end consumption-based Carbon Intensity Forecasting service&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Databases&lt;/code> &lt;code>Machine Learning&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, command line (bash), MySQL, Django, machine learning, cronjob&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium (175 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/abel-souza/">Abel Souza&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Develop a containerized end-to-end backend, API, and frontend for collecting, estimating, and visualizing real-time and forecast electrical grid&amp;rsquo;s carbon intensity data in a scalable manner.&lt;/p>
&lt;p>Tasks:&lt;/p>
&lt;ul>
&lt;li>Research web technologies and frameworks relevant to CarbonCast development.&lt;/li>
&lt;li>Run and collect CarbonCast&amp;rsquo;s data (CSV)&lt;/li>
&lt;li>Ingest CSV into a MySQL or SQLite database&lt;/li>
&lt;li>Develop an Application Programming Interface (API) and a Web User Interface (UI) to provide real-time data access and visualization.&lt;/li>
&lt;li>Deploy the CarbonCast API as a service and dockerize it so that other users and applications can locally deploy and use it easily.&lt;/li>
&lt;li>Implement a choropleth web map to visualize the carbon intensity data across the different geographical regions supported by CarbonCast.&lt;/li>
&lt;li>Enhance CarbonCast by implementing an extensible architecture wherein every region can employ distinct models for their predictive modeling.&lt;/li>
&lt;/ul></description></item><item><title>Vector Embeddings Dataset</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/embeddings/</link><pubDate>Tue, 11 Feb 2025 13:00:00 -0800</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/embeddings/</guid><description>&lt;h3 id="vector-embeddings-dataset">Vector Embeddings Dataset&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Vector Embeddings&lt;/code> &lt;code>LLMs&lt;/code> &lt;code>Transformers&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> software development, apis, scripting, python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:jayjeetc@ucsc.edu">Jayjeet Chakraborty&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>To benchmark vector search algorithms (aka ANN algorithms), there are several datasets available but none of
them represent actual real world workloads. This is because they usually have small vectors of only a few hundred
dimensions. For vector search experiments to represent real world workloads, we want to have datasets with
several thousand dimensions like what is generated by OpenAIs text-embedding models. This project aims to create a
dataset with 1B embeddings from a wikipedia dataset using open source models. Ideally, we will have 3 versions of this dataset, with 1024, 4096, and 8192 sized embeddings to start with.&lt;/p></description></item><item><title>Brahma</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/brahma/</link><pubDate>Tue, 11 Feb 2025 12:34:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/brahma/</guid><description>&lt;p>Brahma is a lightweight framework for building collaborative and cross platform WebXR based experiences using Three.js for the front-end and a simple Node.js/WebSocket script on the backend. It was created at the Social Emotional Technology Lab to facilitate the development of novel collaborative interfaces and virtual environments capable of loading scientific datasets. For example, in the featured image, multiple avatars are exploring a &lt;a href="https://www.science.org/doi/10.1126/science.adf0566" target="_blank" rel="noopener">marine science dataset related to seal migration paths&lt;/a> overlaid on NOAA bathymetry and telemetry data.&lt;/p>
&lt;p>It addresses a gap where prior open-source collaborative VR is no longer available such as the defunct &lt;a href="https://support.mozilla.org/en-US/kb/end-support-mozilla-hubs" target="_blank" rel="noopener">Mozilla Hubs&lt;/a> or proprietary engine based frameworks such as &lt;a href="https://ubiq.online/" target="_blank" rel="noopener">Ubiq&lt;/a>. Furthermore, it uses very little computational resources to run and develop, enabling creators who may not have a powerful computer to run a game engine in order to develop a networked VR application.&lt;/p>
&lt;p>This project involves the first public release of Brahma&amp;ndash; creating a lightweight open source framework that facilitates multi-user games, scientific visualizations and other applications. In order to do so, we need to formalize the framework, provide documentation, and implement key examples so that the open source tool can be extensible and serve a wider community.&lt;/p>
&lt;p>Mentees can expect to learn best practices for VR development and testing and gain familiarity with full stack development practices. Mentees should have access and experience using a VR headset.&lt;/p>
&lt;h1 id="brahma--protoocol-release-and-validation">Brahma / Protoocol Release and Validation&lt;/h1>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Web Development&lt;/code> &lt;code>Software Architecture&lt;/code> &lt;code>VR Development&lt;/code> &lt;code>Computer Graphics&lt;/code> &lt;code>Cloud Platforms&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Node.js, Three.js&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate-Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:sghosh17@ucsc.edu">Samir Ghosh&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The proposed work includes three phases, primarily working on backend code, and API design. In the first phase, to gain familiarity, the mentee will be running and testing the Brahma backend on a variety of cloud platforms such as AWS, Google Cloud, and Azure&amp;ndash; and learning best methods for documentation in the process. Then, in the second phase, the mentee will work on formalizing the protocol for avatar embodiment and other multi-user interfaces, testing the application with a simple pong game. In the third phase, the mentee will address telemetry, logging, and analysis considerations.&lt;/p>
&lt;p>This project is well suited for someone who has interest in virtual reality, especially social VR, multi-user, or collaborative applications&lt;/p>
&lt;h1 id="brahma--allocentric-webxr-interfaces">Brahma / Allocentric WebXR Interfaces&lt;/h1>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Web Development&lt;/code> &lt;code>VR Development&lt;/code> &lt;code>Computer Graphics&lt;/code> &lt;code>UX/UI&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Three.js, GLSL, WebSocket&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate-Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:sghosh17@ucsc.edu">Samir Ghosh&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The proposed work primarily involves front-end code and VR interface design. In the first phase, the mentee will gain familiarity with best practices for WebXR development through the implementation and documentation of simple interaction patterns. Then, the mentee will implement a simple multi-user pong game to learn about allocentric interfaces. In the final phase of the project, the mentee will design and implement one or more allocentric interface of their choosing.&lt;/p>
&lt;p>This project is well suited for someone who has interest in virtual reality, especially aspects of graphics and interaction design.&lt;/p></description></item><item><title>WildBerryEye</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/wildberryeye/</link><pubDate>Tue, 11 Feb 2025 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/wildberryeye/</guid><description>&lt;p>WildBerryEye leverages Raspberry Pi and YOLO object detection models to monitor pollinizers like bees and hummingbirds visiting flowers. This initiative aims to enhance environmental research by automating data collection and analysis of pollinator activities, which are crucial for ecological assessments and conservation efforts. The project utilizes video data provided by &lt;a href="https://www.researchgate.net/profile/Rossana-Maguina-Conde" target="_blank" rel="noopener">Dr. Rossana Maguiña&lt;/a>, processed through advanced machine learning techniques to accurately identify and track pollinator interactions in natural habitats.&lt;/p>
&lt;h3 id="develop-web-based-user-interface">Develop web-based user interface&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Full Stack Development&lt;/code> &lt;code>React&lt;/code> &lt;code>Flask&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Experience with full stack development and real time processing&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate to Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or large (175 or 350 hrs)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:caiespin@ucsc.edu">Carlos Isaac Espinosa Ramirez&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Develop a clean and intuitive web-based interface for WildBerryEye, ensuring ease of use for researchers and contributors. The platform should present real-time pollinator detection results, facilitate data visualization, and allow users to interact with system settings efficiently. The website must be accessible, visually appealing, and optimized for both desktop and mobile users, avoiding unnecessary complexity or intrusive elements.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Frontend Development: Continue development to enhance the user interface using React and CSS, ensuring a responsive and user-friendly design.&lt;/li>
&lt;li>Backend Development: Expand functionality using Flask, focusing on efficient API endpoints and seamless interaction with the frontend (excluding database implementation).&lt;/li>
&lt;li>Real-Time Communication: Implement and refine real-time updates between the frontend and backend to enhance system responsiveness.&lt;/li>
&lt;li>Usability &amp;amp; Design Optimization: Research and propose improvements to the system’s usability, design, and overall user experience.&lt;/li>
&lt;/ul></description></item><item><title>AI Data Readiness Inspector (AIDRIN)</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/lbl/aidrin/</link><pubDate>Tue, 11 Feb 2025 10:15:00 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/lbl/aidrin/</guid><description>&lt;p>Garbage In Garbage Out (GIGO) is a universally agreed quote by computer scientists from various domains, including Artificial Intelligence (AI). As data is the fuel for AI, models trained on low-quality, biased data are often ineffective. Computer scientists who use AI invest considerable time and effort in preparing the data for AI.&lt;/p>
&lt;p>&lt;a href="https://arxiv.org/pdf/2406.19256" target="_blank" rel="noopener">AIDRIN&lt;/a> (AI Data Readiness INspector) is a framework that provides a quantifiable assessment of the readiness of data for AI processes, covering a broad range of readiness dimensions available in the literature. AIDRIN uses metrics in traditional data quality assessment, such as completeness, outliers, and duplicates, for data evaluation. Furthermore, AIDRIN uses metrics specific to assess data for AI, such as feature importance, feature correlations, class imbalance, fairness, privacy, and FAIR (Findability, Accessibility, Interoperability, and Reusability) principle compliance. AIDRIN provides visualizations and reports to assist data scientists in further investigating the readiness of data.&lt;/p>
&lt;h3 id="aidrin-visualizations-and-science-gateway">AIDRIN Visualizations and Science Gateway&lt;/h3>
&lt;p>The proposed work will include improvements in the AIDRIN framework to (1) enhance, extend, and optimize the visualizations of metrics related to all six pillars of AI data readiness and (2) set up a science gateway on NERSC or AWS cloud service.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>data readiness&lt;/code> &lt;code>AI&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, C/C++, good communicator&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jean-luca-bez/">Jean Luca Bez&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/suren-byna/">Suren Byna&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>h5bench with AI workloads</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/lbl/h5bench-ai/</link><pubDate>Tue, 11 Feb 2025 10:15:00 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/lbl/h5bench-ai/</guid><description>&lt;p>&lt;a href="https://github.com/hpc-io/h5bench" target="_blank" rel="noopener">h5bench&lt;/a> is a suite of parallel I/O benchmarks or kernels representing I/O patterns that are commonly used in HDF5 applications on high performance computing systems. h5bench measures I/O performance from various aspects, including the I/O overhead, and observed I/O rate.&lt;/p>
&lt;p>Parallel I/O is a critical technique for moving data between compute and storage subsystems of supercomputers. With massive amounts of data produced or consumed by compute nodes, high-performant parallel I/O is essential. I/O benchmarks play an important role in this process; however, there is a scarcity of I/O benchmarks representative of current workloads on HPC systems. Toward creating representative I/O kernels from real-world applications, we have created h5bench, a set of I/O kernels that exercise HDF5 I/O on parallel file systems in numerous dimensions. Our focus on HDF5 is due to the parallel I/O library&amp;rsquo;s heavy usage in various scientific applications running on supercomputing systems. The various tests benchmarked in the h5bench suite include I/O operations (read and write), data locality (arrays of basic data types and arrays of structures), array dimensionality (1D arrays, 2D meshes, 3D cubes), I/O modes (synchronous and asynchronous). h5bench measurements can be used to identify performance bottlenecks and their root causes and evaluate I/O optimizations. As the I/O patterns of h5bench are diverse and capture the I/O behaviors of various HPC applications, this study will be helpful to the broader supercomputing and I/O community.&lt;/p>
&lt;h3 id="h5bench-with-ai-workloads">h5bench with AI workloads&lt;/h3>
&lt;p>The proposed work will include (1) analyzing and characterizing AI workloads that rely on HDF5 datasets, (2) extracting a kernel of their I/O operations, and (3) implementing and validating the kernel in h5bench.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>I/O&lt;/code> &lt;code>HPC&lt;/code> &lt;code>benchmarking&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, C/C++, good communicator&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jean-luca-bez/">Jean Luca Bez&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/suren-byna/">Suren Byna&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>HAgent</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/hagent/</link><pubDate>Tue, 11 Feb 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/hagent/</guid><description>&lt;p>&lt;a href="https://github.com/masc-ucsc/hagent" target="_blank" rel="noopener">HAgent&lt;/a> is a platform to build AI hardware agent engine to support multiple components in chip design, such as code generation, verification, debugging, and tapeout.&lt;/p>
&lt;p>HAgent is build as a compiler for for Hardware Agents, it interfaces with
typical EDA tools like compilers, synthesis, and verification. There are
several projects around enhancing HAgent.&lt;/p>
&lt;h3 id="bugfarm-hagent-step">BugFarm hagent step&lt;/h3>
&lt;p>&lt;strong>Objective&lt;/strong>: Develop a HAgent step (pass) to create bugs in a given design.&lt;/p>
&lt;p>&lt;strong>Description&lt;/strong>: Using LLMs (Hagent APIs), the goal is to add &amp;ldquo;bugs&amp;rdquo; to input Verilog design.
The goal is for other tools passes that need to fix bugs, to use this
infrastructure as a bug generator. There is a MCY
(&lt;a href="https://github.com/YosysHQ/mcy" target="_blank" rel="noopener">https://github.com/YosysHQ/mcy&lt;/a>) that does something similar but it does not
use verilog and create a very different Verilog output. The BugFarm is supposed
to have somewhat similar functionality but edit the Verilog directly which
results in a code with just a few edits. Like MCY, there has to be a step to confirm that
the change affects results. The project should benchmarks and compare with MCY.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Skills Needed:&lt;/strong> Python, Verilog, and understand agents&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jose-renau/">Jose Renau&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/farzaneh-rabiei-kashanaki/">Farzaneh Rabiei Kashanaki&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="hdeval-competition-repository">HDEval Competition Repository&lt;/h3>
&lt;p>&lt;strong>Objective&lt;/strong>: Create a platform for HDL programming challenges and community engagement.&lt;/p>
&lt;p>&lt;strong>Description&lt;/strong>: Develop a repository where users can solve HDL problems in Verilog, Chisel, PyRTL, etc. Implement a points system for successful solutions. Allow users to submit new problems (code, specifications, verification, and tests) that are not easily solvable by LLMs. Automate solution testing and provide feedback on submissions.&lt;/p>
&lt;p>The submissions consist of 4 components: code, specification, verification, and tests. It should be possible to submit also examples of bugs in code/specification/verification/tests during the design.&lt;/p>
&lt;p>If the code is different from Verilog, it should include the HDL (chisel, PyRTL,&amp;hellip;) and also the Verilog.&lt;/p>
&lt;p>The specification is free form. For any given specification, an expert on the area should be able to generate code, verification, and tests. Similarly, from any pair. Any expert should be able to generate the rest. For example, from verification and tests, it should be able to generate the code and specification.&lt;/p>
&lt;p>Typical specifications consist of a plan, API, and a sample usage.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Skills Needed:&lt;/strong> Web design, some hardware understanding&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jose-renau/">Jose Renau&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/farzaneh-rabiei-kashanaki/">Farzaneh Rabiei Kashanaki&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="integrate-silicon-compiler">Integrate Silicon Compiler&lt;/h3>
&lt;p>&lt;strong>Objective&lt;/strong>: &lt;a href="https://github.com/siliconcompiler/siliconcompiler" target="_blank" rel="noopener">Silicon Compiler&lt;/a> is an open-source Python library that allows to interface with many EDA tools. The idea is to integrate it with HAgent to allow prompts/queries to
interface with it.&lt;/p>
&lt;p>&lt;strong>Description&lt;/strong>: The agentic component requires to check with silicon compiler
that the generated Python compiles but also that has reasonable parameters.
This will require a react loop for compiler errors, and likely a judge loop for
testing for reasonable options/flow with feedback from execution. Since there
is not much training examples, it will require a few shot with a database to
populate context accordingly.&lt;/p>
&lt;p>The end result should allow to select different tools and options trhough silicon compiler.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Skills Needed:&lt;/strong> Backend chip design&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> High&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jose-renau/">Jose Renau&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="comodore-64-or-msx-or-gameboy">Comodore 64 or MSX or Gameboy&lt;/h3>
&lt;p>&lt;strong>Objective&lt;/strong>: Create a prompt-only specification to build a hardware
accelerated for the target platform (Comodore 64, MSX or Gameboy). The
generated code should focus on Verilog, but it is fine to also target some
other HDL. In all the cases, the project should include a generated Verilog
integrated with some emulator for verification.&lt;/p>
&lt;p>&lt;strong>Description&lt;/strong>: Using &lt;a href="https://github.com/masc-ucsc/hagent" target="_blank" rel="noopener">Hagent&lt;/a>, create an
&lt;a href="https://github.com/masc-ucsc/hdeval" target="_blank" rel="noopener">HDLEval&lt;/a> benchmark (set of prompts) that
provide the necessary information to create the Verilog implementation. HDLEval
prompts usually consists of a high-level PLAN or specification, an API to
implement, and a few examples of usage for the given API.&lt;/p>
&lt;p>The result of running the bencharmk, a generated Verilog runs program in the
emulator and the Verilog to compare correctness. The platform should have an
already existing emulator &lt;a href="https://vice-emu.sourceforge.io/" target="_blank" rel="noopener">vice-emu&lt;/a> or
&lt;a href="https://mgba.io/" target="_blank" rel="noopener">mGBA&lt;/a> to perform cosimulation against the generated
specification.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Skills Needed:&lt;/strong> Verilog for front-end design&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> High&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jose-renau/">Jose Renau&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Scenic: A Language for Design and Verification of Autonomous Cyber-Physical Systems</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/scenic/</link><pubDate>Tue, 11 Feb 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/scenic/</guid><description>&lt;p>&lt;a href="https://scenic-lang.org/" target="_blank" rel="noopener">Scenic&lt;/a> is a probabilistic programming language for the design and verification of autonomous cyber-physical systems like self-driving cars.
Scenic allows users to define &lt;em>scenarios&lt;/em> for testing or training their system by putting a probability distribution on the system&amp;rsquo;s environment: the positions, orientations, and other properties of objects and agents, as well as their behaviors over time.
Sampling these scenarios and running them in a simulator yields synthetic data which can be used to train or test a system.
Since Scenic was released open-source in 2019, our group and many others in academia have used Scenic to find, diagnose, and fix bugs in autonomous cars, aircraft, robots, and other kinds of systems.
In industry, it is being used by companies including Boeing, Meta, Deutsche Bahn, and Toyota in domains spanning autonomous driving, aviation, household robotics, railways, maritime, and virtual reality.&lt;/p>
&lt;p>Our long-term goal is for Scenic to become a widely-used common representation and toolkit supporting the entire design lifecycle of AI-based cyber-physical systems.
Towards this end, we have many summer projects available, ranging from adding new application domains to working on the Scenic compiler and sampler:&lt;/p>
&lt;ol>
&lt;li>3D Driving Scenarios&lt;/li>
&lt;li>A Library for Aviation Scenarios&lt;/li>
&lt;li>Interfacing Scenic to new simulators&lt;/li>
&lt;li>Optimizing and parallelizing Scenic&lt;/li>
&lt;li>Improvements and infrastructure for the VerifAI toolkit&lt;/li>
&lt;/ol>
&lt;p>See the sections below for details.&lt;/p>
&lt;h3 id="3d-driving-scenarios">3D Driving Scenarios&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Autonomous Driving&lt;/code> &lt;code>3D modeling&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python; basic vector geometry&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/daniel-fremont/">Daniel Fremont&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/eric-vin/">Eric Vin&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Scenic scenarios written to test autonomous vehicles use the &lt;a href="https://docs.scenic-lang.org/en/latest/modules/scenic.domains.driving.html" target="_blank" rel="noopener">driving domain&lt;/a>, a Scenic library defining driving-specific concepts including cars, pedestrians, roads, lanes, and intersections.
The library extracts information about road networks, such as the shapes of lanes, from files in the standard &lt;a href="https://www.asam.net/standards/detail/opendrive/" target="_blank" rel="noopener">OpenDRIVE&lt;/a> format.
Currently, we only generate 2D polygons for lanes, throwing away 3D information.
While this suffices for many driving scenarios, it means we cannot properly model overpasses (the roads appear to overlap) or test driving scenarios where 3D geometry is important, such as hilly terrain.&lt;/p>
&lt;p>The goals of this project are to extend our road network library to generate 3D meshes (instead of 2D polygons) for roads, write new Scenic scenarios which use this new capability, and (if time allows) test autonomous driving software using them.&lt;/p>
&lt;h3 id="a-library-for-aviation-scenarios">A Library for Aviation Scenarios&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Autonomous Aircraft&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python; ideally some aviation experience&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/daniel-fremont/">Daniel Fremont&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/eric-vin/">Eric Vin&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>We have used Scenic to find, diagnose, and fix bugs in software for autonomous aircraft: in particular, &lt;a href="https://arxiv.org/abs/2005.07173" target="_blank" rel="noopener">this paper&lt;/a> studied a neural network-based automated taxiing system using the &lt;a href="https://www.x-plane.com/" target="_blank" rel="noopener">X-Plane&lt;/a> flight simulator.
We also have prototype interfaces to &lt;a href="https://microsoft.github.io/AirSim/" target="_blank" rel="noopener">AirSim&lt;/a> and &lt;a href="https://www.flightsimulator.com/" target="_blank" rel="noopener">Microsoft Flight Simulator&lt;/a>.
However, our experiments so far have mainly focused on simple scenarios involving a single aircraft.&lt;/p>
&lt;p>The goal of this project is to develop an &lt;em>aviation library&lt;/em> for Scenic (like the driving domain mentioned in the previous project) which will allow users to create complex aviation scenarios in a simulator-agnostic way.
The library would define concepts for aircraft, flight paths, weather, etc. and allow importing real-world data about these.
The student would demonstrate the library&amp;rsquo;s functionality by writing some example scenarios and testing either simple aircraft controllers or (if time allows) ML-based flight software.&lt;/p>
&lt;h3 id="interfacing-scenic-to-new-simulators">Interfacing Scenic to New Simulators&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Simulation&lt;/code> &lt;code>Autonomous Driving&lt;/code> &lt;code>Robotics&lt;/code> &lt;code>LLMs&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/daniel-fremont/">Daniel Fremont&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/eric-vin/">Eric Vin&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Scenic is designed to be &lt;a href="https://docs.scenic-lang.org/en/latest/new_simulator.html" target="_blank" rel="noopener">easily-interfaced to new simulators&lt;/a>.
Depending on student interest, we could pick a simulator which would open up new kinds of applications for Scenic and write an interface for it.
Some possibilities include:&lt;/p>
&lt;ul>
&lt;li>The &lt;a href="https://github.com/tier4/AWSIM" target="_blank" rel="noopener">AWSIM&lt;/a> driving simulator (to allow testing the &lt;a href="https://autoware.org/" target="_blank" rel="noopener">Autoware&lt;/a> open-source autonomous driving software stack)&lt;/li>
&lt;li>The &lt;a href="https://www.coppeliarobotics.com/" target="_blank" rel="noopener">CoppeliaSim&lt;/a> robotics simulator&lt;/li>
&lt;li>NVIDIA&amp;rsquo;s &lt;a href="https://github.com/NVIDIA/Cosmos" target="_blank" rel="noopener">Cosmos&lt;/a>, an LLM which generates videos from text prompts&lt;/li>
&lt;li>NVIDIA&amp;rsquo;s &lt;a href="https://www.nvidia.com/en-us/omniverse/" target="_blank" rel="noopener">Omniverse&lt;/a> (various applications, e.g. simulating virtual factories)&lt;/li>
&lt;li>Various simulators for which we have prototype interfaces that could be generalized and made more usable, including &lt;a href="https://mujoco.org/" target="_blank" rel="noopener">MuJoCo&lt;/a> and &lt;a href="https://developer.nvidia.com/isaac/sim" target="_blank" rel="noopener">Isaac Sim&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The goal of the project would be to create an interface between Scenic and the new simulator and write scenarios demonstrating it.
If time allows, we could do a case study on a realistic system for publication at an academic conference.&lt;/p>
&lt;h3 id="optimizing-and-parallelizing-scenic">Optimizing and Parallelizing Scenic&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Optimization&lt;/code> &lt;code>Parallelization&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/daniel-fremont/">Daniel Fremont&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/eric-vin/">Eric Vin&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Large-scale testing with Scenic, when one wants to generate thousands of simulations, can be very computationally-expensive.
In some cases, the bottleneck is the simulator, and being able to easily run multiple simulations in parallel would greatly increase scalability.
In others, Scenic itself spends substantial time trying to sample scenarios satisfying all the given constraints.&lt;/p>
&lt;p>This project would explore a variety of approaches to speeding up scene and simulation generation in Scenic.
Some possibilities include:&lt;/p>
&lt;ul>
&lt;li>Parallelizing scene generation and simulation (e.g. using &lt;a href="https://github.com/ray-project/ray" target="_blank" rel="noopener">Ray&lt;/a>)&lt;/li>
&lt;li>Systematically profiling real-world Scenic programs to characterize the main bottlenecks and propose optimizations&lt;/li>
&lt;li>JIT compiling Scenic&amp;rsquo;s internal sampling code (e.g. using &lt;a href="https://numba.pydata.org/" target="_blank" rel="noopener">Numba&lt;/a>)&lt;/li>
&lt;/ul>
&lt;h3 id="improvements-and-infrastructure-for-the-verifai-toolkit">Improvements and Infrastructure for the VerifAI Toolkit&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>DevOps&lt;/code> &lt;code>Documentation&lt;/code> &lt;code>APIs&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Easy&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/daniel-fremont/">Daniel Fremont&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/eric-vin/">Eric Vin&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;a href="https://github.com/BerkeleyLearnVerify/VerifAI" target="_blank" rel="noopener">VerifAI&lt;/a> is a toolkit for design and analysis of AI-based systems that builds on top of Scenic.
It adds among other features the ability to perform &lt;em>falsification&lt;/em>, intelligently searching for scenarios that will cause a system to behave in an undesirable way.&lt;/p>
&lt;p>The goal of this project is to improve VerifAI&amp;rsquo;s development infrastructure, documentation, and ease of use, which are currently relatively poor compared to Scenic.
Specific tasks could include:&lt;/p>
&lt;ul>
&lt;li>Setting up continuous integration (CI) on GitHub&lt;/li>
&lt;li>Creating processes to help users/developers submit issues and PRs and deal with them in a timely manner&lt;/li>
&lt;li>Writing more documentation, including tutorials and examples (not only for end users of VerifAI but those wanting to develop custom falsification components, for example)&lt;/li>
&lt;li>Refactoring VerifAI&amp;rsquo;s API to make it easier to use and extend&lt;/li>
&lt;/ul></description></item><item><title>Smart Batching for Large Language Models</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucr/smartbatch/</link><pubDate>Sun, 09 Feb 2025 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucr/smartbatch/</guid><description>&lt;p>Sequence tokenization is a crucial step during Large Language Model training, fine-tuning, and inference. User prompts and training data are tokenized and zero-padded before being fed to the model in batches. This process allows models to interpret human language by breaking down complex sentences into simple token units that are numerically represented in a token set. However, the process of sequence padding for maintaining batch dimensions can introduce unnecessary overhead if batching is not properly done.&lt;/p>
&lt;p>In this project, we introduce Smart Batching, where we dynamically batch sequences in a fine-tuning dataset by their respective lengths. With this method, we aim to minimize the amount of zero padding required during sequence batching, which can result in improved and efficient fine-tuning and inference speeds. We also analyze this method with other commonly used batching practices (Longest Sequence, Random Shuffling) on valuable metrics such as runtime and model accuracy.&lt;/p>
&lt;h3 id="project-title">Project Title&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Large Language Models&lt;/code> &lt;code>Fine-Tuning&lt;/code> &lt;code>AI&lt;/code> &lt;code>Transformers&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Python, Pytorch, Large Language Models&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Moderate&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: [Daniel Wong]&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/daniel-wong/">Daniel Wong&lt;/a>, [Luanzheng &amp;ldquo;Lenny&amp;rdquo; Guo]&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/luanzheng-lenny-guo/">Luanzheng &amp;#34;Lenny&amp;#34; Guo&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="project-tasks-and-milestones">Project Tasks and Milestones&lt;/h3>
&lt;ul>
&lt;li>Implement an open source smart batching framework based on HuggingFace to allow for dynamically grouping sequences of similar token lengths into batches&lt;/li>
&lt;li>Analyze runtime, padding, and model accuracy with smart batching and other commonly used batching practices&lt;/li>
&lt;li>Apply smart batching with distributed fine-tuning and observe large language model outputs&lt;/li>
&lt;/ul></description></item><item><title>Disentangled Generation and Editing of Pathology Images</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/uci/pathology_image_disentanglement/</link><pubDate>Fri, 07 Feb 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/uci/pathology_image_disentanglement/</guid><description>&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> computational pathology, image generation, disentangled representations, latent space manipulation, deep learning&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong>
&lt;ul>
&lt;li>&lt;strong>Programming Languages:&lt;/strong>
&lt;ul>
&lt;li>Proficient in Python, with experience in machine learning libraries such as PyTorch or TensorFlow.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Generative Models:&lt;/strong>
&lt;ul>
&lt;li>Familiarity with Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and contrastive learning methods.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Data Analysis:&lt;/strong>
&lt;ul>
&lt;li>Image processing techniques, statistical analysis, and working with histopathology datasets.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Biomedical Knowledge (preferred):&lt;/strong>
&lt;ul>
&lt;li>Basic understanding of histology, cancer pathology, and biological image annotation.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Advanced&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours). The project involves substantial computational work, model development, and evaluation of generated pathology images.&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/xi-li/">Xi Li&lt;/a> (contact person), Mentor Name&lt;/li>
&lt;/ul>
&lt;h3 id="project-idea-description">&lt;strong>Project Idea Description&lt;/strong>&lt;/h3>
&lt;p>The project aims to advance the &lt;strong>generation and disentanglement of pathology images&lt;/strong>, focusing on precise control over key histological features. By leveraging generative models, we seek to create synthetic histological images where specific pathological characteristics can be independently controlled.&lt;/p>
&lt;h3 id="challenges-in-current-approaches">&lt;strong>Challenges in Current Approaches&lt;/strong>&lt;/h3>
&lt;p>Current methods in histopathology image generation often struggle with:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Feature Entanglement:&lt;/strong> Difficulty in isolating individual factors such as cancer presence, severity, or staining variations.&lt;/li>
&lt;li>&lt;strong>Lack of Control:&lt;/strong> Limited capability to manipulate specific pathological attributes without affecting unrelated features.&lt;/li>
&lt;li>&lt;strong>Consistency Issues:&lt;/strong> Generated images often fail to maintain realistic cellular distributions, affecting biological validity.&lt;/li>
&lt;/ol>
&lt;h3 id="project-motivation">&lt;strong>Project Motivation&lt;/strong>&lt;/h3>
&lt;p>This project proposes a &lt;strong>disentangled representation framework&lt;/strong> to address these limitations. By separating key features within the latent space, we aim to:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Control Histological Features:&lt;/strong> Adjust factors such as cancer presence, tumor grade, number of malignant cells, and staining methods.&lt;/li>
&lt;li>&lt;strong>Ensure Spatial Consistency:&lt;/strong> Maintain the natural distribution of cells during image reconstruction and editing.&lt;/li>
&lt;li>&lt;strong>Enable Latent Space Manipulation:&lt;/strong> Provide interpretable controls for editing and generating realistic histopathology images.&lt;/li>
&lt;/ul>
&lt;h3 id="project-objectives">&lt;strong>Project Objectives&lt;/strong>&lt;/h3>
&lt;ol>
&lt;li>&lt;strong>Disentangled Representation Learning:&lt;/strong>
&lt;ul>
&lt;li>Develop generative models (e.g., VAEs, GANs) to separate and control histological features.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Latent Space Manipulation:&lt;/strong>
&lt;ul>
&lt;li>Design mechanisms for intuitive editing of pathology images through latent space adjustments.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Spatial Consistency Validation:&lt;/strong>
&lt;ul>
&lt;li>Implement evaluation metrics to ensure that cell distribution remains biologically consistent during image generation.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;h3 id="project-deliverables">&lt;strong>Project Deliverables&lt;/strong>&lt;/h3>
&lt;ol>
&lt;li>&lt;strong>Generative Model Framework:&lt;/strong>
&lt;ul>
&lt;li>An open-source Python implementation for pathology image generation and editing.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Disentangled Latent Space Tools:&lt;/strong>
&lt;ul>
&lt;li>Tools for visualizing and manipulating latent spaces to control specific pathological features.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Evaluation Metrics:&lt;/strong>
&lt;ul>
&lt;li>Comprehensive benchmarks assessing image quality, feature disentanglement, and biological realism.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Documentation and Tutorials:&lt;/strong>
&lt;ul>
&lt;li>Clear guidelines and code examples for the research community to adopt and build upon this work.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;h3 id="impact">&lt;strong>Impact&lt;/strong>&lt;/h3>
&lt;p>By enabling precise control over generated histology images, this project will contribute to &lt;strong>data augmentation&lt;/strong>, &lt;strong>model interpretability&lt;/strong>, and &lt;strong>biological insight&lt;/strong> in computational pathology. The disentangled approach offers new opportunities for researchers to explore disease mechanisms, develop robust diagnostic models, and improve our understanding of cancer progression and tissue morphology.&lt;/p>
&lt;hr></description></item><item><title>Autograder</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/autograder/</link><pubDate>Thu, 06 Feb 2025 13:00:00 -0800</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/autograder/</guid><description>&lt;p>The &lt;a href="https://github.com/edulinq/autograder-server" target="_blank" rel="noopener">EduLinq Autograder&lt;/a> is an open source tool used by several courses at UCSC
to safely and quickly grade programming assignments.
Grading student code is something that may seem simple at first (you just need to run their code!),
but quickly becomes exceeding complex as you get more into the details.
Specifically, grading a student&amp;rsquo;s code securely while providing the &amp;ldquo;last mile&amp;rdquo; service of getting code from students
and sending results to instructors/TAs and the course&amp;rsquo;s LMS (e.g., Canvas) can be very difficult.
The Autograder provides all of this in a free and open source project.
The &lt;a href="https://linqs.org" target="_blank" rel="noopener">LINQS Lab&lt;/a> has made many contributions to the maintain and improve the Autograder.&lt;/p>
&lt;p>As an open source project, there are endless opportunities for development, improvements, and collaboration.
Here, we highlight some specific projects that will work well in the summer mentorship setting.&lt;/p>
&lt;p>All students interested in LINQS projects for OSRE/GSoC 2025 should fill out &lt;a href="https://forms.gle/RxGqnQiCDeHSX6tq6" target="_blank" rel="noopener">this form&lt;/a>.
Towards the end of the application window, we will contact those who we believe to be a good fit for a LINQS project.
The form will stop accepting responses once the application window closes.
Do not post on any of the project repositories about OSRE/GSoC
(e.g., comment on an issue that you want to tackle it as a part of OSRE/GSoC 2025).
Remember, these are active repositories that were not created for OSRE/GSoC.&lt;/p>
&lt;h3 id="llm-detection">LLM Detection&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>AI/ML&lt;/code> &lt;code>LLM&lt;/code> &lt;code>Research&lt;/code> &lt;code>Backend&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> software development, backend, systems, data munging, go, docker&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:linqs.osre25@gmail.com">Eriq Augustine&lt;/a>, &lt;a href="mailto:linqs.osre25@gmail.com">Fabrice Kurmann&lt;/a>, &lt;a href="mailto:linqs.osre25@gmail.com">Lise Getoor&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>As &lt;a href="https://en.wikipedia.org/wiki/Large_language_model" target="_blank" rel="noopener">Large Language Model (LLM)&lt;/a> tools like ChatGPT become more common and powerful,
instructors need tools to help determine if students are the actual authors of the code they submit.
More classical instances of plagiarism are often discovered by code similarity tools like &lt;a href="https://theory.stanford.edu/~aiken/moss/" target="_blank" rel="noopener">MOSS&lt;/a>.
However these tools are not sufficient for detecting code written not by a student,
but by an AI model like &lt;a href="https://en.wikipedia.org/wiki/ChatGPT" target="_blank" rel="noopener">ChatGPT&lt;/a> or &lt;a href="https://en.wikipedia.org/wiki/GitHub_Copilot" target="_blank" rel="noopener">GitHub Copilot&lt;/a>.&lt;/p>
&lt;p>The task for this project is to create a system that provides a score indicating the system&amp;rsquo;s confidence that a given piece of code was written by an AI tool and not a student.
This will supplement the existing code analysis tools in the Autograder.
There are many approaches to completing this task that will be considered.
A more software development approach can consist of levering exiting systems to create a production-ready system,
whereas a more research approach can consist of creating a novel approach complete with a paper and experiments.&lt;/p>
&lt;p>See Also:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/edulinq/autograder-server" target="_blank" rel="noopener">Repository for Autograder Server&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/autograder-server/issues/140" target="_blank" rel="noopener">GitHub Issue&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="code-analysis-gui">Code Analysis GUI&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Frontend&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> software development, frontend, data munging, js, css, go&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Easy&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:linqs.osre25@gmail.com">Eriq Augustine&lt;/a>, &lt;a href="mailto:linqs.osre25@gmail.com">Fabrice Kurmann&lt;/a>, &lt;a href="mailto:linqs.osre25@gmail.com">Lise Getoor&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The Autograder has existing functionality to analyze the code in a student&amp;rsquo;s submission for malicious content.
Relevant to this project is that the Autograder can run a pairwise similarity analysis against all submitted code.
This is how most existing software plagiarism systems detect offending code.
The existing infrastructure provides detailed statistics on code similarity,
but does not currently have a visual way to display this data.&lt;/p>
&lt;p>The task for this project is to create a web GUI using the Autograder REST API
to display the results of a code analysis.
The size of this project depends on how many of the existing features are going to be supported by the web GUI.&lt;/p>
&lt;p>See Also:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/edulinq/autograder-web" target="_blank" rel="noopener">Repository for Autograder Web GUI&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/autograder-server/issues/142" target="_blank" rel="noopener">GitHub Issue&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/autograder-server/blob/main/internal/model/analysis.go#L78" target="_blank" rel="noopener">Pairwise Code Analysis Type&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/autograder-py/blob/main/tests/api/testdata/courses/assignments/submit/analysis/course_assignments_submissions_analysis_pairwise_wait.json" target="_blank" rel="noopener">Sample API Data&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="web-gui">Web GUI&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Frontend&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> software development, frontend, js, css&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Easy&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:linqs.osre25@gmail.com">Eriq Augustine&lt;/a>, &lt;a href="mailto:linqs.osre25@gmail.com">Fabrice Kurmann&lt;/a>, &lt;a href="mailto:linqs.osre25@gmail.com">Lise Getoor&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The Autograder contains dozens of &lt;a href="https://github.com/edulinq/autograder-server/blob/main/resources/api.json" target="_blank" rel="noopener">API endpoints&lt;/a>,
most directly representing a piece of functionality exposed to the user.
All of these features are exposed in the &lt;a href="https://github.com/edulinq/autograder-py" target="_blank" rel="noopener">Autograder&amp;rsquo;s Python Interface&lt;/a>.
However, the Python interface is a purely command-line interface.
And although command-line interface are objectively (read: subjectively) the best,
a web GUI would be more accessible to a wider audience.
The autograder already has a web GUI,
but it does not cover all the features available in the Autograder.&lt;/p>
&lt;p>The task for this project is to augment the Autograder&amp;rsquo;s web GUI with more features.
Specifically, add support for more tools used to create and administer courses.&lt;/p>
&lt;p>See Also:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/edulinq/autograder-web" target="_blank" rel="noopener">Repository for Autograder Web GUI&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/autograder-server/issues/61" target="_blank" rel="noopener">GitHub Issue&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/autograder-server/blob/main/resources/api.json" target="_blank" rel="noopener">Autograder API Endpoints&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/autograder-py" target="_blank" rel="noopener">Autograder&amp;rsquo;s Python Interface&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>LMS Toolkit</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/lms-toolkit/</link><pubDate>Thu, 06 Feb 2025 13:00:00 -0800</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/lms-toolkit/</guid><description>&lt;p>The &lt;a href="https://github.com/edulinq/py-canvas" target="_blank" rel="noopener">EduLinq LMS Toolkit&lt;/a> (also called the &amp;ldquo;Canvas Tool&amp;rdquo; or &amp;ldquo;py-canvas&amp;rdquo;) is a suite of tools used by several courses at UCSC
to interact with Canvas from the command line or Python.
A &lt;a href="https://en.wikipedia.org/wiki/Learning_management_system" target="_blank" rel="noopener">Learning Management System&lt;/a> (LMS) is a system that institutions use to manage courses, assignments, students, and grades.
The most popular LMSs are
&lt;a href="https://en.wikipedia.org/wiki/Instructure#Canvas" target="_blank" rel="noopener">Canvas&lt;/a>,
&lt;a href="https://en.wikipedia.org/wiki/Blackboard_Learn" target="_blank" rel="noopener">Blackboard&lt;/a>,
&lt;a href="https://en.wikipedia.org/wiki/Moodle" target="_blank" rel="noopener">Moodle&lt;/a>,
and &lt;a href="https://en.wikipedia.org/wiki/D2L#Brightspace" target="_blank" rel="noopener">Brightspace&lt;/a>.
These tools can be very helpful, especially from an administrative standpoint, but can be hard to interact with.
They can be especially difficult when instructors and TAs want to do something that is not explicitly supported by their built-in GUIs
(e.g., when an instructor wants to use a special grading policy).
The LMS Toolkit project is an effort to create a single suite of command-line tools (along with a Python interface)
to connect to all the above mentioned LMSs in a simple and uniform way.
So, not only can instructors and TAs easily access the modify the data held in an LMS (like a student&amp;rsquo;s grades),
but they can also do it the same way on any LMS.
The &lt;a href="https://linqs.org" target="_blank" rel="noopener">LINQS Lab&lt;/a> has made many contributions to the maintain and improve the Quiz Composer.&lt;/p>
&lt;p>Currently, the LMS Toolkit only supports Canvas, but this suite of projects hopes to not only expand existing support,
but add support for more LMSs.&lt;/p>
&lt;p>All students interested in LINQS projects for OSRE/GSoC 2025 should fill out &lt;a href="https://forms.gle/RxGqnQiCDeHSX6tq6" target="_blank" rel="noopener">this form&lt;/a>.
Towards the end of the application window, we will contact those who we believe to be a good fit for a LINQS project.
The form will stop accepting responses once the application window closes.
Do not post on any of the project repositories about OSRE/GSoC
(e.g., comment on an issue that you want to tackle it as a part of OSRE/GSoC 2025).
Remember, these are active repositories that were not created for OSRE/GSoC.&lt;/p>
&lt;h3 id="advanced-canvas-support">Advanced Canvas Support&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Backend&lt;/code> &lt;code>Teaching Tools&lt;/code> &lt;code>API&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> software development, backend, rest api, data munging, http request inspection, python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:linqs.osre25@gmail.com">Eriq Augustine&lt;/a>, &lt;a href="mailto:linqs.osre25@gmail.com">Batuhan Salih&lt;/a>, &lt;a href="mailto:linqs.osre25@gmail.com">Lise Getoor&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The LMS Toolkit already has basic read-write support for core Canvas functionality (working with grades and assignments).
However, there are still many more features that can be supported such as
&lt;a href="https://github.com/edulinq/py-canvas/issues/17" target="_blank" rel="noopener">group management&lt;/a>,
&lt;a href="https://github.com/edulinq/py-canvas/issues/7" target="_blank" rel="noopener">quiz management&lt;/a>,
&lt;a href="https://github.com/edulinq/py-canvas/issues/10" target="_blank" rel="noopener">quiz statistics&lt;/a>,
and &lt;a href="https://github.com/edulinq/py-canvas/issues/19" target="_blank" rel="noopener">assignment statuses&lt;/a>.&lt;/p>
&lt;p>The task for this project is to implement chose of set of advanced Canvas features to support
(not limited to those features mentioned above),
design an LMS-agnostic way to support those features,
and implement those features.
The flexibility in the features chosen to implement account for the variable size of this project.&lt;/p>
&lt;p>See Also:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/edulinq/py-canvas" target="_blank" rel="noopener">Repository for LMS Toolkit&lt;/a>&lt;/li>
&lt;li>GitHub Issues
&lt;ul>
&lt;li>&lt;a href="https://github.com/edulinq/py-canvas/issues/17" target="_blank" rel="noopener">Group Management&lt;/a>,&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/py-canvas/issues/7" target="_blank" rel="noopener">Quiz Management&lt;/a>,&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/py-canvas/issues/10" target="_blank" rel="noopener">Quiz Statistics&lt;/a>,&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/py-canvas/issues/19" target="_blank" rel="noopener">Assignment Statuses&lt;/a>.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="new-lms-support-moodle">New LMS Support: Moodle&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Backend&lt;/code> &lt;code>Teaching Tools&lt;/code> &lt;code>API&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> software development, backend, rest api, data munging, http request inspection, python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:linqs.osre25@gmail.com">Eriq Augustine&lt;/a>, &lt;a href="mailto:linqs.osre25@gmail.com">Batuhan Salih&lt;/a>, &lt;a href="mailto:linqs.osre25@gmail.com">Lise Getoor&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The goal of the LMS toolkit is to provide a single interface for all LMSs.
It is a lofty goal, however there is currently only support for &lt;a href="https://en.wikipedia.org/wiki/Instructure#Canvas" target="_blank" rel="noopener">Canvas&lt;/a>.
&lt;a href="https://en.wikipedia.org/wiki/Moodle" target="_blank" rel="noopener">Moodle&lt;/a> is one of the more popular LMSs.
Naturally, the LMS Toolkit wants to support Moodle as well.
Moodle is open source, so adding support in the LMS Toolkit should not be too challenging.&lt;/p>
&lt;p>The task for this project is to add basic support for the Moodle LMS.
It is not necessary to support all the same features that are supported for Canvas,
but at least the core features of score and assignment management should be implemented.&lt;/p>
&lt;p>See Also:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/edulinq/py-canvas" target="_blank" rel="noopener">Repository for LMS Toolkit&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://en.wikipedia.org/wiki/Moodle" target="_blank" rel="noopener">Moodle Wiki Page&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/py-canvas/issues/22" target="_blank" rel="noopener">GitHub Issue&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="new-lms-support-blackboard">New LMS Support: Blackboard&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Backend&lt;/code> &lt;code>Teaching Tools&lt;/code> &lt;code>API&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> software development, backend, rest api, data munging, http request inspection, python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:linqs.osre25@gmail.com">Eriq Augustine&lt;/a>, &lt;a href="mailto:linqs.osre25@gmail.com">Batuhan Salih&lt;/a>, &lt;a href="mailto:linqs.osre25@gmail.com">Lise Getoor&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The goal of the LMS toolkit is to provide a single interface for all LMSs.
It is a lofty goal, however there is currently only support for &lt;a href="https://en.wikipedia.org/wiki/Instructure#Canvas" target="_blank" rel="noopener">Canvas&lt;/a>.
&lt;a href="https://en.wikipedia.org/wiki/Blackboard_Learn" target="_blank" rel="noopener">Blackboard&lt;/a> (also called &amp;ldquo;Blackboard Learn&amp;rdquo;) is one of the more popular LMSs.
Naturally, the LMS Toolkit wants to support Blackboard as well.
However, a challenge in supporting Blackboard is that it is not open source (unlike Canvas).
Therefore, support and testing on Blackboard may be very challenging.&lt;/p>
&lt;p>The task for this project is to add basic support for the Blackboard LMS.
It is not necessary to support all the same features that are supported for Canvas,
but at least the core features of score and assignment management should be implemented.
The closed nature of Blackboard makes this a challenging and uncertain project.&lt;/p>
&lt;p>See Also:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/edulinq/py-canvas" target="_blank" rel="noopener">Repository for LMS Toolkit&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://en.wikipedia.org/wiki/Blackboard_Learn" target="_blank" rel="noopener">Blackboard Wiki Page&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/py-canvas/issues/21" target="_blank" rel="noopener">GitHub Issue&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="new-lms-support-brightspace">New LMS Support: Brightspace&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Backend&lt;/code> &lt;code>Teaching Tools&lt;/code> &lt;code>API&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> software development, backend, rest api, data munging, http request inspection, python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:linqs.osre25@gmail.com">Eriq Augustine&lt;/a>, &lt;a href="mailto:linqs.osre25@gmail.com">Batuhan Salih&lt;/a>, &lt;a href="mailto:linqs.osre25@gmail.com">Lise Getoor&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The goal of the LMS toolkit is to provide a single interface for all LMSs.
It is a lofty goal, however there is currently only support for &lt;a href="https://en.wikipedia.org/wiki/Instructure#Canvas" target="_blank" rel="noopener">Canvas&lt;/a>.
&lt;a href="https://en.wikipedia.org/wiki/D2L#Brightspace" target="_blank" rel="noopener">D2L Brightspace&lt;/a> is one of the more popular LMSs.
Naturally, the LMS Toolkit wants to support Brightspace as well.
However, a challenge in supporting Brightspace is that it is not open source (unlike Canvas).
Therefore, support and testing on Brightspace may be very challenging.&lt;/p>
&lt;p>The task for this project is to add basic support for the Brightspace LMS.
It is not necessary to support all the same features that are supported for Canvas,
but at least the core features of score and assignment management should be implemented.
The closed nature of Brightspace makes this a challenging and uncertain project.&lt;/p>
&lt;p>See Also:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/edulinq/py-canvas" target="_blank" rel="noopener">Repository for LMS Toolkit&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://en.wikipedia.org/wiki/D2L#Brightspace" target="_blank" rel="noopener">Brightspace Wiki Page&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/py-canvas/issues/23" target="_blank" rel="noopener">GitHub Issue&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="testing--ci-infrastructure">Testing / CI Infrastructure&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Backend&lt;/code> &lt;code>Teaching Tools&lt;/code> &lt;code>Testing&lt;/code> &lt;code>CI&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> software development, backend, testing, ci, docker&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:linqs.osre25@gmail.com">Eriq Augustine&lt;/a>, &lt;a href="mailto:linqs.osre25@gmail.com">Batuhan Salih&lt;/a>, &lt;a href="mailto:linqs.osre25@gmail.com">Lise Getoor&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The goal of the LMS toolkit is to provide a single interface for all LMSs.
This means that our system must communicate with several different (the LMSs),
each with their own systems, data patterns, versions, and quirks.
Testing will be essential to ensure that our tools keep working as the different LMSs evolve and update.
The LMS Toolkit currently tests with Canvas by
&lt;a href="https://github.com/edulinq/py-canvas/tree/main/tests/api/test_cases" target="_blank" rel="noopener">mocking API responses&lt;/a>.
However, this tactic does not scale well with multiple LMSs (and multiple versions of each system).
A more scalable approach would be to have test instances of the different LMSs that our testing infrastructure can interact with
both interactively and in &lt;a href="https://en.wikipedia.org/wiki/Continuous_integration" target="_blank" rel="noopener">continuous integration&lt;/a> (CI).&lt;/p>
&lt;p>The task for this project is to create testing infrastructure that
connects to test instances of different LMS systems (e.g., Canvas).
This task does not require that all the LMSs in this document are used,
but the testing infrastructure should be robust enough to support them all.
The open source LMSs (Canvas and Moodle) will likely be much easier to setup than the others,
and should be targeted first.
We should be able to run tests locally as well as in CI,
and will likely heavily use &lt;a href="https://en.wikipedia.org/wiki/Docker_%28software%29" target="_blank" rel="noopener">Docker&lt;/a> containers.&lt;/p>
&lt;p>See Also:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/edulinq/py-canvas" target="_blank" rel="noopener">Repository for LMS Toolkit&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/py-canvas/issues/24" target="_blank" rel="noopener">GitHub Issue&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/py-canvas/tree/main/tests/api/test_cases" target="_blank" rel="noopener">Mocked API Responses&lt;/a>.&lt;/li>
&lt;/ul></description></item><item><title>Quiz Composer</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/quiz-composer/</link><pubDate>Thu, 06 Feb 2025 13:00:00 -0800</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/quiz-composer/</guid><description>&lt;p>The &lt;a href="https://github.com/edulinq/quizgen" target="_blank" rel="noopener">EduLinq Quiz Composer&lt;/a> (also called the &amp;ldquo;Quiz Generator&amp;rdquo;) is a tool used by several courses at UCSC
to create and maintain platform-agnostic quizzes (including exams and worksheets).
Knowledge assessments like quizzes, exams, and tests are a core part of the learning process for many courses.
However maintaining banks of questions, collaborating on new questions, and converting quizzes to new formats can use up a lot of time,
taking time away from actually working on improving course materials.
The Quiz Composer helps by providing a single text-based format that can be stored in a repository and &amp;ldquo;compiled&amp;rdquo; into many different formats including:
HTML, LaTeX, PDF, Canvas, GradeScope, and QTI.
The &lt;a href="https://linqs.org" target="_blank" rel="noopener">LINQS Lab&lt;/a> has made many contributions to the maintain and improve the Quiz Composer.&lt;/p>
&lt;p>As an open source project, there are endless opportunities for development, improvements, and collaboration.
Here, we highlight some specific projects that will work well in the summer mentorship setting.&lt;/p>
&lt;p>All students interested in LINQS projects for OSRE/GSoC 2025 should fill out &lt;a href="https://forms.gle/RxGqnQiCDeHSX6tq6" target="_blank" rel="noopener">this form&lt;/a>.
Towards the end of the application window, we will contact those who we believe to be a good fit for a LINQS project.
The form will stop accepting responses once the application window closes.
Do not post on any of the project repositories about OSRE/GSoC
(e.g., comment on an issue that you want to tackle it as a part of OSRE/GSoC 2025).
Remember, these are active repositories that were not created for OSRE/GSoC.&lt;/p>
&lt;h3 id="canvas-import">Canvas Import&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Backend&lt;/code> &lt;code>Teaching Tools&lt;/code> &lt;code>API&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> software development, backend, rest api, data munging, http request inspection, python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium (175 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:linqs.osre25@gmail.com">Eriq Augustine&lt;/a>, &lt;a href="mailto:linqs.osre25@gmail.com">Lucas Ellenberger&lt;/a>, &lt;a href="mailto:linqs.osre25@gmail.com">Lise Getoor&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The Quiz Composer houses quizzes and quiz questions in a simple and unambiguous format based
on &lt;a href="https://en.wikipedia.org/wiki/JSON" target="_blank" rel="noopener">JSON&lt;/a> and &lt;a href="https://en.wikipedia.org/wiki/Markdown" target="_blank" rel="noopener">Markdown&lt;/a> (specifically, the &lt;a href="https://commonmark.org" target="_blank" rel="noopener">CommonMark specification&lt;/a>).
This allows the Quiz Composer to unambiguously create versions of the same quiz in many different formats.
However, creating a quiz in the Quiz Composer format can be a daunting task for those not familiar with JSON or Markdown.
Instead, it would be easier for people to import quizzes from another format into the Quiz Composer format,
and then edit it as they see fit.
Unfortunately not all other quiz formats, namely Canvas in this case, are unambiguous.&lt;/p>
&lt;p>The task for this project is to implement the functionality of importing quizzes from Canvas to the standard Quiz Composer format.
The unambiguous nature of Canvas quizzes makes this task non-trivial,
and adds an additional element of design decisions to this task.
It will be impossible to import quizzes 100% correctly,
but we want to be able to get close enough that most people can import their quizzes without issue.&lt;/p>
&lt;p>See Also:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/edulinq/quizgen" target="_blank" rel="noopener">Repository for Quiz Composer&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/quizgen/issues/27" target="_blank" rel="noopener">GitHub Issue&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="google-forms-export">Google Forms Export&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Backend&lt;/code> &lt;code>Teaching Tools&lt;/code> &lt;code>API&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> software development, backend, rest api, data munging, python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium (175 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:linqs.osre25@gmail.com">Eriq Augustine&lt;/a>, &lt;a href="mailto:linqs.osre25@gmail.com">Lucas Ellenberger&lt;/a>, &lt;a href="mailto:linqs.osre25@gmail.com">Lise Getoor&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The Quiz Composer can export quizzes to many different formats,
each with a varying level of interactivity and feature support.
For example, quizzes can be exported to PDFs which will be printed and the students will just write down their answers to be checked in the future.
Quizzes can also be exported to interactive platforms like Canvas where students can enter answers that may be automatically checked with feedback immediately provided to the student.
On potential platform with functionality somewhere between the above two examples is &lt;a href="https://workspace.google.com/products/forms/" target="_blank" rel="noopener">Google Forms&lt;/a>.
&amp;ldquo;Forms&amp;rdquo; (an entity on Google Forms) can be something like a survey or (as of more recently) a quiz.&lt;/p>
&lt;p>The task for this project is to add support for exporting quizzes from the Quiz Composer to Google Forms.
There is a large overlap in the quiz features supported in Canvas (which the Quiz Composer already supports) and Google Forms,
so most settings should be fairly straightforward.
There may be some design work around deciding what features are specific to one quiz platform
and what features can be abstracted to work across several platforms.&lt;/p>
&lt;p>See Also:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/edulinq/quizgen" target="_blank" rel="noopener">Repository for Quiz Composer&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/quizgen/issues/19" target="_blank" rel="noopener">GitHub Issue&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="template-questions">Template Questions&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Backend&lt;/code> &lt;code>Teaching Tools&lt;/code> &lt;code>API&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> software development, backend, data munging, python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate-Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:linqs.osre25@gmail.com">Eriq Augustine&lt;/a>, &lt;a href="mailto:linqs.osre25@gmail.com">Lucas Ellenberger&lt;/a>, &lt;a href="mailto:linqs.osre25@gmail.com">Lise Getoor&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Questions in the Quiz Composer are described using &lt;a href="https://en.wikipedia.org/wiki/JSON" target="_blank" rel="noopener">JSON&lt;/a> and &lt;a href="https://en.wikipedia.org/wiki/Markdown" target="_blank" rel="noopener">Markdown&lt;/a>
files which contain the question prompt, possible answers, and the correct answer.
(Of course there are many differ &lt;a href="https://github.com/edulinq/quizgen/blob/main/docs/question-types.md" target="_blank" rel="noopener">question types&lt;/a>,
each with different semantics and requirements.)
However, a limitation of this is that each question is always the same.
You can have multiple copies of a question with slightly different prompts, numbers, and answers;
but you are still limited to each question being static and unchanging.
It would be useful to have &amp;ldquo;template questions&amp;rdquo; that can dynamically create static questions from a template
and collection of replacement data.&lt;/p>
&lt;p>The task for this project is to add support for the &amp;ldquo;template questions&amp;rdquo; discussed above.
Much of the high-level design work for this issue has &lt;a href="https://github.com/edulinq/quizgen/issues/26" target="_blank" rel="noopener">already been completed&lt;/a>.
But there is still the implementation and low-level design decision left to do.&lt;/p>
&lt;p>See Also:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/edulinq/quizgen" target="_blank" rel="noopener">Repository for Quiz Composer&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/edulinq/quizgen/issues/26" target="_blank" rel="noopener">GitHub Issue&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>LLMSeqRec: LLM Enhanced Contextual Sequential Recommender</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/sf/llmseqrec/</link><pubDate>Thu, 06 Feb 2025 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/sf/llmseqrec/</guid><description>&lt;h3 id="project-description">Project Description&lt;/h3>
&lt;p>Sequential Recommender Systems are widely used in scientific and business applications to analyze and predict patterns over time. In biology and ecology, they help track species behavior by suggesting related research on migration patterns and environmental changes. Medical applications include personalized treatment recommendations based on patient history and predicting disease progression. In physics and engineering, these systems optimize experimental setups by suggesting relevant past experiments or simulations. Environmental and climate science applications include forecasting climate trends and recommending datasets for monitoring deforestation or pollution. In business and e-commerce, sequential recommenders enhance user experiences by predicting consumer behavior, suggesting personalized products, and optimizing marketing strategies based on browsing and purchase history. By leveraging sequential dependencies, these recommender systems enhance research efficiency, knowledge discovery, and business decision-making across various domains. Traditional sequential recommendation systems rely on historical user interactions to predict future preferences, but they often struggle with capturing complex contextual dependencies and adapting to dynamic user behaviors. Existing models primarily use predefined embeddings and handcrafted features, limiting their ability to generalize across diverse recommendation scenarios. To address these challenges, we propose LLM Enhanced Contextual Sequential Recommender (LLMSeqRec), which leverages Large Language Models (LLMs) to enrich sequential recommendations with deep contextual understanding and adaptive reasoning.
By integrating LLM-generated embeddings and contextual representations, LLMSeqRec enhances user intent modeling, cold-start recommendations, and long-range dependencies in sequential data. Unlike traditional models that rely solely on structured interaction logs, LLMSeqRec dynamically interprets and augments sequences with semantic context, leading to more accurate and personalized recommendations. This fusion of LLM intelligence with sequential modeling enables a more scalable, adaptable, and explainable recommender system, bridging the gap between traditional sequence-based approaches and advanced AI-driven recommendations.&lt;/p>
&lt;h3 id="project-objectives">Project Objectives&lt;/h3>
&lt;p>Aligned with the vision of the 2025 Open Source Research Experience (OSRE), this project aims to develop an LLM-Enhanced Contextual Sequential Recommender (LLMSeqRec) to improve sequential recommendation accuracy across various scientific and business applications. Sequential recommender systems are widely used to analyze and predict patterns over time, assisting in fields such as biology, ecology, medicine, physics, engineering, environmental science, and e-commerce. However, traditional models often struggle with capturing complex contextual dependencies and adapting to dynamic user behaviors, as they primarily rely on vanilla sequential Id orders.
To address these limitations, this project will leverage Large Language Models (LLMs) to enhance context-aware sequential recommendations by dynamically integrating LLM-generated embeddings and contextual representations. The core challenge lies in designing LLMSeqRec, a unified and scalable model capable of enriching user intent modeling, mitigating cold-start issues, and capturing long-range dependencies within sequential data. Unlike conventional systems that rely solely on structured interaction logs, LLMSeqRec will interpret and augment sequences with semantic context, resulting in more accurate, adaptable, and explainable recommendations. Below is an outline of the methodologies and models that will be developed in this project:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Step 1: Data Preprocessing &amp;amp; Feature Creation&lt;/strong>:
Develop a data processing pipeline to parse user’s sequential interaction behaviors into sequential data points for LLM-based embeddings and contextual sequential transformer modeling; Extract user behavior sequences, items’ metadata, and temporal patterns to create context-aware sequential representations for training, validation and testing; The data source can be from Amazon open public data or Movie Lense data set. The data points creation can follow SASRec (in the reference 1).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Step 2: Model Development&lt;/strong>:
Design and implement LLM-enhanced sequential recommendation models, integrating pretrained language models to augment user-item interactions with semantic context; Develop an adaptive mechanism to incorporate external contextual signals, such as product descriptions, reviews into the sequential recommendation process; The baseline model can be SASRec pytorch implementation.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Step 3: Evaluation&lt;/strong>: :
Benchmark LLMSeqRec against state-of-the-art sequential recommenders, evaluating on accuracy, NDCG and cold-start performance; Conduct ablation studies to analyze the impact of LLM-generated embeddings on recommendation quality; Optimize model inference speed and efficiency for real-time recommendation scenarios.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="project-deliverables">Project Deliverables&lt;/h3>
&lt;p>This project will deliver three components, software, model training, validation and performance evaluation and demo. The software which implements the above LLMSeqRec model will be hosted on the github repo as open-access repositories. The evaluation results and demo will be published along the github repo .&lt;/p>
&lt;h3 id="llmseqrec">LLMSeqRec&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: LLM Enhanced Contextual Sequential Recommender&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Proficiency in Python, Pytorch, Github, Self-attention, Transformer&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Difficult&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/linsey-pang/">Linsey Pang&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/bin-dong/">Bin Dong&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="references">References:&lt;/h3>
&lt;ul>
&lt;li>Self-Attentive Sequential Recommendation (SASRec)&lt;/li>
&lt;li>BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer&lt;/li>
&lt;li>Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks&lt;/li>
&lt;li>Amazon Dataset: &lt;a href="https://cseweb.ucsd.edu/~jmcauley/datasets.html#amazon_reviews" target="_blank" rel="noopener">https://cseweb.ucsd.edu/~jmcauley/datasets.html#amazon_reviews&lt;/a>&lt;/li>
&lt;li>Movie Lense Data: &lt;a href="https://grouplens.org/datasets/movielens/" target="_blank" rel="noopener">https://grouplens.org/datasets/movielens/&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>ReIDMM: Re-identifying Multiple Objects across Multiple Streams</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/lbl/reidmm/</link><pubDate>Thu, 06 Feb 2025 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/lbl/reidmm/</guid><description>&lt;h3 id="project-description">Project Description&lt;/h3>
&lt;p>Re-identifying multiple objects across multiple streams (ReIDMM) is essential in scientific research and various industries. It involves tracking and analyzing entities across different viewpoints or time frames. In astronomy, ReIDMM helps track celestial objects like asteroids and space debris using multiple observatories. In biology and ecology, it enables the identification of animals across different camera traps and aids in tracking microscopic organisms in laboratory studies. In physics and engineering, it is used for tracking particles in high-energy physics experiments, monitoring structural changes in materials, and identifying robots or drones in lab automation. Beyond scientific applications, ReIDMM plays a critical role in industries such as retail, where it tracks customer behavior across multiple stores and improves sales and prevents theft. In smart cities, it supports traffic monitoring by identifying vehicles across intersections for improved traffic flow management. In manufacturing, it enables supply chain tracking by locating packages across conveyor belts and warehouse cameras. In autonomous systems, ReIDMM enhances multi-camera sensor fusion and warehouse robotics by identifying pedestrians, obstacles, and objects across different camera views.&lt;/p>
&lt;h3 id="project-objectives">Project Objectives&lt;/h3>
&lt;p>Aligned with the vision of the 2025 Open Source Research Experience (OSRE), this project aims to develop an open-source algorithm for multiple-object re-identification across diverse open-source data streams. As highlighted earlier, this method is expected to have wide-ranging applications in both scientific research and industry. Utilizing an open-source dataset, our focus will be on re-identifying common objects such as vehicles and pedestrians. The primary challenge lies in designing a unified algorithm, ReIDMM, capable of performing robust multi-object re-identification across multiple streams. Users will be able to tag any object as a target in a video or image for tracking across streams. Below is an outline of the algorithms to be developed in this project:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Step 1: Target Object Identification&lt;/strong>: Randomly select a target object from an image or video using object detection models such as YOLOv7. These models detect objects by generating bounding boxes around them. Target objects could include vehicles, pedestrians, animals, or other recognizable entities. This step ensures an initial object of interest is chosen for re-identification.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Step 2: Feature Extraction and Embedding&lt;/strong>: Once the target object is identified, extract relevant features such as bounding box coordinates, timestamp, location metadata (if available), and visual characteristics. A multimodal embedding approach is used, where these features are transformed into a numerical representation (embedding vector) that captures the object&amp;rsquo;s unique identity. This allows for efficient comparison across different images or videos.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Step 3: Searching and Matching&lt;/strong>: To find the target object in other images or videos: (1) Extract embeddings of all objects detected in the other images/videos; (2) Compute similarity between the target object’s embedding and those of all detected objects using metrics like cosine similarity or Euclidean distance. (3) Rank objects by similarity, returning the most probable matches. The highest-ranked results are likely to be the same object observed from different angles, lighting conditions, or time frames.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="project-deliverables">Project Deliverables&lt;/h3>
&lt;p>This project will deliver three things, software, evaluation results and demo. The software which implements the above ReIDMM algorithm will be hosted on the github repo as open-access repositories. The evaluation results and demo will be published along the github repo.&lt;/p>
&lt;h3 id="reidmm">ReIDMM&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: ReIDMM: Re-identifying Multiple Objects across Multiple Streams`&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Proficient in Python, Experience with images processing, machine learning&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Difficult&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/bin-dong/">Bin Dong&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/linsey-pang/">Linsey Pang&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="reference">Reference:&lt;/h3>
&lt;ul>
&lt;li>&lt;a href="https://medium.datadriveninvestor.com/multiple-object-tracking-using-person-re-identification-f9b7360cda1a" target="_blank" rel="noopener">multiple-object-tracking-using-person&lt;/a>&lt;/li>
&lt;li>Dataset: &lt;a href="https://paperswithcode.com/task/vehicle-re-identification" target="_blank" rel="noopener">Vehicle re-identification dataset and paper&lt;/a> and &lt;a href="https://paperswithcode.com/task/person-re-identification" target="_blank" rel="noopener">Person re-identification data and paper&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Seam: Kubernetes-Aware Programmable Networking &amp; Cloud Provisioning</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsd/seam/</link><pubDate>Wed, 05 Feb 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsd/seam/</guid><description>&lt;p>Seam is a project focused on building a Kubernetes-aware programmable networking and cloud provisioning system. It combines Python, Kubernetes, P4 programming, and SmartNICs to create a robust framework for managing cloud resources, optimizing networking, and provisioning virtual machines. Students will learn about cutting-edge technologies such as Kubernetes, Docker, P4 programming, SmartNICs, KubeVirt, Prometheus, Grafana, and Flask, while working on real-world applications in high-performance computing environments. This project will help students understand the intricacies of cloud resource management and programmable networking, providing them with valuable skills for future careers in software engineering, networking, and DevOps.&lt;/p>
&lt;p>The project involves creating a &lt;strong>Python library&lt;/strong> for provisioning Kubernetes resources, including virtual machines and networking, using tools such as &lt;strong>KubeVirt&lt;/strong> for VM provisioning and &lt;strong>ESnet SENSE&lt;/strong> for network configuration. The library will also integrate monitoring solutions with &lt;strong>Prometheus&lt;/strong> and &lt;strong>Grafana&lt;/strong> for real-time metrics collection and visualization. Students will develop &lt;strong>Flask-based dashboards&lt;/strong> for managing these resources, implement automated pipelines using &lt;strong>GitLab CI/CD&lt;/strong>, and explore full-stack web development, database management with &lt;strong>PostgreSQL&lt;/strong>, and API design.&lt;/p>
&lt;p>In addition, students will gain hands-on experience with &lt;strong>programmable networking&lt;/strong> using &lt;strong>P4&lt;/strong> and &lt;strong>SmartNICs&lt;/strong>, learning how to write P4 programs for dynamic routing, security, and network policy enforcement at the hardware level. The integration of &lt;strong>Kubernetes&lt;/strong>, &lt;strong>SmartNICs&lt;/strong>, and &lt;strong>P4 programming&lt;/strong> will allow for advanced optimizations and efficient management of high-performance cloud environments.&lt;/p>
&lt;p>Thus far, the framework has been developed to allow provisioning of resources within Kubernetes, integrating Prometheus and Grafana for monitoring, and providing an interface for users to manage cloud resources. We aim to extend this by incorporating advanced network policies and improving the web interface.&lt;/p>
&lt;h3 id="seam--kubernetes-resource-provisioning-and-management">Seam / Kubernetes Resource Provisioning and Management&lt;/h3>
&lt;p>The proposed work includes expanding the Python library to support comprehensive &lt;strong>Kubernetes resource provisioning&lt;/strong>, &lt;strong>network management&lt;/strong>, and &lt;strong>virtual machine provisioning&lt;/strong> using &lt;strong>KubeVirt&lt;/strong>. Students will enhance the current implementation to allow users to define &lt;strong>resource limits, CPU/GPU quotas, and network policies&lt;/strong>. They will also integrate with &lt;strong>ESnet SENSE&lt;/strong> to facilitate &lt;strong>L2 networking&lt;/strong>, and explore the use of &lt;strong>Prometheus&lt;/strong> and &lt;strong>Grafana&lt;/strong> for real-time performance monitoring and metrics collection.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Kubernetes, Python, Cloud Computing, Networking, Programmable Networking, Monitoring, CI/CD&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, Kubernetes, P4 programming, KubeVirt, ESnet SENSE, Docker, GitLab CI/CD, Prometheus, Grafana, PostgreSQL, Flask&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mohammad-firas-sada/">Mohammad Firas Sada&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/thomas-a.-defanti/">Thomas A. DeFanti&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jeffrey-weekley/">Jeffrey Weekley&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/derek-weitzel/">Derek Weitzel&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/dmitry-mishin/">Dmitry Mishin&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="seam--full-stack-web-development-and-dashboard">Seam / Full-Stack Web Development and Dashboard&lt;/h3>
&lt;p>The proposed work includes building a &lt;strong>Flask-based web dashboard&lt;/strong> using &lt;strong>Bootstrap&lt;/strong> for UI, integrating it with the &lt;strong>Python library&lt;/strong> to enable users to easily provision resources, monitor network performance, and track resource usage in real-time. The dashboard will support &lt;strong>role-based access control (RBAC)&lt;/strong>, allowing for secure multi-user management. Students will also integrate &lt;strong>PostgreSQL&lt;/strong> for managing and storing configurations, logs, and performance metrics.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Full-Stack Web Development, Flask, Bootstrap, PostgreSQL, Kubernetes, Monitoring, DevOps&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Web Development, Flask, Bootstrap, PostgreSQL, API Development, Kubernetes&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium to Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mohammad-firas-sada/">Mohammad Firas Sada&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/thomas-a.-defanti/">Thomas A. DeFanti&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jeffrey-weekley/">Jeffrey Weekley&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/derek-weitzel/">Derek Weitzel&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/dmitry-mishin/">Dmitry Mishin&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="seam--cicd-and-gitlab-integration">Seam / CI/CD and GitLab Integration&lt;/h3>
&lt;p>The proposed work includes setting up &lt;strong>GitLab CI/CD pipelines&lt;/strong> for automated &lt;strong>testing, deployment&lt;/strong>, and &lt;strong>maintenance&lt;/strong> of the Python library, Kubernetes resources, and web dashboard. Students will automate the deployment of &lt;strong>P4 programs&lt;/strong>, &lt;strong>Kubernetes deployments&lt;/strong>, and &lt;strong>networking configurations&lt;/strong>. They will also focus on &lt;strong>unit testing, integration testing&lt;/strong>, and the &lt;strong>automation of benchmarking experiments&lt;/strong> to ensure reproducibility of results.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> CI/CD, GitLab, Python, Kubernetes, DevOps, Testing, Automation&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> GitLab CI/CD, Python, Kubernetes, Docker, Automation, Testing, Benchmarking&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium to Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mohammad-firas-sada/">Mohammad Firas Sada&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/thomas-a.-defanti/">Thomas A. DeFanti&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jeffrey-weekley/">Jeffrey Weekley&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/derek-weitzel/">Derek Weitzel&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/dmitry-mishin/">Dmitry Mishin&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="seam--networking--smartnic-programming">Seam / Networking &amp;amp; SmartNIC Programming&lt;/h3>
&lt;p>The proposed work includes writing &lt;strong>P4 programs&lt;/strong> to control network traffic flow, enforce network security policies, and optimize data transfer across the Kubernetes cluster. Students will gain experience with &lt;strong>SmartNICs&lt;/strong> (Xilinx Alveo U55C, SN1000, NVIDIA Bluefield 2) and &lt;strong>Tofino switches&lt;/strong>, using P4 to write &lt;strong>network policies&lt;/strong> and integrate with the &lt;strong>Kubernetes network layer&lt;/strong> (Multus, Calico). Students will also explore &lt;strong>gRPC APIs&lt;/strong> for dynamically adjusting network policies and provisioning virtual network interfaces in real time.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Networking, P4 Programming, SmartNICs, Kubernetes Networking, Cloud Computing&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> P4, Networking, SmartNICs, Kubernetes Networking, Multus, Calico, gRPC&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mohammad-firas-sada/">Mohammad Firas Sada&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/thomas-a.-defanti/">Thomas A. DeFanti&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jeffrey-weekley/">Jeffrey Weekley&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/derek-weitzel/">Derek Weitzel&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/dmitry-mishin/">Dmitry Mishin&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>WaDAR</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/wadar/</link><pubDate>Wed, 05 Feb 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/wadar/</guid><description>&lt;p>&lt;a href="https://github.com/jlab-sensing/wadar" target="_blank" rel="noopener">WaDAR&lt;/a> (Water Radar) is an innovative, low-cost, hybrid approach to soil moisture sensing that combines the benefits of in-ground (in situ) and remote sensing technologies. Traditional soil moisture measurement methods suffer from drawbacks: in situ sensors are expensive and difficult to maintain, while remote sensing offers lower accuracy and resolution. WaDAR bridges this gap by using inexpensive underground backscatter tags paired with above-ground radars, enabling completely wireless, high-resolution soil moisture monitoring.&lt;/p>
&lt;h2 id="key-features-of-wadar">Key Features of WaDAR&lt;/h2>
&lt;ul>
&lt;li>Uses &lt;strong>RF backscatter tags&lt;/strong> buried underground to provide high-accuracy soil moisture readings.&lt;/li>
&lt;li>Uses &lt;strong>ultra-wideband radar&lt;/strong> for above-ground sensing.&lt;/li>
&lt;li>Offers an average error of just 1.4%, comparable to state-of-the-art commercial sensors.&lt;/li>
&lt;li>Reduces deployment costs significantly, making it accessible for widespread agricultural use.&lt;/li>
&lt;li>Supports real-time, scalable, and maintenance-free soil moisture monitoring for farmers.&lt;/li>
&lt;/ul>
&lt;h3 id="improving-and-optimizing-data-processing-pipeline-for-more-accurate-soil-moisture-measurements">Improving and Optimizing Data Processing Pipeline for More Accurate Soil Moisture Measurements&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Digital Signal Processing&lt;/code> &lt;code>Machine Learning&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> C/embedded, signal processing, machine learning, MATLAB (optional)&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium (175 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/colleen-josephson/">Colleen Josephson&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/eric-vetha/">Eric Vetha&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Enhance the accuracy of soil moisture measurements by refining the data processing pipeline.&lt;/p>
&lt;p>Tasks:&lt;/p>
&lt;ul>
&lt;li>Develop and test algorithms for noise reduction and signal improvement.&lt;/li>
&lt;li>Implement advanced filtering and statistical techniques to improve measurement precision.&lt;/li>
&lt;li>Validate improvements using real-world field data.&lt;/li>
&lt;li>Translate algorithms into embedded to be implemented in real-time embedded hardware.&lt;/li>
&lt;/ul>
&lt;h3 id="improving-backscatter-tag-pcb">Improving Backscatter Tag PCB&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Hardware Design&lt;/code> &lt;code>Signal Processing&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> PCB design, RF knowledge&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium (175 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/colleen-josephson/">Colleen Josephson&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/eric-vetha/">Eric Vetha&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Enhance the performance of WaDAR&amp;rsquo;s backscatter tags by optimizing PCB design for improved signal-to-noise ratio (SNR) and implementing a communication protocol for tag identification.&lt;/p>
&lt;p>Tasks:&lt;/p>
&lt;ul>
&lt;li>Redesign PCB for improved readings.&lt;/li>
&lt;li>Implement and test a communication protocol to distinguish between multiple tags.&lt;/li>
&lt;li>Evaluate hardware changes in real-world field conditions.&lt;/li>
&lt;li>Optimize power consumption and scalability for practical deployment.&lt;/li>
&lt;/ul></description></item><item><title>Mediglot</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/polyphy/</link><pubDate>Tue, 04 Feb 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/polyphy/</guid><description>&lt;p>&lt;a href="https://github.com/PolyPhyHub/PolyPhy" target="_blank" rel="noopener">PolyPhy&lt;/a> is a GPU-oriented agent-based system for reconstructing and visualizing &lt;em>optimal transport networks&lt;/em> defined over sparse data. Rooted in astronomy and inspired by nature, we have used an early prototype called &lt;a href="https://github.com/CreativeCodingLab/Polyphorm" target="_blank" rel="noopener">Polyphorm&lt;/a> to reconstruct the &lt;a href="https://youtu.be/5ILwq5OFuwY" target="_blank" rel="noopener">Cosmic web&lt;/a> structure, but also to discover network-like patterns in natural language data. You can see an instructive overview of PolyPhy in our &lt;a href="https://elek.pub/workshop_cross2022.html" target="_blank" rel="noopener">workshop&lt;/a> and more details about our research &lt;a href="https://elek.pub/projects/Rhizome-Cosmology" target="_blank" rel="noopener">here&lt;/a>. Recent projects, such as &lt;a href="https://github.com/PolyPhyHub/PolyGlot" target="_blank" rel="noopener">Polyglot&lt;/a> and &lt;a href="https://github.com/Ayush-Sharma410/MediGlot" target="_blank" rel="noopener">Mediglot&lt;/a> have focused on using PolyPhy to better visualize language embeddings.&lt;/p>
&lt;h3 id="medicinal-language-embeddings">Medicinal Language Embeddings&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Large Language Models&lt;/code> &lt;code>NLP&lt;/code> &lt;code>Embeddings&lt;/code> &lt;code>Medicine&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, JavaScript, Data Science, Technical Communication&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/oskar-elek/">Oskar Elek&lt;/a>, &lt;a href="mailto:kdeol@ualberta.ca">Kiran Deol&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>This project aims to refine and enhance Mediglot, a web application for visualizing 3D medicinal embeddings, which extends the Polyglot app and leverages the PolyPhy toolkit for network-inspired data science. Mediglot currently enables users to explore high-dimensional vector representations of medicines (derived from their salt compositions) in a 3D space using UMAP, as well as analyze similarity through the innovative Monte-Carlo Physarum Machine (MCPM) metric. Unlike traditional language data, medicinal embeddings do not have an inherent sequential structure. Instead, we must work with the salt compositions of each medicine to create embeddings that are faithful to the intended purpose of each medicine.&lt;/p>
&lt;p>This year, we would like to focus on exploring and integrating state-of-the-art AI techniques and algorithms to improve Mediglot&amp;rsquo;s clustering capabilities and its representation of medicinal data in 3D. The contributor will experiment with advanced large language models (LLMs) and cutting-edge AI methods to develop innovative approaches for refining clustering and extracting deeper insights from medicinal embeddings. Beyond LLMs, we would like to experiment with more traditional language processing methods to design novel embedding procedures. Additionally, we would like to experiment with other similarity metrics. While the similarity of two medicines depends on the initial embedding, we would like to examine the effects of different metrics on the kinds of insights a user can extract. Finally, the contributor is expected to evaluate and compare different algorithms for dimensionality reduction to enhance the faithfulness of the visualization and its interpretability.&lt;/p>
&lt;p>The ideal contributor for this project has experience with Python (and common scientific toolkits such as NumPy, Pandas, SciPy). They will also need some experience with JavaScript and web development (MediGlot is distributed as a vanilla JS web app). Knowledge of embedding techniques for language processing is highly recommended.&lt;/p>
&lt;p>&lt;strong>Specific tasks:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Closely work with the mentors to understand the context of the project and its detailed requirements in preparation for the proposal.&lt;/li>
&lt;li>Become acquainted with the tooling (PolyPhy, PolyGlot, Mediglot) prior to the start of the project period.&lt;/li>
&lt;li>Explore different embedding techniques for medicinal data (including implementing novel embedding procedures).&lt;/li>
&lt;li>Explore different dimensionality reduction techniques, with a focus on faithful visualizations.&lt;/li>
&lt;li>Document the process and resulting findings in a publicly available report.&lt;/li>
&lt;/ul>
&lt;h3 id="enhancing-polyphy-web-application">Enhancing PolyPhy Web Application&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Web Development&lt;/code> &lt;code>UI/UX Design&lt;/code> &lt;code>Full Stack Development&lt;/code> &lt;code>JavaScript&lt;/code> &lt;code>Next.js&lt;/code> &lt;code>Node.js&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Full Stack Web Development, UI/UX Design, JavaScript, Next.js, Node.js, Technical Communication&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium (175 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/oskar-elek/">Oskar Elek&lt;/a>, &lt;a href="mailto:kdeol@ualberta.ca">Kiran Deol&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>This project aims to revamp and enhance the PolyPhy web platform to better support contributors, users, and researchers. The goal is to optimize the website’s UI/UX, improve its performance, and integrate Mediglot to provide users with a seamless experience in visualizing both general network structures and 3D medicinal embeddings.&lt;/p>
&lt;p>The contributor will be responsible for improving the website’s overall look, feel, and functionality, ensuring a smooth and engaging experience for both contributors and end-users. This includes addressing front-end and back-end challenges, optimizing the platform for better accessibility, and ensuring seamless integration with Mediglot.&lt;/p>
&lt;p>The ideal candidate should have experience in full-stack web development, particularly with &lt;strong>Next.js&lt;/strong>, &lt;strong>JavaScript&lt;/strong>, and &lt;strong>Node.js&lt;/strong>, and should be familiar with UI/UX design principles. A strong ability to communicate effectively, both in writing and through code, is essential for this role.&lt;/p>
&lt;p>&lt;strong>Specific tasks:&lt;/strong>&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Collaborate with mentors&lt;/strong> to understand the project&amp;rsquo;s goals and the specific requirements for the website improvements.&lt;/li>
&lt;li>&lt;strong>UI/UX Redesign&lt;/strong>:
&lt;ul>
&lt;li>Redesign and enhance the website’s navigation, layout, and visual elements to create an intuitive and visually engaging experience.&lt;/li>
&lt;li>Improve mobile responsiveness for broader accessibility across devices.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Website Performance &amp;amp; Stability&lt;/strong>:
&lt;ul>
&lt;li>Identify and resolve performance bottlenecks, bugs, or issues affecting speed, stability, and usability.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Mediglot Integration&lt;/strong>:
&lt;ul>
&lt;li>Integrate the Mediglot web application with PolyPhy, ensuring seamless functionality and a unified user experience for visualizing medicinal data alongside general network reconstructions.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Documentation&lt;/strong>:
&lt;ul>
&lt;li>Document the development process, challenges, and solutions in a clear and organized manner, ensuring transparent collaboration with mentors and the community.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol></description></item><item><title>Environmental NeTworked Sensor (ENTS)</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/ents/</link><pubDate>Fri, 31 Jan 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/ents/</guid><description>&lt;h3 id="ents-i-web-portal-for-large-scale-sensor-networks">ENTS I: Web portal for large-scale sensor networks&lt;/h3>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Data Visualization Dashboard" srcset="
/project/osre25/ucsc/ents/osp1_huda3c1d46887767e16b865c47973b8288_360491_2d797937cbe25a879de96b44cb5c65b3.webp 400w,
/project/osre25/ucsc/ents/osp1_huda3c1d46887767e16b865c47973b8288_360491_baae6484e015277af7b09e866b6869f5.webp 760w,
/project/osre25/ucsc/ents/osp1_huda3c1d46887767e16b865c47973b8288_360491_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/ents/osp1_huda3c1d46887767e16b865c47973b8288_360491_2d797937cbe25a879de96b44cb5c65b3.webp"
width="760"
height="759"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Data Visualization, Backend, Frontend, UI/UX, Analytics&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong>
&lt;ul>
&lt;li>&lt;em>Required:&lt;/em> React, Javascript, Python, SQL, Git&lt;/li>
&lt;li>&lt;em>Nice to have:&lt;/em> Flask, Docker, CI/CD, AWS, Authentication&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/colleen-josephson/">Colleen Josephson&lt;/a>, &lt;a href="mailto:jtmadden@ucsc.edu">John Madden&lt;/a>, &lt;a href="mailto:alevy1@ucsc.edu">Alec Levy&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The Environmental NeTworked Sensor (ENTS) platform, formally Open Sensing Platform (OSP), implements data visualization website for monitoring microbial fuel cell sensors (see &lt;a href="https://github.com/jlab-sensing/DirtViz" target="_blank" rel="noopener">GitHub&lt;/a>). The mission is to scale up the current platform to support other researchers or citizen scientists in integrating their novel sensing hardware or microbial fuel cell sensors for monitoring and data analysis. Examples of the types of sensors currently deployed are sensors measuring soil moisture, temperature, current, and voltage in outdoor settings. The focus of the software half of the project involves building upon our existing visualization web platform, and adding additional features to support the mission. A live version of the website is available &lt;a href="https://dirtviz.jlab.ucsc.edu/" target="_blank" rel="noopener">here&lt;/a>.&lt;/p>
&lt;p>Below is a list of project ideas that would be beneficial to the ENTS project. You are not limited to the following projects, and encourage new ideas that enhance the platform:&lt;/p>
&lt;ul>
&lt;li>Improve streaming functionality&lt;/li>
&lt;li>Generic interface for sensor measurements&lt;/li>
&lt;li>Logger registration&lt;/li>
&lt;li>Over the air (OTA) configuration updates&lt;/li>
&lt;li>Implement unit tests and API documentation&lt;/li>
&lt;/ul>
&lt;h3 id="ents-ii-hardware-to-for-large-scale-field-sensor-networks">ENTS II: Hardware to for large-scale field sensor networks&lt;/h3>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Hardware" srcset="
/project/osre25/ucsc/ents/featured_huecd1356655ddd10d106d2d602a359510_6281233_b1317e5e84a756a1081cbeec0e17af86.webp 400w,
/project/osre25/ucsc/ents/featured_huecd1356655ddd10d106d2d602a359510_6281233_2fc59e21c5096f7f08aea36f5769242e.webp 760w,
/project/osre25/ucsc/ents/featured_huecd1356655ddd10d106d2d602a359510_6281233_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/ents/featured_huecd1356655ddd10d106d2d602a359510_6281233_b1317e5e84a756a1081cbeec0e17af86.webp"
width="760"
height="460"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Embedded system, wireless communication, low-power remote sensing&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong>
&lt;ul>
&lt;li>&lt;em>Required:&lt;/em> C/C++, Git, Github, PlatformIO&lt;/li>
&lt;li>&lt;em>Nice to have:&lt;/em> STM32 HAL, ESP32 Arduino, protobuf, python, knowledge of standard communication protocols (I2C, SPI, and UART)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/colleen-josephson/">Colleen Josephson&lt;/a>, &lt;a href="mailto:jtmadden@ucsc.edu">John Madden&lt;/a>, &lt;a href="mailto:jlin143@ucsc.edu">Jack Lin&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The Environmental NeTworked Sensor (ENTS) node aims to be a general purpose hardware platform for outdoor sensing (e.g. agriculture, ecological monitoring, etc.). The typical use case involves a sensor deployment in an agricultural field, remotely uploading measurements without interfering with farming operations. The current hardware revision (&lt;a href="https://github.com/jlab-sensing/soil_power_sensor" target="_blank" rel="noopener">Soil Power Sensor&lt;/a> was originally designed for monitoring power output of microbial fuel cells using high fidelity voltage and current measurement channels, as well as auxiliary sensors such as the SDI-12 &lt;a href="https://metergroup.com/products/teros-21/" target="_blank" rel="noopener">TEROS-21 soil moisture sensor&lt;/a>. The primary activities of this project will involve low-level firmware design and implementation, but may also incorporate hardware design revisions if necessary. We are looking to expand functionality to other external sensors, as well as optimize for power consumption, via significant firmware design activities.&lt;/p>
&lt;p>Long-range, low-power wireless communication is achieved through a LoRa capable STM32 microcontroller with in-lab experiments using an ESP32 microcontroller to enable the simpler WiFi interface. Both wireless interfaces communicate upload measurements to our data visualization dashboard, &lt;strong>ENTS I&lt;/strong>. The combined goal across both of these projects is to create a system that enables researchers to test and evaluate novel sensing solutions. We are looking to make the device usable to a wide range of researchers which may not have a background in electronics, so are interested in design activities that enhance user friendliness.&lt;/p>
&lt;p>In total there will be 2-4 people working on the hardware with progress being tracked on GitHub. Broader project planning is tracked through a Jira board. We intend to have weekly meetings to provide updates on current issue progress along with assigning tasks. Please reach out to &lt;a href="mailto:jtmadden@ucsc.edu">John Madden&lt;/a> if there are any questions or specific ideas for the project.&lt;/p>
&lt;p>Below is a list of project ideas that would be beneficial to the ENTS project. You are not limited to the following projects, and encourage new ideas that enhance the platform:&lt;/p>
&lt;ul>
&lt;li>Backup logging via SD card&lt;/li>
&lt;li>I2C multiplexing for multiple of the same sensors&lt;/li>
&lt;li>Batch sensor measurement uploading&lt;/li>
&lt;/ul></description></item><item><title>Causeway: Scaling Experiential Learning Through Micro-Roles</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/causeway/</link><pubDate>Thu, 30 Jan 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/causeway/</guid><description>&lt;p>&lt;a href="https://causeway.web.app" target="_blank" rel="noopener">Causeway&lt;/a> is a platform for learning to develop web applications using an Angular, RxJS, NgRx, and Firebase stack. Most online coding tutorials focus on covering the technical syntax or features of a language or framework, which means that new developers don’t have great resources for building a holistic picture of how everything they learn connects to actually developing a complex web application. Causeway breaks down the process of developing a web application into a hierarchy of micro-roles which provides learners with a clear pathway for learning that also translates to a clear process for developing an application. In the longer future, this would also enable learners to easily contribute to projects as they learn through taking on micro-roles for yet-to-be-developed projects. The platform uses the &lt;a href="https://developer.stackblitz.com/platform/api/webcontainer-api" target="_blank" rel="noopener">Stackblitz WebContainer API&lt;/a> to run full applications in the browser for interactive learning.&lt;/p>
&lt;p>Thus far, we have developed a version of the platform that walks learners through the process of developing UI components of a web application as well as containers that contain multiple UI components and are responsible for fetching data from the backend and handling events and updates to the database. We&amp;rsquo;d like to extend the content to cover defining the database schema and entire applications, and to other topics beyond web development like AI/ML. We&amp;rsquo;d like to add quizzes to the experience and explore ways to use Generative AI to augment the learning experience, e.g. to support planning, reflection, and assessment. Finally, we&amp;rsquo;d like to instrument the application with logs and analytics so we can better measure impact and learning outcomes, and develop a stronger CI/CD pipeline.&lt;/p>
&lt;h3 id="causeway--improving-the-core-infrastructure">Causeway / Improving the Core Infrastructure&lt;/h3>
&lt;p>The proposed work includes adding logging, analytics, and a production-level CI/CD pipeline, adding a robust testing framework, and refactoring some of our code into seperate modules. Both roles will also contribute to running usability studies and documenting the platform.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Web Development, Educational Technologies, Angular&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Web development experience, HTML, CSS, Javascript, Angular, RxJS, NgRx, Firebase&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium to Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/david-lee/">David Lee&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="causeway--quizzes-and-generative-ai">Causeway / Quizzes and Generative AI&lt;/h3>
&lt;p>The proposed work includes extending the application to support quizzes, adding quizzes for the existing tasks, and exploring the use of generative AI to support the quizzes feature. Both roles will also contribute to running usability studies and documenting the platform.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Web Development, Educational Technologies, Angular&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Web development experience, HTML, CSS, Javascript, Angular, RxJS, NgRx, Firebase, Generative AI&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium to Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/david-lee/">David Lee&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>OpenROAD - An Open-Source, Autonomous RTL-GDSII Flow for Chip Design</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/openroad/openroad/</link><pubDate>Sun, 19 Jan 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/openroad/openroad/</guid><description>&lt;p>The &lt;a href="https://theopenroadproject.org" target="_blank" rel="noopener">OpenROAD&lt;/a> project is a non-profit project, originally funded by DARPA with the aim of creating open-source EDA tools; an Autonomous flow from RTL-GDSII that completes &amp;lt; 24 hrs, to lower cost and boost innovation in IC design. This project is now supported by &lt;a href="precisioninno.com">Precision Innovations&lt;/a>.&lt;/p>
&lt;p>OpenROAD massively scales and supports EWD (Education and Workforce Development) and supports a broad ecosystem making it a vital tool that supports a rapidly growing Semiconductor Industry.&lt;/p>
&lt;p>OpenROAD is the fastest onramp to gain knowledge, skills and create pathways for great career opportunities in chip design. You will develop important software and hardware design skills by contributing to these interesting projects. You will also have the opportunity to work with mentors from the OpenROAD project and other industry experts.&lt;/p>
&lt;p>We welcome a diverse community of designers, researchers, enthusiasts, software engineers and entrepreneurs to use and contribute to OpenROAD and make a far-reaching impact in the rapidly growing, global Semiconductor Industry.&lt;/p>
&lt;h3 id="improving-code-quality-in-openroad">Improving Code Quality in OpenROAD&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Coding Best Practices in C++&lt;/code>, &lt;code>Code Quality Tooling&lt;/code>, &lt;code>Continuous Integration&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium (175 hours)&lt;/li>
&lt;li>&lt;strong>Mentors&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/matt-liberty/">Matt Liberty&lt;/a> &amp;amp; &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/arthur-koucher/">Arthur Koucher&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>OpenROAD is a large and complex program. This project is to improve the code quality through resolving issues flagged by tools like Coverity and clang-tidy. New tools like the clang sanitizers ASAN/TSAN/UBSAN should also be set up and integrated with the Jenkins CI.&lt;/p>
&lt;h3 id="gui-testing-in-openroad">GUI Testing in OpenROAD&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Testing&lt;/code>, &lt;code>Continuous Integration&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++, Qt&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/matt-liberty/">Matt Liberty&lt;/a> &amp;amp; &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/peter-gadfort/">Peter Gadfort&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The OpenROAD GUI is a crucial set of functionality for users to see and investigate their design. GUI testing is specialized and rather different from standard unit testing. The GUI therefore needs improvements to its testing to cover both interaction and rendering. The GUI uses the Qt framework. An open-source testing tool like &lt;a href="https://github.com/faaxm/spix" target="_blank" rel="noopener">https://github.com/faaxm/spix&lt;/a> will be set up and key tests developed. This will provide the framework for all future testing.&lt;/p>
&lt;h3 id="rectilinear-floorplans-in-openroad">Rectilinear Floorplans in OpenROAD&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Electronic Design Automation&lt;/code>, &lt;code>Algorithms&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++, data structures and algorithms&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/eder-monteiro/">Eder Monteiro&lt;/a> &amp;amp; &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/augusto-berndt/">Augusto Berndt&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>OpenROAD supports block floorplans that are rectangular in shape. Some designs may require more complex shapes to fit. This project extends the tool to support rectilinear polygon shapes as floorplans. This will require upgrading data structures and algorithms in various parts of OpenROAD including floor plan generation, pin placement, and global placement.&lt;/p>
&lt;h3 id="lef-reader-and-database-enhancements-in-openroad">LEF Reader and Database Enhancements in OpenROAD&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Electronic Design Automation&lt;/code>, &lt;code>Database&lt;/code>, &lt;code>Parsing&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Boost Spirit parsers, Database, C++&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium (175 hours)&lt;/li>
&lt;li>&lt;strong>Mentors&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/osama-hammad/">Osama Hammad&lt;/a> &amp;amp; &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ethan-mahintorabi/">Ethan Mahintorabi&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>LEF (Library Exchange Format) is a standard format for describing physical design rules for integrated circuits. OpenROAD has support for many constructs but some newer ones for advanced process nodes are not supported. This project is to support parsing such information and storing in the OpenDB for use by the rest of the tool.&lt;/p>
&lt;h3 id="orassistant---llm-data-engineering-and-testing">ORAssistant - LLM Data Engineering and Testing&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Large Language Model&lt;/code>, &lt;code>Machine Learning&lt;/code>, &lt;code>Data Engineering&lt;/code>, &lt;code>Model Deployment&lt;/code>, &lt;code>Testing&lt;/code>, &lt;code>Full-Stack Development&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: large language model engineering, database, evaluation, CI/CD, open-source or related software development, full-stack&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium (175 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jack-luar/">Jack Luar&lt;/a> &amp;amp; &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/palaniappan-r/">Palaniappan R&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>This project is aimed at enhancing robustness and accuracy for &lt;a href="https://woset-workshop.github.io/PDFs/2024/11_ORAssistant_A_Custom_RAG_ba.pdf" target="_blank" rel="noopener">OR Assistant&lt;/a>, the &lt;a href="https://github.com/The-OpenROAD-Project/ORAssistant" target="_blank" rel="noopener">conversational assistant for OpenROAD&lt;/a> through comprehensive testing and evaluation. You will work with members of the OpenROAD team and other researchers to enhance the existing dataset to cover a wide range of use cases to deliver accurate responses more efficiently. This project will focus on data engineering and benchmarking and you will collaborate on a project on the LLM model engineering. Tasks include: creating evaluation pipelines, building databases to gather feedback, improving CI/CD, writing documentation, and improving the backend and frontend services as needed (non-exhaustive). You will gain valuable experience and skills in understanding chip design flows and applications. Open to proposals from all levels of ML practitioners.&lt;/p>
&lt;h3 id="orassistant---llm-model-engineering">ORAssistant - LLM Model Engineering&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Large Language Model&lt;/code>, &lt;code>Machine Learning&lt;/code>, &lt;code>Model Architecture&lt;/code>, &lt;code>Model Deployment&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: large language model engineering, prompt engineering, fine-tuning&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium (175 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jack-luar/">Jack Luar&lt;/a> &amp;amp; &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/palaniappan-r/">Palaniappan R&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>This project is aimed at enhancing robustness and accuracy for &lt;a href="https://woset-workshop.github.io/PDFs/2024/11_ORAssistant_A_Custom_RAG_ba.pdf" target="_blank" rel="noopener">OR Assistant&lt;/a>, the &lt;a href="https://github.com/The-OpenROAD-Project/ORAssistant" target="_blank" rel="noopener">conversational assistant for OpenROAD&lt;/a> through enhanced model architectures. You will work with members of the OpenROAD team and other researchers to explore alternate architectures beyond the existing RAG-based implementation. This project will focus on improving reliability and accuracy of the existing model architecture. You will collaborate on a tandem project on data engineering for OR assistant. Tasks include: reviewing and understanding the state-of-the-art in retrieval augmented generation, implementing best practices, caching prompts, improving relevance and accuracy metrics, writing documentation and improving the backend and frontend services as needed (non-exhaustive). You will gain valuable experience and skills in understanding chip design flows and applications. Open to proposals from all levels of ML practitioners.&lt;/p></description></item><item><title>RAG-ST: Retrieval-Augmented Generation for Spatial Transcriptomics</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/uci/rag-st/</link><pubDate>Wed, 15 Jan 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/uci/rag-st/</guid><description>&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> bioinformatics, spatial transcriptomics, gene expression generation, retrieval-augmented generation, large models&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong>
&lt;ul>
&lt;li>&lt;strong>Programming Languages:&lt;/strong>
&lt;ul>
&lt;li>Proficient in Python, and familiarity with machine learning libraries such as PyTorch.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Data Analysis:&lt;/strong>
&lt;ul>
&lt;li>Experience with spatial transcriptomics datasets and statistical modeling.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Machine Learning:&lt;/strong>
&lt;ul>
&lt;li>Understanding of vision models, retrieval-based systems, and MLP architectures.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Bioinformatics Knowledge (preferred):&lt;/strong>
&lt;ul>
&lt;li>Familiarity with scRNA-seq data integration and computational biology tools.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Advanced&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours). Given the scope of integrating RAG models, building a robust database, and ensuring interpretable predictions, this project involves substantial computational and data preparation work.&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ziheng-duan/">Ziheng Duan&lt;/a> (contact person)&lt;/li>
&lt;/ul>
&lt;h3 id="project-idea-description">&lt;strong>Project Idea Description&lt;/strong>&lt;/h3>
&lt;p>Spatial transcriptomics (ST) is a revolutionary technology that provides spatially resolved gene expression measurements, enabling researchers to study cellular behaviour within tissues with unprecedented detail. This technology has transformed our understanding of complex biological systems, such as disease progression, tissue development, and cellular heterogeneity. However, the widespread adoption of ST is limited by its high cost and technical requirements.&lt;/p>
&lt;p>Histology imaging, on the other hand, is far more accessible and cost-effective. If gene expression could be accurately predicted from histology images, it would enable researchers to leverage these abundant images for high-resolution biological insights without the need for expensive spatial transcriptomics experiments. This task has immense potential to democratize spatial transcriptomics research and significantly reduce costs.&lt;/p>
&lt;h3 id="challenges-in-current-approaches">&lt;strong>Challenges in Current Approaches&lt;/strong>&lt;/h3>
&lt;p>Current methods for predicting gene expression from histology images typically involve:&lt;/p>
&lt;ol>
&lt;li>Using large vision models to encode histology image patches into embeddings.&lt;/li>
&lt;li>Employing Multi-Layer Perceptrons (MLPs) to map these embeddings to gene expression profiles.&lt;/li>
&lt;/ol>
&lt;p>While these approaches have shown promise, they suffer from two critical limitations:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Accuracy&lt;/strong>: The MLP-based mappings often fail to fully capture the biological complexity encoded in the histology images, leading to suboptimal predictions.&lt;/li>
&lt;li>&lt;strong>Interpretability&lt;/strong>: These models act as black boxes, providing no insight into the underlying biological rationale for the predictions. Researchers cannot determine why a specific gene expression profile was generated, limiting trust and utility in biological contexts.&lt;/li>
&lt;/ul>
&lt;h3 id="project-motivation">&lt;strong>Project Motivation&lt;/strong>&lt;/h3>
&lt;p>To overcome these limitations, this project proposes a novel &lt;strong>Retrieval-Augmented Generation (RAG)&lt;/strong> framework for spatial transcriptomics. Instead of relying solely on black-box MLPs, RAG-ST will:&lt;/p>
&lt;ul>
&lt;li>Retrieve relevant examples from a curated database of paired histology images, scRNA-seq data, and gene expression profiles.&lt;/li>
&lt;li>Use these retrieved examples to inform and enhance the generation process, resulting in predictions that are both more accurate and biologically interpretable.&lt;/li>
&lt;/ul>
&lt;p>This approach not only grounds predictions in biologically meaningful data but also provides transparency by revealing which database entries influenced the results.&lt;/p>
&lt;h3 id="project-objectives">&lt;strong>Project Objectives&lt;/strong>&lt;/h3>
&lt;ol>
&lt;li>&lt;strong>Database Construction&lt;/strong>:
&lt;ul>
&lt;li>Curate a large and diverse database of histology images paired with scRNA-seq and gene expression data.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Model Development&lt;/strong>:
&lt;ul>
&lt;li>Develop a RAG framework combining vision-based encoders and retrieval-enhanced generation techniques.&lt;/li>
&lt;li>Incorporate interpretability mechanisms to link predicted gene expressions to retrieved examples.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Evaluation and Benchmarking&lt;/strong>:
&lt;ul>
&lt;li>Assess RAG-ST against state-of-the-art methods, focusing on accuracy, interpretability, and biological validity.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;h3 id="project-deliverables">&lt;strong>Project Deliverables&lt;/strong>&lt;/h3>
&lt;ol>
&lt;li>&lt;strong>Curated Database&lt;/strong>:
&lt;ul>
&lt;li>A publicly available, well-documented database of histology images and gene expression profiles.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>RAG-ST Framework&lt;/strong>:
&lt;ul>
&lt;li>An open-source Python implementation of the RAG-ST model, with retrieval, generation, and visualization tools.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Benchmark Results&lt;/strong>:
&lt;ul>
&lt;li>Comprehensive evaluations demonstrating the benefits of RAG-ST over conventional pipelines.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Documentation and Tutorials&lt;/strong>:
&lt;ul>
&lt;li>User-friendly guides to facilitate adoption by the spatial transcriptomics research community.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;h3 id="impact">&lt;strong>Impact&lt;/strong>&lt;/h3>
&lt;p>By integrating retrieval-augmented generation with large models, RAG-ST represents a paradigm shift in spatial transcriptomics. It offers a cost-effective, accurate, and interpretable solution for gene expression prediction, democratizing access to high-quality spatial transcriptomic insights and fostering advancements in biological research.&lt;/p>
&lt;hr></description></item><item><title>Final Report: Stream processing support for FasTensor</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/lbl/fastensor/20240830-aditya_narayan/</link><pubDate>Fri, 30 Aug 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/lbl/fastensor/20240830-aditya_narayan/</guid><description>&lt;h1 id="final-report-stream-processing-support-for-fastensor">Final Report: Stream processing support for FasTensor&lt;/h1>
&lt;h2 id="project-description">Project Description&lt;/h2>
&lt;p>FasTensor is a scientific computing library specialized in performing computations over dense matrices that exhibit spatial locality, a characteristic often found in physical phenomena data. Our GSoC'24 project aimed to enhance FasTensor by enabling it to ingest and process live data streams from sensors and scientific equipment.&lt;/p>
&lt;h2 id="what-is-fastensor">What is FasTensor?&lt;/h2>
&lt;p>Imagine you&amp;rsquo;re working on a physical simulation or solving partial differential equations (PDEs). You&amp;rsquo;ve discretized your PDE, but now you face a new challenge: you need to run your computations fast and parallelize them across massive compute clusters.&lt;/p>
&lt;p>At this point, you find yourself describing a stencil &lt;a href="https://dl.acm.org/doi/abs/10.1145/2686745.2686756" target="_blank" rel="noopener">[1]&lt;/a> operation. But should you really spend your time tinkering with loop orders, data layouts, and countless other side-quests unrelated to your core problem?&lt;/p>
&lt;p>This is where FasTensor comes in: Describe your computation as a stencil, and it takes care of ensuring optimal execution. FasTensor lets you focus on the science, not the implementation details.&lt;/p>
&lt;h2 id="repository-links">Repository Links&lt;/h2>
&lt;ul>
&lt;li>FasTensor: &lt;a href="https://github.com/BinDong314/FasTensor" target="_blank" rel="noopener">https://github.com/BinDong314/FasTensor&lt;/a>&lt;/li>
&lt;li>My fork: &lt;a href="https://github.com/my-name/FasTensor/tree/ftstream" target="_blank" rel="noopener">https://github.com/my-name/FasTensor/tree/ftstream&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="prs">PR(s)&lt;/h3>
&lt;ol>
&lt;li>&lt;a href="https://github.com/BinDong314/FasTensor/pull/1" target="_blank" rel="noopener">Stream processing support for FasTensor completed.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/BinDong314/FasTensor/pull/2" target="_blank" rel="noopener">Merge ftstream into the FasTensor repo&lt;/a>&lt;/li>
&lt;/ol>
&lt;h2 id="work-done-this-summer">Work done this summer&lt;/h2>
&lt;h3 id="develop-streaming-simulator-ftstream">Develop Streaming simulator: FTStream&lt;/h3>
&lt;p>I was first entasked by Dr. Bin to develop a stream simulator for testing the streaming capability of FasTensor. For testing purposes, a stream is characterized by file size, count, and arrival interval. FTStream can generate streams of various sizes and intervals, up to the theoretical limits of disk and filesystem. We&amp;rsquo;re talking speeds up to 2.5 GiB/s on a non-parallel NVMe!&lt;/p>
&lt;p>Writing this tool was an adventure in throughput testing and exploring APIs. I wrote multiple drivers, each for a different whim and hijinks of systems in the HPC world. Here&amp;rsquo;s a brief journey through the APIs we explored:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>HDF5 APIs:&lt;/strong> Pretty fast in flush-to-disk operation, but the API design strongly binds to file handles, which inhibits high throughput duplication.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>HDF5 VFL and VOL:&lt;/strong> We dabbled in these dark arts, but there be dragons! Keeping a long-term view of maintenance, we dropped the idea.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>POSIX O_DIRECT:&lt;/strong> This involved getting your buffers aligned right and handling remainders correctly. A step up, but not quite at the theoretical limits.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Linux AIO:&lt;/strong> Streaming is latency sensitive domain, to reach the theoretical limits, every syscall saved matters. Linux AIO allowed us syscall batching with &lt;code>io_submit()&lt;/code>. It took a few testing sessions to get the correct combo of queue depth, buffer size, and alignment right.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>We settled on O_DIRECT + Linux AIO. Feel free to modify &lt;code>ftstream/fastflush.h&lt;/code> to suit your needs.&lt;/p>
&lt;img src="https://raw.githubusercontent.com/aditya-narayan5/GSoC24-Final_Report/f486087ae3e6ef1f1077c885e9352c9440848724/images/ftstream.png" width=75% height=75%>
&lt;h3 id="stream-support">Stream Support&lt;/h3>
&lt;p>FasTensor has just one simple paradigm: you give it a data source, an output data store, and your transform, and it handles all the behind-the-scenes grunt work of computing over big datasets so you can focus on your research.&lt;/p>
&lt;p>We aimed to achieve the same for streaming: Drop in the STREAM keyword, append a pattern identifying your stream, and use your usual transform.&lt;/p>
&lt;img src="https://raw.githubusercontent.com/aditya-narayan5/GSoC24-Final_Report/f486087ae3e6ef1f1077c885e9352c9440848724/images/example_code.png" width=75% height=100%>
Voila! Now your previous FasTensor code supports live data streams.
&lt;img src="https://raw.githubusercontent.com/aditya-narayan5/GSoC24-Final_Report/da34fab7a857b0223332d84a0aa1c8cdf0811761/images/fastensor_streaming_demo.gif" width=75% height=75%>
&lt;h4 id="technical-tidbits">Technical tidbits:&lt;/h4>
&lt;ul>
&lt;li>Implements a manager-worker pattern to allow us flexibility in the future to implement different stream semantics such as windowing, CPU-memory based load balancing&lt;/li>
&lt;li>Supports streams of indefinite size&lt;/li>
&lt;/ul>
&lt;h2 id="challenges">Challenges&lt;/h2>
&lt;p>HPC has its fair share of challenges. Things you take for granted might not be available there, and it takes a while to adjust to paradigms of scale and parallelization.&lt;/p>
&lt;p>For example, when developing FTStream, we found O_DIRECT is available on some parallel file systems like GPFS but not supported on Lustre/CFS. We developed a separate MPIO driver for FTStream that will be upstreamed once thoroughly tested on Lustre.&lt;/p>
&lt;h2 id="future-work">Future Work&lt;/h2>
&lt;ul>
&lt;li>Implement windowing and explore more advanced stream semantics.&lt;/li>
&lt;li>Implement support for for defining workload policies&lt;/li>
&lt;li>Optimize interleaving IO and Compute.&lt;/li>
&lt;/ul>
&lt;h2 id="references">References&lt;/h2>
&lt;p>[1] Anshu Dubey. 2014. Stencils in Scientific Computations. In Proceedings of the Second Workshop on Optimizing Stencil Computations (WOSC &amp;lsquo;14). Association for Computing Machinery, New York, NY, USA, 57.
&lt;a href="https://doi.org/10.1145/2686745.2686756" target="_blank" rel="noopener">https://doi.org/10.1145/2686745.2686756&lt;/a>&lt;/p>
&lt;h2 id="acknowledgement">Acknowledgement&lt;/h2>
&lt;p>I struck gold when it comes to mentors.&lt;/p>
&lt;p>Dr. &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/bin-dong/">Bin Dong&lt;/a> was really kind and supportive throughout the journey. From the very first steps of giving a tour around the codebase to giving me a lot of freedom to experiment, refactor, and refine.&lt;/p>
&lt;p>Dr. &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/john-wu/">John Wu&lt;/a> was encouraging and nurturing of budding talent. We had great research presentations every Monday apart from usual mentor interactions, where different research groups presented their talks and students were invited to present their progress.&lt;/p>
&lt;p>I&amp;rsquo;ve come across Quantum computing many times in the news, but I never thought I&amp;rsquo;d get a frontline preview from the researchers working at the bleeding edge at the Lawrence Berkeley National Laboratory (LBL).&lt;/p>
&lt;p>This GSoC experience, made possible by Google and UC OSPO, has been invaluable for my growth as a developer and researcher.&lt;/p>
&lt;p>For people interested in HPC, ML, Systems, or Reproducibility, I encourage you all to apply to UC OSPO. It&amp;rsquo;s been an incredible journey, and I&amp;rsquo;m grateful for every moment of it!&lt;/p></description></item><item><title>ORAssistant - LLM Assistant for OpenROAD</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240827-palaniappan-r/</link><pubDate>Tue, 27 Aug 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240827-palaniappan-r/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Hello! I&amp;rsquo;m &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/palaniappan-r/">Palaniappan R&lt;/a>, an undergraduate student at BITS Pilani, India. Over the past few months, I&amp;rsquo;ve been working as a GSoC contributor on the &lt;a href="https://summerofcode.withgoogle.com/programs/2024/projects/DSo6kvA5" target="_blank" rel="noopener">LLM Assistant for OpenROAD - Model Architecture and Prototype&lt;/a> project, under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/indira-iyer/">Indira Iyer&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jack-luar/">Jack Luar&lt;/a>. &lt;/p>
&lt;p>The primary objective of my project is to improve the user experience within OpenROAD and OpenROAD-flow-scripts by utilizing Large Language Models(LLMs) to offer fast, relevant answers to FAQs and common issues. The ORAssistant chatbot aims to act as a first line of support, addressing basic queries in domains such as installation and command usage. Its goal is to resolve simple issues before they escalate to public forums, thereby reducing the number of support tickets on platforms like GitHub Issues.&lt;/p>
&lt;h2 id="architecture-overview">Architecture Overview&lt;/h2>
&lt;p>Retrieval-augmented-generation (RAG) is a technique that improves the q&amp;amp;a capabilities and reliability of LLMs by incorporating factual information from external sources. When a user submits a query, the RAG process begins by fetching relevant information from a knowledge base. The retrieved content, combined with the original query is the provided to the LLM to generate a relevant, informed response.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="RAG Architecture" srcset="
/report/osre24/ucsd/openroad/20240827-palaniappan-r/rag_arch_hu8e03f7a9c64923f7711e5a6dbcc7ac36_44482_df391271ecbbb458269da059ad7cf993.webp 400w,
/report/osre24/ucsd/openroad/20240827-palaniappan-r/rag_arch_hu8e03f7a9c64923f7711e5a6dbcc7ac36_44482_3c455fc32c6d18b57b31be5f86590e99.webp 760w,
/report/osre24/ucsd/openroad/20240827-palaniappan-r/rag_arch_hu8e03f7a9c64923f7711e5a6dbcc7ac36_44482_1200x1200_fit_q75_h2_lanczos_2.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240827-palaniappan-r/rag_arch_hu8e03f7a9c64923f7711e5a6dbcc7ac36_44482_df391271ecbbb458269da059ad7cf993.webp"
width="760"
height="410"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="the-knowledge-base">The Knowledge Base&lt;/h2>
&lt;p>ORAssistant is designed to answer queries about all the major tools in the OR flow. The knowledge base primarily consists of official documentation from OpenROAD, OpenROAD-flow-scripts, and their respective manpages. Instead of scraping these primary sources from their websites, the docs are built to the desired markdown format directly from the respective GitHub repositories, using specific commit hashes for reproducibility. The knowledge base also includes documentation from other essential applications in the EDA flow, such as Yosys and OpenSTA. Additionally, it includes scraped and annotated conversational data from discussions on the OpenROAD and OpenROAD-flow-scripts GitHub pages.&lt;/p>
&lt;p>The entire dataset building process has been automated, allowing for dynamic updates to accommodate any live changes.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Knowledge Base Building" srcset="
/report/osre24/ucsd/openroad/20240827-palaniappan-r/knowledge_base_hu389de4d06f6f5009d6f8a5e32337289b_95686_4f5d36607bb3f3a68d364c4b052d7564.webp 400w,
/report/osre24/ucsd/openroad/20240827-palaniappan-r/knowledge_base_hu389de4d06f6f5009d6f8a5e32337289b_95686_774ab20167d5029994bd3450cf9f9627.webp 760w,
/report/osre24/ucsd/openroad/20240827-palaniappan-r/knowledge_base_hu389de4d06f6f5009d6f8a5e32337289b_95686_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240827-palaniappan-r/knowledge_base_hu389de4d06f6f5009d6f8a5e32337289b_95686_4f5d36607bb3f3a68d364c4b052d7564.webp"
width="760"
height="424"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="the-tool-based-architecture">The Tool-Based Architecture&lt;/h2>
&lt;p>After experimenting with multiple RAG approaches, a tool-based setup proved to be the most effective solution. Data from various domains are embedded into vector databases, and hybrid search retriever functions are applied to these vector stores. These functions are organized as individual tools that can be called by the chatbot. To maintain context, each query is rephrased while considering the chat history. This ensures a more precise and context-rich query. Please refer to my previous &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240719-palaniappan-r/">blog post&lt;/a> for more information on the retrieval tools.&lt;/p>
&lt;p>As depicted in the flowchart, a preliminary LLM call analyzes the input query, rephrases it based on the chat history and picks the appropriate tools for the rephrased query. Subsequently, documents are retrieved using the tool and sent to the LLM, which produces a relevant, context-aware response.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Tool Based Architecture" srcset="
/report/osre24/ucsd/openroad/20240827-palaniappan-r/tool_arch_hua38e30b25f21f78f6a933005dd192c89_51518_75dcf9730e30df6c2af5b2e12a33089e.webp 400w,
/report/osre24/ucsd/openroad/20240827-palaniappan-r/tool_arch_hua38e30b25f21f78f6a933005dd192c89_51518_7e257ae5876d4a2639c310e21b80ae97.webp 760w,
/report/osre24/ucsd/openroad/20240827-palaniappan-r/tool_arch_hua38e30b25f21f78f6a933005dd192c89_51518_1200x1200_fit_q75_h2_lanczos_2.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240827-palaniappan-r/tool_arch_hua38e30b25f21f78f6a933005dd192c89_51518_75dcf9730e30df6c2af5b2e12a33089e.webp"
width="760"
height="546"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="using-orassistant">Using ORAssistant&lt;/h2>
&lt;p>ORAssistant is currently hosted at this &lt;a href="https://orassistant.netlify.app/" target="_blank" rel="noopener">link&lt;/a>.&lt;/p>
&lt;p>To set up out ORAssistant locally, find detailed instructions in the &lt;a href="">GitHub Repo&lt;/a>. Both cloud based LLM providers (Gemini, VertexAI) and local options (Ollama) are supported.&lt;/p>
&lt;p>Here&amp;rsquo;s an example of ORAssistant in action,
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Example 1" srcset="
/report/osre24/ucsd/openroad/20240827-palaniappan-r/example1_huc9b9a5dd27909efbfc0d6a5a5532244f_175139_07d9479d1764c9189c0bdd3947bc3a05.webp 400w,
/report/osre24/ucsd/openroad/20240827-palaniappan-r/example1_huc9b9a5dd27909efbfc0d6a5a5532244f_175139_ef65593aa1ba677fc24f91d973e5bfc7.webp 760w,
/report/osre24/ucsd/openroad/20240827-palaniappan-r/example1_huc9b9a5dd27909efbfc0d6a5a5532244f_175139_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240827-palaniappan-r/example1_huc9b9a5dd27909efbfc0d6a5a5532244f_175139_07d9479d1764c9189c0bdd3947bc3a05.webp"
width="760"
height="384"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="future-plans">Future Plans&lt;/h2>
&lt;p>To further enhance the usability of ORAssistant, there are plans to add support for flow script generation. This will become possible after adding a dedicated script generation tool into the current tool-based workflow. Support for more tools in the EDA flow, such as KLayout will also be added in the near future.&lt;/p>
&lt;p>Additionally, ORAssistant is planned to be integrated directly into OpenROAD&amp;rsquo;s CLI and GUI interfaces.&lt;/p>
&lt;p>As I near the end of my GSoC, I&amp;rsquo;d like to thank the GSoC Organizing Committee, UC OSPO and The OpenROAD Project for this incredible opportunity. I&amp;rsquo;m immensely grateful to &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/indira-iyer/">Indira Iyer&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jack-luar/">Jack Luar&lt;/a> for their support and guidance throughout my GSoC journey. Thank You.&lt;/p></description></item><item><title>Hardware Hierarchical Dynamical Systems</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/</link><pubDate>Sat, 24 Aug 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/</guid><description>&lt;p>Hi everyone! I am Ujjwal Shekhar, a Computer Science student at the International Institute of Information Technology - Hyderabad. I am excited to share my work on the project titled &lt;strong>&amp;ldquo;Hardware Hierarchical Dynamical Systems&amp;rdquo;&lt;/strong> as part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/osre/">Open Source Research Experience (OSRE) program&lt;/a> and &lt;a href="https://summerofcode.withgoogle.com/" target="_blank" rel="noopener">Google Summer of Code&lt;/a>. This project has been an incredible journey, and I&amp;rsquo;ve had the privilege of working with my mentors, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jose-renau/">Jose Renau&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sakshi-garg/">Sakshi Garg&lt;/a>.&lt;/p>
&lt;h1 id="project-overview-and-goals">Project Overview and Goals&lt;/h1>
&lt;blockquote>
&lt;p>Abstract Syntax Trees (ASTs) are fundamental to modern compilers, serving as the backbone for parsing and transforming code. When compiling hardware code, the sheer volume of data can make compilation times a significant bottleneck. My project focuses on building a memory-optimized tree data structure specifically tailored for AST-typical queries.&lt;/p>
&lt;/blockquote>
&lt;p>The &lt;a href="https://github.com/masc-ucsc/livehd" target="_blank" rel="noopener">LiveHD&lt;/a> repository, developed by the &lt;a href="https://masc.soe.ucsc.edu" target="_blank" rel="noopener">Micro Architecture Lab&lt;/a> at UCSC, offers a compiler infrastructure optimized for hardware synthesis and simulation. The existing &lt;a href="https://github.com/masc-ucsc/livehd/blob/master/core/lhtree.hpp" target="_blank" rel="noopener">LHTree&lt;/a> data structure provides a foundation, but there was significant potential for further optimization, which I explored throughout this project.&lt;/p>
&lt;h3 id="key-ast-queries">Key AST Queries&lt;/h3>
&lt;p>The core queries that the tree is optimized for include:&lt;/p>
&lt;ul>
&lt;li>Finding the parent of a node.&lt;/li>
&lt;li>Finding the first and last child of a node.&lt;/li>
&lt;li>Locating the previous and next sibling of a node.&lt;/li>
&lt;li>Adding a child to a node.&lt;/li>
&lt;li>Inserting a sibling to a node.&lt;/li>
&lt;li>Performing preorder, postorder, and sibling order traversal.&lt;/li>
&lt;li>Removing a leaf or an entire subtree from the tree.&lt;/li>
&lt;/ul>
&lt;p>The primary goal was to create a tree class that excels at handling these queries efficiently, while still being robust enough to support less frequent operations. The new HHDS tree structure has demonstrated superior performance for specific tree configurations and continues to show potential across other types, particularly in memory consumption and cache efficiency, compared to the current LHTree.&lt;/p>
&lt;p>The benchmarks were done using Google Bench to test the tree for scalability and performance. The new version of the tree is currently being integrated into the LiveHD core repository. Profiling to find bottlenecks in the tree was also done using Callgrind and KCachegrind.&lt;/p>
&lt;h2 id="background-and-motivation">Background and Motivation&lt;/h2>
&lt;h3 id="naive-approach">Naive approach&lt;/h3>
&lt;p>A straightforward method for storing an n-ary tree is to maintain pointers from each node to its parent, children, and immediate siblings. While simple, this approach is memory-intensive and has poor cache efficiency due to the non-contiguous nature of nodes in memory. The variable memory usage per node, depending on the number of children, can also introduce significant overhead.&lt;/p>
&lt;h3 id="enhancements-to-the-naive-approach">Enhancements to the Naive Approach&lt;/h3>
&lt;p>To reduce memory overhead, one optimization is to store only pointers to the first and last child within each node. This reduces memory usage to a constant per node. Additionally, since many AST-related queries focus on the tree&amp;rsquo;s structure rather than the data itself, we can separate the data from the structure. The tree would store only pointers to the data, allowing the tree structure to be optimized independently of the data storage.&lt;/p>
&lt;blockquote>
&lt;p>While separating the data and the structure may seem like an obvious improvement, we will see that it can be extended to provide greater benefits.&lt;/p>
&lt;/blockquote>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Naive and improved methods of storing the tree" srcset="
/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/fig1_hu78c649c062d309c5f78b4b25d06f11c2_90521_7d57a7eca121eafa6de264160253597d.webp 400w,
/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/fig1_hu78c649c062d309c5f78b4b25d06f11c2_90521_ad09a2aa9614ada2d18b11fd703737e7.webp 760w,
/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/fig1_hu78c649c062d309c5f78b4b25d06f11c2_90521_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/fig1_hu78c649c062d309c5f78b4b25d06f11c2_90521_7d57a7eca121eafa6de264160253597d.webp"
width="760"
height="686"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h3 id="improving-the-cache-efficiency">Improving the cache efficiency&lt;/h3>
&lt;p>While reducing memory consumption is beneficial, the tree&amp;rsquo;s cache efficiency can still be suboptimal if the children of a node are scattered in memory. To enhance cache efficiency, storing children in contiguous memory locations is crucial. This improves spatial locality, which in turn boosts cache performance. Additionally, this approach eliminates the need to explicitly store data pointers in the tree, as the data resides at a contiguous memory index aligned with the bookkeeping.&lt;/p>
&lt;p>By storing children contiguously, we can also eliminate the need for previous and next sibling pointers, as siblings are inherently adjacent in memory. Similarly, we can avoid storing the parent pointer for every child, since all children share the same parent.&lt;/p>
&lt;h2 id="optimizations-in-lhtree-old-method">Optimizations in LHTree (Old method)&lt;/h2>
&lt;p>The &lt;a href="https://github.com/masc-ucsc/livehd/blob/master/core/lhtree.hpp" target="_blank" rel="noopener">LHTree&lt;/a> class in LiveHD was designed with these optimizations in mind. It groups siblings into &lt;em>chunks&lt;/em> of four, storing the parent pointer only in the first sibling of each chunk. The last sibling in each chunk points to the next chunk, minimizing the number of pointers required and thus reducing memory overhead.&lt;/p>
&lt;p>LHTree organizes the entire tree as a 2-dimensional array, where the first dimension represents the tree level and the second dimension represents the node index at that level. This structure improves cache efficiency by storing nodes contiguously in memory. Each tree position is a 48-bit ID, with the last 32 bits representing the node&amp;rsquo;s index and the first 16 bits indicating the tree level.&lt;/p>
&lt;p>This explicit maintenance of level separately limits the tree&amp;rsquo;s scalability for deeper trees, due to the fixed number of bits allocated for the level.&lt;/p>
&lt;blockquote>
&lt;p>Despite these optimizations, LHTree has some limitations, particularly in cache alignment and flexibility, which the HHDS tree aims to address.&lt;/p>
&lt;/blockquote>
&lt;p>Unfortunately, the number of bits required by each &amp;ldquo;chunk&amp;rdquo; happens to be slightly bigger than a single cache line (512 bits). This means that the cache efficiency of the tree is not optimal.&lt;/p>
&lt;h2 id="hhds-tree--a-new-approach">HHDS Tree : A New Approach&lt;/h2>
&lt;h3 id="eliminating-levels">Eliminating Levels&lt;/h3>
&lt;p>The HHDS tree stores everything in a single vector, removing the need for explicit level information. This simplification not only improves cache efficiency but also eliminates restrictions on the number of nodes per level and the total number of levels.&lt;/p>
&lt;h3 id="enhanced-cache-alignment">Enhanced Cache Alignment&lt;/h3>
&lt;p>In the HHDS tree, each node has a 46-bit ID. Chunks in the HHDS tree contain up to eight children, with the first 43 bits of the absolute ID serving as the chunk ID and the last three bits indicating the node&amp;rsquo;s offset within the chunk.&lt;/p>
&lt;p>For each chunk, which is exactly 64 bytes (or 512 bits) long—matching the size of a cache line—the following information is stored:&lt;/p>
&lt;ul>
&lt;li>A 46-bit parent pointer (absolute ID).&lt;/li>
&lt;li>A 43-bit first child long pointer (chunk ID).&lt;/li>
&lt;li>A 43-bit last child long pointer (chunk ID).&lt;/li>
&lt;li>43-bit previous and next sibling chunk pointers.&lt;/li>
&lt;li>Seven 21-bit short delta pointers for the first child.&lt;/li>
&lt;li>Seven 21-bit short delta pointers for the last child.&lt;/li>
&lt;/ul>
&lt;blockquote>
&lt;p>&lt;strong>NOTE&lt;/strong>: The 0th chunk is an INVALID node, the real nodes start from the 1st chunk, with the node at an absolute ID of 8 (chunk ID of 1) being the root node.&lt;/p>
&lt;/blockquote>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Overview of the HHDS tree book-keeping" srcset="
/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/fig2_hucb9a27b986f748027535f10fe0848fa0_79213_30c5181b8def0cc33b1b86e98f51c9db.webp 400w,
/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/fig2_hucb9a27b986f748027535f10fe0848fa0_79213_dbc8dcac70e873bb719beedc7adf4645.webp 760w,
/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/fig2_hucb9a27b986f748027535f10fe0848fa0_79213_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/fig2_hucb9a27b986f748027535f10fe0848fa0_79213_30c5181b8def0cc33b1b86e98f51c9db.webp"
width="760"
height="359"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;blockquote>
&lt;p>Refer to the next section for more information on the short delta pointers.&lt;/p>
&lt;/blockquote>
&lt;p>The chunk is 512 bits long, which is 64 bytes, exactly the size of a cache line. Thus the amount of memory required in the worst case is 512 bits for a single node in the chunk, and in the best case is 46 bits for all 8 nodes in the chunk.&lt;/p>
&lt;blockquote>
&lt;p>We utilized the &lt;code>__attribute__((packed, aligned(64)))&lt;/code> attribute in C++ to ensure that each chunk aligns perfectly with a cache line. Bitfields were employed to pack the data efficiently within the chunk.&lt;/p>
&lt;/blockquote>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-cpp" data-lang="cpp">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nf">__attribute__&lt;/span>&lt;span class="p">((&lt;/span>&lt;span class="n">packed&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">aligned&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">64&lt;/span>&lt;span class="p">)))&lt;/span> &lt;span class="n">Tree_pointers&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">private&lt;/span>&lt;span class="o">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// We only store the exact ID of parent
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nl">parent&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">CHUNK_BITS&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">CHUNK_SHIFT&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nl">next_sibling&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">CHUNK_BITS&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nl">prev_sibling&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">CHUNK_BITS&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Long child pointers
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nl">first_child_l&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">CHUNK_BITS&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nl">last_child_l&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">CHUNK_BITS&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Short (delta) child pointers
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// You cannot make an array of bitfields inside a packed
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// struct, since the compiler will align each bitfield to the
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// size of the nearest power of two.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">first_child_s_0&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">first_child_s_1&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">first_child_s_2&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">first_child_s_3&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">first_child_s_4&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">first_child_s_5&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">first_child_s_6&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">last_child_s_0&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">last_child_s_1&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">last_child_s_2&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">last_child_s_3&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">last_child_s_4&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">last_child_s_5&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">last_child_s_6&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="build-append---short-delta-heuristic">Build Append - Short Delta Heuristic&lt;/h3>
&lt;p>Empirical observations show that children are often added to a node shortly after the parent, meaning they are stored close to the parent in memory. This allows children to be stored as a delta from the parent, reducing the need for full chunk IDs.&lt;/p>
&lt;p>When adding a child:&lt;/p>
&lt;ul>
&lt;li>Attempt to store the child as a delta from the parent.&lt;/li>
&lt;li>If not feasible, allocate a new chunk for the parent and store the pointer to the child chunk in the newly created parent chunk.&lt;/li>
&lt;/ul>
&lt;p>Implementing chunk breaking required careful handling to ensure that when a parent moves to a new chunk, its new chunk can still be referenced efficiently by its parent, potentially requiring recursive adjustments.&lt;/p>
&lt;blockquote>
&lt;p>This is because the grandparent might not be able to store the parent as a delta from itself after the parent moves to a new chunk.&lt;/p>
&lt;/blockquote>
&lt;h2 id="compliance-with-the-livehd-core-repository">Compliance with the LiveHD core repository&lt;/h2>
&lt;p>Since the HHDS tree is an evolution of the LHTree, it was crucial to maintain compatibility with the LiveHD core repository. All necessary methods were implemented in the HHDS tree to ensure seamless integration. Naming conventions and syntax were kept consistent with the LHTree to facilitate a smooth transition.&lt;/p>
&lt;p>Exposed methods in the HHDS tree are:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-cpp" data-lang="cpp">&lt;span class="line">&lt;span class="cl">&lt;span class="cm">/**
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="cm"> * Query based API (no updates)
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="cm"> */&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nf">get_parent&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">Tree_pos&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">curr_index&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">const&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nf">get_last_child&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">Tree_pos&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">parent_index&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">const&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nf">get_first_child&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">Tree_pos&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">parent_index&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">const&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">bool&lt;/span> &lt;span class="nf">is_last_child&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">Tree_pos&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">self_index&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">const&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">bool&lt;/span> &lt;span class="nf">is_first_child&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">Tree_pos&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">self_index&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">const&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nf">get_sibling_next&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">Tree_pos&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">sibling_id&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">const&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nf">get_sibling_prev&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">Tree_pos&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">sibling_id&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">const&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">bool&lt;/span> &lt;span class="nf">is_leaf&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">Tree_pos&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">leaf_index&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">const&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="cm">/**
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="cm"> * Update based API (Adds and Deletes from the tree)
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="cm"> */&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// FREQUENT UPDATES
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nf">append_sibling&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">Tree_pos&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">sibling_id&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="k">const&lt;/span> &lt;span class="n">X&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">data&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nf">add_child&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">Tree_pos&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">parent_index&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="k">const&lt;/span> &lt;span class="n">X&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">data&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nf">add_root&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">X&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">data&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">void&lt;/span> &lt;span class="nf">delete_leaf&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">Tree_pos&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">leaf_index&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">void&lt;/span> &lt;span class="nf">delete_subtree&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">Tree_pos&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">subtree_root&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// INFREQUENT UPDATES
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nf">insert_next_sibling&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">Tree_pos&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">sibling_id&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">const&lt;/span> &lt;span class="n">X&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">data&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h1 id="benchmarking-results">Benchmarking Results&lt;/h1>
&lt;p>Preliminary benchmarks indicate that the HHDS tree outperforms the LHTree in both runtime efficiency (for certain cases, more on this in a later section) and memory consumption. The HHDS tree demonstrates enhanced performance across various tests, offering a more optimized solution for handling Abstract Syntax Tree (AST) operations.&lt;/p>
&lt;p>I constructed identical trees using both the LHTree and HHDS tree structures and executed a series of queries on each. The benchmarks were performed using Google Benchmark to ensure accurate and consistent results. Below, I detail the specific tests conducted.&lt;/p>
&lt;h3 id="benchmark-tests-overview">Benchmark Tests Overview&lt;/h3>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Deep Tree Test&lt;/strong>&lt;br>
This test simulates a line graph by repeatedly adding a child to the last node in the tree. It is designed to assess the tree&amp;rsquo;s performance when handling deep structures, where each node has a single child.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Wide Tree Test&lt;/strong>&lt;br>
In this scenario, a single root node is created, followed by the addition of numerous child nodes directly under the root. This test evaluates the tree&amp;rsquo;s efficiency in managing wide structures with many immediate children.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Chip-Typical Tree Test&lt;/strong>&lt;br>
This test models a tree commonly seen in hardware design. For each node, a random number of children (ranging from 1 to 7) are added, and the process is recursively applied to the leaf nodes up to a certain depth. This test measures the tree&amp;rsquo;s performance in realistic, varied conditions.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Chip-Typical (Long) Tree Test&lt;/strong>&lt;br>
Similar to the Chip-Typical Tree Test, but with a broader range of children per node (1 to 20). This test is particularly useful for examining performance when the tree is more complex and chunk splitting is more likely.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>These tests provide a comprehensive analysis of the HHDS tree&amp;rsquo;s capabilities, highlighting its superiority over the LHTree for deeper trees.&lt;/p>
&lt;h2 id="addappend-benchmarks">Add/Append Benchmarks&lt;/h2>
&lt;h3 id="deep-tree-test">Deep Tree Test&lt;/h3>
&lt;blockquote>
&lt;p>&lt;code>test_deep_tree_100_hhds&lt;/code> indicates the time taken to run a benchmark on a deep tree of 100 nodes using the HHDS tree structure. This nomenclature is consistent across all tests.&lt;/p>
&lt;/blockquote>
&lt;h4 id="disabled-compiler-optimizations">Disabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10_hhds 11704 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10_lh 19541 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100_hhds 85317 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100_lh 163058 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000_hhds 760260 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000_lh 1442391 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000_hhds 9889199 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000_lh 16215232 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100000_hhds 84650074 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100000_lh 163255882 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000000_hhds 877646208 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000000_lh 1659725904 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000000_hhds 9256118059 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000000_lh 1.4431e+10 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h4 id="enabled-compiler-optimizations">Enabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10_hhds 1443 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10_lh 1462 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100_hhds 7398 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100_lh 17455 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000_hhds 79544 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000_lh 165656 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000_hhds 1337406 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000_lh 1494153 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100000_hhds 12288324 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100000_lh 14897463 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000000_hhds 116810846 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000000_lh 188815892 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000000_hhds 2338596582 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000000_lh 2238844395 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Here, the HHDS tree structure consistently outperforms the LHTree in the Deep Tree Test, showcasing its efficiency in handling deep tree structures.&lt;/p>
&lt;h3 id="wide-tree-test">Wide Tree Test&lt;/h3>
&lt;h4 id="disabled-compiler-optimizations-1">Disabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10_hhds 6581 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10_lh 6235 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100_hhds 34911 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100_lh 35734 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000_hhds 323228 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000_lh 312755 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000_hhds 3547963 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000_lh 2975894 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100000_hhds 33800125 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100000_lh 32538424 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000000_hhds 332509041 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000000_lh 336261868 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000000_hhds 3527352810 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000000_lh 8774024963 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h4 id="enabled-compiler-optimizations-1">Enabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10_hhds 837 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10_lh 512 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100_hhds 3394 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100_lh 2675 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000_hhds 26019 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000_lh 20141 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000_hhds 319068 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000_lh 245964 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100000_hhds 3369183 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100000_lh 2910862 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000000_hhds 39243340 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000000_lh 26777306 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000000_hhds 454508781 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000000_lh 331688046 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Here without compiler optimizations, the HHDS tree structure typically outperforms the LHTree in the Wide Tree Test for large tree sizes. For smaller tree sizes, the LHTree showed a slightly better performance. However, using compiler optimizations, the LHTree starts to perform better than HHDS.&lt;/p>
&lt;blockquote>
&lt;p>The reason for the HHDS tree&amp;rsquo;s superior performance can be attributed to the chunk size being large, which allows for better cache utilization and reduced memory overhead. However, the LH Tree has been put through more tuning and has been in use for a longer time, which could explain its better performance with compiler optimizations. In the future, the HHDS tree could be optimized further to match or exceed the LH Tree&amp;rsquo;s performance.&lt;/p>
&lt;/blockquote>
&lt;h3 id="chip-typical-tree-test">Chip Typical Tree Test&lt;/h3>
&lt;h4 id="disabled-compiler-optimizations-2">Disabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">--------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">--------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_1_hhds 7109 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_1_lh 6803 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_2_hhds 22728 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_2_lh 22064 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_3_hhds 75398 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_3_lh 70910 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_4_hhds 270062 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_4_lh 254423 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_5_hhds 1110254 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_5_lh 1074439 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_6_hhds 5024264 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_6_lh 3900709 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_7_hhds/iterations:5 13290739 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_7_lh/iterations:5 22145462 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_8_hhds/iterations:5 83438683 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_8_lh/iterations:5 105475664 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h4 id="enabled-compiler-optimizations-2">Enabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">--------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">--------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_1_hhds 938 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_1_lh 387 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_2_hhds 1877 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_2_lh 1351 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_3_hhds 7095 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_3_lh 5052 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_4_hhds 35019 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_4_lh 21569 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_5_hhds 130915 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_5_lh 78010 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_6_hhds 522385 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_6_lh 278223 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_7_hhds/iterations:5 4015636 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_7_lh/iterations:5 1648426 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_8_hhds/iterations:5 9873724 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_8_lh/iterations:5 4607773 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>For the Chip Typical test, the HHDS tree&amp;rsquo;s performance is better for larger tree sizes, while the LHTree performs better for smaller tree sizes. However, with compiler optimizations, the LH Tree performs better than the HHDS tree.&lt;/p>
&lt;h3 id="chip-typical-long-tree-test">Chip Typical (long) Tree test&lt;/h3>
&lt;h4 id="disabled-compiler-optimizations-3">Disabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">-------------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">-------------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_1_hhds 8875 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_1_lh 8479 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_2_hhds 62490 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_2_lh 64620 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_3_hhds 625064 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_3_lh 654787 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_4_hhds 6128047 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_4_lh 6528778 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_5_hhds 71345448 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_5_lh 77170587 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_6_hhds/iterations:5 656595039 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_6_lh/iterations:5 860193491 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h4 id="enabled-compiler-optimizations-3">Enabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">-------------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">-------------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_1_hhds 1139 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_1_lh 692 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_2_hhds 8666 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_2_lh 5238 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_3_hhds 90856 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_3_lh 48758 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_4_hhds 1034346 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_4_lh 472964 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_5_hhds 13040238 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_5_lh 5025192 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_6_hhds/iterations:3 131143411 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_6_lh/iterations:3 68739573 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Similar to the previous case, the HHDS tree performs better in debug mode (without compiler optimizations). However, the LH Tree performs better with compiler optimizations.&lt;/p>
&lt;blockquote>
&lt;p>We see that the HHDS tree has shown overall better performance without compiler optimizations, however, with compiler optimizations, the LH Tree has shown better performance. HHDS Tree has shown better performance regardless, for the Deep Tree test. This indicates an inherent trade-off between the choice of both trees. To further investigate this behaviour I conducted some profiling, which is in a later section.&lt;/p>
&lt;/blockquote>
&lt;h2 id="iterators-benchmarks">Iterators Benchmarks&lt;/h2>
&lt;h3 id="deep-tree-test-1">Deep Tree test&lt;/h3>
&lt;h4 id="disabled-compiler-optimizations-4">Disabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">--------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">-------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10_hhds 884 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10_lh 1356 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100_hhds 7987 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100_lh 11191 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000_hhds 86991 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000_lh 105809 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000_hhds 894127 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000_lh 1076983 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100000_hhds 7927102 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100000_lh 11177187 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000000_hhds/iterations:4 80470145 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000000_lh/iterations:4 145763040 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000000_hhds/iterations:3 1055529435 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000000_lh/iterations:3 995416880 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h4 id="enabled-compiler-optimizations-4">Enabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10_hhds 202 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10_lh 93.1 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100_hhds 1595 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100_lh 1039 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000_hhds 15663 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000_lh 11000 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000_hhds 164778 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000_lh 107293 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100000_hhds 1615928 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100000_lh 1260507 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000000_hhds 19582402 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000000_lh 15954697 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000000_hhds 214887559 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000000_lh 179118729 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="wide-tree-test-1">Wide Tree test&lt;/h3>
&lt;h4 id="disabled-compiler-optimizations-5">Disabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">-------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">-------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10_hhds 7171 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10_lh 7098 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100_hhds 6204 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100_lh 10372 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000_hhds 62762 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000_lh 106132 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000_hhds 622999 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000_lh 1124283 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100000_hhds 6118490 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100000_lh 9550170 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000000_hhds/iterations:10 59438777 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000000_lh/iterations:10 97842431 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000000_hhds/iterations:7 778347697 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000000_lh/iterations:7 1163215808 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h4 id="enabled-compiler-optimizations-5">Enabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10_hhds 2103 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10_lh 1284 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100_hhds 1563 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100_lh 632 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000_hhds 15627 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000_lh 6410 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000_hhds 149588 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000_lh 56030 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100000_hhds 1511278 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100000_lh 563926 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000000_hhds 17056051 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000000_lh 7754815 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000000_hhds 143994848 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000000_lh 55040231 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="chip-typical-test">Chip typical test&lt;/h3>
&lt;h4 id="disabled-compiler-optimizations-6">Disabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">--------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">--------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_1_hhds 344 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_1_lh 892 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_2_hhds 2192 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_2_lh 1691 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_3_hhds 13628 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_3_lh 14235 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_4_hhds 34049 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_4_lh 84096 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_5_hhds 206482 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_5_lh 203680 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_6_hhds 848996 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_6_lh 708212 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_7_hhds/iterations:5 3645372 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_7_lh/iterations:5 6657982 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_8_hhds/iterations:5 7375050 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_8_lh/iterations:5 4577351 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h4 id="enabled-compiler-optimizations-6">Enabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">-------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">-------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_1_hhds 93.1 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_1_lh 50.1 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_2_hhds 149 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_2_lh 212 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_3_hhds 1166 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_3_lh 554 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_4_hhds 7385 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_4_lh 3138 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_5_hhds 54477 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_5_lh 10643 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_6_hhds 215050 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_6_lh 53043 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_7_hhds 492555 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_7_lh 577120 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_8_hhds 2630675 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_8_lh 1278702 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="chip-typical-long-test">Chip typical (long) test&lt;/h3>
&lt;h4 id="disabled-compiler-optimizations-7">Disabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_1_hhds 911 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_1_lh 1435 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_2_hhds 8161 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_2_lh 8619 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_3_hhds 76618 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_3_lh 132467 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_4_hhds 1644808 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_4_lh 1962406 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_5_hhds 7199648 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_5_lh 9195894 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_6_hhds 169002499 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_6_lh 207296570 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h4 id="enabled-compiler-optimizations-7">Enabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_1_hhds 223 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_1_lh 101 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_2_hhds 2270 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_2_lh 719 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_3_hhds 38291 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_3_lh 12547 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_4_hhds 294222 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_4_lh 187010 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_5_hhds 4721230 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_5_lh 835256 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_6_hhds 30302468 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_6_lh 10057136 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Overall, both add/append and iterators related benchmarks show an improvement in performance. Without compiler optimizations, HHDS tree performs better than the LH Tree. With compiler optimizations, there are similar differences in the traversal benchmarks. We will now look at some profiling that was done to identify the bottlenecks in the HHDS tree.&lt;/p>
&lt;h2 id="exceptions-and-a-reminder-of-why-they-are-slow">Exceptions, and a reminder of why they are slow.&lt;/h2>
&lt;p>When looking at the performance difference between the HHDS tree and LH tree (after enabling compiler optimizations), I was shocked to see that the HHDS tree was performing worse than the LH tree by multiple orders of magnitude upon using exceptions. This was a surprise to me, as I had not expected exceptions to have such a large impact on performance.&lt;/p>
&lt;p>The reason this happens is because exceptions are slow. When an exception is thrown, the stack is unwound, and the program has to jump to the catch block. This is a slow process, and should be avoided in performance-critical code. Moreover, the compiler cannot optimize code with exceptions as well as it can without them. This is why the HHDS tree performs so much worse than the LH tree when exceptions are enabled. But the HHDS tree still wasn&amp;rsquo;t performing as well as it should have been.&lt;/p>
&lt;h1 id="profiling">Profiling&lt;/h1>
&lt;p>I used &lt;code>callgrind&lt;/code> to profile the HHDS tree and identify potential bottlenecks. The profiling results provided valuable insights into the tree&amp;rsquo;s performance and areas for optimization. I generated a call graph using &lt;code>KCachegrind&lt;/code> and analyzed the function calls to determine the most time-consuming operations.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Profiling results" srcset="
/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/fig3_hubc3fa7f2ca383621c0ea38621e28abe1_254926_06163f8afdc871f89387a8c1724d9e28.webp 400w,
/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/fig3_hubc3fa7f2ca383621c0ea38621e28abe1_254926_731e96b9cf72b9d02381dec918d2530f.webp 760w,
/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/fig3_hubc3fa7f2ca383621c0ea38621e28abe1_254926_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/fig3_hubc3fa7f2ca383621c0ea38621e28abe1_254926_06163f8afdc871f89387a8c1724d9e28.webp"
width="760"
height="683"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>The call graph clearly shows that the bottleneck is the &lt;code>_create_space&lt;/code> call that is tasked with creating space for a new node. This function is called when a new node is added to the tree, and its performance directly impacts the tree&amp;rsquo;s efficiency.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">inline Tree_pos _create_space(const X&amp;amp; data) {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> // Make space for CHUNK_SIZE number of entries at the end
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> data_stack.emplace_back(data);
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> for (int i = 0; i &amp;lt; CHUNK_MASK; i++) {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> data_stack.emplace_back();
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> }
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> // Add the single pointer node for all CHUNK_SIZE entries
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> pointers_stack.emplace_back();
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> return pointers_stack.size() - 1;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>However, the &lt;code>_create_space&lt;/code> function is relatively simple and should not be causing such a significant performance hit. This indicates that the issue may lie in the memory allocation process or the data structure itself. One possible way of dealing with this would be to increase chunk sizes, or enable dynamic chunk sizing, which would allow for more efficient memory allocation.&lt;/p>
&lt;p>Another possible bottleneck, seems to be any amount of computation that will be done to find the next vacant space in the chunk (like in &lt;code>get_last_child()&lt;/code>). This is because the chunk is a fixed size, and if the chunk is full, the program will have to search for the next chunk that has space. This is a linear operation, and can be slow for wide trees. To fix this, I tried to add extra bookkeeping in the &lt;code>Tree_pointers&lt;/code> node structure:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">class __attribute__((packed, aligned(64))) Tree_pointers {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">private:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> // We only store the exact ID of parent
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> Tree_pos parent : CHUNK_BITS + CHUNK_SHIFT;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> Tree_pos next_sibling : CHUNK_BITS;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> Tree_pos prev_sibling : CHUNK_BITS;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> // Long child pointers
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> Tree_pos first_child_l : CHUNK_BITS;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> Tree_pos last_child_l : CHUNK_BITS;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> // Storing the last occupied index in the short delta
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> // This is to avoid iterating over all short deltas
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> // to find the last occupied index
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> unsigned short last_occupied : CHUNK_SHIFT;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> // Short (delta) child pointers
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> Short_delta first_child_s_0 : SHORT_DELTA;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> Short_delta first_child_s_1 : SHORT_DELTA;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> ...
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>However, the improvement in performance was marginal after making this change. This indicates that the issue may be more complex and require further investigation. This tree has also been added to the repository, in case a future contributor might be able to make use of it.&lt;/p>
&lt;p>There are other possible bottlenecks that might be coming from storing separate short deltas instead of reducing the size of the delta and packing it into a single large integer type. I will be implementing this idea in the future.&lt;/p>
&lt;h1 id="code-contributions">Code contributions&lt;/h1>
&lt;p>All of my Pull requests and code changes here made on the &lt;a href="https://github.com/masc-ucsc/hhds/graphs/contributors" target="_blank" rel="noopener">HHDS repository&lt;/a>. Each contribution has undergone thorough review and been successfully merged into the main repository:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/masc-ucsc/hhds/pull/32" target="_blank" rel="noopener">https://github.com/masc-ucsc/hhds/pull/32&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/masc-ucsc/hhds/pull/37" target="_blank" rel="noopener">https://github.com/masc-ucsc/hhds/pull/37&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/masc-ucsc/hhds/pull/38" target="_blank" rel="noopener">https://github.com/masc-ucsc/hhds/pull/38&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/masc-ucsc/hhds/pull/41" target="_blank" rel="noopener">https://github.com/masc-ucsc/hhds/pull/41&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/masc-ucsc/hhds/pull/47" target="_blank" rel="noopener">https://github.com/masc-ucsc/hhds/pull/47&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/masc-ucsc/hhds/pull/48" target="_blank" rel="noopener">https://github.com/masc-ucsc/hhds/pull/48&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/masc-ucsc/hhds/pull/54" target="_blank" rel="noopener">https://github.com/masc-ucsc/hhds/pull/54&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Additionally, we are planning to integrate these changes into the LiveHD repository in the near future.&lt;/p>
&lt;h1 id="conclusion-and-future-work">Conclusion and Future Work&lt;/h1>
&lt;p>Working on this project has been a valuable learning experience, particularly in applying core C++ features. I discovered that simple, fundamentally sound optimizations often outperform more complex ones. The greatest challenge for me was to steer through the changes in our original Plan of Action, however, due to the support and guidance from my mentors I was able to make it.&lt;/p>
&lt;p>There are still areas where the HHDS tree can be improved to make it more robust. One area of future exploration is dynamic chunk sizing:&lt;/p>
&lt;blockquote>
&lt;p>Dynamic Chunk Sizing: Instead of using fixed 8-sized chunks as we did, we could implement multiple chunk sizes. This would allow users to &amp;ldquo;hint&amp;rdquo; the HHDS tree to use specific chunk types, potentially reducing memory consumption further.&lt;/p>
&lt;/blockquote>
&lt;p>Overall, the HHDS tree has shown promise in handling deep tree structures efficiently. With further optimization and enhancements, it can become a powerful tool for handling complex tree operations.&lt;/p>
&lt;h1 id="acknowledgements">Acknowledgements&lt;/h1>
&lt;p>I would like to thank my mentors, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jose-renau/">Jose Renau&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sakshi-garg/">Sakshi Garg&lt;/a> for their guidance and support throughout the project. It would not have been possible without their help. Their insights and mentorship have significantly contributed to my learning and the success of this work.&lt;/p></description></item><item><title>Final Blogpost: HDEval's LLM Benchmarking for HDL Design</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240821-ashwinbardhwaj/</link><pubDate>Wed, 21 Aug 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240821-ashwinbardhwaj/</guid><description>&lt;h1 id="introduction">Introduction&lt;/h1>
&lt;p>Hello everyone! I&amp;rsquo;m Ashwin Bardhwaj, an undergraduate student studying at UC Berkeley. As part of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/livehd">Micro Architecture Santa Cruz (MASC)&lt;/a> my &lt;a href="https://drive.google.com/file/d/1Fnr85lqrTs7OBohfHfSZI2K3wZU3zJm0/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jose-renau/">Jose Renau&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sakshi-garg/">Sakshi Garg&lt;/a> looks to create a suite of benchmark programs for &lt;a href="https://github.com/masc-ucsc/hdeval" target="_blank" rel="noopener">HDEval&lt;/a>.&lt;/p>
&lt;p>The goal of this project is to create large-scale Verilog programs in order to benchmark that capability of LLMs to develop HDL code. Throughout this project, I have created 3 of the large Verilog testbenches called 3-Stage-RISC_V processor, Gameboy Emulator, and Sorts. The benchmark programs will lose their effectriveness if LLMs such as ChatGPT scrape over Github reposotires and learn from them. As a result, the code itself cannot be made public due to LLM scraping over repositories, this file will cover the test report for all 3 of these projects.&lt;/p>
&lt;h1 id="3-stage-risc-v-processor">3 Stage RISC V Processor&lt;/h1>
&lt;p>This is a pipelined RISC processor developed to to handle RV32I instructions. A 3-Stage processsor will typically contain a Fetch, Decode, and Execute cycle. As a result, every instruction will take exactly 3 clock cycles. For this processor, instructions can be formatted into R, I (Load), S (Store), B (Cond), and J (Jump and Link) type instructions. Once a 32 bit instruction is fetched at the location in memory specifed by the pc (Program Counter) register, it is sent to be decoded by the &amp;ldquo;decode unit&amp;rdquo;. Through decoding an instruction, we can determine the exact operation code, register location of the 2 operands (rs1 and rs2), and the destination register (rd) at which to write the calculated result. After decoding, an activation flag is sent to the excetution cycle to then take and access the register file at address rs1 and rs2 in order to get the correct operand data. The data and operation is then sent to the ALU to compute the result based on the opcode. The result is then written back into the register file at the rd address and the program counter is incremented and the next instruction is fetched.&lt;/p>
&lt;p>The prompts for each module in this processor have been generated and tested against a GPT 3 turbo and GPT 4o models as an example. In the RISC V tab in my test report, I have provided the exact prompts and results after running on MASC&amp;rsquo;s &lt;a href="https://github.com/masc-ucsc/hdlagent" target="_blank" rel="noopener">HDLAgent&lt;/a> tool which can access the APIs of many LLMs.&lt;/p>
&lt;h1 id="gameboy-emulator">Gameboy Emulator&lt;/h1>
&lt;p>The Gameboy Emulator is a Verilog implementation of the classic GameBoy console that was widely popular in the 1990s. The main aspects of the GameBoy that were focused on in this project were the Z-80 like CPU, memory objects like RAM, VRAM, and ROM, the PPU (Picture Processing Unit), and other peripherals. The instructions are given to the CISC (variable-length instructions) CPU where they are decoded and executed based on the details and expectations of that specific instruction. In some cases, timing becomes a concern and there is significant effort made to ensure that instructions can be parsed and run predictably and effictively. Instructions from the ROM may take between 1 to 4 clock cycles to run depending on the requirements. For example, the instruction &amp;ldquo;LD B, HL&amp;rdquo; , loads the data found at the 16 bit address given by registers H and L into register B is a 2 cycle instruction. The first cycle decodes the HL address and fetches the data at the accurate location, while the second cycle takes the new input data at writes it into register B. This requires accurate timing control between different asects of the GameBoy.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Gameboy Emulator Top Level Wave File" srcset="
/report/osre24/ucsc/livehd/20240821-ashwinbardhwaj/Gameboy_Wave_File_hueac052dcb2b1c9a531ecc9cf3de73e1f_112493_1c31333f2eab882478c68b3e4fe07ef4.webp 400w,
/report/osre24/ucsc/livehd/20240821-ashwinbardhwaj/Gameboy_Wave_File_hueac052dcb2b1c9a531ecc9cf3de73e1f_112493_afc571aac140f2cd4e9e117826b4bf3a.webp 760w,
/report/osre24/ucsc/livehd/20240821-ashwinbardhwaj/Gameboy_Wave_File_hueac052dcb2b1c9a531ecc9cf3de73e1f_112493_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240821-ashwinbardhwaj/Gameboy_Wave_File_hueac052dcb2b1c9a531ecc9cf3de73e1f_112493_1c31333f2eab882478c68b3e4fe07ef4.webp"
width="760"
height="402"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>The Picture Processing Unit is also an integral feature of the gameboy. Three frames called Background, Window, and Sprite are combined into the classic Gameboy screens we know today. White the Background and Window data are consistently called from the VRAM after certain clock cycle times, the Sprite and sprtite attributes are accessed using DMA (Direct Memory Access) from OAM (Object Attribute Memory). This reduces the CPU load and improves the speed of sprite data.&lt;/p>
&lt;h1 id="deliverables">Deliverables&lt;/h1>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>HDEval Test Report&lt;/strong>: The &lt;a href="https://docs.google.com/spreadsheets/d/1vDh_k75h0sG8JGRDDZcdBM4AprVcw9l1/edit?usp=sharing&amp;amp;ouid=102173779464961795129&amp;amp;rtpof=true&amp;amp;sd=true" target="_blank" rel="noopener">HDEval Test Report&lt;/a> contains the module prompts for each testbench, the results after testing on GPT 3 turbo and 4o, and test cases to ensure code correctness and reliability.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>HDEval Repo&lt;/strong>: &lt;a href="https://github.com/masc-ucsc/hdeval" target="_blank" rel="noopener">HDEval&lt;/a> contains the encrypted version of the yaml files that encapsulate the code, prompts, and additional data.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h1 id="next-steps">Next Steps&lt;/h1>
&lt;p>Given these benchmarks, it is important to track the abilities of these LLMs to generate HDL code. Therefore, including GPT 3-turbo and 4o. I would like these benchmarks to be applied to more models so that we can track their growth and keep informed on their effectiveness in HDL and hardware.&lt;/p>
&lt;h1 id="previous-blogs">Previous Blogs&lt;/h1>
&lt;p>Please feel free to check out my previous blogs!&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240611-ashwinbardhwaj/">First Blog&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240718-ashwinbardhwaj/">Midterm Blog&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Thank you for reading!&lt;/p></description></item><item><title>Midterm Report : Halfway through medicinal data visulaization using PolyPhy/Polyglot</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/polyphy/20240719-ayushsharma/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/polyphy/20240719-ayushsharma/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Hello! My name is &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ayush-sharma/">Ayush Sharma&lt;/a>, a machine learning engineer and researcher based out of Chandigarh, a beautiful city in Northern India known for its modern architecture and green spaces.
For the last month and a half I have been working closely with my mentors &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/oskar-elek/">Oskar Elek&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/kiran-deol/">Kiran Deol&lt;/a> on the project titled &lt;a href="%5cproject%5cosre24%5cucsc%5cpolyphy">Unveiling Medicine Patterns: 3D Clustering with Polyphy/Polyglot&lt;/a>as part of GSoC 2024.&lt;/p>
&lt;h2 id="progress-and-challenges">Progress and Challenges&lt;/h2>
&lt;p>The project focuses on developing effective clustering algorithms to visualize medicine data in three dimensions using PolyPhy and Polyglot. My journey began with data preprocessing and cleaning, where unnecessary data points were removed, and missing values were addressed.&lt;/p>
&lt;p>One of the primary techniques we&amp;rsquo;ve employed is UMAP (Uniform Manifold Approximation and Projection). UMAP&amp;rsquo;s ability to preserve the global structure of the data while providing meaningful clusters proved advantageous. Initial experiments with UMAP on datasets of various sizes (ranging from 1,500 to 15,000 medicines) provided valuable insights into the clustering patterns. By iteratively halving the dimensions and refining the parameters, we achieved more accurate clustering results.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="UMAP on a dataset of 15000 medicines" srcset="
/report/osre24/ucsc/polyphy/20240719-ayushsharma/umap_hua68b3da7cb5e27475c0ecf687ad0d87a_123755_48eb545fa0673e23a0ff289b6fdac6cd.webp 400w,
/report/osre24/ucsc/polyphy/20240719-ayushsharma/umap_hua68b3da7cb5e27475c0ecf687ad0d87a_123755_12b5cf998e90e476fdd4e6c9800cc63e.webp 760w,
/report/osre24/ucsc/polyphy/20240719-ayushsharma/umap_hua68b3da7cb5e27475c0ecf687ad0d87a_123755_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/polyphy/20240719-ayushsharma/umap_hua68b3da7cb5e27475c0ecf687ad0d87a_123755_48eb545fa0673e23a0ff289b6fdac6cd.webp"
width="679"
height="603"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>To complement UMAP, we explored t-SNE (t-distributed Stochastic Neighbor Embedding). t-SNE&amp;rsquo;s focus on local relationships helped in understanding finer details within the clusters. By adjusting t-SNE parameters and conducting perturbations, we could better comprehend the data&amp;rsquo;s behavior. Combining UMAP with t-SNE in a loop, halving dimensions iteratively, showed promise, allowing us to leverage the strengths of both techniques to enhance clustering accuracy.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="t-SNE on a dataset of 15000 medicines" srcset="
/report/osre24/ucsc/polyphy/20240719-ayushsharma/t-SNE_hu27c25081a80397a68d5439e1a165b2a0_67619_505feb5f73fb8656ef98cfa71acfb53b.webp 400w,
/report/osre24/ucsc/polyphy/20240719-ayushsharma/t-SNE_hu27c25081a80397a68d5439e1a165b2a0_67619_fc473d7fb06ab1b2e2bafbb3b86db867.webp 760w,
/report/osre24/ucsc/polyphy/20240719-ayushsharma/t-SNE_hu27c25081a80397a68d5439e1a165b2a0_67619_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/polyphy/20240719-ayushsharma/t-SNE_hu27c25081a80397a68d5439e1a165b2a0_67619_505feb5f73fb8656ef98cfa71acfb53b.webp"
width="760"
height="527"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>We also experimented with pre-trained models like BERT and Glove to create embeddings for the medicines. BERT’s splitting of salts into subparts and Glove’s limitations in recognizing specific salts led us to inaccurate clustering and we&amp;rsquo;ve been working on improving it for the time being.&lt;/p>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;p>Moving forward, I will focus on refining our clustering and embedding techniques to enhance overall accuracy. This involves integrating Jaccard distance alongside other distance measures to improve similarity assessments between medicines and clusters. Additionally, I&amp;rsquo;ll continue experimenting with advanced models like gpt,CLIP, gemini etc., for better embeddings while addressing the limitations of BERT and Glove by leveraging custom embeddings created with transformers and one-hot encoding. Optimization of UMAP and t-SNE algorithms will also be crucial, ensuring their effectiveness in clustering and visualization. These steps aim to overcome current challenges and further advance the project&amp;rsquo;s goals.&lt;/p></description></item><item><title>Midway Through GSoC</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/lbl/drishti/20240714-jaytau/</link><pubDate>Wed, 31 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/lbl/drishti/20240714-jaytau/</guid><description>&lt;p>Hello everyone! I&amp;rsquo;m &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/joel-tony/">Joel Tony&lt;/a>, and I&amp;rsquo;m excited to share my progress update on the &lt;a href="https://github.com/hpc-io/drishti" target="_blank" rel="noopener">Drishti&lt;/a> project as part of my Google Summer of Code (GSoC) experience. Over the past few weeks, I&amp;rsquo;ve been diving deep into the world of I/O visualization for scientific applications, and I&amp;rsquo;m thrilled to tell you about the strides we&amp;rsquo;ve made.&lt;/p>
&lt;h2 id="what-is-drishti">What is Drishti?&lt;/h2>
&lt;p>For those unfamiliar with Drishti, it&amp;rsquo;s an application used to visualize I/O traces of scientific applications. When running complex scientific applications, understanding their I/O behavior can be challenging. Drishti steps in to parse logs from various sources, with a primary focus on those collected using &lt;a href="https://wordpress.cels.anl.gov/darshan/" target="_blank" rel="noopener">Darshan&lt;/a>, a lightweight I/O characterization tool for HPC applications. Drishti provides human-interpretable insights on how to improve I/O performance based on these logs. While Drishti supports multiple log sources, our current work emphasizes Darshan logs due to their comprehensive I/O information. Additionally, Drishti offers visually appealing and easy-to-understand graphs to help users better grasp their application&amp;rsquo;s I/O patterns, making it easier to identify bottlenecks and optimize performance.&lt;/p>
&lt;h2 id="progress-and-challenges">Progress and Challenges&lt;/h2>
&lt;h3 id="export-directory-feature">Export Directory Feature&lt;/h3>
&lt;p>One of the first features I implemented was the export directory functionality. In earlier versions of Drishti, users couldn&amp;rsquo;t select where they wanted their output files to be saved. This became problematic when working with read-only log locations. I familiarized myself with the codebase, created a pull request, and successfully added this feature, allowing users to choose their preferred output location.&lt;/p>
&lt;h3 id="ci-improvements-and-cross-project-dependencies">CI Improvements and Cross-Project Dependencies&lt;/h3>
&lt;p>While working on Drishti, I discovered the tight coupling between various tools in the HPC I/O organization, such as Drishti and DXT Explorer. This highlighted the need for improved Continuous Integration (CI) practices. We currently run about eight GitHub Actions for each pull request, but they don&amp;rsquo;t adequately test the interactions between different branches of these interconnected tools. This is an area we&amp;rsquo;ve identified for future improvement to ensure smoother integration and fewer conflicts between projects.&lt;/p>
&lt;h3 id="refactoring-for-multi-file-support">Refactoring for Multi-File Support&lt;/h3>
&lt;p>The bulk of my time was spent refactoring Drishti to extend its framework from parsing single Darshan files to handling multiple files. This task was more complex than it initially appeared, as Drishti&amp;rsquo;s insights are based on the contents of each Darshan file. When dealing with multiple files, we needed to find a way to aggregate the data meaningfully without sacrificing on performance.&lt;/p>
&lt;p>The original codebase had a single, thousand-line function for parsing Darshan files. To improve this, I implemented a data class structure in Python. This refactoring allows for:&lt;/p>
&lt;ol>
&lt;li>Better separation of computation and condition checking&lt;/li>
&lt;li>Easier parallelization of processing multiple traces&lt;/li>
&lt;li>Finer-grained profiling of performance bottlenecks&lt;/li>
&lt;li>More flexibility in data manipulation and memory management&lt;/li>
&lt;/ol>
&lt;h2 id="learnings-and-skills-gained">Learnings and Skills Gained&lt;/h2>
&lt;p>Through this process, I&amp;rsquo;ve gained valuable insights into:&lt;/p>
&lt;ol>
&lt;li>Refactoring large codebases&lt;/li>
&lt;li>Understanding and improving cross-project dependencies&lt;/li>
&lt;li>Implementing data classes in Python for better code organization&lt;/li>
&lt;li>Balancing performance with code readability and maintainability&lt;/li>
&lt;/ol>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;p>As I move forward with the project, my focus will be on:&lt;/p>
&lt;ol>
&lt;li>Adding unit tests for individual methods to ensure functionality&lt;/li>
&lt;li>Exploring alternative data frame implementations like Polars for better performance&lt;/li>
&lt;li>Developing aggregation methods for different types of data across multiple Darshan files&lt;/li>
&lt;li>Optimizing memory usage and computational efficiency for large datasets&lt;/li>
&lt;/ol>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>Working on Drishti has been an incredible learning experience. I&amp;rsquo;ve had the opportunity to tackle real-world challenges in scientific computing and I/O visualization. As we progress, I&amp;rsquo;m excited about the potential impact of these improvements on the scientific community&amp;rsquo;s ability to optimize their applications&amp;rsquo; I/O performance.&lt;/p>
&lt;p>I&amp;rsquo;m grateful for this opportunity and looking forward to the challenges and discoveries that lie ahead in the second half of my GSoC journey. Stay tuned for more updates as we continue to enhance Drishti!&lt;/p>
&lt;p>If you have any questions or would like to learn more about the project, feel free to &lt;a href="https://www.jaytau.com/#contact?ref=uc-ospo" target="_blank" rel="noopener">reach out to me&lt;/a>. Let&amp;rsquo;s keep pushing the boundaries of scientific computing together!&lt;/p></description></item><item><title>Streaming into the Future: Adding Real-Time Processing to FasTensor</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/lbl/fastensor/20240730-aditya_narayan/</link><pubDate>Tue, 30 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/lbl/fastensor/20240730-aditya_narayan/</guid><description>&lt;p>Hey there, HPC enthusiasts and fellow coders! I&amp;rsquo;m excited to share my progress on this summer&amp;rsquo;s Google Summer of Code project under UC OSPO&amp;rsquo;s FasTensor.
Here&amp;rsquo;s a glimpse into how we&amp;rsquo;re pushing the boundaries of real-time data processing.&lt;/p>
&lt;h2 id="the-big-picture-fastensor-and-hpc-challenges">The Big Picture: FasTensor and HPC Challenges&lt;/h2>
&lt;p>First, a quick refresher: FasTensor is our go-to tool for handling dense arrays in scientific computing. It tackles three major HPC challenges:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Optimizing computations&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Distributing data efficiently&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Balancing workloads across computing cores&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>FasTensor excels at these tasks, especially when dealing with data that has structural locality - a common feature in scientific computing. Here, the Stencil computations come in handy, capturing data locality for operations like solving partial differential equations in physical simulations.&lt;/p>
&lt;h3 id="the-mission-bringing-fastensor-into-real-time">The Mission: Bringing FasTensor into Real-Time&lt;/h3>
&lt;p>While FasTensor is great at processing existing data, the next frontier is handling live data streams from scientific instruments and sensors. That&amp;rsquo;s where my GSoC project comes in: adding stream processing capabilities to FasTensor.&lt;/p>
&lt;h2 id="progress-highlights">Progress Highlights:&lt;/h2>
&lt;h3 id="building-a-stream-simulator">Building a Stream Simulator&lt;/h3>
&lt;p>We&amp;rsquo;ve created FTstream, a nifty tool that simulates data streams. It can generate streams of various sizes and intervals, pushing the limits of what your disk can handle. We&amp;rsquo;re talking speeds up to 2.5 GiB/s on a non-parallel NVMe! This tool is crucial because many scientific instruments, from particle accelerators to radio telescopes, generate massive amounts of data at incredible speeds and we need to able to simulate that. For context, that&amp;rsquo;s faster than a 10MP RGB camera shooting at 35 frames per second that generates data at ~1 GiB/s.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="FTStream: a stream simulator" srcset="
/report/osre24/lbl/fastensor/20240730-aditya_narayan/ftstream_hu742d341e5ed79c96d79ca4fdb4fe00ee_107361_e1ff5502d16324d112780cafc587c0bb.webp 400w,
/report/osre24/lbl/fastensor/20240730-aditya_narayan/ftstream_hu742d341e5ed79c96d79ca4fdb4fe00ee_107361_9ecceb72d631078c6b5109deaefeb0f5.webp 760w,
/report/osre24/lbl/fastensor/20240730-aditya_narayan/ftstream_hu742d341e5ed79c96d79ca4fdb4fe00ee_107361_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/lbl/fastensor/20240730-aditya_narayan/ftstream_hu742d341e5ed79c96d79ca4fdb4fe00ee_107361_e1ff5502d16324d112780cafc587c0bb.webp"
width="760"
height="410"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h3 id="optimizing-io-strategies">Optimizing I/O Strategies&lt;/h3>
&lt;p>We&amp;rsquo;ve been experimenting with various I/O approaches to optimize high-speed data stream handling.&lt;/p>
&lt;h3 id="exploring-streaming-semantics">Exploring Streaming Semantics&lt;/h3>
&lt;p>We&amp;rsquo;re investigating various ways to express and execute stream transformations, to ensure that FasTensor can handle a wide range of streaming computations.&lt;/p>
&lt;h3 id="developing-io-drivers">Developing I/O Drivers&lt;/h3>
&lt;p>We&amp;rsquo;ve developed two new I/O drivers based on LinuxAIO and MPI IO to ingest incoming data smoothly and maintain stream consistency.&lt;/p>
&lt;h2 id="whats-next">What&amp;rsquo;s Next?&lt;/h2>
&lt;h3 id="putting-it-all-together">Putting It All Together&lt;/h3>
&lt;p>We&amp;rsquo;re in the final stretch of integrating all these components into a seamless stream processing system.&lt;/p>
&lt;h3 id="rigorous-testing">Rigorous Testing&lt;/h3>
&lt;p>We&amp;rsquo;ll push our stream processing to its limits, simulating diverse data flows to ensure rock-solid performance in any scientific setting.&lt;/p>
&lt;h3 id="hpc-environment-validation">HPC Environment Validation&lt;/h3>
&lt;p>The ultimate test will be running our new streaming capabilities in real HPC environments, checking how they perform with different I/O setups and computing paradigms.&lt;/p>
&lt;h2 id="wrapping-up">Wrapping Up&lt;/h2>
&lt;p>This summer has been a whirlwind of coding, testing, and learning. We&amp;rsquo;re making significant strides in bringing real-time processing capabilities to FasTensor, which could open up exciting new possibilities in scientific computing and data analysis.
Stay tuned for more updates as we finalize this feature. If you&amp;rsquo;re interested in the nitty-gritty technical details or want to check out the code, feel free to reach out or check our project repository.
Happy coding, and may your computations be ever faster!&lt;/p></description></item><item><title>Enhancing h5bench with HDF5 Compression Capability</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/lbl/h5bench/20240731-henryz/</link><pubDate>Sat, 27 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/lbl/h5bench/20240731-henryz/</guid><description>&lt;h1 id="introduction">Introduction&lt;/h1>
&lt;p>As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/lbl/h5bench">h5bench&lt;/a> project my &lt;a href="https://summerofcode.withgoogle.com/myprojects/details/n0H28Z40" target="_blank" rel="noopener">Enhencing h5bench with HDF5 Compression Capability&lt;/a> under the mentorship of Dr. &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jean-luca-bez/">Jean Luca Bez&lt;/a> and Dr. Suren Byna aims to allow users of h5bench to incoporate compression features in their simulations by creating custom benchmarks with common scientific lossless &amp;amp; lossy compression algorithms such as SZ, SZ3, ZFP, and GZIP.&lt;/p>
&lt;p>The problem I am trying to solve is to implement multiple data compression algorithms in h5bench core access patterns through HDF5 filters. This capability should grant users the flexibility to configure the parameters and methods of compression applied to their datasets according to their specific needs and preferences. My solution primarily involves using a user-defined HDF5 filter mechanism to implement lossless and lossy compression algorithms, such as ZFP, SZ, and cuSZ. Throughout the process, I will deliver one C source code implementing compression configuration settings, one C source code implementing lossless and lossy algorithms, a set of performance reports before and after data compression in CSV and standard output files, and a technical documentation on h5bench user manual website.&lt;/p>
&lt;h1 id="midterm-blog">Midterm Blog&lt;/h1>
&lt;p>This summer, after completing my junior year, I was honored to have the opportunity working with Dr. Jean Luca Bez and Dr. Suren Byna on the h5bench, an open-source benchmarking project designed to simulate runnning sync/async HDF5 I/O on HPC machines. This post will cover mostly what I have learned, produced, planned, and thoughts over the first six weeks.&lt;/p>
&lt;p>First of all, let&amp;rsquo;s define some of the terms here. HDF5 stands for Hierarchical Data Format 5. Unlike other data storage formats (JSON, CSV, XML&amp;hellip;), HDF5 is not only a container that manages data similar to a file system, but also a powerful library that gives you the ability to perform I/O (Inputs/Outputs) operations between memory and file. One of the reasons this tool is commonly used by HPC applications is that it also supports MPI I/O, which is a protocol for parallel computing (you can think of it as the parallel version of POSIX). With exabytes of data and high frequencies of usage for analysis in scientific studies, HDF5 is perfect for the job. Essentially, h5bench is a software that tests the hardware&amp;rsquo;s performance through HDF5 (it also provides other benchmark kernels such as AMReX, E3SM-IO, MACSio, and openPMD-api, but my job focuses on using vanilla HDF5 I/O).&lt;/p>
&lt;p>So, what I have done so far? Frist, my job is to allow users to tune input parameters regarding data compression, and make sure h5bench prints accurate benchmark results with the intended compression algorithm applied to their datasets. h5bench&amp;rsquo;s frondend is written in Python, which takes an input of a JSON file from user and parses it into a CFG configuration file that can be read by the backend later, which is written in C. I created a new enum struct and made user able to specify one from a range of compression algorithms (SZ3, ZFP, LZ4, GZIP, and other pre-defined algorithms). I also made it possible to apply these algorithms to the datasets, so the .h5 (an HDF5 file) would contain chunks of compressed data after multiple H5Dwrite calls.&lt;/p>
&lt;p>Next, the challenges and gains. Throughout the first six weeks, 30% of the time was spent on understanding the newest version of h5bench and HDF5 by reading through C source codes and documentations, and asking many dumb questions to my mentors (thanks to their patience and great answers :D). Writing code is fairly easy after I really understood what the program is doing. By that I mean you have to understand every line in almost all functions and how each and every variables change. 40% of the time was used on debugging and testing the compression algorithm, mainly SZ3. To make code behaves correctly is another level of difficulty. Most of the issues resulted from failing to configure the application and dependent libraries correctly. Without necessary macros enabled during the build process, features like compression filter plugin will not run. As I was also new to CMake and HPC environment, I learned that new envrionment variables will be reset for every new session, even if you requested a compute node resource. Besides getting used to the standard build sequence: &amp;ldquo;cmake ..&amp;rdquo;, &amp;ldquo;make&amp;rdquo;, &amp;ldquo;make install&amp;rdquo;, I also learned to use &amp;ldquo;ccmake ..&amp;rdquo; to examine the flags of the compiled program. The rest of time I learned more about parallel computing, HDF5, compression algorithms, by reading some papers and documentations. A lot of notes were taken (I must say a good note taking system is the game changer). Last but not the least, I also spent times synchronizing online and offline with my mentors to discuess problems. Without their help, I can never make this far.&lt;/p>
&lt;p>My next phase will tackle these problems, here I am just offering a list:&lt;/p>
&lt;ul>
&lt;li>Test applying filter with other compression algorithms, and with different dimension layout of the dataset&lt;/li>
&lt;li>Add decompression capability&lt;/li>
&lt;li>Allow users to tune the auxiliary parameters for controlling the behavior of a certain compression filter H5Pset_filter(COMPRESS_INFO.dcpl_id, H5Z_FILTER_SZ3, H5Z_FLAG_MANDATORY, 0, NULL); cd_nelmts cd_values[]&lt;/li>
&lt;li>Print additional benchmark results to indicate what and how the compression filter is applied, and the compression ratio&lt;/li>
&lt;/ul></description></item><item><title>Data Engineering and Automated Evaluation for OpenROAD's Chat Assistant: Midterm Update</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/</link><pubDate>Sun, 21 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/</guid><description>&lt;p>Hello everyone! We&amp;rsquo;ve reached the halfway point of our Google Summer of Code 2024 journey, and it&amp;rsquo;s time for an update on our project to build a conversational chat assistant for OpenROAD. Under the guidance of our mentors, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/indira-iyer/">Indira Iyer&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jack-luar/">Jack Luar&lt;/a>, we&amp;rsquo;re making significant strides in enhancing OpenROAD&amp;rsquo;s user support capabilities.&lt;/p>
&lt;h2 id="project-focus">Project Focus&lt;/h2>
&lt;p>My project focuses on two crucial aspects of our chat assistant:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Data Engineering&lt;/strong>: Ensuring our assistant has access to comprehensive and relevant information.&lt;/li>
&lt;li>&lt;strong>Evaluation&lt;/strong>: Developing robust methods to assess and improve the assistant&amp;rsquo;s performance.&lt;/li>
&lt;/ol>
&lt;p>The ultimate goal is to create a more responsive and accurate chat assistant capable of aiding users with troubleshooting, installation, and general queries about OpenROAD. I&amp;rsquo;m working in tandem with &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/palaniappan-r/">Palaniappan R&lt;/a>, who is developing the RAG architecture for our assistant.&lt;/p>
&lt;h2 id="progress">Progress&lt;/h2>
&lt;p>Since our initial deployment, I&amp;rsquo;ve been concentrating on implementing automated evaluation systems for our RAG architecture. We&amp;rsquo;ve developed two primary evaluation methods:&lt;/p>
&lt;h3 id="basic-abbreviation-evaluation">Basic Abbreviation Evaluation&lt;/h3>
&lt;p>This method assesses the model&amp;rsquo;s ability to accurately identify and explain common abbreviations used within the OpenROAD community. It ensures that our assistant can effectively communicate using domain-specific terminology.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Figure 1: Flow Chart of Basic Abbreviation Evaluation" srcset="
/report/osre24/ucsd/openroad/20240621-aviral/figure1_basic_abbreviation_evaluation_hud808c3411b9bf24258c9c6d4950618ae_122195_7793f2944668d59749f48f3848acfba7.webp 400w,
/report/osre24/ucsd/openroad/20240621-aviral/figure1_basic_abbreviation_evaluation_hud808c3411b9bf24258c9c6d4950618ae_122195_c0340ef0448a8f440bce5566986a10ef.webp 760w,
/report/osre24/ucsd/openroad/20240621-aviral/figure1_basic_abbreviation_evaluation_hud808c3411b9bf24258c9c6d4950618ae_122195_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/figure1_basic_abbreviation_evaluation_hud808c3411b9bf24258c9c6d4950618ae_122195_7793f2944668d59749f48f3848acfba7.webp"
width="469"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Examples" srcset="
/report/osre24/ucsd/openroad/20240621-aviral/figure2_sample_examples_hu78acdf5642b62d8d730a2574d861f211_90900_f04196ec40b94ffced2a574cbd37ad44.webp 400w,
/report/osre24/ucsd/openroad/20240621-aviral/figure2_sample_examples_hu78acdf5642b62d8d730a2574d861f211_90900_1a776103bd42be9525343172ad16d2a2.webp 760w,
/report/osre24/ucsd/openroad/20240621-aviral/figure2_sample_examples_hu78acdf5642b62d8d730a2574d861f211_90900_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/figure2_sample_examples_hu78acdf5642b62d8d730a2574d861f211_90900_f04196ec40b94ffced2a574cbd37ad44.webp"
width="760"
height="431"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h3 id="llm-judge-based-evaluation">LLM Judge-Based Evaluation&lt;/h3>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Figure 2: Flow Chart of LLM Judge-Based Evaluation" srcset="
/report/osre24/ucsd/openroad/20240621-aviral/figure3_llm_judge_evaluation_hu385b71952e0ff054a9e7c96b25e3d452_269894_8dfc4bba33d8ad8d797f27f1c7a1eaaf.webp 400w,
/report/osre24/ucsd/openroad/20240621-aviral/figure3_llm_judge_evaluation_hu385b71952e0ff054a9e7c96b25e3d452_269894_6ef7c0153c7e61298bbf98aa15f5d69d.webp 760w,
/report/osre24/ucsd/openroad/20240621-aviral/figure3_llm_judge_evaluation_hu385b71952e0ff054a9e7c96b25e3d452_269894_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/figure3_llm_judge_evaluation_hu385b71952e0ff054a9e7c96b25e3d452_269894_8dfc4bba33d8ad8d797f27f1c7a1eaaf.webp"
width="689"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>For this more comprehensive evaluation, we:&lt;/p>
&lt;ol>
&lt;li>Prepared a dataset of question-answer pairs relevant to OpenROAD.&lt;/li>
&lt;li>Queried our model with these questions to generate answers.&lt;/li>
&lt;li>Employed LLMs (including GPT-4o and Gemini 1.5 Flash) to act as judges.&lt;/li>
&lt;li>Evaluated our model&amp;rsquo;s responses against ground truth answers.&lt;/li>
&lt;/ol>
&lt;p>Here&amp;rsquo;s a glimpse of our early benchmark results:&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Benchmark" srcset="
/report/osre24/ucsd/openroad/20240621-aviral/figure4_model_performance_comparison_hu7cf4636aada9c277e08b0256b02e5dd8_206498_06ea37525851a60dad5bd072a03cd329.webp 400w,
/report/osre24/ucsd/openroad/20240621-aviral/figure4_model_performance_comparison_hu7cf4636aada9c277e08b0256b02e5dd8_206498_d9a11b8b08e2634c01f9063cc78ab134.webp 760w,
/report/osre24/ucsd/openroad/20240621-aviral/figure4_model_performance_comparison_hu7cf4636aada9c277e08b0256b02e5dd8_206498_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/figure4_model_performance_comparison_hu7cf4636aada9c277e08b0256b02e5dd8_206498_06ea37525851a60dad5bd072a03cd329.webp"
width="760"
height="701"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Example" srcset="
/report/osre24/ucsd/openroad/20240621-aviral/figure5_sample_examples_hu7826377a5560a22c15aefc27866894bb_575795_f63055fd0281e09d0ef800e1e444c7f9.webp 400w,
/report/osre24/ucsd/openroad/20240621-aviral/figure5_sample_examples_hu7826377a5560a22c15aefc27866894bb_575795_91c683a3ebadbf3ce5a21099a81b1836.webp 760w,
/report/osre24/ucsd/openroad/20240621-aviral/figure5_sample_examples_hu7826377a5560a22c15aefc27866894bb_575795_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/figure5_sample_examples_hu7826377a5560a22c15aefc27866894bb_575795_f63055fd0281e09d0ef800e1e444c7f9.webp"
width="577"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="exploratory-data-analysis-eda-on-github-openroad-issues">Exploratory Data Analysis (EDA) on GitHub OpenROAD issues&lt;/h2>
&lt;p>To gather more data, I performed Exploratory Data Analysis (EDA) on GitHub OpenROAD issues using GitHub&amp;rsquo;s GraphQL API. This allowed us to:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Filter data based on parameters such as:&lt;/p>
&lt;ul>
&lt;li>Minimum number of comments&lt;/li>
&lt;li>Date range&lt;/li>
&lt;li>Mentioned PRs&lt;/li>
&lt;li>Open or closed status&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Structure the data, focusing on issues tagged with Build, Query, Installation, and Runtime.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Process the data into JSONL format with key fields including:&lt;/p>
&lt;ul>
&lt;li>&lt;code>url&lt;/code>: URL of the GitHub issue&lt;/li>
&lt;li>&lt;code>id&lt;/code>: Unique issue number&lt;/li>
&lt;li>&lt;code>title&lt;/code>: Issue title&lt;/li>
&lt;li>&lt;code>author&lt;/code>: Username of the issue creator&lt;/li>
&lt;li>&lt;code>description&lt;/code>: Initial issue description&lt;/li>
&lt;li>&lt;code>content&lt;/code>: Array of messages related to the issue&lt;/li>
&lt;li>&lt;code>category&lt;/code>: General category of the issue&lt;/li>
&lt;li>&lt;code>subcategory&lt;/code>: More specific category of the issue&lt;/li>
&lt;li>&lt;code>tool&lt;/code>: Relevant tools or components&lt;/li>
&lt;li>&lt;code>date&lt;/code>: Issue creation timestamp&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Figure 5: Sample structure of our processed JSONL data" srcset="
/report/osre24/ucsd/openroad/20240621-aviral/figure6_jsonl_data_structure_hua8b5c15add3c3268be11381a14b4e3cd_555437_fd103ea5ef1fa131b8bc806db99a24d1.webp 400w,
/report/osre24/ucsd/openroad/20240621-aviral/figure6_jsonl_data_structure_hua8b5c15add3c3268be11381a14b4e3cd_555437_c30d5d185fec144cfca686499f464f19.webp 760w,
/report/osre24/ucsd/openroad/20240621-aviral/figure6_jsonl_data_structure_hua8b5c15add3c3268be11381a14b4e3cd_555437_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/figure6_jsonl_data_structure_hua8b5c15add3c3268be11381a14b4e3cd_555437_fd103ea5ef1fa131b8bc806db99a24d1.webp"
width="692"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>After curating this dataset, I was able to run an Analysis on OpenROAD Github Issues, identifying multiple categories of issues in the form of a pie chart.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Figure 6: Distribution of OpenROAD issue types" srcset="
/report/osre24/ucsd/openroad/20240621-aviral/figure7_issue_type_distribution_hu54d4d8b580cc8ae9261f464a2e9181da_130314_d788906f5395b26ab2030fb056e45941.webp 400w,
/report/osre24/ucsd/openroad/20240621-aviral/figure7_issue_type_distribution_hu54d4d8b580cc8ae9261f464a2e9181da_130314_ebae2b4145d035c9521679314911236b.webp 760w,
/report/osre24/ucsd/openroad/20240621-aviral/figure7_issue_type_distribution_hu54d4d8b580cc8ae9261f464a2e9181da_130314_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/figure7_issue_type_distribution_hu54d4d8b580cc8ae9261f464a2e9181da_130314_d788906f5395b26ab2030fb056e45941.webp"
width="760"
height="504"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Figure 7: Breakdown of issues by specific OpenROAD tools" srcset="
/report/osre24/ucsd/openroad/20240621-aviral/figure8_issues_by_openroad_tools_hu9e03c2c37c64c392ac5e783cdb492b5c_174856_3af195a89fadc1379452709cdea50d22.webp 400w,
/report/osre24/ucsd/openroad/20240621-aviral/figure8_issues_by_openroad_tools_hu9e03c2c37c64c392ac5e783cdb492b5c_174856_e171fcc132e7c13ef62f2a192ed18b62.webp 760w,
/report/osre24/ucsd/openroad/20240621-aviral/figure8_issues_by_openroad_tools_hu9e03c2c37c64c392ac5e783cdb492b5c_174856_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/figure8_issues_by_openroad_tools_hu9e03c2c37c64c392ac5e783cdb492b5c_174856_3af195a89fadc1379452709cdea50d22.webp"
width="760"
height="511"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="looking-ahead">Looking Ahead&lt;/h2>
&lt;p>As we move into the second half of the GSOC period, our plans include:&lt;/p>
&lt;ul>
&lt;li>Incorporating GitHub Discussions data into our knowledge base.&lt;/li>
&lt;li>Utilizing this expanded dataset to enhance our RAG architecture.&lt;/li>
&lt;li>Continually refining and improving our model&amp;rsquo;s performance based on evaluation results.&lt;/li>
&lt;/ul>
&lt;p>We&amp;rsquo;re excited about the progress we&amp;rsquo;ve made and look forward to delivering an even more capable and helpful chat assistant for the OpenROAD community. Stay tuned for more updates as we continue this exciting journey!&lt;/p></description></item><item><title>Hardware Hierarchical Dynamical Systems</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240720-ujjwalshekhar/</link><pubDate>Sat, 20 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240720-ujjwalshekhar/</guid><description>&lt;p>Hi everyone! I am Ujjwal Shekhar, a Computer Engineering student at the International Institute of Information Technology - Hyderabad. I am excited to share my current progress on the project titled &amp;ldquo;Hardware Hierarchical Dynamical Systems&amp;rdquo; as part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/osre/">Open Source Research Experience (OSRE) program&lt;/a> and &lt;a href="https://summerofcode.withgoogle.com/" target="_blank" rel="noopener">Google Summer of Code&lt;/a>. I am working with my mentors, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jose-renau/">Jose Renau&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sakshi-garg/">Sakshi Garg&lt;/a>, on this project.&lt;/p>
&lt;h1 id="project-overview">Project Overview&lt;/h1>
&lt;p>With hardware compilers, it is not uncommon for the size of code that the hardware compilers need to handle to go into millions. We aim to improve the efficiency of the tree data structure to be used for representing the Abstract Syntax Tree (AST) of the input program. The tree data structure is optimized for typical AST traversal and queries. Some queries that are made to this tree are much more frequent than others.&lt;/p>
&lt;p>Thus, the goal of this project is to be able to optimize the tree for frequent queries while still providing support for other infrequent queries. We use Google Bench to benchmark the tree for scalability and performance and expect it to outperform the current version of the tree. Finally, the new version of the tree will be integrated into the LiveHD core repository.&lt;/p>
&lt;h1 id="progress-and-challenges">Progress and Challenges&lt;/h1>
&lt;p>Over the past month and a half, I have successfully finished working on the add/append methods of the tree. Moreover, I have finished writing the iterators on the tree too. There are preliminary tests already in place and the HHDS repository now has a working Bazel build system.&lt;/p>
&lt;p>As shown in the figure, we can see that the tree went from storing pointers to everything that it could to only storing pointers to the nodes that are absolutely necessary. Moreover, by not maintaining multiple levels in the tree, we have been able to reduce the memory footprint of the tree. This is a significant improvement from the LHtree that was being used earlier.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Gradual improvements from a classical way of storing the tree" srcset="
/report/osre24/ucsc/livehd/20240720-ujjwalshekhar/intro_pic_osre_mid_term_blog_hub4e071864eb1075945911ddf73245a76_149696_ee6f0bd20e8720764ff6513360229a8a.webp 400w,
/report/osre24/ucsc/livehd/20240720-ujjwalshekhar/intro_pic_osre_mid_term_blog_hub4e071864eb1075945911ddf73245a76_149696_3c453b165782d0f857e35219165cfb9c.webp 760w,
/report/osre24/ucsc/livehd/20240720-ujjwalshekhar/intro_pic_osre_mid_term_blog_hub4e071864eb1075945911ddf73245a76_149696_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240720-ujjwalshekhar/intro_pic_osre_mid_term_blog_hub4e071864eb1075945911ddf73245a76_149696_ee6f0bd20e8720764ff6513360229a8a.webp"
width="760"
height="495"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>Furthermore, we have also been able to improve the cache friendliness of each node of the tree. By realizing that most of the time, new children are added soon after the parent is added, we have been able to store the children in a contiguous memory location whenever possible, or access them using a shorter delta from the parent node. This has significantly improved the cache friendliness of the tree by allowing the packing of the book-keeping of up to 8 children in a single 512-bit word. This 512-bit chunk has amazing cache alignment properties.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Bookkeeping in a 512-bit Tree_pointer word" srcset="
/report/osre24/ucsc/livehd/20240720-ujjwalshekhar/tree_pointers_pic_osre_mid_term_blog_hu51661e0f756b9d3138ea612a8422b121_152644_0cacd0b06553a3da1528ebbf73c1d4c5.webp 400w,
/report/osre24/ucsc/livehd/20240720-ujjwalshekhar/tree_pointers_pic_osre_mid_term_blog_hu51661e0f756b9d3138ea612a8422b121_152644_fb9fc4bd787dfbae81d4afd9a5d85401.webp 760w,
/report/osre24/ucsc/livehd/20240720-ujjwalshekhar/tree_pointers_pic_osre_mid_term_blog_hu51661e0f756b9d3138ea612a8422b121_152644_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240720-ujjwalshekhar/tree_pointers_pic_osre_mid_term_blog_hu51661e0f756b9d3138ea612a8422b121_152644_0cacd0b06553a3da1528ebbf73c1d4c5.webp"
width="760"
height="186"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="highlights">Highlights&lt;/h2>
&lt;ul>
&lt;li>Finished working on the add/append methods of the tree.&lt;/li>
&lt;li>Finished writing the iterators on the tree.&lt;/li>
&lt;li>Preliminary tests are in place.&lt;/li>
&lt;li>HHDS repository now has a working Bazel build system.&lt;/li>
&lt;/ul>
&lt;h2 id="challenges">Challenges&lt;/h2>
&lt;ul>
&lt;li>Working out a new plan: The initial plan was to use a flattening policy to optimize the tree for frequent queries. However, this plan has been revised and we have flattened the tree not using a tour-based flattening policy, but by still storing pointers to various nodes in the tree. This has been done to ensure that the tree is still able to support infrequent queries.&lt;/li>
&lt;li>Benchmarking: The benchmarking of the tree is still in progress. I am working on creating a benchmarking suite that will be able to test the tree for scalability and performance. This will allow future developers to test the tree for performance and scalability after they make changes.&lt;/li>
&lt;/ul>
&lt;h1 id="next-steps">Next Steps&lt;/h1>
&lt;p>From here, a lot of testing and benchmarking is still left to be done. Moreover, we need to add the delete methods and make sure that the integration with the LiveHD core repository is smooth. The next steps involve:&lt;/p>
&lt;ul>
&lt;li>Adding the delete methods to the tree.&lt;/li>
&lt;li>Benchmarking the tree for scalability and performance.&lt;/li>
&lt;li>Ensuring that the syntax of the tree is in line with the LiveHD core repository.&lt;/li>
&lt;li>Integrating the tree into the LiveHD core repository.&lt;/li>
&lt;li>Adding documentation to the tree.&lt;/li>
&lt;li>Integrating the testing of the tree into the LiveHD testing suite.&lt;/li>
&lt;/ul>
&lt;h1 id="conclusions">Conclusions&lt;/h1>
&lt;p>My experience so far has been amazing. I have been able to work on a project that is at the intersection of hardware and software. Moreover, I have been able to work with a team that is very supportive and has been able to guide me through the project. I am looking forward to the next steps and am excited to see the final version of the tree in the LiveHD core repository.&lt;/p>
&lt;h1 id="acknowledgements">Acknowledgements&lt;/h1>
&lt;p>I would like to thank my mentors, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jose-renau/">Jose Renau&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sakshi-garg/">Sakshi Garg&lt;/a> for their guidance and support throughout the project. It would not have been possible without their help.&lt;/p></description></item><item><title>Architecture Updates - LLM Assistant for OpenROAD</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240719-palaniappan-r/</link><pubDate>Fri, 19 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240719-palaniappan-r/</guid><description>&lt;p>Hi again! I&amp;rsquo;m &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/palaniappan-r/">Palaniappan R&lt;/a>, a GSoC contributor working on the OpenROAD chat assistant project under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/indira-iyer/">Indira Iyer&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jack-luar/">Jack Luar&lt;/a>. My project aims to build an LLM-powered chat assistant designed to provide seamless access to existing online resources, thereby reducing support overhead. Over the past month, I&amp;rsquo;ve been collaborating with &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/aviral-kaintura/">Aviral Kaintura&lt;/a>, on data engineering to deliver on our common project goal of an OpenROAD assistant and an open-EDA dataset that promotes further research and collaboration.&lt;/p>
&lt;h3 id="progress">Progress&lt;/h3>
&lt;p>The retrieval architecture is at the heart of any retrieval-augmented generation (RAG) setup. Our current setup employs a hybrid-search technique, combining a traditional keyword search method with more advanced vector search methods. As illustrated in the diagram, we combine a simple semantic search, a Maximal Marginal Relevance (MMR) search and a text-based BM25 ranking technique to build our hybrid retriever.&lt;/p>
&lt;div class="mermaid">flowchart LR
id0([Query]) --> id1
id1([Vectorstore]) --- id2([Semantic Retriever])
id1([Vectorstore]) --- id3([MMR Retriever])
id1([Vectorstore]) --- id4([BM25 Retriever])
id2([Semantic Retriever]) -- Retrieved Docs ---> id5([Reranking])
id3([MMR Retriever]) -- Retrieved Docs ---> id5([Reranking])
id4([BM25 Retriever]) -- Retrieved Docs ---> id5([Reranking])
id5([Reranking]) ---> id6(top-n docs)
&lt;/div>
&lt;p>Upon receiving a query, relevant documents are sourced from each retriever, resulting in a broad set of results. We feed these results into a cross-encoder re-ranker model to get the &lt;code>top-n&lt;/code> documents with maximum relevance.&lt;/p>
&lt;p>After building the retriever, we utilized the LangGraph framework to develop a stateful, multi-agent workflow tailored to our use case. This allows flexibility in servicing a diverse set of user questions in an efficient and accurate manner, given the sparse nature of our dataset.&lt;/p>
&lt;p>Our current dataset can be broadly classified into the following categories:&lt;/p>
&lt;ul>
&lt;li>OpenROAD Documentation&lt;/li>
&lt;li>OpenROAD-flow-scripts Documentation&lt;/li>
&lt;li>OpenSTA Documentation&lt;/li>
&lt;li>OpenROAD Manpages&lt;/li>
&lt;/ul>
&lt;p>These data sources are embedded into separate FAISS vector databases using open-source embeddings models (we&amp;rsquo;ve been working on fine-tuning an embeddings model for better retrieval accuracy). The hybrid search retrievers are then applied to these vector databases, creating internal tools that can be queried by our LLM as needed. Each tool has access to different data sources in various domains. For instance, the &lt;code>retrieve_cmds&lt;/code> tool selectively has access to information detailing the multiple commands in the OpenROAD framework, while the &lt;code>retrieve_install&lt;/code> deals with installation-related documentation. As depicted in the flowchart, a routing LLM call classifies the input query and forwards it to the appropriate retriever tool. Relevant documents are then sent back to the LLM for response generation.&lt;/p>
&lt;div class="mermaid">graph TD
__start__ --> router_agent
router_agent -.-> retrieve_cmds
router_agent -.-> retrieve_general
router_agent -.-> retrieve_install
router_agent -.-> retrieve_opensta
retrieve_cmds --> generate
retrieve_general --> generate
retrieve_install --> generate
retrieve_opensta --> generate
generate --> __end__
&lt;/div>
&lt;p>Feel free to try out our chat assistant &lt;a href="https://orassistant.netlify.app/" target="_blank" rel="noopener">here&lt;/a>. Instructions to set up and run our chatbot can be found &lt;a href="https://github.com/The-OpenROAD-Project/ORAssistant" target="_blank" rel="noopener">here&lt;/a>.&lt;/p>
&lt;p>Here&amp;rsquo;s an example of our chatbot in action.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Example" srcset="
/report/osre24/ucsd/openroad/20240719-palaniappan-r/img1_hu0b0a2035cec8d14387553facf446abed_79114_fdd1b5352a0557597dd03559dd46260b.webp 400w,
/report/osre24/ucsd/openroad/20240719-palaniappan-r/img1_hu0b0a2035cec8d14387553facf446abed_79114_2dc96697a7dd37f2c6e4dc350d2f33c6.webp 760w,
/report/osre24/ucsd/openroad/20240719-palaniappan-r/img1_hu0b0a2035cec8d14387553facf446abed_79114_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240719-palaniappan-r/img1_hu0b0a2035cec8d14387553facf446abed_79114_fdd1b5352a0557597dd03559dd46260b.webp"
width="735"
height="655"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h3 id="future-plans">Future Plans&lt;/h3>
&lt;p>In the upcoming weeks, we aim to enhance our dataset by incorporating actionable information filtered from GitHub issues and discussions. We’ll be adding support to keep track of the conversation history as well.&lt;/p>
&lt;p>Stay tuned for more updates!&lt;/p></description></item><item><title>Midterm Blogpost: HDEval's LLM Benchmarking for HDL Design</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240718-ashwinbardhwaj/</link><pubDate>Thu, 18 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240718-ashwinbardhwaj/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Hello! My name is &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ashwin-bardhwaj/">Ashwin Bardhwaj&lt;/a>, an electrical engineering and computer science student based in San Diego, CA. For the past 6 weeks, I have been working closely with Professor &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jose-renau/">Jose Renau&lt;/a> on the &lt;a href="https://github.com/masc-ucsc/hdeval" target="_blank" rel="noopener">HDEval&lt;/a> project. The aim of this project is to create multiple project sized HDL benchmarks to evaluate how well existing LLMs can generate Verilog/Chisel code. These benchmarks will include my own &amp;ldquo;golden&amp;rdquo; HDL implementation of the project as well as respective English prompts to guide the LLM. I am excited to be able to work with these tools that have the potential to become a valuable resource for HDL design. So far, I have been successful in creating the first benchmark, a pipelined 3 stage RISC-V core, as well as working through by second project, a Gameboy Emulator.&lt;/p>
&lt;h2 id="risc-v-implementation">RISC-V Implementation&lt;/h2>
&lt;p>Over this past month and a half, I have successfully completed my first benchmark which focuses on creating, modeling, and testing a pipelined 3-stage RISC-V core. The core uses the fetch, decode, and execute structure and is functional for most RV32I instructions. I synthesized and simulated my Verilog using Icarus Verilog and displayed the waveforms on GTKWave. After development, a good section of time was spent creating and tuning the English explanation of each Verilog module. After running these benchmark files through several LLM APIs, we compared the existing &amp;ldquo;golden&amp;rdquo; modules with the generated ones and noticed that more recent versions of LLMs such as GPT 4o and Claude 3 preform much better at creating syntactically correct and efficient code.&lt;/p>
&lt;p>In addition, I have also created a tool that will parse the Verilog and instruction files into the necessary json structure to then test on various models.&lt;/p>
&lt;h2 id="gameboy-emulator">Gameboy Emulator&lt;/h2>
&lt;p>I am also in the process of developing the second benchmark, which targets a Gameboy emulator. This will challenge the LLMs much more than the RISC-V project because apart from the custom CISC CPU, the model should also understand how to handle various other blocks of the hardware system including memory, picture processing unit (PPU), sound processing unit (SPU), various input/output systems like the buttons and cartridge, and interrupt handlers. As a result, it will challenge the model to understand the system as a whole when creating each individual module.&lt;/p>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;p>As we continue on to the second half of the project, I will continue working on my gameboy emulator. I have already completely developed and tested the Z80-esque CPU, DMA, and interrupt handler but need to continue working on the display and sound interfaces. Also, I will also continue to evaluate and run these tests over a wider range of LLMs to get a better picture of what models and versions are best suited for HDL design as well as the direction these models are going in.&lt;/p></description></item><item><title>Unveiling Medicine Patterns: 3D Clustering with Polyphy/Polyglot</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/polyphy/20240619-ayushsharma/</link><pubDate>Wed, 19 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/polyphy/20240619-ayushsharma/</guid><description>&lt;p>Hello! My name is Ayush and this summer I&amp;rsquo;ll be contributing to &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/ucsc/polyphy/">Polyphy&lt;/a> and &lt;a href="https://normand-1024.github.io/Bio-inspired-Exploration-of-Language-Embedding/" target="_blank" rel="noopener">Polyglot&lt;/a>, a GPU oriented agent-based system for reconstructing and visualizing optimal transport networks defined over sparse data. under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/oskar-elek/">Oskar Elek&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/kiran-deol/">Kiran Deol&lt;/a>.&lt;/p>
&lt;p>For the reference here&amp;rsquo;s my &lt;a href="https://summerofcode.withgoogle.com/media/user/7a1cc1c971c5/proposal/gAAAAABmV3hljjurQ8HAS8PRRRZB2_c5vQ3clWisqad85y-gO7rNvpssnzqGlFeiYQkAb5qY5WDUoRKkxUoTHLLDXLwBvrAjSsRs1qNTYmMrFfsbs1aQrjo=.pdf" target="_blank" rel="noopener">proposal&lt;/a> for this project.&lt;/p>
&lt;p>Polyglot offers an immersive 3D visualization experience, enabling users to zoom, rotate, and delve into complex datasets.
My project aims to harness these capabilities to unlock hidden connections in the realm of medicine, specifically focusing on the relationships between drugs based on their shared salt compositions, rather than just their active ingredients. This approach promises to reveal intricate patterns and relationships that have the potential to revolutionize drug discovery, pharmacology, and personalized medicine.&lt;/p>
&lt;p>In this project, I will create custom embeddings for a vast dataset of over 600,000 medicines, capturing the relationships between their salt compositions. By visualizing these embeddings in Polyglot&amp;rsquo;s 3D space, researchers can identify previously unknown connections between medicines, leading to new insights and breakthroughs. The dynamic and interactive nature of Polyglot will empower researchers to explore these complex relationships in a very efficient and cool way, potentially accelerating the discovery of new drug interactions and therapeutic applications.&lt;/p>
&lt;p>I am really excited to work on this project. Keep following the blogs for further updates!.&lt;/p></description></item><item><title>Artificial Intelligence Explainability Accountability</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/aiealab/20240614-shaburu/</link><pubDate>Fri, 14 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/aiealab/20240614-shaburu/</guid><description>&lt;p>Hey! I&amp;rsquo;m &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sarthak-chowdharyshaburu/">Sarthak Chowdhary(Shaburu)&lt;/a>, and I am thrilled to share my incredible journey with the Open Source Program Office of UC Santa Cruz! Association as part of Google Summer of Code (GSoC) 2024. This experience marks a pivotal milestone in my career, offering me the chance to delve into an intriguing project while learning from the brightest minds in the open-source community. Allow me to guide you through my adventure thus far, from the nerve-wracking wait for results to the exhilarating commencement of the coding period.&lt;/p>
&lt;p>Before we start here&amp;rsquo;s my &lt;a href="https://drive.google.com/file/d/1BzKi0fXdqCgdK0UEG9zM56W6U5CeuyAP/view?usp=drive_link" target="_blank" rel="noopener">Proposal&lt;/a>.&lt;/p>
&lt;h2 id="pre-gsoc-application">Pre-GSoC Application&lt;/h2>
&lt;p>I had shortlisted 3 Organizations that i was working on &lt;/p>
&lt;ul>
&lt;li>OSPO UC Santa Cruz - Amplifying Research Impact Through Open Source&lt;/li>
&lt;li>CVAT.AI - Computer Vision Data Annotation for AI&lt;/li>
&lt;li>Emory University - Biomedical Research to Advance Medical Care&lt;/li>
&lt;/ul>
&lt;p>On the 1st of May, like many students eagerly anticipating the results of the Google Summer of Code (GSoC) 2024, I found myself glued to my screen, anxiously awaiting the clock to strike 11:30 PM IST. After what felt like an eternity of waiting, I finally received the email that changed everything: I had been selected for GSoC 2024 with the &lt;a href="https://ospo.ucsc.edu" target="_blank" rel="noopener">Open Source Program Office of UC Santa Cruz&lt;/a>!&lt;/p>
&lt;p>The first month of GSoC, known as the community bonding period, is for establishing rapport with the people working on the project. I researched about my mentor Dr. &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/leilani-h.-gilpin/">Leilani H. Gilpin&lt;/a> and build a good rapport with her, who is an Assistant Professor in Computer Science and Engineering and an affiliate of the Science &amp;amp; Justice Research Center at UC Santa Cruz. She is also a part of the AI group @ UCSC and leads the &lt;a href="https://aiea-lab.github.io/" target="_blank" rel="noopener">AI Explainability and Accountability (AIEA) Lab&lt;/a>. Her research focuses on the design and analysis of methods for autonomous systems to explain themselves. Her work has applications to robust decision-making, system debugging, and accountability. Her current work examines how generative models can be used in iterative XAIstress testing. She guided me through the necessary documentation and explained the Project demands and requirements in detail, which was invaluable for my project.&lt;/p>
&lt;h2 id="project">Project&lt;/h2>
&lt;p>The project aims to build a system that is capable of taking some input which will be the student’s code and explaining them their mistakes from low level syntax errors, compilation errors to high level issues such as overloaded variables.&lt;/p>
&lt;p>My &lt;a href="https://drive.google.com/file/d/1BzKi0fXdqCgdK0UEG9zM56W6U5CeuyAP/view?usp=drive_link" target="_blank" rel="noopener">Proposal&lt;/a> aims to create custom novel basic questions and take it up a notch by creating custom drivers for each problem, common drivers to detect low level errors and give baseline explanations for various error cases, combining these drivers to make a robust system and use third-party open source software (like monaco code editor - the editor of the web) where necessary. Write uniform and consistent feedback/explanations for Each coding problem while covering all the possible edge cases and a pipeline which will iterate the test cases and feedbacks. This benchmark suite will be used for testing the system.&lt;/p>
&lt;p>Additionally I plan on building an interface that has a roadmap from basics such as arrays, hashmaps to advanced topics such as trees, heap, backtracking along with progress bars and throws confetti on successful unit tests (important). These will be using the same benchmark suite that will be built under the hood. I will be utilizing Judge0 (open-source online code execution system) for the code execution and Monaco(open-source The Editor of the Web) as the code editor for this.&lt;/p>
&lt;p>&lt;strong>Project goals:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Project Objective: By the end of summer the software should be a
novel and robust tool for helping the community of beginner and
advanced programmers alike in learning programming by
hyper-focusing on the mistakes they make and using AI to explain to
them the how, what and why of their code. Provide clear and concise
explanations accompanied by actionable suggestions for debugging
and improvement.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Expected deliverables: A Robust eXplainable AI benchmark suite
which will be used extensively for the undergraduate AI courses and
possibly the Graduate courses as well. Along with anyone interested
in learning programming with the help of personalized AI.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Future work based on project: A beautiful Gamified interface that gets
people excited to learn programming which utilizes the above
benchmark suite would be awesome to build!&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>When I Started my programming journey (before ChatGPT😨) I personally encountered problems that were way above my skill set and I had no way of knowing so, which used to result in spending countless hours without proper feedback as to where I was going wrong. This project has a real impact on people in an innovative way which I wish I had access to at the start of my Programming journey, so working on it comes from a place of passion. Also this specific project will test my own understanding of programming and spending the summer solidifying it, that too under the
guidance of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/leilani-h.-gilpin/">Leilani H. Gilpin&lt;/a> is a dream come true for me.&lt;/p></description></item><item><title>Developing Trustworthy Large Language Models</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/aiealab/20240514-nikhilwani/</link><pubDate>Fri, 14 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/aiealab/20240514-nikhilwani/</guid><description>&lt;p>Hi! Thanks for stopping by.&lt;/p>
&lt;p>In this first blog post of a series of three, I’d like to introduce myself, my mentor, and my project.&lt;/p>
&lt;p>My name is Nikhil. I am an ML researcher who works at the intersection of NLP, ML, and HCI. I previously worked as a Machine Learning Engineer II at &lt;a href="https://vmware.com/" target="_blank" rel="noopener">VMware&lt;/a> and spent some wonderful summers interning with ML teams at &lt;a href="https://www.nvidia.com/" target="_blank" rel="noopener">NVIDIA&lt;/a> and &lt;a href="https://www.iitb.ac.in/" target="_blank" rel="noopener">IIT Bombay&lt;/a>. I also recently graduated from the &lt;a href="https://usc.edu/" target="_blank" rel="noopener">University of Southern California (USC)&lt;/a> with &lt;a href="https://www.cs.usc.edu/academic-programs/masters/cs_ms_honors/" target="_blank" rel="noopener">honors&lt;/a> in Computer Science and a master&amp;rsquo;s thesis.&lt;/p>
&lt;p>This year at Google Summer of Code (GSoC 24), I will be working on &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/ucsc/aiealab/">developing trustworthy large language models&lt;/a>. I’m very grateful to be mentored by &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/leilani-h.-gilpin/">Leilani H. Gilpin&lt;/a> at the &lt;a href="https://aiea-lab.github.io/" target="_blank" rel="noopener">AIEA lab, UC Santa Cruz&lt;/a>. I truly admire the flexibility and ownership she allows me in pursuing my ideas independently within this project. Please feel free to peruse my accepted GSoC proposal &lt;a href="https://drive.google.com/drive/folders/16DHlcHGS7psoFXYc5q2L2-GOsLwIBXl1?usp=drive_link" target="_blank" rel="noopener">here&lt;/a>.&lt;/p>
&lt;p>&lt;strong>Project:&lt;/strong>
My project has a tangible outcome: An open-source, end-to-end, full-stack web app with a hybrid trustworthy LLM in the backend.&lt;/p>
&lt;p>This open-source web app will be a lightweight tool that not only has the ability to take diverse textual prompts and connect with several LLMs and a database but also the capability to gather qualitative and quantitative user feedback. Users will be able to see how this feedback affects the LLMs&amp;rsquo; responses and impacts its reasoning and explanations (xAI). The tool will be thoroughly tested to ensure that the unit tests are passing and there is complete code coverage.&lt;/p>
&lt;p>At the moment, we are investigating LLMs and making them more trustworthy in constraint satisfaction tasks like logical reasoning and misinformation detection tasks. However, our work has applicability in other areas of Responsible AI, such as Social Norms (toxicity detection and cultural insensitivity), Reliability (misinformation, hallucination, and inconsistency), Explainability &amp;amp; Reasoning (lack of interpretability, limited logical, and causal reasoning), Safety (privacy violation and violence), and Robustness (prompt attacks and distribution shifts).&lt;/p>
&lt;p>&lt;strong>Impact:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Responsible AI research teams across industry and academia can use this as a boilerplate for their user study projects.&lt;/li>
&lt;li>Diverse PhD students and academic researchers looking to study LLM and user interaction research will find this useful.&lt;/li>
&lt;li>LLM alignment researchers and practitioners can find this resourceful as user feedback affects the inherent rewards model of the internal LLMs.&lt;/li>
&lt;li>Explainable AI (xAI) researchers can find value in the explanations that this tool generates, which reveal interpretable insights into how modern LLMs think and use their memory.
These are just a few use cases; however, there are several others that we look forward to describing in the upcoming posts.&lt;/li>
&lt;/ul>
&lt;p>This was my first blog in the series of three for the UC OSPO. Stay tuned for the upcoming blogs, which will detail my progress at the halfway mark and the final one concluding my work.&lt;/p>
&lt;p>If you find this work interesting and would love to share your thoughts, I am happy to chat! :) Feel free to connect on &lt;a href="https://www.linkedin.com/in/nikhilwani/" target="_blank" rel="noopener">LinkedIn&lt;/a> and mention that you are reaching out from this blog post.&lt;/p>
&lt;p>It is great to meet the UC OSPO community, and thanks for reading. Bye for now.&lt;/p></description></item><item><title>Heterogeneous Graph Neural Networks for I/O Performance Bottleneck Diagnosis</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/lbl/aiio/20240614-mahdi/</link><pubDate>Fri, 14 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/lbl/aiio/20240614-mahdi/</guid><description>&lt;p>Hello, I am &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mahdi-banisharifdehkordi/">Mahdi Banisharifdehkordi&lt;/a>, a Ph.D. student in Computer Science at Iowa State University, specializing in Artificial Intelligence. This summer, I will be working on the project &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/lbl/aiio/">AIIO / Graph Neural Network&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/bin-dong/">Bin Dong&lt;/a> and Suren Byna.&lt;/p>
&lt;p>High-Performance Computing (HPC) applications often face performance issues due to I/O bottlenecks. Manually identifying these bottlenecks is time-consuming and error-prone. My project aims to enhance the AIIO framework by integrating a Graph Neural Network (GNN) model to automatically diagnose I/O performance bottlenecks at the job level. This involves developing a comprehensive data pre-processing pipeline, constructing and validating a tailored GNN model, and rigorously testing the model&amp;rsquo;s accuracy using test cases from the AIIO dataset.&lt;/p>
&lt;p>Through this project, I seek to provide a sophisticated, AI-driven approach to understanding and improving I/O performance in HPC systems, ultimately contributing to more efficient and reliable HPC applications.&lt;/p></description></item><item><title>LLM Assistant for OpenROAD - Data Engineering and Testing</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240613-aviral/</link><pubDate>Thu, 13 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240613-aviral/</guid><description>&lt;p>Hello! My name is Aviral Kaintura, and I will be contributing to &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD" target="_blank" rel="noopener">OpenROAD&lt;/a>, a groundbreaking open-source toolchain for digital integrated circuit automation (RTL to GDSII) during &lt;a href="https://summerofcode.withgoogle.com/" target="_blank" rel="noopener">GSoC 2024&lt;/a>.&lt;/p>
&lt;p>My project, &lt;a href="https://summerofcode.withgoogle.com/programs/2024/projects/J8uAFNCu" target="_blank" rel="noopener">LLM Assistant for OpenROAD - Data Engineering and Testing&lt;/a>, is jointly mentored by &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/indira-iyer/">Indira Iyer&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jack-luar/">Jack Luar&lt;/a>.&lt;/p>
&lt;p>The aim of this project is to develop a chat assistant to improve the user experience with OpenROAD. My focus will be on developing a well-curated dataset from OpenROAD&amp;rsquo;s knowledge base. This dataset will be fundamental for another project led by &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/palaniappan-r/">Palaniappan R&lt;/a>, which involves building the chatbot&amp;rsquo;s architecture. It will be used for training and validating the model and ensuring efficient context retrieval to generate accurate user responses, aiding in troubleshooting, installation, and other common issues to reduce the maintainers&amp;rsquo; workload.&lt;/p>
&lt;p>In addition to dataset creation, I will be working on testing and evaluation. This includes developing metrics for model evaluation, incorporating both human and automated techniques.&lt;/p>
&lt;p>Our human evaluation framework will utilize chatbot feedback for valuable insights, enhancing the model and dataset. An automated batch testing application is also used to further enhance the evaluation process.&lt;/p>
&lt;p>Here is an early build of the evaluation framework.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Screenshots" srcset="
/report/osre24/ucsd/openroad/20240613-aviral/image_hu3257e6557164f1033894cc91760eaec1_709148_ccb0a69833aa5c774f30b616a038edd6.webp 400w,
/report/osre24/ucsd/openroad/20240613-aviral/image_hu3257e6557164f1033894cc91760eaec1_709148_25ece2ab19d666f60342ed2d6dcb217f.webp 760w,
/report/osre24/ucsd/openroad/20240613-aviral/image_hu3257e6557164f1033894cc91760eaec1_709148_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240613-aviral/image_hu3257e6557164f1033894cc91760eaec1_709148_ccb0a69833aa5c774f30b616a038edd6.webp"
width="760"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
By leveraging advanced data engineering and testing methodologies, we aim to build an assistant that combines high accuracy with optimal response times. Additionally, we will collaborate with research teams at NYU and ASU to contribute to the research on AI-based chat assistants for electronic design automation.&lt;/p>
&lt;p>I am thrilled to be part of this journey and look forward to making a meaningful impact on the OpenROAD project.&lt;/p>
&lt;p>Stay tuned for more updates on the project!&lt;/p></description></item><item><title>LLM Assistant for OpenROAD - Model Architecture and Prototype</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240613-palaniappan-r/</link><pubDate>Thu, 13 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240613-palaniappan-r/</guid><description>&lt;p>Hi there! &lt;/p>
&lt;p>I’m &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/palaniappan-r/">Palaniappan R&lt;/a>, currently an undergraduate student at the Birla Institute of Technology &amp;amp; Science, Pilani, India.&lt;/p>
&lt;p>I&amp;rsquo;ll be working on the &lt;a href="https://summerofcode.withgoogle.com/programs/2024/projects/DSo6kvA5" target="_blank" rel="noopener">LLM Assistant for OpenROAD - Model Architecture and Prototype&lt;/a> project, under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/indira-iyer/">Indira Iyer&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jack-luar/">Jack Luar&lt;/a>. &lt;/p>
&lt;p>My project aims to develop the architecture for a chat assistant built for OpenROAD and its native flow, designed to assist beginners and experienced users by giving easy access to existing resources, offering troubleshooting assistance, and providing fast and accurate responses to common questions. I plan to do this by leveraging state-of-the-art retrieval and fine-tuning techniques.&lt;/p>
&lt;p>As part of this project, I will be working alongside another &lt;a href="https://summerofcode.withgoogle.com/programs/2024/projects/J8uAFNCu" target="_blank" rel="noopener">project&lt;/a> to build and test on a valid dataset for training and deployment. We will also be collaborating with other research teams at NYU and ASU, working on similar projects related to OpenROAD chat assistants and flow generation using Generative AI. Our primary objective is to minimize support overhead, improve user experience by reducing response times, and provide access to updated information about OpenROAD.&lt;/p>
&lt;p>Upon completion, my project will offer a viable chat assistant architecture as part of OpenROAD that benefits both the users and tool developers of OpenROAD.&lt;/p>
&lt;p>An &lt;a href="https://github.com/The-OpenROAD-Project/ORAssistant" target="_blank" rel="noopener">early prototype&lt;/a> developed along with a human evaluation framework shows promising results.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Architecture" srcset="
/report/osre24/ucsd/openroad/20240613-palaniappan-r/img2_hud6fda06af55d21d584ae88c38f077b08_207913_30376cbd7d90ae65683883a4dd83751d.webp 400w,
/report/osre24/ucsd/openroad/20240613-palaniappan-r/img2_hud6fda06af55d21d584ae88c38f077b08_207913_f98f5137b9b17acfa41102f49130d427.webp 760w,
/report/osre24/ucsd/openroad/20240613-palaniappan-r/img2_hud6fda06af55d21d584ae88c38f077b08_207913_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240613-palaniappan-r/img2_hud6fda06af55d21d584ae88c38f077b08_207913_30376cbd7d90ae65683883a4dd83751d.webp"
width="760"
height="157"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>Here are some responses generated by the prototype,
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Examples" srcset="
/report/osre24/ucsd/openroad/20240613-palaniappan-r/img1_hu1368e4ffa06e7513186f849074288e92_2440307_418c15850b5c6c2573a9082ca1a5a9dc.webp 400w,
/report/osre24/ucsd/openroad/20240613-palaniappan-r/img1_hu1368e4ffa06e7513186f849074288e92_2440307_18f9f9a9254bedf140c7ec005c7cc5b9.webp 760w,
/report/osre24/ucsd/openroad/20240613-palaniappan-r/img1_hu1368e4ffa06e7513186f849074288e92_2440307_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240613-palaniappan-r/img1_hu1368e4ffa06e7513186f849074288e92_2440307_418c15850b5c6c2573a9082ca1a5a9dc.webp"
width="760"
height="671"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>I&amp;rsquo;m excited about the potential of ORAssistant as part of the OpenROAD tool suite to accelerate innovation in EDA and chip design by utilizing open-source tools along with Generative AI.&lt;/p>
&lt;p>Stay tuned for more updates!&lt;/p></description></item><item><title>Stream Processing support for FasTensor</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/lbl/fastensor/20240613-aditya_narayan/</link><pubDate>Thu, 13 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/lbl/fastensor/20240613-aditya_narayan/</guid><description>&lt;p>Hi, I&amp;rsquo;m Aditya Narayan,👋&lt;/p>
&lt;p>I&amp;rsquo;m a frequent visitor to the town square of theoretical CS, operations (Ops), and robust high-performance systems. Sometimes I indulge myself with insights on &lt;a href="https://www.science.org/doi/10.1126/science.aam9868" target="_blank" rel="noopener">Computing and Biology&lt;/a>, and other times I enjoy the accounts of minefield experiences in the &lt;a href="https://www.youtube.com/watch?v=tDacjrSCeq4" target="_blank" rel="noopener">systems world&lt;/a>. Luckily, this summer, OSRE offered an opportunity that happened to be at the perfect intersection of my interests.&lt;/p>
&lt;p>This summer, I will be working on a scientific computing library called FasTensor that offers a parallel computing structure called Stencil, widely popular in the scientific computing world to solve PDEs for Physical Simulations and Convolutions on Signals, among its many uses.
I am excited to introduce my mentors, Dr. &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/bin-dong/">Bin Dong&lt;/a> and Dr. &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/john-wu/">John Wu&lt;/a> of the &lt;a href="https://crd.lbl.gov/divisions/scidata/sdm/" target="_blank" rel="noopener">Scientific Data Management Group&lt;/a> at Lawrence Berkeley National Laboratory (LBNL). They bring invaluable expertise to the project.&lt;/p>
&lt;p>They recognized the need for a tensor processing library that provided dedicated support for big datasets with inherent structural locality, often found in the scientific computing world, which was lacking in popular open-source MapReduce or Key-Value based frameworks.&lt;/p>
&lt;p>More often than not, the operations performed on these datasets are composed of computations involving neighboring elements. This motivated the development of the FasTensor library.&lt;/p>
&lt;p>I will be working on providing a Stream Processing interface that enables online data processing of large-scale datasets as they arrive from Data Producers. The project focuses on offering rich interfaces for managing and composing streams, supporting common scientific data formats like HDF5, and integrating fault tolerance and reliability mechanisms.&lt;/p>
&lt;p>I am thrilled to work on the FasTensor project because I believe it has the potential to make a significant impact by enabling researchers to implement a rich set of computations on their big datasets in an easy and intuitive manner.&lt;/p>
&lt;p>After all, FasTensor has just one simple paradigm: A -&amp;gt; Transform(F(x), B),&lt;/p>
&lt;p>and it handles all the behind-the-scenes grunt work of handling big datasets so you can focus on your research.&lt;/p>
&lt;p>Stay tuned for updates and feel free to &lt;a href="https://github.com/BinDong314/FasTensor" target="_blank" rel="noopener">collaborate&lt;/a>!&lt;/p></description></item><item><title>Drishti</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/lbl/drishti/20240614-jaytau/</link><pubDate>Thu, 06 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/lbl/drishti/20240614-jaytau/</guid><description>&lt;p>Namaste everyone! 🙏🏻&lt;/p>
&lt;p>I&amp;rsquo;m &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/joel-tony/">Joel Tony&lt;/a>, a third-year Computer Science undergraduate at BITS Pilani, Goa, India. I&amp;rsquo;m truly honored to be part of this year&amp;rsquo;s Google Summer of Code program, working with the UC OSPO organization on a project that genuinely excites me. I&amp;rsquo;m particularly grateful to be working under the mentorship of Dr. &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jean-luca-bez/">Jean Luca Bez&lt;/a>, a Research Scientist at Lawrence Berkeley National Laboratory, and Dr. &lt;a href="https://sbyna.github.io" target="_blank" rel="noopener">Suren Byna&lt;/a>, a Full Professor at the Ohio State University. Their expertise in high-performance computing and data systems is invaluable as I tackle this project.&lt;/p>
&lt;p>My project, &amp;ldquo;&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/lbl/drishti">Drishti: Visualization and Analysis of AI-based Applications&lt;/a>&amp;rdquo;, aims to extend the &lt;a href="https://github.com/hpc-io/drishti" target="_blank" rel="noopener">Drishti&lt;/a> framework to better support AI/ML workloads, focusing specifically on optimizing their Input/Output (I/O) performance. I/O refers to the data transfer between a computer&amp;rsquo;s memory and external storage devices like hard drives (HDDs) or solid-state drives (SSDs). As AI models and datasets continue to grow exponentially in size, efficient I/O management has become a critical bottleneck that can significantly impact the overall performance of these data-intensive workloads.&lt;/p>
&lt;p>Drishti is an innovative, interactive web-based framework that helps users understand the I/O behavior of scientific applications by visualizing I/O traces and highlighting bottlenecks. It transforms raw I/O data into interpretable visualizations, making performance issues more apparent. Now, I&amp;rsquo;m working to adapt these capabilities for the unique I/O patterns of AI/ML workloads.&lt;/p>
&lt;p>Through my studies in high-performance computing and working with tools like BeeGFS and Darshan, I&amp;rsquo;ve gained insights into the intricacies of I/O performance. However, adapting Drishti for AI/ML workloads presents new challenges. In traditional HPC, computing often dominates, but in the realm of AI, the tables have turned. As models grow by billions of parameters and datasets expand to petabytes, I/O has become the critical path. Training larger models or using richer datasets doesn&amp;rsquo;t just mean more computation; it means handling vastly more data. This shift makes I/O optimisation not just a performance tweak but a fundamental enabler of AI progress. By fine-tuning Drishti for AI/ML workloads, we aim to pinpoint I/O bottlenecks precisely, helping researchers streamline their data pipelines and unlock the full potential of their hardware.&lt;/p>
&lt;p>As outlined in my &lt;a href="https://docs.google.com/document/d/1zfQclXYWFswUbHuuwEU7bjjTvzS3gRCyNci08lTR3Rg/edit?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a>, my tasks are threefold:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Modularize Drishti&amp;rsquo;s codebase&lt;/strong>: Currently, it&amp;rsquo;s a single 1700-line file that handles multiple functionalities. I&amp;rsquo;ll be refactoring it into focused, maintainable modules, improving readability and facilitating future enhancements.&lt;/li>
&lt;li>&lt;strong>Enable multi-trace handling&lt;/strong>: Unlike traditional HPC apps that typically generate one trace file, most AI jobs produce multiple. I&amp;rsquo;ll build a layer to aggregate these, providing a comprehensive view of the application&amp;rsquo;s I/O behavior.&lt;/li>
&lt;li>&lt;strong>Craft AI/ML-specific recommendations&lt;/strong>: Current suggestions often involve MPI-IO or HDF5, which aren&amp;rsquo;t typical in ML frameworks like PyTorch or TensorFlow. I&amp;rsquo;ll create targeted recommendations that align with these frameworks&amp;rsquo; data pipelines.&lt;/li>
&lt;/ol>
&lt;p>This summer, my mission is to make Drishti as fluent in AI/ML I/O patterns as it is in traditional HPC workloads. My goal is not just to adapt Drishti but to optimize it for the unique I/O challenges that AI/ML applications face. Whether it&amp;rsquo;s dealing with massive datasets, handling numerous small files, or navigating framework-specific data formats, we want Drishti to provide clear, actionable insights.&lt;/p>
&lt;p>From classroom theories to hands-on projects, from understanding file systems to optimizing AI workflows, each step has deepened my appreciation for the complexities and potential of high-performance computing. This GSoC project is an opportunity to apply this knowledge in a meaningful way, contributing to a tool that can significantly impact the open-source community.&lt;/p>
&lt;p>In today&amp;rsquo;s AI-driven world, the pace of innovation is often gated by I/O performance. A model that takes weeks to train due to I/O bottlenecks might, with optimized I/O, train in days—translating directly into faster iterations, more experiments, and ultimately, breakthroughs. By making I/O behavior in AI/ML applications more interpretable through Drishti, we&amp;rsquo;re not just tweaking code. We&amp;rsquo;re providing developers with the insights they need to optimize their data pipelines, turning I/O from a bottleneck into a catalyst for AI advancement.&lt;/p>
&lt;p>I look forward to sharing updates as we adapt Drishti for the AI era, focusing squarely on optimizing I/O for AI/ML workloads. In doing so, we aim to accelerate not just data transfer but the very progress of AI itself. I&amp;rsquo;m deeply thankful to Dr. Jean Luca Bez and Prof. Suren Byna for their guidance in this endeavor and to the UC OSPO and GSoC communities for this incredible opportunity.&lt;/p></description></item><item><title>Enhancing h5bench with HDF5 Compression Capability</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/lbl/h5bench/20240614-henryz/</link><pubDate>Mon, 27 May 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/lbl/h5bench/20240614-henryz/</guid><description>&lt;p>As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/lbl/h5bench">h5bench&lt;/a> project my &lt;a href="https://summerofcode.withgoogle.com/myprojects/details/n0H28Z40" target="_blank" rel="noopener">Enhencing h5bench with HDF5 Compression Capability&lt;/a> under the mentorship of Dr. &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jean-luca-bez/">Jean Luca Bez&lt;/a> and Dr. Suren Byna aims to allow users of h5bench to incoporate compression features in their simulations by creating custom benchmarks with common scientific lossless &amp;amp; lossy compression algorithms such as SZ, SZ3, ZFP, and GZIP.&lt;/p>
&lt;p>The problem I am trying to solve is to implement multiple data compression algorithms in h5bench core access patterns through HDF5 filters. This capability should grant users the flexibility to configure the parameters and methods of compression applied to their datasets according to their specific needs and preferences. My solution primarily involves using a user-defined HDF5 filter mechanism to implement lossless and lossy compression algorithms, such as ZFP, SZ, and cuSZ. Throughout the process, I will deliver one C source code implementing compression configuration settings, one C source code implementing lossless and lossy algorithms, a set of performance reports before and after data compression in CSV and standard output files, and a technical documentation on h5bench user manual website.&lt;/p></description></item><item><title>Hardware Hierarchical Dynamical Systems</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240513-ujjwalshekhar/</link><pubDate>Tue, 14 May 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240513-ujjwalshekhar/</guid><description>&lt;p>As part of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/livehd">Micro Architecture Santa Cruz (MASC)&lt;/a> my &lt;a href="https://docs.google.com/document/d/1FyQfRVJ2LnPJ9bCBqiylmnc1dOaumed1LQ_N6cK5krw/edit?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jose-renau/">Jose Renau&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sakshi-garg/">Sakshi Garg&lt;/a> aims to develop a tree data structure under HHDS to replace the current one offered by &lt;a href="https://github.com/masc-ucsc/livehd/blob/34eed40f32669bdab2fbf8fbcc65492660ba40df/core/lhtree.hpp#L526" target="_blank" rel="noopener">LHTree&lt;/a>&lt;/p>
&lt;p>The tree data structure is to be optimized for typical AST traversal and queries. Some queries that are made to this tree are much more frequent than others. Thus a flattening policy will be used to optimize the tree for these queries, at the potential cost of becoming slow for the infrequent queries. The tree will be benchmarked for scalability and performance and is expected to outperform the current version of the tree. Once the implementation is complete, the tree will be integrated into the LiveHD core repository.&lt;/p></description></item><item><title>HDEval: Benchmarking LLMs that Generate Verilog/Chisel Modules From Natural Language</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240611-ashwinbardhwaj/</link><pubDate>Tue, 14 May 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240611-ashwinbardhwaj/</guid><description>&lt;p>Hi everyone!&lt;/p>
&lt;p>I&amp;rsquo;m Ashwin Bardhwaj, currently pursuing a bachelors in Electrical Engineering and Computer Science at UC Berkeley. I was recently involved in a project to implement a secure hardware encryption enclave in Verilog. That&amp;rsquo;s why I was excited to work with the MASC group to evaluate how existing generalized LLMs (such as ChatGPT 4 or StarCoder) can generate accurate Verliog/Chisel code from English and assist in the hardware development process.&lt;/p>
&lt;p>As part of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/livehd">Micro Architecture Santa Cruz (MASC)&lt;/a> my &lt;a href="https://drive.google.com/file/d/1Fnr85lqrTs7OBohfHfSZI2K3wZU3zJm0/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jose-renau/">Jose Renau&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sakshi-garg/">Sakshi Garg&lt;/a> looks to create a suite of benchmark programs for &lt;a href="https://github.com/masc-ucsc/hdeval" target="_blank" rel="noopener">HDEval&lt;/a>.&lt;/p>
&lt;p>The deliverable of this project is to create multiple large HDL benchmarks along with a respective set of prompts. Using yosys to implement Logic Equivalence Check, we are able to prove through formal verification that the generated code will exhibit the same behavior as the benchmark. In addition, we can also consider the performance and resource utilization of the generated code as a metric.&lt;/p></description></item><item><title>BenchmarkST: Cross-Platform, Multi-Species Spatial Transcriptomics Gene Imputation Benchmarking</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uci/benchmarkst/</link><pubDate>Sat, 17 Feb 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/uci/benchmarkst/</guid><description>&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> bioinformatics, spatial transcriptomics, gene imputation, benchmarking, cross-platform/species analysis&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong>
&lt;ul>
&lt;li>&lt;strong>Programming Languages:&lt;/strong>
&lt;ul>
&lt;li>Proficient in Python and/or R, commonly used in bioinformatics.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Data Analysis:&lt;/strong>
&lt;ul>
&lt;li>Experience with statistical data analysis and machine learning models.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Bioinformatics Knowledge (not required but preferred):&lt;/strong>
&lt;ul>
&lt;li>Proficiency in bioinformatics and computational biology.&lt;/li>
&lt;li>Familiarity with spatial transcriptomics datasets and platforms.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Advanced&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours). Given the scope of integrating multi-platform, multi-species datasets and the complexity of benchmarking gene imputation methods, this project is substantial. It requires extensive data preparation, analysis, and validation phases, making it suitable for a larger time investment.&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ziheng-duan/">Ziheng Duan&lt;/a> (contact person)&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Project Idea Description&lt;/strong>&lt;/p>
&lt;p>The orchestration of cellular life is profoundly influenced by the precise control of gene activation and silencing across different spatial and temporal contexts. Understanding these complex spatiotemporal gene expression patterns is vital for advancing our knowledge of biological processes, from development and disease progression to adaptation. While single-cell RNA sequencing (scRNA-seq) has revolutionized our ability to profile gene expression across thousands of cells simultaneously, its requirement for cell dissociation strips away the critical spatial context, limiting our comprehension of cellular interactions within their native environments. Recent strides in spatial transcriptomics have started to bridge this gap by enabling spatially resolved gene expression measurements at single-cell or even sub-cellular resolutions. These advancements offer unparalleled opportunities to delineate the intricate tapestry of gene expression within tissues, shedding light on the dynamic interactions between cells and their surroundings.&lt;/p>
&lt;p>Despite these technological advances, a significant challenge remains: the datasets generated by spatial transcriptomic technologies are often incomplete, marred by missing gene expression values due to various technical and biological constraints. This limitation severely impedes our ability to fully interpret these rich datasets and extract meaningful insights from them. Gene imputation emerges as a pivotal solution to this problem, aiming to fill in these missing data points, thereby enhancing the resolution, quality, and interpretability of spatial transcriptomic datasets.&lt;/p>
&lt;p>Recognizing the critical importance of this task, there is a pressing need for a unified benchmarking platform that can facilitate the evaluation and comparison of gene imputation methods across a diverse array of samples, spanning multiple sampling platforms, species, and organs. Currently, the bioinformatics and spatial transcriptomics fields lack such a standardized framework, hindering progress and innovation. To address this gap, our project aims to establish a comprehensive gene imputation dataset that encompasses a wide range of conditions and parameters. We intend to reproduce known methods and assess their efficacy, providing a solid and reproducible foundation for future advancements in this domain.&lt;/p>
&lt;p>&lt;strong>Project Deliverable&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>A comprehensive, preprocessed benchmark dataset that spans multiple sampling platforms, species, and organs, aimed at standardizing gene imputation tasks in spatial transcriptomics.&lt;/li>
&lt;li>An objective comparison of state-of-the-art gene imputation methodologies, enhancing the understanding of their performance and applicability across diverse biological contexts.&lt;/li>
&lt;li>A user-friendly Python package offering a suite of gene imputation tools, designed to fulfill the research needs of the spatial transcriptomics community by improving data completeness and reproducibility.&lt;/li>
&lt;/ul></description></item><item><title>GPEC: An Open Emulation Platform to Evaluate GPU/ML Workloads on Erasure Coding Storage</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/lanl/gpec/</link><pubDate>Thu, 08 Feb 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/lanl/gpec/</guid><description>&lt;p>&lt;strong>Project Idea Description&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Storage Systems, Machine Learning, Erasure Coding&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> C/C++, Python, PyTorch, Bash scripting, Linux, Erasure Coding, Machine Learning&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/meng-wang/">Meng Wang&lt;/a> (primary contact), &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/john-bent/">John Bent&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Large-scale data centers store immense amounts of user data across a multitude of disks, necessitating redundancy strategies like erasure coding (EC) to safeguard against disk failures. Numerous research efforts have sought to assess the performance and durability of various erasure coding approaches, including single-level erasure coding, locally recoverable coding, and multi-level erasure coding.&lt;/p>
&lt;p>Despite its widespread adoption, a significant research gap exists regarding the performance of large-scale erasure-coded storage systems when exposed to machine learning (ML) workloads. While conventional practice often leans towards replication for enhanced performance, this project seeks to explore whether cost-effective erasure encoding can deliver comparable performance. In this context, several fundamental questions remain unanswered, including:
Can a typical erasure-coded storage system deliver sufficient throughput for ML training tasks?
Can an erasure-coded storage system maintain low-latency performance for ML training and inference workloads?
How does disk failure and subsequent repair impact the throughput and latency of ML workloads?
What influence do various erasure coding design choices, such as chunk placement strategies and repair methods, have on the aforementioned performance metrics?&lt;/p>
&lt;p>To address these questions, the most straightforward approach would involve running ML workloads on large-scale erasure coded storage systems within HPC data centers. However, this presents challenges for researchers and students due to limited access to expensive GPUs and distributed storage systems, especially when dealing with large-scale evaluations. Consequently, there is a need for a cost-effective evaluation platform.&lt;/p>
&lt;p>The objective of this project is to develop an open-source platform that facilitates cheap and reproducible evaluations of erasure-coded storage systems concerning ML workloads. This platform consists of two key components:
GPU Emulator: This emulator is designed to simulate GPU performance for ML workloads. Development of the GPU emulator is near completion.
EC Emulator: This emulator is designed to simulate the performance characteristics of erasure-coded storage systems. It is still in the exploratory phase and requires further development.&lt;/p>
&lt;p>The student&amp;rsquo;s responsibilities will include documenting the GPU emulator, progressing the development of the EC emulator, and packaging the experiments to ensure easy reproducibility. It is anticipated that this platform will empower researchers and students to conduct cost-effective and reproducible evaluations of large-scale erasure-coded storage systems in the context of ML workloads.&lt;/p>
&lt;p>&lt;strong>Project Deliverable&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Build an EC emulator to emulate the performance characteristics of large-scale erasure-coded storage systems&lt;/li>
&lt;li>Incorporate the EC emulator into ML workloads and GPU emulator&lt;/li>
&lt;li>Conduct reproducible experiments to evaluate the performance of erasure-coded storage systems in the context of ML workloads&lt;/li>
&lt;li>Publish a Trovi artifact shared on Chameleon Cloud and a GitHub repository with open-source code&lt;/li>
&lt;/ul></description></item><item><title>Turn on, Tune in, Listen up: Maximizing Side-Channel Recovery in Cross-Platform Time-to-Digital Converters</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/ucsc/turnontunein/</link><pubDate>Thu, 08 Feb 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/ucsc/turnontunein/</guid><description>&lt;p>&lt;a href="https://github.com/KastnerRG/PL-Sensors" target="_blank" rel="noopener">Turn on, Tune in, Listen Up&lt;/a> Is an open-source framework for implementing voltage flucturation sensors in FPGA devices for use in side-channel security research. Side-channels are an ever present hardware security threat. The reconfigurability of FPGAs significantly broadens the side-channel attack surface in many cloud heterogeneous systems. We have developed a highly tunable side-channel sensor, which significantly improves side-channel attack time and resolution in multiple contexts. Concurrent users sharing the same device may attack one another through the power side-channel (&lt;a href="https://dl.acm.org/doi/abs/10.1145/3543622.3573193" target="_blank" rel="noopener">check out our paper&lt;/a>), while consecutive users may attack one another through measurement of the physical wear-out state of the FPGA device (&lt;a href="https://arxiv.org/abs/2303.17881" target="_blank" rel="noopener">check out our paper&lt;/a>). We have demonstrated these attack surfaces on both Intel (Altera) and AMD (Xilinx) platforms. Currently, our open-sourced sensor design and side-channel analysis flow is limited to AMD devices. We are seeking CSE/CS/CE/ECE researchers interested in FPGA design, heterogeneous computing and/or hardware security to combine our Intel and AMD side-channel sensors into a unified attack framework and comparing capabilities between vendors.&lt;/p>
&lt;h3 id="open-source-sensor-repository-updates">Open-source sensor repository updates&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Hardware security&lt;/code>, &lt;code>cloud security&lt;/code>, &lt;code>heterogeneous computing&lt;/code>, &lt;code>temporal and spatial side-channels&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Experience with GitHub, FPGA development (AMD or Intel), and Python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:drichmond@ucsc.edu">Dustin Richmond&lt;/a>, &lt;a href="mailto:tsheaves@ucdavis.edu">Tyler Sheaves&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Update existing open-source voltage fluctuation sensor to support both AMD and Intel devices. Currently our repository exclusively supports AMD FPGAs. We have added new features to our sensor and have demonstrated an implementation on Intel. We would like to consolidate this work into a unified repository containing side-channel analysis demonstrations using open-source target benchmark designs.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Adapt existing tooling scripts to support multiple vendor tool flows.&lt;/li>
&lt;li>Adapt existing test infrastructure to target multiple SoC-type FPGA platforms (i.e. DE10-Nano, Pynq Z2, etc.).&lt;/li>
&lt;li>Evaluate cross-platform sensor architecture on a collection of benchmark designs. Demonstrate each benchmark using a cross-platform unified side-channel analysis framework.&lt;/li>
&lt;li>Draw a comparison between sensor implementations on different architectures.&lt;/li>
&lt;/ul></description></item><item><title>Artificial Intelligence Explainability Accountability</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/ucsc/aiealab/</link><pubDate>Wed, 07 Feb 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/ucsc/aiealab/</guid><description>&lt;h2 id="trustworthy-logical-reasoning-large-language-models--llms">Trustworthy Logical Reasoning Large Language Models (LLMs)&lt;/h2>
&lt;p>Logical LLMs is a project to translate the output from large language models (LLM) into a logic-based programming language (prolog) to detect inconsistencies and hallucinations automatically . The goals of this project would be to build a user interface for users to be able to give feedback which can be incorporated into the system. The project goal is to create a trustworthy hybrid open-source LLM tool that can learn from user feedback and explain its mistakes.&lt;/p>
&lt;h3 id="collect-hallucinations-and-facts">Collect Hallucinations and Facts&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: AI/ML, data collection, logic, user interfaces&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: javascript, html, python, bash, git&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Easy/Moderate&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large&lt;/li>
&lt;li>&lt;strong>Mentors&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/leilani-h.-gilpin/">Leilani H. Gilpin&lt;/a> (and a PhD student TBD).&lt;/li>
&lt;/ul>
&lt;h3 id="specific-tasks">Specific Tasks&lt;/h3>
&lt;ul>
&lt;li>Run queries in an LLM API with various prompts.&lt;/li>
&lt;li>Create a user interface system that collects user feedback in a web
browser.&lt;/li>
&lt;li>Create a pipeline for storing the user data in a common format that
can be shared in our database.&lt;/li>
&lt;li>Document the tool for future maintenance.&lt;/li>
&lt;/ul>
&lt;h2 id="explaining-failures-in-autograding">Explaining failures in autograding&lt;/h2>
&lt;p>The eXplainable autograder (XAutograder) is a tool for autograding student coding assignments, while providing personalized explanations or feedback. The goal of this project is to create an introductory set of coding assignment with explanations of wrong answers. This benchmark suite will be used for testing our system. The project goal is to create a dynamic autograding system that can learn from student&amp;rsquo;s code and explain their mistakes&lt;/p>
&lt;h3 id="design-introductory-questions-and-explanations">Design introductory questions and explanations&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: AI/ML, AI for education, XAI (Explainable AI_&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: python, git&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Moderate&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large&lt;/li>
&lt;li>&lt;strong>Mentors&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/leilani-h.-gilpin/">Leilani H. Gilpin&lt;/a> (and a PhD student TBD).&lt;/li>
&lt;/ul>
&lt;h3 id="specific-tasks">Specific Tasks&lt;/h3>
&lt;ul>
&lt;li>Design 5-10 basic programming questions (aggregated from online,
other courses, etc).&lt;/li>
&lt;li>Create tests of correctness (unit tests), and a testing framework
which can input a set of answers, and provide a final assessment&lt;/li>
&lt;li>Create a set of baseline explanations for various error cases, e.g.,
out of bounds error, syntax error, etc.&lt;/li>
&lt;li>Create a pipeline for iterating on the test cases and/or explanation
feedback.&lt;/li>
&lt;li>Document the tool for future maintenance.&lt;/li>
&lt;/ul></description></item><item><title>Causeway: Learning Web Development Through Micro-Roles</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/ucsc/causeway/</link><pubDate>Wed, 07 Feb 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/ucsc/causeway/</guid><description>&lt;p>&lt;a href="https://tech4good-causeway.web.app/#/tutorial/quarter-goals?c=01-quarter-goals-component&amp;amp;r=01-component-elements&amp;amp;s=01-intro-to-causeway" target="_blank" rel="noopener">Causeway&lt;/a> is a platform for learning to develop web applications using an Angular, RxJS, NgRx, and Firebase stack. Most online coding tutorials focus on covering the technical syntax or features of a language or framework, which means that new developers don’t have great resources for building a holistic picture of how everything they learn connects to actually developing a complex web application. Causeway breaks down the process of developing a web application into a hierarchy of micro-roles which provides learners with a clear pathway for learning that also translates to a clear process for developing an application. In the longer future, this would also enable learners to easily contribute to projects as they learn through taking on micro-roles for yet-to-be-developed projects. The platform uses the &lt;a href="https://developer.stackblitz.com/platform/api/webcontainer-api" target="_blank" rel="noopener">Stackblitz WebContainer API&lt;/a> to run full applications in the browser for interactive learning.&lt;/p>
&lt;p>Thus far, we have developed a version of the platform that walks learners through the process of developing presentational components of a web application as well as smart components / containers that contain multiple presentational components and are responsible for fetching data from the backend and handling events and updates to the database. This content is still using Angular 13 and needs to be updated to Angular 17, as well as to make some improvements in our use of RxJS, NgRx, and Firebase. We’d also like to extend the content in multiple ways including: 1) extending the walkthrough to more components and containers besides the single example we have, ideally in a way that covers a complete application, and 2) extending beyond components and containers to cover defining database entities and relationships. We’d also like to develop a learning dashboard where users can see the different micro-roles and lessons that they’ve completed or that are upcoming for the project they are working on.&lt;/p>
&lt;h3 id="causeway--improving-the-core-infrastructure-and-experience">Causeway / Improving the Core Infrastructure and Experience&lt;/h3>
&lt;p>The proposed work includes updating the platform and the example infrastructure within the platform to the latest version of Angular and other associated libraries, implementing and testing logging and analytics, implementing a learning dashboard for users, and time permitting, creating new modules to cover defining database entities and relationships. Both roles will also contribute to running usability studies and documenting the platform so that it can be open-sourced.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Web Development, Educational Technologies, Angular&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Web development experience, HTML, CSS, Javascript, Angular, RxJS, NgRx, Firebase&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium to Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/david-lee/">David Lee&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="causeway--extend-the-learning-scope-and-experience">Causeway / Extend the Learning Scope and Experience&lt;/h3>
&lt;p>The proposed work includes extending the component and container walkthroughs to cover a complete interactive application. This means writing a separate simple application, and organizing the code required to do so into units of work organized by our micro-role structure. Both roles will also contribute to running usability studies and documenting the platform so that it can be open-sourced.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Web Development, Educational Technologies, Angular&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Web development experience, HTML, CSS, Javascript, Angular, RxJS, NgRx, Firebase&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/david-lee/">David Lee&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Open Sensing Platform (OSP)</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/ucsc/osp/</link><pubDate>Mon, 05 Feb 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/ucsc/osp/</guid><description>&lt;h2 id="open-sensing-platform-i-software-to-enable-large-scale-outdoor-sensor-networks">Open Sensing Platform I: Software to enable large scale outdoor sensor networks&lt;/h2>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Data Visualization Dashboard" srcset="
/project/osre24/ucsc/osp/osp1_huda3c1d46887767e16b865c47973b8288_360491_2d797937cbe25a879de96b44cb5c65b3.webp 400w,
/project/osre24/ucsc/osp/osp1_huda3c1d46887767e16b865c47973b8288_360491_baae6484e015277af7b09e866b6869f5.webp 760w,
/project/osre24/ucsc/osp/osp1_huda3c1d46887767e16b865c47973b8288_360491_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/ucsc/osp/osp1_huda3c1d46887767e16b865c47973b8288_360491_2d797937cbe25a879de96b44cb5c65b3.webp"
width="760"
height="759"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Data Visualization, Backend, Web Development, UI/UX, Analytics&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong>
&lt;ul>
&lt;li>&lt;em>Required:&lt;/em> React, Javascript, Python, SQL, Git&lt;/li>
&lt;li>&lt;em>Nice to have:&lt;/em> Flask, Docker, CI/CD, AWS, Authentication&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/colleen-josephson/">Colleen Josephson&lt;/a>, &lt;a href="mailto:jtmadden@ucsc.edu">John Madden&lt;/a>, &lt;a href="mailto:awu70@ucsc.edu">Aaron Wu&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Open Sensing Platform (OSP) is a new initiative expanding from our prior project DirtViz, a data visualization web platform for monitoring microbial fuel cell sensors (see &lt;a href="https://github.com/jlab-sensing/DirtViz" target="_blank" rel="noopener">GitHub&lt;/a>). The mission is to scale up the current platform to support other researchers or citizen scientists in integrating their novel sensing hardware or microbial fuel cell sensors for monitoring and data analysis. Examples of the types of sensors currently deployed are sensors measuring soil moisture, temperature, current, and voltage in outdoor settings. The focus of the software half of the project involves building upon our existing visualization web platform, and adding additional features to support the mission. A live version of the website is available &lt;a href="https://dirtviz.jlab.ucsc.edu/" target="_blank" rel="noopener">here&lt;/a>.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Deliverables:&lt;/strong>
&lt;ul>
&lt;li>Create a system for remote collaborators/citizen scientists to set up their sensors and upload securely, eg. designing user flow to create sensors&lt;/li>
&lt;li>Craft an intuitive navigation system so that data from deployment sites around the world can be easily viewed, eg. designing experience/system to locate deployment sites.&lt;/li>
&lt;li>Refine our web-based visualization tools to add additional features for users to analyze collected data, eg. lazy loading out-of-range data or caching queried data.&lt;/li>
&lt;li>Document the tool thoroughly for future maintenance&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="open-sensing-platform-ii-hardware-to-enable-large-scale-outdoor-sensor-networks">Open Sensing Platform II: Hardware to enable large scale outdoor sensor networks&lt;/h2>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Hardware" srcset="
/project/osre24/ucsc/osp/featured_hu6708254effb609c97dc781c926e4aea5_3805876_b844f987d1fd7b63009c6d2a89b9dcf2.webp 400w,
/project/osre24/ucsc/osp/featured_hu6708254effb609c97dc781c926e4aea5_3805876_3199ed5510eaff77a8cf1f93ae26f10d.webp 760w,
/project/osre24/ucsc/osp/featured_hu6708254effb609c97dc781c926e4aea5_3805876_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/ucsc/osp/featured_hu6708254effb609c97dc781c926e4aea5_3805876_b844f987d1fd7b63009c6d2a89b9dcf2.webp"
width="760"
height="521"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Embedded system, wireless communication, low-power remote sensing&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong>
&lt;ul>
&lt;li>&lt;em>Required:&lt;/em> C/C++, Git, Github, Platformio&lt;/li>
&lt;li>&lt;em>Nice to have:&lt;/em> PCB design and debugging experience, STM32 HAL, ESP32 Arduino, protobuf, python, knowledge of standard communication protocols (I2C, SPI, and UART)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Hard&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/colleen-josephson/">Colleen Josephson&lt;/a>, &lt;a href="mailto:jtmadden@ucsc.edu">John Madden&lt;/a>, &lt;a href="mailto:sgtaylor@ucsc.edu">Stephen Taylor&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The Open Sensing Platform hardware aims to be a general purpose hardware platform for outdoor sensing (e.g. agriculture, ecological monitoring, etc.). The typical use case involves a sensor deployment in an agricultural field, remotely uploading measurements without interfering with farming operations. The current hardware revision (&lt;a href="https://github.com/jlab-sensing/soil_power_sensor" target="_blank" rel="noopener">Soil Power Sensor&lt;/a>) was originally designed for monitoring power output of microbial fuel cells using high fidelity voltage and current measurement channels, as well as auxiliary sensors such as the SDI-12 &lt;a href="https://metergroup.com/products/teros-12/" target="_blank" rel="noopener">TEROS-12 soil moisture sensor&lt;/a>. The primary activities of this project will involve low-level firmware design and implementation, but may also incorporate hardware design revisions if necessary. We are looking to expand functionality to other external sensors, as well as optimize for power consumption, via significant firmware design activities.&lt;/p>
&lt;p>Long-range, low-power wireless communication is achieved through a LoRa capable STM32 microcontroller with in-lab experiments using an ESP32 microcontroller to enable the simpler WiFi interface. Both wireless interfaces communicate upload measurements to our data visualization dashboard, &lt;strong>Open Sensing Platform I&lt;/strong>. The combined goal across both of these projects is to create a system that enables researchers to test and evaluate novel sensing solutions. We are looking to make the device usable to a wide range of researchers which may not have a background in electronics, so are interested in design activities that enhance user friendliness.&lt;/p>
&lt;p>In total there will be 2-4 people working on the hardware with progress being tracked on GitHub. Broader project planning is tracked through a Jira board. We intend to have weekly meetings to provide updates on current issue progress along with assigning tasks. Please reach out to &lt;a href="mailto:jtmadden@ucsc.edu">John Madden&lt;/a> if there are any questions or specific ideas for the project.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Deliverables:&lt;/strong> Contribution via commits to the GitHub repository with documentation on completed work. A changelog of contributions to the firmware.&lt;/li>
&lt;/ul></description></item><item><title>LiveHD</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/ucsc/livehd/</link><pubDate>Thu, 01 Feb 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/ucsc/livehd/</guid><description>&lt;p>The goals is to enable a more productive flow where the ASIC/FPGA designer can
work with multiple hardware description languages like CHISEL, Pyrope, or
Verilog.&lt;/p>
&lt;p>There are several projects, some compiler infrastructure around
&lt;a href="https://github.com/masc-ucsc/livehd" target="_blank" rel="noopener">LiveHD&lt;/a>. Others around how to interface
LLMs to improve chip design productivity.&lt;/p>
&lt;p>There are the following projects available:&lt;/p>
&lt;ul>
&lt;li>Slang with LiveHD&lt;/li>
&lt;li>Hardware Hierarchical Dynamic Structures (hdds)&lt;/li>
&lt;li>HDLEval for LLMs&lt;/li>
&lt;li>C++ Profiler Optimizer with LLMs&lt;/li>
&lt;li>Decompiler from Assembly to C++ with LLMs&lt;/li>
&lt;/ul>
&lt;h2 id="slang-with-livehd">Slang with LiveHD&lt;/h2>
&lt;h3 id="project-idea">Project Idea&lt;/h3>
&lt;p>&lt;a href="https://github.com/MikePopoloski/slang" target="_blank" rel="noopener">slang&lt;/a> is one of the best open source
Verilog front-ends available. &lt;a href="https://github.com/masc-ucsc/livehd" target="_blank" rel="noopener">LiveHD&lt;/a>
uses slang, but only a subset of Verilog is supported. The goal is to add more slang features.&lt;/p>
&lt;h3 id="project-deliverable">Project Deliverable&lt;/h3>
&lt;p>The slang/LiveHD interface creates LiveHD IR (LNAST IR). The plan is to keep
extending the translation to support more features. This is a project that
allows small steps. The goal is to support all Verilog 2001, and potentially
some System Verilog features.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> SysteVerilog, Compilers&lt;/li>
&lt;li>&lt;strong>Skills Needed:&lt;/strong> Knowledge of Verilog, C++17, some compiler background.&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large&lt;/li>
&lt;li>&lt;strong>Mentor:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jose-renau/">Jose Renau&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sakshi-garg/">Sakshi Garg&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="hardware-hierarchical-dynamic-structures-hdds">Hardware Hierarchical Dynamic Structures (hdds)&lt;/h2>
&lt;h3 id="project-idea-1">Project Idea&lt;/h3>
&lt;p>&lt;a href="https://github.com/masc-ucsc/hhds" target="_blank" rel="noopener">hdds&lt;/a> aims to build efficient tree and
graph data structures commonly used by hardware compilers. A key difference is
the hierarchical nature, and patterns.&lt;/p>
&lt;h3 id="project-deliverable-1">Project Deliverable&lt;/h3>
&lt;p>There are 2 main components: Graph and Tree.&lt;/p>
&lt;p>For each, there is a hierarchical implementation that allows to connect tree/graphs in a hieararchy.
For example, a graph can call another graph with input and outputs like a Verilog module calls other Verilog modules.&lt;/p>
&lt;p>Both classes should have iterators for traversing in topological sort.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Data structures for compilers&lt;/li>
&lt;li>&lt;strong>Skills Needed:&lt;/strong> Data structures, C++17&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large&lt;/li>
&lt;li>&lt;strong>Mentor:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jose-renau/">Jose Renau&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sakshi-garg/">Sakshi Garg&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="hdleval-for-llms">HDLEval for LLMs&lt;/h2>
&lt;h3 id="project-idea-2">Project Idea&lt;/h3>
&lt;p>LLMs can be used to create new hardware. The goal of this project is to create multiple prompts
so that LLM/compiler designers can have examples to improve their flows.&lt;/p>
&lt;h3 id="project-deliverable-2">Project Deliverable&lt;/h3>
&lt;p>The idea is to create many sample projects where a &amp;ldquo;input&amp;rdquo; creates a Verilog artifact. The specification should not assume Verilog as output because other HDLs like Chisel could be used.&lt;/p>
&lt;p>The goal is to create many sample circuits that are realistic and practical. The description can have&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Verilog, LLMs&lt;/li>
&lt;li>&lt;strong>Skills Needed:&lt;/strong> Verilog or Chisel&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Low&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Small or medium&lt;/li>
&lt;li>&lt;strong>Mentor:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jose-renau/">Jose Renau&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="c-profiler-optimizer-with-llms">C++ Profiler Optimizer with LLMs&lt;/h2>
&lt;h3 id="project-idea-3">Project Idea&lt;/h3>
&lt;p>Fine-tune, and/or RAG, a LLM to leverage profiling tools so that it can provide
code optimization recommendations for C++ and possibly Rust code.&lt;/p>
&lt;h3 id="project-deliverable-3">Project Deliverable&lt;/h3>
&lt;p>Create a Python package (poetry?) called aiprof that analyzes the execution of a C++ or Rust program and
provide code change recommendations to improve performance.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">aiprof ./binary
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>aiprof uses perf tools but also other tools like redspy, zerospy, and loadspy
to find problematic code areas and drive the GPT optimizer.&lt;/p>
&lt;p>The plan is to find several examples of transformations to have a database so
that a model like CodeLlama or mixtral can be fine-tuned with code optimization
recomendations.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> C++, perf tools&lt;/li>
&lt;li>&lt;strong>Skills Needed:&lt;/strong> C++17, Linux performance counters&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large&lt;/li>
&lt;li>&lt;strong>Mentor:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jose-renau/">Jose Renau&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="decompiler-from-assembly-to-c-with-llms">Decompiler from Assembly to C++ with LLMs&lt;/h2>
&lt;h3 id="project-idea-4">Project Idea&lt;/h3>
&lt;p>There are several decompilers from assembly to C like ghidra and retdec. The idea is to enhance
both outputs to feed an LLM to generate nicer C++ code.&lt;/p>
&lt;h3 id="project-deliverable-4">Project Deliverable&lt;/h3>
&lt;p>ghidra and retdec generate C code out of assembly. The idea is to start with
these tools as baseline, but feed it to a LLM to generate C++ code instead of
plain C.&lt;/p>
&lt;p>Create a Python package (poetry?) called aidecomp that integrates both
decompilers. It allows to target C or C++17.&lt;/p>
&lt;p>To check that the generated code is compatible with the function translated, a
fuzzer could be used. This allows aidecomp to iterate the generation if the
generated code is not equivalent.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> C++, decompilers&lt;/li>
&lt;li>&lt;strong>Skills Needed:&lt;/strong> C++17&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large&lt;/li>
&lt;li>&lt;strong>Mentor:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jose-renau/">Jose Renau&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Drishti</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/lbl/drishti/</link><pubDate>Tue, 30 Jan 2024 10:15:00 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/lbl/drishti/</guid><description>&lt;p>&lt;a href="https://github.com/hpc-io/drishti" target="_blank" rel="noopener">Drishti&lt;/a> is a novel interactive web-based analysis framework to visualize I/O traces, highlight bottlenecks, and help understand the I/O behavior of scientific applications. Drishti aims to fill the gap between the trace collection, analysis, and tuning phases. The framework contains an interactive I/O trace analysis component for end-users to visually inspect their applications&amp;rsquo; I/O behavior, focusing on areas of interest and getting a clear picture of common root causes of I/O performance bottlenecks. Based on the automatic detection of I/O performance bottlenecks, our framework maps numerous common and well-known bottlenecks and their solution recommendations that can be implemented by users.&lt;/p>
&lt;h3 id="drishti--server-side-visualization-service">Drishti / Server-side Visualization Service&lt;/h3>
&lt;p>The proposed work will include investigating and building server-side solutions to support the visualization of larger I/O traces and logs, while integrating with the existing analysis, reports, and recommendations.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>I/O&lt;/code> &lt;code>HPC&lt;/code> &lt;code>visualization&lt;/code>, &lt;code>performance analysis&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, HTML/CSS, JavaScript&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jean-luca-bez/">Jean Luca Bez&lt;/a> and &lt;a href="mailto:sbyna@lbl.gov">Suren Byna&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="drishti--visualization-and-analysis-of-ai-based-applications">Drishti / Visualization and Analysis of AI-based Applications&lt;/h3>
&lt;p>Drishti to handle metrics from non-MPI applications, specifically, AI/ML codes and applications. This work entails adapting the existing framework, heuristics, and recommendations to support metrics collected from AI/ML workloads.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>I/O&lt;/code> &lt;code>HPC&lt;/code> &lt;code>AI&lt;/code> &lt;code>visualization&lt;/code>, &lt;code>performance analysis&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, AI, performance profiling&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jean-luca-bez/">Jean Luca Bez&lt;/a> and &lt;a href="mailto:sbyna@lbl.gov">Suren Byna&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>h5bench</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/lbl/h5bench/</link><pubDate>Tue, 30 Jan 2024 10:15:00 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/lbl/h5bench/</guid><description>&lt;p>&lt;a href="https://github.com/hpc-io/h5bench" target="_blank" rel="noopener">h5bench&lt;/a> is a suite of parallel I/O benchmarks or kernels representing I/O patterns that are commonly used in HDF5 applications on high performance computing systems. h5bench measures I/O performance from various aspects, including the I/O overhead, and observed I/O rate.&lt;/p>
&lt;p>Parallel I/O is a critical technique for moving data between compute and storage subsystems of supercomputers. With massive amounts of data produced or consumed by compute nodes, high-performant parallel I/O is essential. I/O benchmarks play an important role in this process; however, there is a scarcity of I/O benchmarks representative of current workloads on HPC systems. Toward creating representative I/O kernels from real-world applications, we have created h5bench, a set of I/O kernels that exercise HDF5 I/O on parallel file systems in numerous dimensions. Our focus on HDF5 is due to the parallel I/O library&amp;rsquo;s heavy usage in various scientific applications running on supercomputing systems. The various tests benchmarked in the h5bench suite include I/O operations (read and write), data locality (arrays of basic data types and arrays of structures), array dimensionality (1D arrays, 2D meshes, 3D cubes), I/O modes (synchronous and asynchronous). h5bench measurements can be used to identify performance bottlenecks and their root causes and evaluate I/O optimizations. As the I/O patterns of h5bench are diverse and capture the I/O behaviors of various HPC applications, this study will be helpful to the broader supercomputing and I/O community.&lt;/p>
&lt;h3 id="h5bench--reporting-and-enhancing">h5bench / Reporting and Enhancing&lt;/h3>
&lt;p>The proposed work will include standardizing and enhancing the reports generated by the suite, and integrate additional I/O kernels (e.g., HACC-IO).&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>I/O&lt;/code> &lt;code>HPC&lt;/code> &lt;code>benchmarking&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, C/C++, good communicator&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jean-luca-bez/">Jean Luca Bez&lt;/a> and &lt;a href="mailto:sbyna@lbl.gov">Suren Byna&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="h5bench--compression">h5bench / Compression&lt;/h3>
&lt;p>The proposed work will focus on including compression capabilities into the h5bench core access patterns through HDF5 filters.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>I/O&lt;/code> &lt;code>HPC&lt;/code> &lt;code>benchmarking&lt;/code>, &lt;code>compression&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> C/C++, Python, HDF5&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jean-luca-bez/">Jean Luca Bez&lt;/a> and &lt;a href="mailto:sbyna@lbl.gov">Suren Byna&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>OpenROAD - An Open-Source, Autonomous RTL-GDSII Flow for Chip Design</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/openroad/openroad/</link><pubDate>Mon, 22 Jan 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/openroad/openroad/</guid><description>&lt;p>The &lt;a href="https://theopenroadproject.org" target="_blank" rel="noopener">OpenROAD&lt;/a> project is a non-profit project, originally funded by DARPA with the aim of creating open-source EDA tools; an Autonomous flow from RTL-GDSII that completes &amp;lt; 24 hrs, to lower cost and boost innovation in IC design. This project is now supported by &lt;a href="precisioninno.com">Precision Innovations&lt;/a>.&lt;/p>
&lt;p>OpenROAD massively scales and supports EWD (Education and Workforce Development) and supports a broad ecosystem making it a vital tool that supports a rapidly growing Semiconductor Industry.&lt;/p>
&lt;p>OpenROAD is the fastest onramp to gain knowledge, skills and create pathways for great career opportunities in chip design. You will develop important software and hardware design skills by contributing to these interesting projects. You will also have the opportunity to work with mentors from the OpenROAD project and other industry experts.&lt;/p>
&lt;p>We welcome a diverse community of designers, researchers, enthusiasts, software engineers and entrepreneurs to use and contribute to OpenROAD and make a far-reaching impact in the rapidly growing, global Semiconductor Industry.&lt;/p>
&lt;h3 id="create-openroad-tutorials-and-videos">Create OpenROAD Tutorials and Videos&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Documentation&lt;/code>, &lt;code>Tutorials&lt;/code>, &lt;code>Videos&lt;/code>, &lt;code>VLSI design basics&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Video/audio recording and editing, training and education&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/indira-iyer/">Indira Iyer&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/vitor-bandeira/">Vitor Bandeira&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Create short videos for training and course curriculum highlighting key features and flows in &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts" target="_blank" rel="noopener">OpenROAD-flow-scripts&lt;/a>.&lt;/p>
&lt;h3 id="improve-the-openroad-autotuner-flow-and-documentation">Improve the OpenROAD AutoTuner Flow and documentation&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>OpenROAD-flow-scripts&lt;/code>, &lt;code>AutoTuner&lt;/code>, &lt;code>Design Exploration&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Knowledge of ML for hyperparameter tuning, Cloud-based computation, Basic VLSI design and tools knowledge, python, C/C++&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/vitor-bandeira/">Vitor Bandeira&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/indira-iyer/">Indira Iyer&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Test, analyze and enhance the &lt;a href="https://openroad-flow-scripts.readthedocs.io/en/latest/user/InstructionsForAutoTuner.html" target="_blank" rel="noopener">AutoTuner&lt;/a> to improve usability, documentation and QoR. The Autotuner is an important tool in the OpenROAD flow - &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts" target="_blank" rel="noopener">OpenROAD-flow-scripts&lt;/a> for Chip design exploration that significantly reduces design time. You will use state-of-the-art ML tools to test the current tool exhaustively for good PPA (performance, power, area) results. You will also update existing documentation to reflect any changes to the tool and flow.&lt;/p>
&lt;h3 id="implement-a-memory-compiler-in-the-openroad-flow">Implement a memory compiler in the OpenROAD Flow&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>OpenROAD-flow-scripts&lt;/code>, &lt;code>Memory Compiler&lt;/code>,&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Basic VLSI design and tools knowledge, python, tcl, C/C++, memory design a plus&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium (175 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/matt-liberty/">Matt Liberty&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/austin-rovinski/">Austin Rovinski&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Implement a memory compiler as part of the OpenROAD flow to improve the placement and layout efficiency of large, memory-intensive designs. You will start with an existing code base to develop this feature: &lt;a href="https://github.com/The-OpenROAD-Project-staging/OpenROAD/tree/dffram" target="_blank" rel="noopener">https://github.com/The-OpenROAD-Project-staging/OpenROAD/tree/dffram&lt;/a>
This is another option: &lt;a href="https://github.com/AUCOHL/DFFRAM" target="_blank" rel="noopener">https://github.com/AUCOHL/DFFRAM&lt;/a>
Enhance code to support DFFRAM support for the OpenROAD native flow, &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts" target="_blank" rel="noopener">OpenROAD-flow-scripts&lt;/a>.&lt;/p>
&lt;h3 id="integrate-a-tcl-and-python-linter">Integrate a tcl and python linter&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Linting&lt;/code>, &lt;code>Workflow&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: tcl, python, linting&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Easy&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Small (90 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/vitor-bandeira/">Vitor Bandeira&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/austin-rovinski/">Austin Rovinski&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Integrate a tcl and python linter for tools in OpenROAD and &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts" target="_blank" rel="noopener">OpenROAD-flow-scripts&lt;/a> to enforce error checking, style and best practices.&lt;/p>
&lt;h3 id="llm-assistant-for-openroad---create-model-architecture-and-prototype">LLM assistant for OpenROAD - Create Model Architecture and Prototype&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Large Language Model&lt;/code>, &lt;code>Machine Learning&lt;/code>, &lt;code>Model Architecture&lt;/code>, &lt;code>Model Deployment&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: large language model engineering, prompt engineering, fine-tuning&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium (175 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/indira-iyer/">Indira Iyer&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jack-luar/">Jack Luar&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>This project involves the creation of a conversational assistant designed around &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD" target="_blank" rel="noopener">OpenROAD&lt;/a> to answer user queries. You will be working in tandem with members of the OpenROAD team and other researchers to deliver a final deployable prototype. You will focus on the design and implementation of modular LLM architectures. You will be experimenting through different architectures and justifying which approach works the best on our domain-specific data. Open to proposals from all levels of ML practitioners.&lt;/p>
&lt;h3 id="llm-assistant-for-openroad---data-engineering-and-testing">LLM assistant for OpenROAD - Data Engineering and testing&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Large Language Model&lt;/code>, &lt;code>Machine Learning&lt;/code>, &lt;code>Data Engineering&lt;/code>, &lt;code>Model Deployment&lt;/code>, &lt;code>Testing&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: large language model engineering, prompt engineering, fine-tuning&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium (175 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/indira-iyer/">Indira Iyer&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jack-luar/">Jack Luar&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>This project involves the creation of a conversational assistant designed around &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD" target="_blank" rel="noopener">OpenROAD&lt;/a> to answer user queries. You will be working in tandem with members of the OpenROAD team and other researchers to deliver a final deployable prototype. This project will focus on the data engineering portion of the project. This may include: training pipelines specifically tailored for fine-tuning LLM models, data annotation, preprocessing and augmentation. Open to proposals from all levels of ML practitioners.&lt;/p>
&lt;h3 id="create-unit-tests-for-openroad-tools">Create Unit tests for OpenROAD tools&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>OpenROAD-flow-scripts&lt;/code>, &lt;code>unit testing&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Basic VLSI design and tools knowledge, python, tcl, C/C++, Github&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium ( 175 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/vitor-bandeira/">Vitor Bandeira&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/indira-iyer/">Indira Iyer&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>You will build unit tests to test specific features of the OpenROAD tool which will become part of the regression test. Here is an example of a test for UPF support: &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD/blob/master/test/upf/mpd_aes.upf" target="_blank" rel="noopener">https://github.com/The-OpenROAD-Project/OpenROAD/blob/master/test/upf/mpd_aes.upf&lt;/a>.
This is a great way to learn VLSI flow basics and the art of testing them for practical applications.&lt;/p></description></item><item><title>AIIO / Graph Neural Network</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/lbl/aiio/</link><pubDate>Wed, 17 Jan 2024 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/lbl/aiio/</guid><description>&lt;p>[AIIO] (&lt;a href="https://github.com/hpc-io/aiio" target="_blank" rel="noopener">https://github.com/hpc-io/aiio&lt;/a>) revolutionizes the way for users to automatically tune the I/O performance of applications on HPC systems. It currently works on linear regression models but has more opportunities to work on heterogeneous data, such as programming info. This requires extending the linear regression model to more complex models, such as heterogeneous graph neural networks. The proposed work will include developing the graph neural work-based model to predict the I/O performance and interpretation.&lt;/p>
&lt;h3 id="aiio--graph-neural-network">AIIO / Graph Neural Network&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: AIIO/Graph Neural Network`&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Python, Github, Machine Learning&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Difficult&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/bin-dong/">Bin Dong&lt;/a>, Suren Byna&lt;/li>
&lt;/ul>
&lt;p>The Specific tasks of the project include:&lt;/p>
&lt;ul>
&lt;li>Develop the data pre-processing pipeline to convert I/O logs into formats which are required by the Graph Neural Network&lt;/li>
&lt;li>Build and test the Graph Neural Network to model the I/O performance for HPC applications.&lt;/li>
&lt;li>Test and evaluate the accuracy of the Graph Neural Network with test cases from AIIO&lt;/li>
&lt;/ul></description></item><item><title>FasTensor / Stream Processing</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/lbl/fastensor/</link><pubDate>Wed, 17 Jan 2024 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/lbl/fastensor/</guid><description>&lt;p>[FasTensor] (&lt;a href="https://github.com/BinDong314/FasTensor" target="_blank" rel="noopener">https://github.com/BinDong314/FasTensor&lt;/a>) is a generic tensor processing engine with scalability from single nodes to thousands of nodes on HPC. FasTensor supports applications from traditional SQL query to complex DFT solver in scientific applications. It has a 1000X performance advantage over MapReduce and Spark in supporting generic data processing functions on tensor structure. In this project, we propose to expand FasTensor with streaming functionality to support online data processing. Specifically, participants of this project will develop a stream endpoint for retrieving live data output from applications, such as DAS. The stream endpoint performs the function to maintain the pointer of data, which could be a n-dimensional subset of a tensor.&lt;/p>
&lt;h3 id="fastensor--stream-processing">FasTensor / Stream Processing&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>FasTensor/Streaming Processing&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++, github&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Difficult&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/bin-dong/">Bin Dong&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/john-wu/">John Wu&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The Specific tasks of the project include:&lt;/p>
&lt;ul>
&lt;li>Building a mock workflow based on our DAS application (&lt;a href="https://github.com/BinDong314/DASSA" target="_blank" rel="noopener">https://github.com/BinDong314/DASSA&lt;/a>) to test stream processing. The mock workflow comprises a data producer, which generates DAS data, and a data consumer, which processes the data.&lt;/li>
&lt;li>Developing a Stream Endpoint (e.g., I/O driver) to iteratively read dynamically increasing data from a directory. The stream endpoint essentially includes open, read, and write functions, and a pointer to remember current file pointer.&lt;/li>
&lt;li>Integrating the Stream Endpoint into the FasTensor library.&lt;/li>
&lt;li>Evaluating the performance of the mock workflow with the new Stream Endpoint.&lt;/li>
&lt;li>Documenting the execution mechanism.&lt;/li>
&lt;/ul></description></item><item><title>PolyPhy</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/ucsc/polyphy/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/ucsc/polyphy/</guid><description>&lt;p>&lt;a href="https://github.com/PolyPhyHub/PolyPhy" target="_blank" rel="noopener">PolyPhy&lt;/a> is a GPU oriented agent-based system for reconstructing and visualizing &lt;em>optimal transport networks&lt;/em> defined over sparse data. Rooted in astronomy and inspired by nature, we have used an early prototype called &lt;a href="https://github.com/CreativeCodingLab/Polyphorm" target="_blank" rel="noopener">Polyphorm&lt;/a> to reconstruct the &lt;a href="https://youtu.be/5ILwq5OFuwY" target="_blank" rel="noopener">Cosmic web&lt;/a> structure, but also to discover network-like patterns in natural language data. You can see an instructive overview of PolyPhy in our &lt;a href="https://elek.pub/workshop_cross2022.html" target="_blank" rel="noopener">workshop&lt;/a> and more details about our research &lt;a href="https://elek.pub/projects/Rhizome-Cosmology" target="_blank" rel="noopener">here&lt;/a>.&lt;/p>
&lt;p>Under the hood, PolyPhy uses a richer 3D scalar field representation of the reconstructed network, instead of a typical discrete representation like a graph or a mesh. The ultimate purpose of PolyPhy is to become a toolkit for a range of specialists across different disciplines: astronomers, neuroscientists, data scientists and even artists and designers. PolyPhy aspires to be a tool for discovering connections between different disciplines by creating quantitatively comparable structural analytics.&lt;/p>
&lt;h3 id="polyphy-web-presence">PolyPhy Web Presence&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Web Development&lt;/code> &lt;code>UX&lt;/code> &lt;code>Social Media&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> full stack web development, Javascript, good communicator&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/oskar-elek/">Oskar Elek&lt;/a>, &lt;a href="mailto:ez@nmsu.edu">Ezra Huscher&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The online presentation of a software project is without a doubt one of the core ingredients of its success. This project aims to develop a sustainable web presentce for PolyPhy, catering to interested contributors, active collaborators, and users alike.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Closely work with the mentors on understanding the context of the project and its detailed requirements in preparation of the proposal.&lt;/li>
&lt;li>Port the existing &lt;a href="https://polyphy.io" target="_blank" rel="noopener">website&lt;/a> into a more modern Javascript framework (such as Next.js) that provides a user-friendly CMS and admin interface.&lt;/li>
&lt;li>Update the contents of the website with new information from the repository &lt;a href="https://github.com/CreativeCodingLab/Polyphorm" target="_blank" rel="noopener">repository page&lt;/a> as well as other sources as directed by the mentors.&lt;/li>
&lt;li>Develop a simple functional system for posting updates about the project to selected social media and other communication platforms (LinkedIn, Twitter/X or Mastodon, mailing list) which will also be reflected on the website.&lt;/li>
&lt;li>Optional: improve the UX of the website where needed.&lt;/li>
&lt;li>Optional: implement website analytics (visitor stats etc).&lt;/li>
&lt;/ul>
&lt;h3 id="data-visualization-and-analysis-with-polyphypolyglot">Data Visualization and Analysis with PolyPhy/Polyglot&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Data Science&lt;/code> &lt;code>Data Visualization&lt;/code> &lt;code>Point Clustering&lt;/code> &lt;code>3D&lt;/code> &lt;code>Neural Embeddings&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> data science, Python, Javascript, statistics, familiarity with AI and latent embedding spaces a big plus&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350+ hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/oskar-elek/">Oskar Elek&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/kiran-deol/">Kiran Deol&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The aim of this project is to explore a novel data-scientific usecase using PolyPhy and its associated web visualization interface &lt;a href="https://github.com/PolyPhyHub/PolyGlot" target="_blank" rel="noopener">PolyGlot&lt;/a>. The contributor is expected to identify a dataset they are already well familiar with, and that fits the application scope of the PolyPhy/PolyGlot tooling: a complex point cloud arising from a 3D or a higher dimensional process which will benefit from latent pattern identification and a subsequent visual as well as quantitative analysis. The contributor needs to have the rights for using the dataset - either by owning the copyright or via the open-source nature of the data.&lt;/p>
&lt;p>&lt;strong>Specific tasks:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Closely work with the mentors on understanding the context of the project and its detailed requirements in preparation of the proposal.&lt;/li>
&lt;li>Become acquainted with the tooling (PolyPhy, PolyGlot) prior to the start of the project period.&lt;/li>
&lt;li>Document the nature of the target dataset and define the complete data pipeline with assistance of the mentors, including the specific analytic tasks and objectives.&lt;/li>
&lt;li>Implement the data pipeline in PolyPhy and PolyGlot.&lt;/li>
&lt;li>Document the process and resulting findings in a publicly available report.&lt;/li>
&lt;/ul></description></item><item><title>These 4 new features will change the way you use OpenROAD</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20231029-luarss/</link><pubDate>Sun, 29 Oct 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20231029-luarss/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Welcome to the final blog post for my GSoC’23! Once again, my name is
Jack and I am working under the open-source electronic design automation
project - OpenROAD. We are a fast growing leading open-source
foundational application for semiconductor digital design, as evidenced
from our consistent star growth since inception. You may check us out
at this &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD/" target="_blank" rel="noopener">link&lt;/a>.
Allow me to share the four significant contributions I made in this GSoC
project.&lt;/p>
&lt;p>&lt;a href="https://star-history.com/#The-OpenROAD-Project/OpenROAD&amp;amp;Date" target="_blank" rel="noopener">
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://api.star-history.com/svg?repos=The-OpenROAD-Project/OpenROAD&amp;amp;type=Date" alt="Star History Chart" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/a>&lt;/p>
&lt;h2 id="1-improving-ease-of-installation">1) Improving Ease of Installation&lt;/h2>
&lt;p>Firstly, OpenROAD is now able to support multiple operating systems.
This is essential as one of our primary goals is to democratise chip
implementation. And installation is often one of the hardest steps
to get right, so that was one of our priorities. Today, we have
provided options for different types of installation:&lt;/p>
&lt;ul>
&lt;li>&lt;em>Prebuilt binaries&lt;/em>: Local installations can often be riddled
with incompatibilities or unexpected bugs, as well as taking a long
compilation time. We sidestepped this by providing semi-regular
updates to OpenROAD binary, reducing the time to installation.&lt;/li>
&lt;li>&lt;em>Docker&lt;/em>: Echoing previous concerns, we also enabled Docker installation
for 9 major operating systems. Docker is extremely flexible and runs
on many operating systems (as long as it is supported by Docker).&lt;/li>
&lt;/ul>
&lt;p>With these changes, we have observed 10% reduction of installation related Github issues posted on a weekly basis.&lt;/p>
&lt;p>
&lt;figure id="figure-figure-1-supported-os-matrix">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Pic1" srcset="
/report/osre23/ucsd/openroad/20231029-luarss/pic1_hu40f387e99db4aa81085a02b3bc75ebae_22326_5ec6a03672875da1d114ed8b24e54d81.webp 400w,
/report/osre23/ucsd/openroad/20231029-luarss/pic1_hu40f387e99db4aa81085a02b3bc75ebae_22326_256594bafdfffa842322c55b991f1ae1.webp 760w,
/report/osre23/ucsd/openroad/20231029-luarss/pic1_hu40f387e99db4aa81085a02b3bc75ebae_22326_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20231029-luarss/pic1_hu40f387e99db4aa81085a02b3bc75ebae_22326_5ec6a03672875da1d114ed8b24e54d81.webp"
width="650"
height="608"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 1: Supported OS matrix
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;h2 id="2-filling-missing-documentation">2) Filling Missing Documentation&lt;/h2>
&lt;p>Next, we have made considerable improvements to over 20 tool-specific
documentations, introducing consistent formatting styles for each page.
We introduce default values and datatypes to allow users to use the
tools with greater ease.&lt;/p>
&lt;p>
&lt;figure id="figure-figure-2-helpful-documentation-defaults-and-datatype">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Pic2" srcset="
/report/osre23/ucsd/openroad/20231029-luarss/pic2_hu909e40a774da931354132b6c4f3b2165_22459_f20854090d02e2c8c4eab994e275b52a.webp 400w,
/report/osre23/ucsd/openroad/20231029-luarss/pic2_hu909e40a774da931354132b6c4f3b2165_22459_2d201fd5ada34b46714b076a84194e28.webp 760w,
/report/osre23/ucsd/openroad/20231029-luarss/pic2_hu909e40a774da931354132b6c4f3b2165_22459_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20231029-luarss/pic2_hu909e40a774da931354132b6c4f3b2165_22459_f20854090d02e2c8c4eab994e275b52a.webp"
width="691"
height="368"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 2: Helpful documentation defaults and datatype
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;p>Rather than having all arguments for a function under a common table,
we separated out into developer arguments and developer commands.
This is to further make our documentation more beginner-friendly to read,
while not alienating our technical userbase. We have also added sections
for example scripts and regression test, so as to help onboard
newcomers to each tool of the flow.&lt;/p>
&lt;p>
&lt;figure id="figure-figure-3-useful-developer-commands-example-scripts-and-regression-test-instructions">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Pic3" srcset="
/report/osre23/ucsd/openroad/20231029-luarss/pic3_huf8b2e6da7ee6998c3390f4691d0458af_30285_e3fcd088f5df4574a67cf6d097c9e73a.webp 400w,
/report/osre23/ucsd/openroad/20231029-luarss/pic3_huf8b2e6da7ee6998c3390f4691d0458af_30285_1ceeb7f590547f00904c173b5a084798.webp 760w,
/report/osre23/ucsd/openroad/20231029-luarss/pic3_huf8b2e6da7ee6998c3390f4691d0458af_30285_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20231029-luarss/pic3_huf8b2e6da7ee6998c3390f4691d0458af_30285_e3fcd088f5df4574a67cf6d097c9e73a.webp"
width="690"
height="670"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 3: Useful developer commands, example scripts, and regression test instructions
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;h2 id="3-extensible-documentation-framework">3) Extensible Documentation Framework&lt;/h2>
&lt;p>Thirdly, we have introduced extensible documentation frameworks.
Now, what do we mean by &lt;em>extensible&lt;/em>? It means we have created an
infrastructure which is easy to use for developers, and allows for
greater maintanability. Our goal is to create something that
requires minimal changes to add content for documentation.&lt;/p>
&lt;p>So, how did we do this?&lt;/p>
&lt;p>We introduced 4 initiatives, namely: the warning/error messages glossary.
We noticed that people were searching for error and warning messages,
but our documentation did not have them. So we added a page where all
the error/warning messages along with relevant code line number can
be generated automatically. On top of that, developers can add useful
debug information to help the end user.&lt;/p>
&lt;p>
&lt;figure id="figure-figure-4-warningerror-messages-glossary">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Pic4" srcset="
/report/osre23/ucsd/openroad/20231029-luarss/pic4_hu4de9242319e92c4f80050403ede9a5eb_17089_aa069c4f5f2d1682fc92525139f6d57c.webp 400w,
/report/osre23/ucsd/openroad/20231029-luarss/pic4_hu4de9242319e92c4f80050403ede9a5eb_17089_881f56c79ec21ee86b422f9eb12ef3c8.webp 760w,
/report/osre23/ucsd/openroad/20231029-luarss/pic4_hu4de9242319e92c4f80050403ede9a5eb_17089_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20231029-luarss/pic4_hu4de9242319e92c4f80050403ede9a5eb_17089_aa069c4f5f2d1682fc92525139f6d57c.webp"
width="687"
height="348"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 4: Warning/Error messages glossary.
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;p>Next, we also introduced automatically generated Doxygen pages, which
integrates nicely into our C++/Tcl source code framework. This automatic
generation will make it much more convenient for developers to just
insert comments into their source code, and allow Doxygen to generate
documentation automatically.&lt;/p>
&lt;p>
&lt;figure id="figure-figure-5-doxygen-pages">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Pic5" srcset="
/report/osre23/ucsd/openroad/20231029-luarss/pic5_hu5bbe3008d2202e9240368dd966dc7b39_37072_567ad1b2725278073bfe8cdf4d2dad6a.webp 400w,
/report/osre23/ucsd/openroad/20231029-luarss/pic5_hu5bbe3008d2202e9240368dd966dc7b39_37072_35b25ed8006816a0cd300dba6aedb4a3.webp 760w,
/report/osre23/ucsd/openroad/20231029-luarss/pic5_hu5bbe3008d2202e9240368dd966dc7b39_37072_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20231029-luarss/pic5_hu5bbe3008d2202e9240368dd966dc7b39_37072_567ad1b2725278073bfe8cdf4d2dad6a.webp"
width="760"
height="578"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 5: Doxygen pages.
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;p>Next, we introduced cloud-based packaging. It is important that our
framework is able to runnable on cloud, and the ever-popular notebook
format. Our Colab based notebook was created with this in mind, and
allows for easy transfer to other notebook providers with some
modifications. Check out the notebooks here!&lt;/p>
&lt;p>
&lt;figure id="figure-figure-6-google-colab-can-now-run-openroad-scripts">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Pic6" srcset="
/report/osre23/ucsd/openroad/20231029-luarss/pic6_hu84acc4eba83f1de30ea399aa678d63ae_48463_0f20b3a36a05036a4602868c18f0da9b.webp 400w,
/report/osre23/ucsd/openroad/20231029-luarss/pic6_hu84acc4eba83f1de30ea399aa678d63ae_48463_125685c82e5be8372c2ae4b937fdd412.webp 760w,
/report/osre23/ucsd/openroad/20231029-luarss/pic6_hu84acc4eba83f1de30ea399aa678d63ae_48463_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20231029-luarss/pic6_hu84acc4eba83f1de30ea399aa678d63ae_48463_0f20b3a36a05036a4602868c18f0da9b.webp"
width="760"
height="321"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 6: Google Colab can now run OpenROAD scripts.
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;p>Lastly, we have the changelog workflow which can be triggered manually.
For our open-source project, we have chosen not to do software releases.
This means it can be difficult to track the changes between commit
numbers. Adding this workflow can help newcomers track the changes
easier, by month.&lt;/p>
&lt;p>
&lt;figure id="figure-figure-7-sample-output-of-github-changelog">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Pic7" srcset="
/report/osre23/ucsd/openroad/20231029-luarss/pic7_hu4e7dae0ef8916646279c834f2bbbed59_40244_a13d29d9b1d8fe53307365f5dfd84d86.webp 400w,
/report/osre23/ucsd/openroad/20231029-luarss/pic7_hu4e7dae0ef8916646279c834f2bbbed59_40244_9baeb333eb95f59c9ac1004e0e9fd54c.webp 760w,
/report/osre23/ucsd/openroad/20231029-luarss/pic7_hu4e7dae0ef8916646279c834f2bbbed59_40244_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20231029-luarss/pic7_hu4e7dae0ef8916646279c834f2bbbed59_40244_a13d29d9b1d8fe53307365f5dfd84d86.webp"
width="760"
height="400"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 7: Sample output of github changelog
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;h2 id="4-openroad-chatbot">4) OpenROAD Chatbot&lt;/h2>
&lt;p>Finally, we are also discussing the potential of creating a chatbot whose
purpose is to answer user queries. We were thinking, there are lots of
domain knowledge in Slack Channels, Github repos, and so on, so why
not create a LLM-based chatbot. Stay tuned for updates!&lt;/p>
&lt;h2 id="personal-reflections">Personal Reflections&lt;/h2>
&lt;p>To me, my most valuable takeaway is with regards to code quality. Often
times, we as coders tend to opt for the best solution and “hack” something
out quickly. Hacking is fine, as a proof of concept - but not for
long term code development. Working in open-source projects like this,
I have learnt to avoid creating unnecessary files, shortening the code
and optimising runtime. In doing our job, we also wish to make life
easier, not harder for future developers&lt;/p>
&lt;h2 id="final-words">Final Words&lt;/h2>
&lt;p>I would like to express my gratitude to my mentors Indira and Vitor for
their guidance and insight throughout the project, as well as the
OpenROAD dev team for their assistance. Would also like to thank the
Google Summer of Code organising committee, and UCSC for creating such a
wonderful program. Being able to contribute to actual real open-source
projects with real needs, is truly the best of both worlds for aspiring
programmers.&lt;/p></description></item><item><title>Final GSoC Blog - Polyglot</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/polyphy/20230925-kirandeol/</link><pubDate>Mon, 25 Sep 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/polyphy/20230925-kirandeol/</guid><description>&lt;p>As I send in my final work submission for the final GSoC evaluation, I&amp;rsquo;m excited to share with you the progress we&amp;rsquo;ve made this summer (and future plans for Polyglot!). You can view the repository and web app here: &lt;a href="https://polyphyhub.github.io/PolyGlot/" target="_blank" rel="noopener">https://polyphyhub.github.io/PolyGlot/&lt;/a>. As a quick reminder of the project, we sought to extend the Polyglot web app, as developed by Hongwei (Henry) Zhou. For context, the web app follows this methodology:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Given a set of words, use an embedding model (such as Word2Vec, BERT, etc.) to generate a set of high dimensional points associated with each word.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Use a dimensionality reduction method (such as UMAP) to reduce the dimensionality of each word-vector point to 3 dimensions&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Use the novel MCPM (Monte Carlo Physarum Machine) to compute the similarities between a set of anchor points and the rest of the point cloud. You could use any similarity metric here, too, such as the Euclidean distance.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The web app then displays the point cloud of 3-dimensional embeddings, but uses coloring to indicate the level of MCPM similarity each word has with the anchor point (e.g, if the anchor point is the word “dog”, the rest of the point cloud is colored such that words identified as similar to “dog” by the MCPM metric are brighter, whereas dissimilar words are darker.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>The main results since the last blog are summarized as follows:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Novel timeline feature in which users can track the importance of certain words over time by watching the change in size of points (computes the IF-IDF metric for a word across all documents in a given year). Uses linear interpolation for years which do not have an explicit importance score.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>An industrial collaboration with UK startup Lautonomy, where we have pre-processed and entered their data into Polyglot. Pre-processing consisted of first computing a high dimensional embedding of their set of words using OpenAI&amp;rsquo;s CLIP model &lt;a href="https://openai.com/research/clip" target="_blank" rel="noopener">https://openai.com/research/clip&lt;/a> and the CLIP-as-service Python package &lt;a href="https://clip-as-service.jina.ai" target="_blank" rel="noopener">https://clip-as-service.jina.ai&lt;/a>. Next, we used UMAP to reduce the dimensionality of these embeddings to 3D. We computed the Euclidean distance on this data (in place of MCPM metric). Finally, we formatted the data to enter into Polyglot.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>Although the app has developed a lot over the summer, we are planning to continue working on Polyglot, particularly with respect to one of our original goals: to set up a pipeline from PolyPhy to Polyglot. Unfortunately, with PolyPhy undergoing refactoring this summer, we weren&amp;rsquo;t able to set this pipeline up. However, that is one of our goals for the next few months. We are also moving forward with the industrial collaboration with legal analytics startup Lautonomy. We hope to release an output together soon!&lt;/p>
&lt;p>If you&amp;rsquo;re curious about Polyglot or are interesting in getting involved, please feel free to reach out to myself, Oskar Elek, and Jasmine Otto!&lt;/p></description></item><item><title>KV store final Blog</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/kvstore/20230825-manank/</link><pubDate>Fri, 25 Aug 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/kvstore/20230825-manank/</guid><description>&lt;p>Hello again!
Before we get started, take a look at my previous blogs, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/kvstore/20230526-manank">Introduction&lt;/a> and
&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/kvstore/20230730-manank">Mid Term&lt;/a>. The goal of the project was to implement io_uring based backend driver for client side, which was at
that time using traditional sockets. The objective was improving performance from the zero copy capabilities of io uring. In the process, I learnt about many things,
about &lt;a href="https://gitlab.com/kinetic-storage/libkinetic/-/tree/develop" target="_blank" rel="noopener">libkinetic&lt;/a> and KV stores in general.&lt;/p>
&lt;p>I started by writing a separate driver using io_uring in libkinetic/src in ktli_uring.c, most of which is similar to the sockets backend in ktli_sockets.c. The only
difference was in the send and receive functions. For more detailed description about the implementation, refer to the mid term blog.&lt;/p>
&lt;p>After the implementation, it was time to put it to test. We ran extensive benchmarks with a tool called &lt;a href="https://fio.readthedocs.io/en/latest/fio_doc.html" target="_blank" rel="noopener">fio&lt;/a>, which
is generally used to run tests on filesystems and other IO related things. Thanks to Philip, who had already written an IO engine for testing kinetic KV store (&lt;a href="https://github.com/pkufeldt/fio" target="_blank" rel="noopener">link&lt;/a>), I didn&amp;rsquo;t have much problem in setting up the testbench. Again thanks to Philip, He set up a ubuntu server with the kinetic server
and gave me access through ssh. We ran extensive tests on that server, with both socket and uring backends, with several different block sizes. The link to the benchmarks sheet can be found &lt;a href="https://docs.google.com/spreadsheets/d/1HE7-KbxSqYZ3vmTZiJYoq21P7zfymU7N/edit?usp=sharing&amp;amp;ouid=116274960434137108384&amp;amp;rtpof=true&amp;amp;sd=true" target="_blank" rel="noopener">here&lt;/a>.&lt;/p>
&lt;p>We spent a lot of time in reading and discussing the numbers, probably the most time consuming part of the project, we had several long discussions analyzing numbers
and their implications, for example in the initial tests, we were getting very high std dev in mean send times, then we figured it was because of the network
bottleneck, as we were using large block sizes and filling up the 2.5G network bandwidth quickly.&lt;/p>
&lt;p>In conclusion, we found out that there are many other major factors affecting the performance of the KV store, for example the network, and the server side of the KV
store. Thus, though io_uring offers performance benefit at the userspace-kernel level, in this case, there were other factors that had more significant effect than the
kernal IO stack on the client side. Thus, for increasing the performance, we need to look at the server side&lt;/p>
&lt;p>I would like to thank Philip and Aldrin for their unwavering support and in depth discussions on the topic in our weekly meetings, I learned a lot from them
throughout the entire duration of the project.&lt;/p></description></item><item><title>Grammar, Parsers, and Queries</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/livehd/20230819-rbaxt/</link><pubDate>Sat, 12 Aug 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/livehd/20230819-rbaxt/</guid><description>&lt;h2 id="update-on-tree-sitter-pyrope">Update on tree-sitter-pyrope&lt;/h2>
&lt;p>The pyrope hardware description language now has syntax highlighting available for neovim users.
The &lt;a href="https://github.com/masc-ucsc/tree-sitter-pyrope" target="_blank" rel="noopener">repository&lt;/a> includes a guide to installing the parser, and activating highlights.
After we have tested the syntax highlighting, a pull request will be made to the &lt;a href="https://github.com/nvim-treesitter/nvim-treesitter" target="_blank" rel="noopener">nvim-treesitter repository&lt;/a>.
In this post, I will outline the highlighting process and reflect on a useful feature of neovim.&lt;/p>
&lt;h3 id="syntax-trees">Syntax Trees&lt;/h3>
&lt;p>The pyrope language is described by a grammar. A grammar is a set of rules that describes the allowed structure of a language.
A parser uses the grammar to generate a syntax tree. For example, consider this line of pyrope code.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-shell" data-lang="shell">&lt;span class="line">&lt;span class="cl">var a:u32 &lt;span class="o">=&lt;/span> &lt;span class="m">0&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Using the pyrope parser, we can get a syntax tree for this statement.
The command &lt;code>tree-sitter parse file.prp&lt;/code> gives us the following output.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-shell" data-lang="shell">&lt;span class="line">&lt;span class="cl">&lt;span class="o">(&lt;/span>statement &lt;span class="o">[&lt;/span>1, 0&lt;span class="o">]&lt;/span> - &lt;span class="o">[&lt;/span>1, 13&lt;span class="o">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">(&lt;/span>assignment_or_declaration_statement &lt;span class="o">[&lt;/span>1, 0&lt;span class="o">]&lt;/span> - &lt;span class="o">[&lt;/span>1, 13&lt;span class="o">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> decl: &lt;span class="o">(&lt;/span>var_or_let_or_reg &lt;span class="o">[&lt;/span>1, 0&lt;span class="o">]&lt;/span> - &lt;span class="o">[&lt;/span>1, 3&lt;span class="o">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> lvalue: &lt;span class="o">(&lt;/span>complex_identifier &lt;span class="o">[&lt;/span>1, 4&lt;span class="o">]&lt;/span> - &lt;span class="o">[&lt;/span>1, 5&lt;span class="o">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">(&lt;/span>identifier &lt;span class="o">[&lt;/span>1, 4&lt;span class="o">]&lt;/span> - &lt;span class="o">[&lt;/span>1, 5&lt;span class="o">]))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> type: &lt;span class="o">(&lt;/span>type_cast &lt;span class="o">[&lt;/span>1, 5&lt;span class="o">]&lt;/span> - &lt;span class="o">[&lt;/span>1, 9&lt;span class="o">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> type: &lt;span class="o">(&lt;/span>primitive_type &lt;span class="o">[&lt;/span>1, 6&lt;span class="o">]&lt;/span> - &lt;span class="o">[&lt;/span>1, 9&lt;span class="o">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">(&lt;/span>sized_integer_type &lt;span class="o">[&lt;/span>1, 6&lt;span class="o">]&lt;/span> - &lt;span class="o">[&lt;/span>1, 9&lt;span class="o">])))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> operator: &lt;span class="o">(&lt;/span>assignment_operator &lt;span class="o">[&lt;/span>1, 10&lt;span class="o">]&lt;/span> - &lt;span class="o">[&lt;/span>1, 11&lt;span class="o">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> rvalue: &lt;span class="o">(&lt;/span>constant &lt;span class="o">[&lt;/span>1, 12&lt;span class="o">]&lt;/span> - &lt;span class="o">[&lt;/span>1, 13&lt;span class="o">])))&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The nvim-treesitter syntax highlighting is based on this tree structure.&lt;/p>
&lt;h3 id="queries">Queries&lt;/h3>
&lt;p>A query is an expression that selects nodes from the tree.
For example,&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-shell" data-lang="shell">&lt;span class="line">&lt;span class="cl">&lt;span class="o">(&lt;/span>complex_identifier &lt;span class="o">(&lt;/span>identifier&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>matches any identifier that is the child of a complex_identifier.
Color schemes in neovim assign colors to different highlight groups.
So, we can assign highlight groups to tree queries.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-shell" data-lang="shell">&lt;span class="line">&lt;span class="cl">&lt;span class="o">(&lt;/span>constant&lt;span class="o">)&lt;/span> @number
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Now, when a constant shows up in the syntax tree, it will highlight according to the @number group.
Most of the work I did on this project involved studying the pyrope grammar, and writing queries based on it.&lt;/p>
&lt;h2 id="neovim">neovim&lt;/h2>
&lt;p>The text editor &lt;a href="https://neovim.io/" target="_blank" rel="noopener">neovim&lt;/a> is a popular choice among programmers. It allows advanced user control with configuration files.
It also has an active community working on plugins to extend its functionality.
Tools such as lazyvim allow for features like code completion and file management that give neovim the same functionality as IDEs.
However, because neovim configuration is unique to each user, this may make it difficult to reproduce neovim instructions.
For example, Professor Renau was going to test pyrope syntax highlighting in neovim.
However, I did not know what configuration was necessary for him to see highlights in neovim.
While I knew that syntax highlighting worked on my setup, I have lots of configuration files that may have contributed to that success.
There is no guarantee that Professor Renau, or other potential users, have the same neovim configuration that I do.&lt;/p>
&lt;h3 id="nvim_appname">NVIM_APPNAME&lt;/h3>
&lt;p>So, Professor Renau suggested I use the &lt;code>$NVIM_APPNAME&lt;/code> variable to test the process on a fresh configuration.
This feature allows the user to specify the configuration files used to launch neovim.
For example, I installed &lt;a href="https://www.lazyvim.org/" target="_blank" rel="noopener">lazyvim&lt;/a> to the folder &lt;code>~/.config/lazy&lt;/code>. Then, I launched neovim with &lt;code>NVIM_APPNAME=lazy nvim&lt;/code>.
So instead of using my default configuration from &lt;code>~/.config/nvim&lt;/code>, the lazyvim configuration was used.
This allowed me to use a neovim instance that was unaffected by my configuration files.
I was able to preview the process of setting up syntax highlighting from the perspective of a lazyvim user.
Similarly, the process can be done with an empty folder to mimic a brand new neovim installation
The point is, configuration files can impact reproducibility in neovim.
However, this feature allows us to bypass our individual configurations, and create reproducible guidelines.&lt;/p>
&lt;h3 id="conclusion">Conclusion&lt;/h3>
&lt;p>In conclusion, most of my work involved writing queries for the pyrope tree-sitter grammar.
This was for the purpose of syntax highlighting in neovim.
However, an important part of any open source project is communicating the results and providing documentation.
The NVIM_APPNAME feature helps view neovim from the perspective of different users, which helps for writing useful documentation.&lt;/p></description></item><item><title>Midpoint Blog Interactive Exploration of High-dimensional Datasets with PolyPhy and Polyglot</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/polyphy/20230803-kirandeol/</link><pubDate>Thu, 03 Aug 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/polyphy/20230803-kirandeol/</guid><description>&lt;p>The last few months of my GSoC project have been very exciting and I hope to share why with you here in this blog post! To briefly summarize, my project has been focused on further developing the Polyglot app, a tool for visualizing 3D language embeddings. One important part of Polyglot is its utilization of the novel MCPM metric, where points are colored according to their MCPM similarity to a user-chosen “anchor point” (e.g., if “hat” is our anchor point, then similar words like “cap” or “fedora” will be colored more prominently).&lt;/p>
&lt;p>The first issue we wanted to tackle was actually navigating the point cloud. With hundreds of thousands of points, it can be difficult to find what you’re looking for! Thus, the first few features added were a search bar for points and anchor points and a “jump to point” feature which changes a user’s center of rotation and “jumps” to a chosen point. There were a few hiccups with implementing these features, mainly due to the large number of points and the particular quirks of the graphics library Polyglot uses. In the end though, these simple features made it feel a lot easier to use Polyglot.&lt;/p>
&lt;p>The next set of features related to our desire to actually annotate the point cloud. Similar to how one might annotate a Google doc (ie., highlight a chunk of text and leave a comment), we wanted to set up something similar, but with points! Indeed, this led to the development of a cool brush tool for coloring points, named and commented annotations (up to 5), a search bar within annotations, and finally a button to export annotations and comments to a CSV.&lt;/p>
&lt;p>The next few weeks are looking bright as we strive to finish the PolyPhy-Polyglot pipeline (a notebook for quickly formatting MCPM data from PolyPhy and getting it into Polyglot). We also hope to add a unique “timeline” feature in which users can analyze sections of the point cloud based on the associated time of each point. Overall, it’s been a very stimulating summer and I’m excited to push this project even further!&lt;/p></description></item><item><title>Midterm: High Fidelity UAV Simulation Using Unreal Engine with specular reflections</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/osavc/20230802-damodardatta/</link><pubDate>Wed, 02 Aug 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/osavc/20230802-damodardatta/</guid><description>&lt;p>As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/osavc">Open Source Autonomous Vehicle Controller&lt;/a> my &lt;a href="https://drive.google.com/file/d/18g-WRZj_7ufIt6YZNn4OG1s7VKi1u5hV/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;strong>Aaron Hunter and Carlos Espinosa&lt;/strong> aims to Develop a Unreal Engine based simulator for testing. The simulator will be using Unreal Engine for the physics and visualization.&lt;/p>
&lt;h2 id="what-we-have-done-so-far">What we have done so far&lt;/h2>
&lt;ul>
&lt;li>We found that we can use Unreal Engine as a physics simulator and co-simulate with Simulink using the tools provided by MathWorks.&lt;/li>
&lt;li>Simulated a example provided by MathWorks but i wasn&amp;rsquo;t getting the expected behaviour and there were very few resource available.&lt;/li>
&lt;li>So we decided with using Gazebo and ROS for simulation instead of Unreal Engine and Simulink for the example of a balancing bot which had been designed in Solidworks.&lt;/li>
&lt;li>For using Gazebo, i had converted the Solidworks model into an URDF and imported it into Gazebo.&lt;/li>
&lt;/ul>
&lt;h2 id="future-work">Future Work&lt;/h2>
&lt;p>Currently, i am working on using Gazebo and ROS for controling a balancing bot using a PID control algorithm. Afterwards document the process of import a model into Gazebo for testing a control algorithm.&lt;/p></description></item><item><title>ScaleBugs: Reproducible Scalability Bugs</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucdavis/scalebugs/20230802-boluwarinayinmode/</link><pubDate>Wed, 02 Aug 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucdavis/scalebugs/20230802-boluwarinayinmode/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>As part of the Scalebugs Project, we have worked on building a dataset of reproducible scalability bugs. To achieve this, we go through existing bug reports for popular distributed systems, which include Cassandra, HDFS, Ignite, and Kafka. Workloads are designed to reproduce these scalability bugs by triggering some functionalities of the system under different configurations (e.g., different numbers of nodes), for which we will observe the impact on performance.&lt;/p>
&lt;p>So far we have worked on packaging the buggy and fixed versions of scalability systems, a runtime environment that ensures reproducibility, and the workloads used to trigger the symptoms of the bug inside docker containers. By packaging these versions together, we are simplifying the process of deployment and testing. This enables us to switch between different versions efficiently, aiding in the identification and comparison of the bug&amp;rsquo;s behavior. For each scalability system, we have carefully built a runtime environment that is consistent and reproducible. This approach ensures that each time we run tests or investigations, the conditions remain identical.&lt;/p>
&lt;h2 id="new-terms">New Terms&lt;/h2>
&lt;p>In order to make sense of the various bug reports, we had to learn some terminologies associated with scalability systems:&lt;/p>
&lt;p>&lt;strong>Clusters&lt;/strong>: Clusters are groups of related or connected items, often found in various fields such as computer science, data analysis, or even social sciences. For example, in data analysis, clusters might represent groups of data points with similar characteristics, making it easier to understand patterns or trends in the data.&lt;/p>
&lt;p>&lt;strong>Cluster Membership&lt;/strong>: Cluster membership refers to the process of determining which items or entities belong to a particular cluster. This task can be done based on various criteria, such as similarity in attributes, spatial proximity, or shared characteristics.&lt;/p>
&lt;p>&lt;strong>Locks&lt;/strong>: In computer programming, locks are mechanisms used to manage access to shared resources, such as files, data structures, or hardware devices. When multiple processes or threads need to access a shared resource simultaneously, locks ensure that only one process or thread can access it at a time, preventing data corruption or conflicts.&lt;/p>
&lt;p>&lt;strong>Lock Contentions&lt;/strong>: Lock contention occurs when multiple processes or threads attempt to acquire the same lock simultaneously. When this happens, one process or thread must wait until the lock becomes available, leading to potential delays and reduced performance.&lt;/p>
&lt;p>&lt;strong>Critical Paths&lt;/strong>: In project management or process analysis, a critical path is the longest chain of dependent tasks that determines the overall duration of the project or process. Any delay in tasks along the critical path will directly impact the project&amp;rsquo;s completion time.&lt;/p>
&lt;p>&lt;strong>Tokens&lt;/strong>: Tokens can have various meanings depending on the context. In computer programming, tokens are the smallest units of source code recognized by a compiler or interpreter. In cryptography, tokens can represent digital certificates or authentication data used for secure communication.&lt;/p>
&lt;p>&lt;strong>Nodes&lt;/strong>: In the context of network theory or graph theory, nodes are individual points or entities that form a network or graph. In a computer network, nodes can be devices like computers or routers, and in a social network, nodes can represent individuals or entities.&lt;/p>
&lt;p>&lt;strong>Peers&lt;/strong>: Peers are entities within a network that have the same status or capabilities. In peer-to-peer networks, each node can act as both a client and a server, enabling direct communication between nodes without relying on a central server.&lt;/p>
&lt;p>&lt;strong>Gossipers, Gossip Protocol&lt;/strong>: In distributed systems, gossipers are nodes that share information with each other using the gossip protocol. The gossip protocol involves randomly selecting peers and exchanging information in a decentralized manner, allowing information to spread quickly across the network.&lt;/p>
&lt;p>&lt;strong>Threads&lt;/strong>: Threads are the smallest units of execution within a process in computer programming. Multiple threads can run concurrently within a single process, enabling multitasking and parallel processing. Threads can share the same resources within the process, making them more lightweight than separate processes. However, proper synchronization is essential to prevent data corruption or conflicts when multiple threads access shared resources.&lt;/p>
&lt;p>&lt;strong>Flush and Writes Contention&lt;/strong>: This refers to a situation where simultaneous operations involving data flushing (saving data to a storage medium) and data writing (updating or adding data) are causing conflicts or delays. This contention can arise when multiple processes or threads attempt to perform these operations concurrently, leading to performance bottlenecks or potential data integrity issues.&lt;/p>
&lt;h2 id="accomplishments">Accomplishments&lt;/h2>
&lt;p>We have been able to build docker containers for the following scalability bugs:&lt;/p>
&lt;p>&lt;strong>IGNITE 12087&lt;/strong>&lt;/p>
&lt;p>This bug stems from the resolution of the IGNITE-5227 issue (another bug), which has led to a significant decline in the performance of a particular operation. Prior to addressing IGNITE-5227, the insertion of 30,000 entries displayed remarkable efficiency, completing in roughly 1 second. However, post the resolution, executing the same insertion process for 30,000 entries witnessed a considerable slowdown, taking approximately 130 seconds – a performance degradation of nearly 100 times.&lt;/p>
&lt;p>&lt;strong>CASSANDRA 14660&lt;/strong>&lt;/p>
&lt;p>This bug is related to how clusters work together and how a lock is causing conflicts with the critical path. The issue arises from a method call that uses O(Peers * Tokens) resources while contending for a lock, which is causing problems in the write path. The lock is used to protect cached tokens that are essential for determining the correct replicas. The lock is implemented as a synchronized block in the TokenMetadata class.&lt;/p>
&lt;p>&lt;em>How was this fixed?&lt;/em>&lt;/p>
&lt;p>It was fixed by reducing the complexity of the operation to O(Peers) taking advantage of some properties of the token list and the data structure.&lt;/p>
&lt;p>&lt;strong>CASSANDRA 12281&lt;/strong>&lt;/p>
&lt;p>This bug is also related to how clusters work together and a lock conflict. The issue arises when a specific method is trying to access a lot of resources (O(Tokens^2)) while contending for a read lock. As reported, a cluster with around 300 nodes has around 300 * 256 (assuming the default number of tokens) tokens, thus joining a new member reportedly is taking more than 30 mins. This happens because due to the long execution time here, this lock makes every gossip message delayed, so the node never becomes active.&lt;/p>
&lt;p>&lt;em>How was this fixed?&lt;/em>&lt;/p>
&lt;p>The granularity of the lock is decreased, meaning that the expensive function calls now do not take the problematic read lock and simply use a synchronized block, synchronizing on a specific field, that does the job much better.&lt;/p>
&lt;p>&lt;strong>HA16850&lt;/strong>&lt;/p>
&lt;p>This is a bug related to obtaining thread information in the JvmMetrics package. When obtaining thread information, the original buggy version used MXBeans to obtain thread information. The call uses an underlying native implementation that holds a lock on threads, preventing thread termination or creation. This means that the more threads that we have to obtain information for, the longer the function call will hold a lock. The result is that the execution time scales on the number of active threads O(threads).&lt;/p>
&lt;p>&lt;em>How was this fixed?&lt;/em>&lt;/p>
&lt;p>Developers utilized a ThreadGroup to keep track of obtaining metrics for threads. The result is that there is no lock held for every thread.&lt;/p>
&lt;p>&lt;strong>CA13923&lt;/strong>&lt;/p>
&lt;p>This issue revolves around conflicts between the &amp;ldquo;flush&amp;rdquo; and &amp;ldquo;writes&amp;rdquo; processes. The main problem is that during the &amp;ldquo;flush&amp;rdquo; process, a resource-intensive function called &amp;ldquo;getAddressRanges&amp;rdquo; is invoked. This function has a high computational cost and its complexity is O(Tokens^2). In other words, the time it takes to complete this function grows quickly as the number of &amp;ldquo;tokens&amp;rdquo; increases. This situation is causing challenges and delays in the overall process.&lt;/p>
&lt;p>&lt;em>How was this fixed?&lt;/em>&lt;/p>
&lt;p>This function call affected many paths and they made sure no one calls getAddressRanges in critical paths.&lt;/p>
&lt;h2 id="challenges">Challenges&lt;/h2>
&lt;p>&lt;strong>Demanding Memory Requirements&lt;/strong>: Running certain builds consumes a significant amount of memory. This places a strain on system resources and can impact the overall performance and stability of the process.&lt;/p>
&lt;p>&lt;strong>Little Issues Impacting Execution&lt;/strong>: Often, seemingly minor details can obstruct the successful execution of a build. Resolving such issues requires thorough investigation and extensive research into similar problems faced by others in the past.&lt;/p>
&lt;p>&lt;strong>Complexities of Scalability Bugs&lt;/strong>: Identifying the underlying causes of scalability-related bugs is intricate. These bugs exhibit unique characteristics that can complicate the process of pinpointing and comprehending their root origins.&lt;/p>
&lt;h2 id="what-is-docker--for-those-who-dont-know-about-it-">What is Docker? ( For those who don&amp;rsquo;t know about it )&lt;/h2>
&lt;p>Docker is a platform that facilitates the containerization of applications, leading to consistent and efficient deployment across diverse environments. Its benefits include portability, resource efficiency, isolation, and rapid development cycles. DockerHub complements Docker by providing a centralized hub for sharing and accessing container images, fostering collaboration and ease of use within the Docker ecosystem.&lt;/p>
&lt;p>More about docker &lt;a href="https://docs.docker.com/get-started/overview/" target="_blank" rel="noopener">https://docs.docker.com/get-started/overview/&lt;/a>&lt;/p></description></item><item><title>Midterm: Open Source Autonomous Vehicle Controller</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/osavc/20230801-25chilingh/</link><pubDate>Tue, 01 Aug 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/osavc/20230801-25chilingh/</guid><description>&lt;p>As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/osavc">Open Source Autonomous Vehicle Controller Project&lt;/a> my &lt;a href="https://docs.google.com/document/d/1hDU87aAzbn88vWwOHH0ggIID2W4KKzp8SKF1Lb8LU90/edit?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;strong>Aaron Hunter and Carlos Espinosa&lt;/strong> aimed to create comprehensive technical documentation to help onboard new users of the OSAVC controller.&lt;/p>
&lt;p>I have accomplished the following:&lt;/p>
&lt;ul>
&lt;li>From the KiCad Schematic Editor, created pinouts of the I/O connectors on the OSAVC.&lt;/li>
&lt;li>Detailed a hardware overview of the OSAVC by labeling and describing each electrical component.&lt;/li>
&lt;li>Documented the setup for loading code on the OSAVC, including software such as Git, MPLAB X, XC32 Compiler, and serial terminal and hardware by showing how to connect the PICKit3 and OSAVC to a PC.&lt;/li>
&lt;li>Tested the OSAVC by receiving and transmitting characters in the serial port into a buffer.&lt;/li>
&lt;li>Fixed bugs/errors in the NEO_M8N GPS module library and PWM motors library.&lt;/li>
&lt;li>Created a new library for the uni and bidirectional ESC brushless motors.&lt;/li>
&lt;li>Created a user-interfaced test harness for all peripherals: serial, IMU, GPS, encoder, PWM actuators, radio telemetry, Mavlink heartbeat, radio controller, and LIDAR.&lt;/li>
&lt;li>Incorporated new user interface element and fixed video streaming errors in the Flask app running on the Raspberry Pi 4 communicating with the OSAVC.&lt;/li>
&lt;li>Documented both software and hardware steps to run the OSAVC with a companion computer such as a Raspberry Pi 4.&lt;/li>
&lt;li>Highlighted common problems encountered with the OSAVC.&lt;/li>
&lt;li>Created a contributor&amp;rsquo;s guide for others to create new libraries or contribute to the OSAVC project.&lt;/li>
&lt;li>Designed a &lt;a href="https://grabcad.com/library/ptn78020w-1" target="_blank" rel="noopener">switching voltage regulator&lt;/a> in SOLIDWORKS&lt;/li>
&lt;li>Designed a self balancing bot that employs the OSAVC in SOLIDWORKS&lt;/li>
&lt;/ul>
&lt;h2 id="future-work">Future Work&lt;/h2>
&lt;p>Currently, the laser cutter at UCSC is in maintenance, so we couldn&amp;rsquo;t assemble the self balancing bot yet. Once we assemble it, I will finish and document the control algorithms. We can also try incorporating ML models on the Raspberry Pi with the Coral USB accelerator on the self balancing bot.&lt;/p></description></item><item><title>Implemented IO uring for Key-Value Drives</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/kvstore/20230730-manank/</link><pubDate>Mon, 31 Jul 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/kvstore/20230730-manank/</guid><description>&lt;p>Hi everyone!&lt;/p>
&lt;p>I&amp;rsquo;m Manank Patel, (&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/kvstore/20230526-manank">link&lt;/a> to my Introduction post) and am currently working on &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/kvstore">Efficient Communication with Key/Value Storage Devices&lt;/a>. The goal of the project was to leverage the capabilities of io_uring and implement a new backend driver.&lt;/p>
&lt;p>In the existing sockets backend, we use non-blocking sockets with looping to ensure all the data is written. Here is a simplified flow diagram for the
same. The reasoning behind using non blocking sockets and TCP_NODELAY is to get proper network utilization. This snippet from the code explains it further.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">NODELAY means that segments are always sent as soon as possible,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">even if there is only a small amount of data. When not set,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">data is buffered until there is a sufficient amount to send out,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">thereby avoiding the frequent sending of small packets, which
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">results in poor utilization of the network. This option is
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">overridden by TCP_CORK; however, setting this option forces
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">an explicit flush of pending output, even if TCP_CORK is
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">currently set.
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Sockets flow" srcset="
/report/osre23/ucsc/kvstore/20230730-manank/ktli_socket_huf9f86d17a6f220de349bb1b61ce1052f_93743_fe3f3d8030752b92e5fb87ea1d67e0c2.webp 400w,
/report/osre23/ucsc/kvstore/20230730-manank/ktli_socket_huf9f86d17a6f220de349bb1b61ce1052f_93743_44c789c0dc2dbae770c40595d35ae941.webp 760w,
/report/osre23/ucsc/kvstore/20230730-manank/ktli_socket_huf9f86d17a6f220de349bb1b61ce1052f_93743_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/kvstore/20230730-manank/ktli_socket_huf9f86d17a6f220de349bb1b61ce1052f_93743_fe3f3d8030752b92e5fb87ea1d67e0c2.webp"
width="469"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>In the above figure, we have a &lt;a href="https://gitlab.com/kinetic-storage/libkinetic/-/blob/manank/src/ktli_socket.c?ref_type=heads#L436" target="_blank" rel="noopener">loop&lt;/a> with a writev call, and we check the return value and if all the data has not been written, then we modify the
offsets and then loop again, otherwise, if all the data has been written, we exit the loop and return from the function. Now this works well with traditional sockets, as we get the return value from the writev call as soon as it returns. In case of io_uring, if we try to follow the same design, we get the
following flow diagram.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="uring flow" srcset="
/report/osre23/ucsc/kvstore/20230730-manank/ktli_uring_nonb_huf47400b8be9e2650586ffc8c37d95fc6_108831_eaf262f65651ce613bf0a033f897afde.webp 400w,
/report/osre23/ucsc/kvstore/20230730-manank/ktli_uring_nonb_huf47400b8be9e2650586ffc8c37d95fc6_108831_bc898fc227145dff9464f87e8f66363f.webp 760w,
/report/osre23/ucsc/kvstore/20230730-manank/ktli_uring_nonb_huf47400b8be9e2650586ffc8c37d95fc6_108831_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/kvstore/20230730-manank/ktli_uring_nonb_huf47400b8be9e2650586ffc8c37d95fc6_108831_eaf262f65651ce613bf0a033f897afde.webp"
width="417"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>Here, as you can see, there are many additional steps/overhead if we want to check the return value before sending the
next writev, as we need to know how many bytes has been written till now to change the offsets and issue
the next request accordingly. Thus, in every iteration of the loop we need to to get an sqe, prep it for writev, then
submit it, and then get a CQE, and then wait for the CQE to get the return value of writev call.&lt;/p>
&lt;p>The alternate approach would be to write the full message/iovec atomically in one call, as shown in following diagram.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="possible uring flow" srcset="
/report/osre23/ucsc/kvstore/20230730-manank/ktli_uring_ideal_hu2d99f0bee974127b66eb083c255358d0_60614_df20a0788e55e56bf7af70d91c7275c6.webp 400w,
/report/osre23/ucsc/kvstore/20230730-manank/ktli_uring_ideal_hu2d99f0bee974127b66eb083c255358d0_60614_056949985d6ef71540ba0c4992f11376.webp 760w,
/report/osre23/ucsc/kvstore/20230730-manank/ktli_uring_ideal_hu2d99f0bee974127b66eb083c255358d0_60614_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/kvstore/20230730-manank/ktli_uring_ideal_hu2d99f0bee974127b66eb083c255358d0_60614_df20a0788e55e56bf7af70d91c7275c6.webp"
width="535"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>However, on trying this method, and running fio tests, we noticed that it worked well with smaller block sizes, like
16k, 32k and 64k, but was failing constantly with larger block sizes like 512k or 1m. This was because it was not able to
write all the data to the socket in one go. This method showed good results as compared to sockets backend (for small BS
i.e). We tried to increase the send/recv buffers to 1MiB-10MiB but it still struggled with larger blocksizes.&lt;/p>
&lt;p>Going forward, we discussed a few ideas to understand the performance trade-offs. One is to use a static variable and increment it on
every loop iteration, in this way we can find out if that is really the contirbuting factor to our problem. Another idea
is to break down the message in small chunks, say 256k and and set up io uring with sqe polling and then link and submit
those requests in loop, without calling io_uring_submit and waiting for CQE. The plan is to try these ideas, discuss and
come up with new ideas on how we can leverage io_uring for ktli backend.&lt;/p></description></item><item><title>PDC Midterm Evaluation</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/lbl/pdc/20230802-nijwang/</link><pubDate>Sun, 30 Jul 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/lbl/pdc/20230802-nijwang/</guid><description>&lt;h2 id="mid-term-evaluation-update">Mid-Term Evaluation Update&lt;/h2>
&lt;p>Hello! I&amp;rsquo;m Nick, a GSoC contributor for the Proactive Data Containers (PDC) Project.
Over the past few weeks I&amp;rsquo;ve worked on verifying the functionality of the Python API for the PDC project and ensuring the smooth onboarding for new users of the data containers.&lt;/p>
&lt;p>I began by documenting the installation of the Ubuntu virtual machine in order to run the PDC repository, since the project wasn&amp;rsquo;t initially supported on Apple silicon hardware. The installation notes that I recorded for PDC would help contribute towards a more refined and precise process that can be seen updated on the github webpage.&lt;/p>
&lt;p>After installing the dependencies of the project onto the VM, I would begin maintaining the existing Python API and making changes that would allow the tests to compile and run successfully. The manual setup had a few problems with file directories paths that prevented the installation of a few files on new devices, which I fixed by manually by linking the path and removing a few header files. However, this proved to only be a temporary fix as the prior issues was evidence of a hardcoded path, which was resolved by some alteration and fishing in the source code.&lt;/p>
&lt;p>Now the PDC and PDCpy installations should go smoothly regardless of what OS is being used, and the instruction documentation can be found from the github page which should allow any user to access the data containers.&lt;/p></description></item><item><title>Building extensions between Python libraries for Biotechnology laboratories</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/labop/20230804-luhesketh/</link><pubDate>Fri, 28 Jul 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/labop/20230804-luhesketh/</guid><description>&lt;p>Hello again! This is Luiza, a GSoC contributor for the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsd/labop">LabOp&lt;/a> Project.
My task is to build bridges between programming languages for Biotechnology Laboratory automation.&lt;/p>
&lt;p>When talking about life sciences, reproducibility is a issue amongst most research centers. Biotechnology focused laboratories usually have their own protocols developed in house for their own applications. Researchers rely on such protocols to perform their experiments and collect data but when it comes to sharing those protocols and performing them in different laboratories many difficulties arise. Whether it is by lack of equipment, reagents or even by having different orders of execution, replicating a protocol in another laboratory is a challenge. To address this issue LabOp was developed to represent a protocol and convert it in many ways possible, so it can be executed by humans and by machines.&lt;/p>
&lt;p>PylabRobot and PyHamilton also come to the picture as such libraries exist to make it possible to write protocols for Hamilton robots(and Tecan machines as well for PylabRobot) but those libraries share the limitation of being able to only represent laboratory protocols at their lower levels, with the user having to write every single command in Python for the protocol to be executed. Thus I’m currently developing an extension for LabOp protocols to be converted into PylabRobot/PyHamilton scripts. This way the researcher writing the protocol can do it in a friendlier fashion, using human-friendly terms to write protocols for robot execution.&lt;/p>
&lt;figure id="figure-behaviourspecialization-for-liquid-handling-class">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="BehaviourSpecialization for Liquid Handling class" srcset="
/report/osre23/ucsd/labop/20230804-luhesketh/featured_hu4f9fc1ff392d0f6236dd97921cc62ee1_67178_7dea1005b9355831aab4fd48906afaec.webp 400w,
/report/osre23/ucsd/labop/20230804-luhesketh/featured_hu4f9fc1ff392d0f6236dd97921cc62ee1_67178_67bd573e81d4a87cd9d10cf5cb216d81.webp 760w,
/report/osre23/ucsd/labop/20230804-luhesketh/featured_hu4f9fc1ff392d0f6236dd97921cc62ee1_67178_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/labop/20230804-luhesketh/featured_hu4f9fc1ff392d0f6236dd97921cc62ee1_67178_7dea1005b9355831aab4fd48906afaec.webp"
width="760"
height="436"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
BehaviourSpecialization for Liquid Handling class
&lt;/figcaption>&lt;/figure>
&lt;p>The first step is building a correspondence spreadsheet with a hello world protocol written in both languages (LabOp | PylabRobot ). This way we can make an equivalence between the functions, parameters and default commands of both Libraries, as well as their structure. This spreadsheet will serve as guidance for the conversion of the Liquid handling steps from their representation in LabOp to their representation in Pylabrobot.&lt;/p>
&lt;p>The second step is to create a file that&amp;rsquo;ll do execute the conversion. In this file I will define a Labware map that&amp;rsquo;s basically a dictionary translating the resources LabOp names into Labware IDs recognizable by PylabRobots &amp;ldquo;resource&amp;rdquo; classes and a Behaviourspecialization class that should convert LabOp actions into PylabRobots Liquid Handler class operations as they&amp;rsquo;ll coordinate the commands sent from the script to the machines.(see featured images)&lt;/p>
&lt;figure id="figure-dictionary-for-labop-to-pylabrobot-container-correspondence">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Dictionary for LabOp to Pylabrobot container correspondence" srcset="
/report/osre23/ucsd/labop/20230804-luhesketh/featured_2_hu5b9f0fb3cb1cb61c40db218a2048e04a_278481_76e3dd3c112ca74ef8e3b7459123e154.webp 400w,
/report/osre23/ucsd/labop/20230804-luhesketh/featured_2_hu5b9f0fb3cb1cb61c40db218a2048e04a_278481_8337c1f75572828ec38252d4fdee0f96.webp 760w,
/report/osre23/ucsd/labop/20230804-luhesketh/featured_2_hu5b9f0fb3cb1cb61c40db218a2048e04a_278481_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/labop/20230804-luhesketh/featured_2_hu5b9f0fb3cb1cb61c40db218a2048e04a_278481_76e3dd3c112ca74ef8e3b7459123e154.webp"
width="760"
height="465"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Dictionary for LabOp to Pylabrobot container correspondence
&lt;/figcaption>&lt;/figure>
&lt;p>Then we move to the protocol that will be tested on the Hamilton Machines, this is a Plasmid purification protocol that is usually performed by a human at a very lower level, one sample at a time. This limitation is not present on Hamilton robots as they can handle many samples at the same time with only one protocol execution. The robot that will be running this protocol has two modules that are not yet present in PylabRobot’s extensions, a pressure pump module and a on deck heatershaker. I’ll be implemmenting this modules in PylabRobot based on their default commands present in PyHamilton and run the protocol on a Hamilton Starlet unit.&lt;/p>
&lt;p>The steps of the protocol have been decoupled to facilitate the pilot testing, they are as follows:&lt;/p>
&lt;ul>
&lt;li>Liquid handling - GOOD TO GO&lt;/li>
&lt;li>Pressure pump module- requires adjustments&lt;/li>
&lt;li>plate grippers(necessary to move the plasmid plate from one module to another) - requires adjustment&lt;/li>
&lt;li>On deck heaterShaker- GOOD TO GO&lt;/li>
&lt;/ul>
&lt;p>The first pilot tests of the protocol will be run with water instead of plasmid to verify that all the steps are going smoothly, when that’s out of the way we will perform the protocol with dirty plasmids that require purification (which is what the protocol is for). The measurements for success will be sequencing the plasmid (if possible), performing a gel eletrophoresis and measuring absorbance of the DNA.&lt;/p>
&lt;p>The goal of this tests is to gather data from the efectiveness of the protocol and its execution on the machine, thus confirming that it is in fact a useful mechanism for DNA purification.&lt;/p></description></item><item><title>PolyPhy Infrastructure Enhancement</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/polyphy/20230727-prashantjha/</link><pubDate>Thu, 27 Jul 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/polyphy/20230727-prashantjha/</guid><description>&lt;p>As part of the Polyphy Project, my proposal was aimed at improving various aspects of the project, including CI/CD workflows, encapsulation, and security. Under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/oskar-elek/">Oskar Elek&lt;/a>, I have made significant progress in the following areas:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Fixed GitHub CI Workflows and Release to PyPI:&lt;/strong>
During the first phase, I focused on refining the GitHub CI workflows by implementing new flows that facilitate seamless releases to PyPI. This ensures that the project can be easily distributed and installed by users, making it more accessible and user-friendly.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Encapsulation from Jupyter into Module:&lt;/strong>
I successfully encapsulated the code from Jupyter notebooks into a module. This step is crucial as it prepares the codebase to be released as a standalone module, making it easier for developers to use and integrate into their own projects.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>SonarCloud Integration for Better Code Analysis:&lt;/strong>
To ensure the codebase&amp;rsquo;s quality, I set up SonarCloud to perform comprehensive code analysis. This helps in identifying potential issues, bugs, and areas of improvement, leading to a more robust and reliable project.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Migration to Docker from Tox:&lt;/strong>
In order to improve the containerization process, I replaced the existing solution, Tox, with Docker. Docker provides better container management and ensures a consistent development and deployment environment across different platforms.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Research on Community Platforms for Self-Hosting:&lt;/strong>
I conducted extensive research on various community platforms suitable for self-hosting. This will enable the project to establish a thriving community and foster active collaboration among users and contributors.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Enhanced Security Measures:&lt;/strong>
I implemented several security improvements to safeguard the project and its users. These include setting up a comprehensive security policy, implementing secret scanning to prevent unintentional exposure of sensitive information, code scanning to identify potential vulnerabilities, private vulnerability reporting to handle security issues responsibly, and Dependabot integration for monitoring and managing dependencies.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Upgraded Taichi to Utilize Class-Based Features:&lt;/strong>
As part of the project&amp;rsquo;s development, I successfully upgraded Taichi to utilize class-based features available, thereby enhancing the codebase&amp;rsquo;s organization and maintainability.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>Moving forward, I plan to continue working diligently to achieve the goals outlined in my proposal. The improvements made during the first half of the GSoC program have laid a strong foundation for the project&amp;rsquo;s growth and success.&lt;/p>
&lt;p>Stay tuned for further updates and exciting developments as the project progresses!&lt;/p></description></item><item><title>Uncovering Actionable Insights using ReadTheDocs Analytics</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20230727-luarss/</link><pubDate>Thu, 27 Jul 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20230727-luarss/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Hello again! This is Jack, a GSoC contributor for the OpenROAD Project.
My task is to update and optimise the documentation to encourage user
adoption and engagement.&lt;/p>
&lt;p>For open-source repo maintainers, &lt;a href="https://readthedocs.org/" target="_blank" rel="noopener">readthedocs&lt;/a>
is a godsend. One of its more underrated features are in providing
search and traffic analytics of up to &lt;strong>90 days&lt;/strong> for the &lt;code>Community&lt;/code> tier
users. This is awesome, because ReadTheDocs is &amp;ldquo;always free for open source
and community projects&amp;rdquo;.&lt;/p>
&lt;h2 id="motivation">Motivation&lt;/h2>
&lt;p>Why are analytics important?&lt;/p>
&lt;p>Analytics are great as a &lt;em>proxy&lt;/em> indicator for documentation engagement.
For instance, traffic to a page, could highlight how popular the tool is,
or it could also mean the tool is unclear and therefore people might need
more visits to the page to further understand usage. But overall,
it still indicates that the page needs to be taken care of due to the
increased visits.&lt;/p>
&lt;p>In what follows we aim to provide a quick tutorial as well as
list out some of the actionable insights we uncovered in the
OpenROAD/OpenROAD-flow-scripts documentation project.&lt;/p>
&lt;h2 id="preamble">Preamble&lt;/h2>
&lt;p>To download the analytics raw &lt;code>csv&lt;/code> files, refer to this
&lt;a href="https://docs.readthedocs.io/en/stable/analytics.html" target="_blank" rel="noopener">website&lt;/a>.&lt;/p>
&lt;p>You should also have the following packages installed: &lt;code>pandas&lt;/code>, &lt;code>numpy&lt;/code>, &lt;code>matplotlib&lt;/code>, &lt;code>scipy&lt;/code>.&lt;/p>
&lt;h2 id="traffic-analytics">Traffic Analytics&lt;/h2>
&lt;p>Traffic analytics are easy to understand.
It comes in the format &lt;code>Date&lt;/code>, &lt;code>Version&lt;/code>, &lt;code>Path&lt;/code>, &lt;code>DailyViews&lt;/code> as follows:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">df&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pd&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">read_csv&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;ta_or.csv&amp;#39;&lt;/span>&lt;span class="p">)[::&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">reset_index&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">drop&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">df&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Date&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">df&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Date&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">apply&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">split&lt;/span>&lt;span class="p">()[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">df&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">head&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>
&lt;figure id="figure-figure-1-loading-traffic-analytics-dataframe">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Load traffic analytics DF" srcset="
/report/osre23/ucsd/openroad/20230727-luarss/pic1_hu3ae39bf6bc653845cdf52f284f9914c8_18120_0fe44b789026339d8a488b67e455af49.webp 400w,
/report/osre23/ucsd/openroad/20230727-luarss/pic1_hu3ae39bf6bc653845cdf52f284f9914c8_18120_c34649440686784f502a8fa245519fe8.webp 760w,
/report/osre23/ucsd/openroad/20230727-luarss/pic1_hu3ae39bf6bc653845cdf52f284f9914c8_18120_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20230727-luarss/pic1_hu3ae39bf6bc653845cdf52f284f9914c8_18120_0fe44b789026339d8a488b67e455af49.webp"
width="420"
height="345"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 1: Loading traffic analytics dataframe
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;p>The raw data is not all that informative.
Let us aggregate the data to obtain the weekly views.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">weeklydf&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">df&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">copy&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">weeklydf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Date&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pd&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">to_datetime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">weeklydf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Date&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">-&lt;/span> &lt;span class="n">pd&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">to_timedelta&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">7&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">unit&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;d&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">weeklydf&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">weeklydf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">groupby&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="s1">&amp;#39;Path&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">pd&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Grouper&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">key&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;Date&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">freq&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;W&amp;#39;&lt;/span>&lt;span class="p">)])[&lt;/span>&lt;span class="s1">&amp;#39;Views&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span>\
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="n">sum&lt;/span>&lt;span class="p">()&lt;/span>\
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="n">reset_index&lt;/span>&lt;span class="p">()&lt;/span>\
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="n">sort_values&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Date&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">weeklydf&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">weeklydf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Path&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="s1">&amp;#39;/index.html&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>
&lt;figure id="figure-figure-2-aggregated-weekly-traffic">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Aggregated weekly traffic" srcset="
/report/osre23/ucsd/openroad/20230727-luarss/pic2_hu4e5090e8319a278be3c23daaec31a810_14831_2356d16291dbea694b0bc9c05693ffe8.webp 400w,
/report/osre23/ucsd/openroad/20230727-luarss/pic2_hu4e5090e8319a278be3c23daaec31a810_14831_cf13de62f49742cd0e76c661feea93ed.webp 760w,
/report/osre23/ucsd/openroad/20230727-luarss/pic2_hu4e5090e8319a278be3c23daaec31a810_14831_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20230727-luarss/pic2_hu4e5090e8319a278be3c23daaec31a810_14831_2356d16291dbea694b0bc9c05693ffe8.webp"
width="243"
height="393"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 2: Aggregated weekly traffic
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;p>Note that we can replace the page path with any interesting page path
we desire. A useful command to obtain all possible page paths in this
dataset is to use:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">weeklydf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Path&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">unique&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>
&lt;figure id="figure-figure-3-unique-paths-in-dataset">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Unique paths" srcset="
/report/osre23/ucsd/openroad/20230727-luarss/pic3_hub46945e77dd8a933670e33e4fea7dea8_54129_94dd6b47fa834b3c36ea619deffd3a6a.webp 400w,
/report/osre23/ucsd/openroad/20230727-luarss/pic3_hub46945e77dd8a933670e33e4fea7dea8_54129_f50b03560ab266073e2dee2fa7a04e51.webp 760w,
/report/osre23/ucsd/openroad/20230727-luarss/pic3_hub46945e77dd8a933670e33e4fea7dea8_54129_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20230727-luarss/pic3_hub46945e77dd8a933670e33e4fea7dea8_54129_94dd6b47fa834b3c36ea619deffd3a6a.webp"
width="591"
height="538"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 3: Unique paths in dataset
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;p>With these neat data in our arsenal, let us do some plotting!
For the visualisation, we have chosen to use the traffic aggregated
on a daily scale. On top of this, we also plot a linear
best-fit line of all the points to track the trendline over time.&lt;/p>
&lt;p>The code below shows how to plot the top 20 pages.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">plot_views&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">df&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">numPages&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mi">20&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Groupby Path, sum views&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pathResults&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">df&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">groupby&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Path&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Views&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">sum&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">sort_values&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">ascending&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">False&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">fig&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">ax&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">subplots&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">numPages&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">figsize&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="mi">15&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">30&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">fig&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tight_layout&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">i&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">numPages&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">key&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pathResults&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">index&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">temp&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">df&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">df&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Path&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="n">key&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ax&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">scatter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">temp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Date&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">temp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Views&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ax&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">set_xticks&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">arange&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">90&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">7&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="c1"># this line is to not clutter the x-axis too much.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ax&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">set_ylabel&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Views&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ax&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">set_title&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">key&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># linear regression&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">x&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">y&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">temp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Date&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">temp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Views&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">bestfit&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">stats&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">linregress&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">)),&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bestfit&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">equation&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nb">str&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">round&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bestfit&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">],&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="s2">&amp;#34;x + &amp;#34;&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="nb">str&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">round&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bestfit&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">],&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ax&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">plot&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">)),&lt;/span> &lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">poly1d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">polyfit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">)),&lt;/span> &lt;span class="n">y&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">))(&lt;/span>&lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">))),&lt;/span> &lt;span class="s1">&amp;#39;--&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="n">label&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">equation&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ax&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">legend&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">loc&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;upper right&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>
&lt;figure id="figure-figure-4-top-20-pages-by-daily-view-counts-in-descending-order">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Top 20 plots" srcset="
/report/osre23/ucsd/openroad/20230727-luarss/pic4_hu0544053b26ff363bea669ad03cb25a33_298754_208fbbf3fe9f3d6b7b48a8f44d65e70b.webp 400w,
/report/osre23/ucsd/openroad/20230727-luarss/pic4_hu0544053b26ff363bea669ad03cb25a33_298754_523ed86a22800eb3addad7738facd6cc.webp 760w,
/report/osre23/ucsd/openroad/20230727-luarss/pic4_hu0544053b26ff363bea669ad03cb25a33_298754_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20230727-luarss/pic4_hu0544053b26ff363bea669ad03cb25a33_298754_208fbbf3fe9f3d6b7b48a8f44d65e70b.webp"
width="379"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 4: Top 20 pages by daily view counts (in descending order)
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;p>Also, we can aggregate the total views by day to plot daily traffic:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">plot_daily_traffic&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">df&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Groupby Date, sum views&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">fig&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">figure&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">figsize&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="mi">15&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">10&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">dateResults&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">df&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">groupby&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Date&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Views&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">sum&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">x&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">y&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">dateResults&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">index&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">dateResults&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">values&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">scatter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">x&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">y&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">xticks&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">arange&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">90&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">7&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ylabel&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Views&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">title&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Traffic by Day&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># linear regression&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">bestfit&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">stats&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">linregress&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">)),&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bestfit&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">equation&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nb">str&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">round&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bestfit&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">],&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="s2">&amp;#34;x + &amp;#34;&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="nb">str&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">round&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bestfit&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">],&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">plot&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">)),&lt;/span> &lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">poly1d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">polyfit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">)),&lt;/span> &lt;span class="n">y&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">))(&lt;/span>&lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">))),&lt;/span> &lt;span class="s1">&amp;#39;--&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="n">label&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">equation&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">legend&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">loc&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;upper right&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>
&lt;figure id="figure-figure-5-daily-aggregated-traffic">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Daily aggregated traffic" srcset="
/report/osre23/ucsd/openroad/20230727-luarss/pic5_hu284f436d507b391ad27b39b31846aa7d_24195_f1cfe4f85a6f52b10851153e3759601f.webp 400w,
/report/osre23/ucsd/openroad/20230727-luarss/pic5_hu284f436d507b391ad27b39b31846aa7d_24195_be83d71fe2635b895829f733ef678a4f.webp 760w,
/report/osre23/ucsd/openroad/20230727-luarss/pic5_hu284f436d507b391ad27b39b31846aa7d_24195_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20230727-luarss/pic5_hu284f436d507b391ad27b39b31846aa7d_24195_f1cfe4f85a6f52b10851153e3759601f.webp"
width="760"
height="503"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 5: Daily aggregated traffic
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;h3 id="key-trends">Key Trends:&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Notice how there seems to be a cyclical pattern every week - rise
in average view counts during Mon-Fri, then a falloff on weekends.
This is most evident in the pages &lt;code>/index.html&lt;/code>, &lt;code>/main/README.html&lt;/code>.
This could be attributed to the standard work or study week of Mon-Fri.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>According to the gradient of the best-fit line for Figure 2,
there seems to be a slow decline of traffic for the OpenROAD docs.
For a gradient of -0.77, it translates roughly to decline of 22 views
per month. The small decline could be attributed to the higher traffic
from 19-29 March 2023, the dates for the
&lt;a href="https://openroaddesigncontest.org/" target="_blank" rel="noopener">OpenROAD 7nm design contest&lt;/a>.
Contest are always good for driving traffic.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="actionable-insights">Actionable insights:&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Top pages are usually landing pages: &lt;code>index.html&lt;/code>, &lt;code>main/README.html&lt;/code>, &lt;code>main/src/README.html&lt;/code>. We thus prioritised making these pages more readable and concise.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>This is followed by tutorial &lt;code>/tutorials/index.html&lt;/code> and &lt;code>/search.html&lt;/code>. The prominence of the tutorials page made us shift the tutorials link to a higher position on the left navigation sidebar. Search tips were also included to obtain better search results. More about search in the next section.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Next, as OpenROAD consists of 20 tools: traffic analytics helps us come up with an order to update: &lt;code>ifp&lt;/code>, &lt;code>gui&lt;/code>, &lt;code>odb&lt;/code>, &lt;code>ppl&lt;/code>, &lt;code>sta&lt;/code>, &lt;code>grt&lt;/code>, &lt;code>mpl&lt;/code>, &lt;code>gpl&lt;/code>, &lt;code>rsz&lt;/code>, &lt;code>rcx&lt;/code>. &lt;code>pdn&lt;/code>, &lt;code>cts&lt;/code>, &lt;code>psm&lt;/code>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="search-analytics">Search Analytics&lt;/h2>
&lt;p>Search analytics come in the form of: &lt;code>Date&lt;/code>, &lt;code>Query&lt;/code>, &lt;code>TotalResults&lt;/code>.
Contrary to traffic analytics, &lt;code>TotalResults&lt;/code> do not refer to search count
for the query that day, but rather it corresponds to the total results
returned by that query on that day. Separate aggregation still needs to
be done to obtain the final count.&lt;/p>
&lt;p>Firstly, let us load the dataset and perform a groupby on the column &lt;code>Date&lt;/code>
to obtain the daily count aggregates.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">df&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pd&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">read_csv&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;sa_or.csv&amp;#39;&lt;/span>&lt;span class="p">)[::&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">reset_index&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">drop&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">df&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">df&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">rename&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">columns&lt;/span> &lt;span class="o">=&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="s1">&amp;#39;Created Date&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s1">&amp;#39;Date&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;Total Results&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s1">&amp;#39;TotalResults&amp;#39;&lt;/span>&lt;span class="p">})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">df&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Date&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">df&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Date&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">apply&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">split&lt;/span>&lt;span class="p">()[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">dateResults&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">df&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">groupby&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Date&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TotalResults&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">count&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">dateResults&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>
&lt;figure id="figure-figure-6-code-output-for-daily-aggregated-search-counts">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Daily count code" srcset="
/report/osre23/ucsd/openroad/20230727-luarss/pic6_hufe284b2003be927a09036a17b0f147ed_7438_303764681c719b59422e8ac4adff87d5.webp 400w,
/report/osre23/ucsd/openroad/20230727-luarss/pic6_hufe284b2003be927a09036a17b0f147ed_7438_ae0b89dd9a05f1d083e0a5caf434a1c6.webp 760w,
/report/osre23/ucsd/openroad/20230727-luarss/pic6_hufe284b2003be927a09036a17b0f147ed_7438_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20230727-luarss/pic6_hufe284b2003be927a09036a17b0f147ed_7438_303764681c719b59422e8ac4adff87d5.webp"
width="390"
height="231"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 6: Code output for daily aggregated search counts.
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;p>Now we are ready to plot the daily aggregated searches. This represents
the number of times a search was performed on the documentation website.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">plot_daily_searches&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">df&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">dateResults&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">df&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">groupby&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Date&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TotalResults&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">count&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">x&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">y&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">dateResults&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">index&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">dateResults&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">values&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">scatter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">x&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">y&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">xticks&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">arange&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">90&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">7&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ylabel&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;# Times Searched&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">title&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Search count by day&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># linear regression&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">bestfit&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">stats&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">linregress&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">)),&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bestfit&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">equation&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nb">str&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">round&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bestfit&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">],&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="s2">&amp;#34;x + &amp;#34;&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="nb">str&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">round&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bestfit&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">],&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">plot&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">)),&lt;/span> &lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">poly1d&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">polyfit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">)),&lt;/span> &lt;span class="n">y&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">))(&lt;/span>&lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">))),&lt;/span> &lt;span class="s1">&amp;#39;--&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="n">label&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">equation&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">legend&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">loc&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;upper right&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>
&lt;figure id="figure-figure-7-daily-aggregated-search-counts">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Final search analytics graph" srcset="
/report/osre23/ucsd/openroad/20230727-luarss/pic7_huaf6e8114aa38a4f77afbcf6239df4596_24960_dfcee10fa9be516c148eb11ac3598591.webp 400w,
/report/osre23/ucsd/openroad/20230727-luarss/pic7_huaf6e8114aa38a4f77afbcf6239df4596_24960_2bfda1034e5a343c34c529e62f8279ba.webp 760w,
/report/osre23/ucsd/openroad/20230727-luarss/pic7_huaf6e8114aa38a4f77afbcf6239df4596_24960_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20230727-luarss/pic7_huaf6e8114aa38a4f77afbcf6239df4596_24960_dfcee10fa9be516c148eb11ac3598591.webp"
width="760"
height="507"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 7: Daily aggregated search counts
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;p>We can also do an additional plot for queries that return zero results.
In other words, we are interested in the terms people are curious about;
but is not covered by our documentation currently.
Think of it as an on-site search engine optimisation.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">zeroResults&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">df&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">df&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TotalResults&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">zeroResults&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">zeroResults&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">groupby&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Query&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Date&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">count&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">sort_values&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">ascending&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">False&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;&lt;/span>&lt;span class="se">\n&lt;/span>&lt;span class="s1">All 0 results queries (desc)&lt;/span>&lt;span class="se">\n&lt;/span>&lt;span class="s1">&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">zeroResults&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">index&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tolist&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Example output as follows:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">[&amp;#39;autotuner&amp;#39;, &amp;#39;tdms&amp;#39;, &amp;#39;*macro*&amp;#39;, &amp;#39;rtlmp_max_inst&amp;#39;, &amp;#39;get_property&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;check_setup&amp;#39;, &amp;#39;centos&amp;#39;, &amp;#39;initialize_padring&amp;#39;, &amp;#39;core_utilization&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;pin_access&amp;#39;, &amp;#39;read_libraries&amp;#39;, &amp;#39;config&amp;#39;, &amp;#39;eco&amp;#39;, &amp;#39;rpt&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;improve_placement&amp;#39;, &amp;#39;define_process_corner&amp;#39;, &amp;#39;global_place&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;report_worst_slack&amp;#39;, &amp;#39;max_phi_cof&amp;#39;, &amp;#39;report_power&amp;#39;, &amp;#39;get_pins&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;registerfile&amp;#39;, &amp;#39;set_global_routing&amp;#39;, &amp;#39;prebuilt&amp;#39;, &amp;#39;env&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;repair_clock_inverters&amp;#39;, &amp;#39;set_thread_count&amp;#39;, &amp;#39;report_&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;partition_design&amp;#39;, &amp;#39;place_cell&amp;#39;, &amp;#39;blockage&amp;#39;, &amp;#39;partitionmgr&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;nmos&amp;#39;, &amp;#39;tuner&amp;#39;, &amp;#39;write_sdf&amp;#39;, &amp;#39;place_density&amp;#39;, &amp;#39;place_pins_args&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;size_cell&amp;#39;, &amp;#39;*macor*&amp;#39;, &amp;#39;repair_clock_inverter&amp;#39;, &amp;#39;misk&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;readhaty&amp;#39;, &amp;#39;readhat&amp;#39;, &amp;#39;obstruct&amp;#39;, &amp;#39;odbpy&amp;#39;, &amp;#39;openpdn&amp;#39;, &amp;#39;openram&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;placement_cfg&amp;#39;, &amp;#39;read_macro_placement&amp;#39;, &amp;#39;output_drc&amp;#39;, &amp;#39;positon&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;pct&amp;#39;, &amp;#39;qrctechtable&amp;#39;, &amp;#39;qrctechfile&amp;#39;, &amp;#39;qrctech&amp;#39;, &amp;#39;qrc&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;properly covered&amp;#39;, &amp;#39;precision innovations&amp;#39;, &amp;#39;repeater&amp;#39;, &amp;#39;&amp;#34;rcx-0487&amp;#34;&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;report_worst&amp;#39;, &amp;#39;report_area&amp;#39;, &amp;#39;report_clock_properties&amp;#39;, &amp;#39;skywater&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;study&amp;#39;, &amp;#39;sv&amp;#39;, &amp;#39;synth&amp;#39;, &amp;#39;synth_hierarchical&amp;#39;, &amp;#39;systemverilog&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;tdm&amp;#39;, &amp;#39;tdms_place&amp;#39;, &amp;#39;triton&amp;#39;, &amp;#39;ungroup&amp;#39;, &amp;#39;verilog_files&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;wrc&amp;#39;, &amp;#39;write_lef&amp;#39;, &amp;#39;write_partition_verilog&amp;#39;, &amp;#39;שואם&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;si2&amp;#39;, &amp;#39;sever&amp;#39;, &amp;#39;setrc&amp;#39;, &amp;#39;rtl_macro&amp;#39;, &amp;#39;report_dcalc&amp;#39;, &amp;#39;report_design&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;report_design_info&amp;#39;, &amp;#39;report_instance&amp;#39;, &amp;#39;report_slews&amp;#39;, &amp;#39;resize&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;rtlmp&amp;#39;, &amp;#39;set_power_activity&amp;#39;, &amp;#39;rtree&amp;#39;, &amp;#39;run_all&amp;#39;, &amp;#39;run_all.tcl&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;sc&amp;#39;, &amp;#39;set_all_input_output_delays&amp;#39;, &amp;#39;set_io_pin_constraints&amp;#39;, &amp;#39;metis&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;lefdef&amp;#39;, &amp;#39;make_result_file&amp;#39;, &amp;#39;macro_placement_cfg&amp;#39;, &amp;#39;clock__details&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;clocks__details&amp;#39;, &amp;#39;combinational&amp;#39;, &amp;#39;config.mk&amp;#39;, &amp;#39;coord&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;core_margin&amp;#39;, &amp;#39;db_process_node&amp;#39;, &amp;#39;dbblocjs&amp;#39;, &amp;#39;dbdatabase&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;dbr&amp;#39;, &amp;#39;dbrt&amp;#39;, &amp;#39;dbrttree&amp;#39;, &amp;#39;debian&amp;#39;, &amp;#39;define_pin_shape&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;densiy&amp;#39;, &amp;#39;desgin&amp;#39;, &amp;#39;diff_file&amp;#39;, &amp;#39;clk_period&amp;#39;, &amp;#39;clk_io_ptc&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;cdl&amp;#39;, &amp;#39;analog&amp;#39;, &amp;#39;./env.sh&amp;#39;, &amp;#39;178&amp;#39;, &amp;#39;6_final&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;6_final.odb&amp;#39;, &amp;#39;_placement&amp;#39;, &amp;#39;abat&amp;#39;, &amp;#39;add_stripe&amp;#39;, &amp;#39;arch&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;ccs&amp;#39;, &amp;#39;binaries&amp;#39;, &amp;#39;bookshelf&amp;#39;, &amp;#39;buff_cell&amp;#39;, &amp;#39;buildwithdocker&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;busbitchars&amp;#39;, &amp;#39;buschar&amp;#39;, &amp;#39;captable&amp;#39;, &amp;#39;directoryobject&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;disallow_one_site_gaps&amp;#39;, &amp;#39;distribute&amp;#39;, &amp;#39;is_port&amp;#39;, &amp;#39;hierarch&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;hop&amp;#39;, &amp;#39;hyper&amp;#39;, &amp;#39;initialie_flooorplan&amp;#39;, &amp;#39;initialize_flooorplan&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;instance_count&amp;#39;, &amp;#39;is_chip&amp;#39;, &amp;#39;lean&amp;#39;, &amp;#39;gui_final&amp;#39;, &amp;#39;lec&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;*def*&amp;#39;, &amp;#39;limitation&amp;#39;, &amp;#39;lyp&amp;#39;, &amp;#39;maco&amp;#39;, &amp;#39;macro_pin&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;macro_place&amp;#39;, &amp;#39;harness&amp;#39;, &amp;#39;gui.py&amp;#39;, &amp;#39;dont&amp;#39;, &amp;#39;fill_cell&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;dreamplace&amp;#39;, &amp;#39;em&amp;#39;, &amp;#39;enable_dpo&amp;#39;, &amp;#39;energy&amp;#39;, &amp;#39;env.sh&amp;#39;, &amp;#39;erc&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;export&amp;#39;, &amp;#39;findmaste&amp;#39;, &amp;#39;grt_layer_adjustments&amp;#39;, &amp;#39;findmaster&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;freepdk45&amp;#39;, &amp;#39;gdt&amp;#39;, &amp;#39;global_&amp;#39;, &amp;#39;global_place_db&amp;#39;,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;#39;global_placementy&amp;#39;, &amp;#39;graph&amp;#39;, &amp;#39;갲&amp;#39;]
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>For our case we can roughly the problem with these zero-result queries fall
under one of these categories:&lt;/p>
&lt;ul>
&lt;li>Missing documentation: Either the parameter of functionality&lt;/li>
&lt;li>Typo: User has the right keyword, but did not type it correctly. We will therefore provide them with search &lt;a href="https://openroad-flow-scripts.readthedocs.io/en/latest/user/FAQS.html#how-do-i-get-better-search-results" target="_blank" rel="noopener">tips&lt;/a> such as using fuzziness &lt;code>~N&lt;/code> operator for better matches.&lt;/li>
&lt;/ul>
&lt;h2 id="future-work">Future Work&lt;/h2>
&lt;p>ReadTheDocs could also be linked with
&lt;a href="https://analytics.google.com/analytics/web/provision/#/provision" target="_blank" rel="noopener">Google Analytics&lt;/a>,
but this remains for more advanced users.&lt;/p>
&lt;p>Another rich source of information helpful to open-source maintainers
are GitHub issues. These are the direct platform where users discuss
their problems. Another great way to track documentation engagement
is to use metrics such as: installation issues per unit week,
or user-issue retention rate, which tracks the number of users
that continue to file issues after their first.&lt;/p>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>This post showcases the amount of insight one can gather from parsing
traffic and search analytics. It also provides useful Python functions
that can be applied to the analytics dataset for fast prototyping
and experimentation. If you are a contributor to open-source projects,
try uncovering some insights for your doc pages today!&lt;/p></description></item><item><title>Highlighting and Formatting Pyrope HDL</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/livehd/20230526-rbaxt/</link><pubDate>Thu, 22 Jun 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/livehd/20230526-rbaxt/</guid><description>&lt;p>As part of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/livehd">Micro Architecture Santa Cruz (MASC)&lt;/a> my &lt;a href="https://drive.google.com/file/d/1aJIF-geNoN49zjkFS1W7yur2-rYCxhrt/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of Jose Renau aims to develop syntax highlighting and a vertical alignment tool for Pyrope. Pyrope is a modern hardware description language under development by MASC. Code is parsed with the &lt;a href="https://github.com/masc-ucsc/tree-sitter-pyrope/tree/main" target="_blank" rel="noopener">tree-sitter grammar for Pyrope&lt;/a>. I am working on developing a query file for the nvim-treesitter plugin. This gives neovim users Pyrope syntax highlighting based on the parse tree. In addition to syntax highlighting, I am working on a vertical alignment tool to improve code readability. These features will improve the usability and convenience of Pyrope.&lt;/p></description></item><item><title>Proactive Data Containers</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/lbl/pdc/20230620-nijwang/</link><pubDate>Tue, 20 Jun 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/lbl/pdc/20230620-nijwang/</guid><description>&lt;p>As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/lbl/pdc">Proactive Data Containers (PDC)&lt;/a> my &lt;a href="https://docs.google.com/document/d/1Pnt-iq9pWD70d_jmSsoJjnbXtIjJGY3IbXFrwyFT4Q4/edit?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/houjun-tang/">Houjun Tang&lt;/a> aims to novel data abstraction for managing science data in an object-oriented manner. PDC&amp;rsquo;s will provide efficient strategies for moving data in deep storage hierarchies and techniques for transforming and reorganizing data based on application requirements. The functionality of the container object themselves are already well developed, so my goal will be to verify the functionality tests regarding the Python API to ensure that it can be used with ease, as well as create command line tools so that it is a complete data object that can be used across platforms and is simple and helpful for the users.&lt;/p></description></item><item><title>Interactive Exploration of High-dimensional Datasets with PolyPhy and Polyglot</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/polyphy/20230616-kirandeol/</link><pubDate>Fri, 16 Jun 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/polyphy/20230616-kirandeol/</guid><description>&lt;p>Hello! My name is Kiran and this summer I&amp;rsquo;ll be working with &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/polyphy">Polyphy&lt;/a> and &lt;a href="https://normand-1024.github.io/Bio-inspired-Exploration-of-Language-Embedding/" target="_blank" rel="noopener">Polyglot&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/oskar-elek/">Oskar Elek&lt;/a>.
The full &lt;a href="https://drive.google.com/file/d/1iwKU938uzUHn0oY2tM0jPADOYoF0kqbh/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> is available online.&lt;/p>
&lt;p>For a brief overview, the Polyglot app allows users to interact with a 3D network of high-dimensional language embeddings, specfically the
&lt;a href="http://vectors.nlpl.eu/repository/" target="_blank" rel="noopener">Gensim Continuous Skipgram result of Wikipedia Dump of February 2017 (296630 words)&lt;/a> dataset. The high-dimensional
embeddings are reduced to 3 dimensions using UMAP. The novel &lt;a href="https://iopscience.iop.org/article/10.3847/2041-8213/ab700c/pdf" target="_blank" rel="noopener">MCPM slime mode metric&lt;/a> is then used
to compute the similarty levels between points (much like how you might compute the Euclidean distance between two points). These similarity levels are used
to filter the network and enable users to find interesting patterns in their data they might not find using quantitative methods alone. For example, the network has
a distinct branch in which only years are nearby! Users might find other clusters, such as ones with sports words or even software engineering words.
Although such exploration may not lead to quantitatively significant conclusions, the ability to explore and test mini hypotheses about the data can lead to
important insights that go on to incite quantitatively significant conclusions.&lt;/p>
&lt;p>In our project, we aim to expand Polyglot such that any user can upload their own data, once they have computed the MCPM metric using PolyPhy. This will have
important applications in building trust in our data and embeddings. This could also help with research on the MCPM metric, which presents a new, more naturalistic
way of computing similarity by relying on the principle of least effort. Overall, there is an exciting summer ahead and if you&amp;rsquo;re interested in keeping up please
feel free to check out the Polyglot app on Github!&lt;/p></description></item><item><title>Optimizing FasTensor: Enabling Efficient Tensor Execution on GPUs</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/lbl/fastensor/20230605-ris0801/</link><pubDate>Mon, 05 Jun 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/lbl/fastensor/20230605-ris0801/</guid><description>&lt;p>Greetings,&lt;/p>
&lt;p>I am Rishabh Singh, and I am excited to be part of the 2023 Google Summer of code program. My &lt;a href="https://docs.google.com/document/d/14DRkbF1S0VnPcopd37Io0pgKVQ1bDSN3QMf3Os6JyBA/edit?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/john-wu/">John Wu&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/bin-dong/">Bin Dong&lt;/a> focuses on optimizing the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/lbl/fastensor">FasTensor&lt;/a> tensor computing library for efficient usage on GPUs, specifically targeting tensor contraction while preserving structure-locality. This optimization is crucial for scientific applications and advanced AI model training. Throughout the project, I will develop custom computational operations for GPUs, implement FasTensor on GPUs, assess its performance, and provide comprehensive documentation. By the end, I aim to deliver a working implementation, a performance report, and a detailed execution mechanism guide. Leveraging my background in software engineering and machine learning, I will utilize languages like C++ and OpenMP to ensure efficient memory management and data movement. Stay tuned for regular updates and informative blogs as I progress through the summer.&lt;/p></description></item><item><title>ScaleBugs: Reproducible Scalability Bugs</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucdavis/scalebugs/20230601-boluwarinayinmode/</link><pubDate>Thu, 01 Jun 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucdavis/scalebugs/20230601-boluwarinayinmode/</guid><description>&lt;p>Hello! As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucdavis/scalebugs/">ScaleBugs&lt;/a> project our proposals (&lt;a href="https://drive.google.com/file/d/17iANa5ei_gguZsGGwR1sfPHOoJysnNsf/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> from &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/goodness-ayinmode/">Goodness Ayinmode&lt;/a> and &lt;a href="https://drive.google.com/file/d/199ZsiWHXsLYbSJ896vaf8tjrYs23P5xN/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> from &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/zahra-nabila-maharani/">Zahra Nabila Maharani&lt;/a>) under the mentorship under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/cindy-rubio-gonzalez/">Cindy Rubio González&lt;/a>,&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/haryadi-s.-gunawi/">Haryadi S. Gunawi&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/hao-nan-zhu/">Hao-Nan Zhu&lt;/a> aims to build a dataset of reproducible scalability bugs by analyzing bug reports from popular distributed systems like Cassandra, HDFS, Ignite, and Kafka. For each bug report, we will analyze whether the reported bug is influenced by the scale of the operation, such as the number of nodes being used or a number of requests. The resulting dataset will consist of bug artifacts containing the buggy and fixed versions of the scalability system, a reproducible runtime environment, and workload shell scripts designed to demonstrate bug symptoms under different scales. These resources will help support research and development efforts in addressing scalability issues and optimizing system performance.&lt;/p></description></item><item><title>Intro: Open Source Autonomous Vehicle Controller</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/osavc/20230530-25chilingh/</link><pubDate>Tue, 30 May 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/osavc/20230530-25chilingh/</guid><description>&lt;p>As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/osavc">Open Source Autonomous Vehicle Controller Project&lt;/a> my &lt;a href="https://docs.google.com/document/d/1hDU87aAzbn88vWwOHH0ggIID2W4KKzp8SKF1Lb8LU90/edit?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;strong>Aaron Hunter and Carlos Espinosa&lt;/strong> aims to create comprehensive technical documentation to help onboard new users of the OSAVC controller. I will be writing tutorials and examples to demonstrate how to start with an OSAVC, programming it with the robotic equivalent of HelloWorld and later moving onto more sophisticated explanations. Hence, this will encourage more applications and wider adoption in the field of autonomous vehicles and expand the community of OSAVC users.&lt;/p></description></item><item><title>Enhancing and Validating LiveHD's Power Modeling Flow</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/livehd/20230529-shahzaibk23/</link><pubDate>Mon, 29 May 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/livehd/20230529-shahzaibk23/</guid><description>&lt;p>As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/livehd">Enhancing and Validating LiveHD&amp;rsquo;s Power Modeling Flow&lt;/a> my &lt;a href="https://docs.google.com/document/d/1_GtzWf_gCKkreN1-6VSAI4h2BqwKEUDGkNNB1OM554I/edit?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of Jose Renau and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sakshi-garg/">Sakshi Garg&lt;/a> aims to enhance and validate LiveHD&amp;rsquo;s power modeling flow, a critical feature for estimating power consumption in modern hardware designs. The existing flow requires further refinement to ensure its stability, accuracy, compatibility with a wider range of netlists and VCD files, and overall performance. To address these challenges, the project will focus on methodically debugging the current implementation, establishing a comprehensive validation methodology for verifying the accuracy of power estimates, and optimizing the flow to handle larger netlists and VCD files efficiently. Additionally, the project aims to improve existing documentation by providing detailed explanations, examples, and tutorials to facilitate user adoption and understanding. Upon successful completion, the project will deliver a more reliable, accurate, and efficient power modeling flow within LiveHD, contributing to the development of energy-efficient hardware designs. This refined flow will not only enhance the capabilities of LiveHD but also encourage wider adoption and utilization by the hardware design community, fostering innovation in the field of energy-efficient devices and systems.&lt;/p></description></item><item><title>High Fidelity UAV Simulation Using Unreal Engine with specular reflections</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/osavc/20230601-damodardatta/</link><pubDate>Mon, 29 May 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/osavc/20230601-damodardatta/</guid><description>&lt;p>As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/osavc">Open Source Autonomous Vehicle Controller&lt;/a> my &lt;a href="https://drive.google.com/file/d/18g-WRZj_7ufIt6YZNn4OG1s7VKi1u5hV/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;strong>Aaron Hunter and Carlos Espinosa&lt;/strong> aims to Develop a unreal engine based simulator for testing. The simulator will be using unreal engine for the physics and visualization.&lt;/p>
&lt;p>The existing framework uses gazebo simulator with ROS which limit the developement to only Python and C++ programing languages. I intend to develope this simulator with intention connecting it with Python and C++, additionaly expanding support to Matlab so that in future the control algorithm design and validation process becomes easier. To smoothen future developement, i intent to add detailed documentation consisting of the developement period weekly report, examples and tutorial. Upon succesful completion, the project will deliver a powerful simulator with realistic simulation using unreal engine and additional support other programming languages like matlab.&lt;/p>
&lt;p>For more information about the Open Source Autonomous Vehicle Controller and the UC OSPO organization, you can visit the &lt;a href="https://github.com/uccross/open-source-autonomous-vehicle-controller" target="_blank" rel="noopener">OSAVC project repository&lt;/a> and the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/">UC OSPO website.&lt;/a>&lt;/p></description></item><item><title>OpenRAM Layout verses Schematic (LVS) visualization</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/openram/20230529-mahnoor-ismail01/</link><pubDate>Mon, 29 May 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/openram/20230529-mahnoor-ismail01/</guid><description>&lt;p>As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/openram">OpenRAM Layout verses Schematic (LVS) visualization&lt;/a> my &lt;a href="https://docs.google.com/document/d/1QEBOglVgy20s0v1_vfpFHw8CdIYUbex12TOjSlAe1-E/edit?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jesse-cirimelli-low/">Jesse Cirimelli-Low&lt;/a> and &lt;a href="mailto:mrg@ucsc.edu">Matthew Guthaus&lt;/a> aims to develop a comprehensive Python-based graphical user interface (GUI) with a robust backend system to effectively analyze, visualize, and debug layout versus schematic (LVS) mismatches in the OpenRAM framework. The proposed solution focuses on efficiently processing LVS report files in JSON format, identifying mismatched nets in the layout, and visually representing extra nets in the schematic graph using advanced backend algorithms. By implementing a powerful backend system, the GUI will streamline the debugging process and improve overall productivity, while maintaining high performance and reliability. The deliverables for this project include a fully-functional GUI with a performant backend, features for visualizing and navigating through LVS mismatches, comprehensive documentation, and user guides.&lt;/p></description></item><item><title>Efficient Communication with Key/Value Storage Devices</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/kvstore/20230526-manank/</link><pubDate>Fri, 26 May 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/kvstore/20230526-manank/</guid><description>&lt;p>Hi everyone!&lt;/p>
&lt;p>I&amp;rsquo;m Manank Patel, and am currently an undergraduate student at Birla Institute of Technology and Sciences - Pilani, KK Birla Goa Campus. As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/kvstore">Efficient Communication with Key/Value Storage Devices&lt;/a> my &lt;a href="https://drive.google.com/file/d/1iJIlHuCpnvDeOyr5DphDDimqdl9s4hKH/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/aldrin-montana/">Aldrin Montana&lt;/a> and &lt;strong>Philip Kufeldt&lt;/strong> aims to implement io_uring based communication backend for network based key-value store.&lt;/p>
&lt;p>io_uring offers a new kernel interface that can improve performance and avoid the overhead of system calls and zero copy network transmission capabilities. The KV store clients utilize traditional network sockets and POSIX APIs for their communication with the KV store. A notable advancement that has emerged in the past two years is the introduction of a new kernel interface known as io_uring, which can be utilized instead of the POSIX API. This fresh interface employs shared memory queues to facilitate communication between the kernel and user, enabling data transfer without the need for system calls and promoting zero copy transfer of data. By circumventing the overhead associated with system calls, this approach has the potential to enhance performance significantly.&lt;/p></description></item><item><title>Update OpenROAD Documentation and Tutorials</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20230526-luarss/</link><pubDate>Fri, 26 May 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/openroad/20230526-luarss/</guid><description>&lt;p>Hi! I am Jack, a Masters student at the National University of Singapore. In GSoC 2023, I will be undertaking the project entitled &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsd/openroad">Update OpenROAD Documentation and Tutorials&lt;/a> to improve the user experience and documentation of this exciting open-source RTL-to-GDSII framework, jointly mentored by &lt;strong>Indira Iyer Almeida&lt;/strong> and &lt;strong>Vitor Bandeira&lt;/strong>. Check out my proposal &lt;a href="https://drive.google.com/file/d/1_R4zDe2N05LtAsvDKa3w6C98GvIZ8HAI/view?usp=sharing" target="_blank" rel="noopener">here!&lt;/a>&lt;/p>
&lt;p>This project aims to review and update missing documentation and tutorials in OpenROAD-flow-scripts. A key focus will be on increasing ease-of-setup by updating documentation, setup scripts and docker-based commands. Next, we will also update documentation for the following OpenROAD components: Makefile flow variable, distributed detailed routing, Hier-RTLMP, Autotuner. If time permits, cloud enablement will be implemented, alongside notebook-based packaging to further increase ease of adoption.&lt;/p></description></item><item><title>Advancing Reproducible Science through Open Source Laboratory Protocols as Software</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/labop/20230621-luhesketh/</link><pubDate>Thu, 25 May 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsd/labop/20230621-luhesketh/</guid><description>&lt;p>Hello everyone!&lt;/p>
&lt;p>My name is Luiza, I am an eighth-semester Bsc Biological Sciences student from São Paulo, Brazil. As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsd/labop">LabOp&lt;/a> working group, my &lt;a href="https://docs.google.com/document/d/1pJ7UIATZYASXjbLdUosvq08QkhPNTFxZFId9dapNp-o/edit?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/dan-bryce/">Dan Bryce&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/tim-fallon/">Tim Fallon&lt;/a> aims to build a conversor that takes normal laboratory protocols and translates them into machine executable protocols. This is possible thanks to LabOP&amp;rsquo;s versatility to represent what a Laboratory protocol should look like. I´ll be testing this specialization in Hamilton machines that are great for experimenting scalling up.&lt;/p>
&lt;p>Nowadays we face a very common issue between Biotechnology laboratories, that is that protocols are difficult to share and to adapt for machine execution. Laboratory protocols are critical to biological research and development, yet complicated to communicate and reproduce across projects, investigators, and organizations. While many attempts have been made to address this challenge, there is currently no available protocol representation that is unambiguous enough for precise interpretation and automation, yet simultaneously abstract enough to enable reuse and adaptation.&lt;/p>
&lt;p>With LabOP we can take a protocol and convert it in multiple ways depending on the needs of the researcher for automation or human experimentation and allowing flexibility for execution and experimentation so I`ll be building a specialization that translates protocols in a way that they can be executed by Hamilton machines.&lt;/p></description></item><item><title>PolyPhy Infrastructure Enhancement</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/polyphy/20230525-prashantjha/</link><pubDate>Thu, 25 May 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/polyphy/20230525-prashantjha/</guid><description>&lt;p>Hey!&lt;/p>
&lt;p>I&amp;rsquo;m Prashant Jha, from Pune, a recent undergraduate student from BITS Pilani. As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/polyphy">Polyphy&lt;/a> my &lt;a href="https://drive.google.com/file/d/1y2X1_6_HliYowZn-qHd7x_Hz6QC3-KSe/view" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/oskar-elek/">Oskar Elek&lt;/a> aims to develop and improve the current infrastructure.&lt;/p>
&lt;p>Polyphorm / PolyPhy - which is led by
Oskar Elek. PolyPhy is an organization that focuses on developing a GPU oriented
agent-based system for reconstructing and visualizing optimal transport networks
defined over sparse data. With its roots in astronomy and inspiration drawn from nature,
PolyPhy has been instrumental in discovering network-like patterns in natural language
data and reconstructing the Cosmic web structure using its early prototype called
Polyphorm. The organization aims to provide a richer 2D / 3D scalar field representation
of the reconstructed network, making it a toolkit for a range of specialists across
different disciplines, including astronomers, neuroscientists, data scientists, and artists.
PolyPhy&amp;rsquo;s ultimate purpose is to create quantitatively comparable structural analytics
and discover connections between different disciplines. To achieve its goals, PolyPhy
requires a robust infrastructure that is engineered using DevOps, Code Refactoring, and
Continuous Integration/Continuous Deployment (CI/CD) practices.
You can see an instructive overview of PolyPhy in our workshop and more details about our research &lt;a href="https://polyphy.io/" target="_blank" rel="noopener">here&lt;/a>.&lt;/p></description></item><item><title>Strengthening Underserved Segments of the Open Source Pipeline</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/sus/20230524-nandinisaagar/</link><pubDate>Thu, 25 May 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/sus/20230524-nandinisaagar/</guid><description>&lt;p>Namaste everyone🙏🏻!&lt;/p>
&lt;p>I&amp;rsquo;m Nandini Saagar, from Mumbai. An undergraduate student at the Indian Institute of Technology, Banaras Hindu University, IIT (BHU), Varanasi. As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/sus">Strengthening Underserved Segments of the Open Source Pipeline&lt;/a> my &lt;a href="https://docs.google.com/document/d/1snzaUfBvptLcWP7I8IyKYFuBNfVGxNe9mnYkFXhb5ZM/edit?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/emily-lovell/">Emily Lovell&lt;/a> aims to strengthen the underserved segment of the open source pipeline.&lt;/p>
&lt;p>My interest in Open Source was first piqued as a freshman when I was introduced to Open Source as a place where people from all communities and backgrounds come together to create software that can have real-world impact, that too in a completely autonomous and self-governed manner! I am so glad that I could transition from just a person who imagined Open Source to be a fair-eyed dream to being a part of multiple such communities. This journey has been life-defining for me, and that’s why I want to help deliver the message of Open Source to all teenagers!&lt;/p>
&lt;p>This project seeks to invite and support broader, more diverse participation in open source by supporting early contributors, especially those who have been historically minoritized within tech. It will aim to create content that anyone with some Open Source experience can use to help and guide new students to the journey of OpenSource, GitHub, and all the relevant technologies, provide a medium and platform for all contributors to share their various OpenSource experiences and testimonials, conduct an Open Source Themed Hackathon/Scavenger Hunt, and leverage the power of social media engagement to get young and brilliant minds acquainted with the technical and open-source world at an early age.&lt;/p>
&lt;p>Stay tuned to explore the enormous world of Open Source with me!&lt;/p></description></item><item><title>Open Source Autonomous Vehicle Controller</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/osavc/20230525-aniruddha1261/</link><pubDate>Wed, 24 May 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/osavc/20230525-aniruddha1261/</guid><description>&lt;p>As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/osavc">Open Source Autonomous Vehicle Controller Project&lt;/a> my &lt;a href="https://drive.google.com/file/d/1_w9RfOM6XWruYUDR1d1yo45tQenpTQq5/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;strong>Aaron Hunter and Carlos Espinosa&lt;/strong> aims to Develop a tutorial that serves as a comprehensive guide for new users of the OSAVC controller. The tutorial will start from scratch, demonstrating how to initialize and program the controller using the equivalent of a &amp;ldquo;Hello, World!&amp;rdquo; program. Subsequently, it will progress to more advanced applications.&lt;/p>
&lt;p>Throughout the project, I will work closely with my mentors to ensure the accuracy, clarity, and usability of the documentation. Their guidance and expertise will be instrumental in achieving the project&amp;rsquo;s objectives effectively.&lt;/p>
&lt;p>By creating comprehensive technical documentation, this project aims to empower new users to harness the capabilities of the OSAVC controller. It will facilitate their understanding of the controller&amp;rsquo;s functionalities and enable them to leverage its potential in the field of autonomous vehicle applications.&lt;/p>
&lt;p>I am excited to embark on this journey, contribute to the open-source community, and make a valuable impact in the field of autonomous vehicles. Stay tuned for regular updates and progress reports as I work towards achieving the goals set forth in this project.&lt;/p>
&lt;p>For more information about the Open Source Autonomous Vehicle Controller and the UC OSPO organization, you can visit the &lt;a href="https://github.com/uccross/open-source-autonomous-vehicle-controller" target="_blank" rel="noopener">OSAVC project repository&lt;/a> and the &lt;a href="https://ospo.ucsc.edu/" target="_blank" rel="noopener">UC OSPO website.&lt;/a>&lt;/p>
&lt;p>Stay connected and join me in this exciting endeavor!&lt;/p></description></item><item><title>OSRE Catalyst</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/catalyst/</link><pubDate>Thu, 23 Mar 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/catalyst/</guid><description>&lt;p>Contributing to an open source project is a great way to build a technical portfolio, learn industry tools/practices, and have real-world impact – all while embedded in a collaborative community. The UC Santa Cruz Open Source Program Office (OSPO) wants to support more students on this path, especially those who have been minoritized in tech. We are partnering with an HBCU for a pilot summer program offering, with hopes to expand our reach in 2024.&lt;/p>
&lt;p>Through a hybrid (in-person/remote) model, participating students will spend four weeks on the UCSC campus learning about open source, followed by four weeks remotely contributing to an open source project. Participants will be well-supported by our instructional team, as well as their small peer cohort, through community-building and mentorship spanning the full eight weeks.&lt;/p>
&lt;h3 id="pilot-program-mentor--developer">Pilot Program Mentor &amp;amp; Developer&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Education&lt;/code>, &lt;code>Broadening Participation&lt;/code>, &lt;code>Mentorship and Support&lt;/code>, &lt;code>Community&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> communication, organization, GitHub/Markdown, basic web programming (HTML, CSS, JavaScript), open source contribution, version control/git workflow, mentorship, teaching&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Novice to Intermediate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/emily-lovell/">Emily Lovell&lt;/a>, &lt;a href="mailto:davis@soe.ucsc.edu">James Davis&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Given that this is a program pilot, your involvement and feedback will directly help shape its future!&lt;/p>
&lt;p>Possible tasks:&lt;/p>
&lt;ul>
&lt;li>Help cultivate a welcoming and supportive learning community&lt;/li>
&lt;li>Support students in completing hands-on activities related to open source contribution (e.g. evaluating potential projects/communities, using git, setting up a development environment)&lt;/li>
&lt;li>Develop technology-specific tutorials to introduce students to languages/libraries/etc. employed by their project&lt;/li>
&lt;li>Offer mentorship around how to navigate documentation, large codebases, and contributor communities&lt;/li>
&lt;li>Share your own input and perspective on what it&amp;rsquo;s like to be a newcomer to open source!&lt;/li>
&lt;/ul></description></item><item><title>eBPF Monitoring Tools</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/lanl/ebpftools/</link><pubDate>Tue, 21 Feb 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/lanl/ebpftools/</guid><description>&lt;p>&lt;a href="https://ebpf.io" target="_blank" rel="noopener">eBPF&lt;/a> is a technology that allows sandboxed programs to run in a priviledged context such as a Linux kernel. eBPF is for operating systems what Javascript is for web browsers: new functionality can be safely loaded without restarting or continually upgrading the operating system or browser and executed efficiently. eBPF is used to introduce new functionality into a running Linux kernel, including next-generation networking, observability, and security functionality. The following is just one idea of many possible.&lt;/p>
&lt;h3 id="implement-darshan-functionality-as-ebpf-tool">Implement Darshan functionality as eBPF tool&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> performance, I/O, workload characterization&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:treddy@lanl.gov">Tyler Reddy&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;a href="https://www.mcs.anl.gov/research/projects/darshan/" target="_blank" rel="noopener">Darshan&lt;/a> is an HPC I/O characterization tool that collect statistics using a lightweight design that makes it suitable for full time deployment. Darshan is an interposer library that catches and counts IO requests (open, write, read, etc.) to a file/file system and it keeps the counters in buckets in data structure that can be queried. How many reads of small size, medium size, large size) for example are the types of things that are counted.&lt;/p>
&lt;p>Having this be an interposer library requires users to link their application with this library. Having this function in epbf would make this same function transparent to users. Darshan has all the functions and could provide the list of functions to implement and the programmer could build and test these functions in ebpf on a linux machine. This could be a broadly available open tool that would be generally useful and but one of perhaps hundreds of examples of where ebpf based tools that could be in the open community for all to leverage.&lt;/p></description></item><item><title>Proactive Data Containers (PDC)</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/lbl/pdc/</link><pubDate>Sun, 12 Feb 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/lbl/pdc/</guid><description>&lt;p>&lt;a href="https://sdm.lbl.gov/pdc/about.html" target="_blank" rel="noopener">Proactive Data Containers&lt;/a> (PDC) are containers within a locus of storage (memory, NVRAM, disk, etc.) that store science data in an object-oriented manner. Managing data as objects enables powerful optimization opportunities for data movement and transformations, and storage mechanisms that take advantage of the deep storage hierarchy and enable automated performance tuning.&lt;/p>
&lt;h3 id="command-line-and-python-interface-to-an-object-centric-data-management-system">Command line and python interface to an object-centric data management system&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Python&lt;/code>, &lt;code>object-centric data management&lt;/code>, &lt;code>PDC&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Linux, C, Python&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/houjun-tang/">Houjun Tang&lt;/a>, &lt;a href="mailto:sbyna@lbl.gov">Suren Byna&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;a href="https://github.com/hpc-io/pdc" target="_blank" rel="noopener">Proactive Data Containers (PDC)&lt;/a> is an object-centric data management system for scientific data on high performance computing systems. It manages objects and their associated metadata within a locus of storage (memory, NVRAM, disk, etc.). Managing data as objects enables powerful optimization opportunities for data movement and transformations, and storage mechanisms that take advantage of the deep storage hierarchy and enable automated performance tuning. This project includes developing and updating efficient and user friendly command line and Python interfaces for PDC.&lt;/p></description></item><item><title>OpenRAM</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/openram/</link><pubDate>Wed, 08 Feb 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/openram/</guid><description>&lt;p>&lt;a href="https://github.com/VLSIDA/OpenRAM" target="_blank" rel="noopener">OpenRAM&lt;/a> is an award winning open-source Python framework to create the layout, netlists, timing and power models, placement and routing models, and other views necessary to use SRAMs in ASIC design. OpenRAM supports integration in both commercial and open-source flows with both predictive and fabricable technologies. Most recently, it has created memories that are included on all of the &lt;a href="https://efabless.com/open_shuttle_program/" target="_blank" rel="noopener">eFabless/Google/Skywater MPW tape-outs&lt;/a>.&lt;/p>
&lt;h3 id="layout-verses-schematic-lvs-visualization">Layout verses Schematic (LVS) visualization&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>VLSI Design Basics&lt;/code>, &lt;code>Python&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, VLSI, JSON&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Easy/Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jesse-cirimelli-low/">Jesse Cirimelli-Low&lt;/a>, &lt;a href="mailto:mrg@ucsc.edu">Matthew Guthaus&lt;/a>&lt;/li>
&lt;li>&lt;strong>Contributor(s):&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/mahnoor-ismail/">Mahnoor Ismail&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Create a visualization interface to debug layout verses schematic mismatches in &lt;a href="https://github.com/RTimothyEdwards/magic" target="_blank" rel="noopener">Magic&lt;/a> layout editor. Results will be parsed from a JSON output of &lt;a href="https://github.com/RTimothyEdwards/netgen" target="_blank" rel="noopener">Netgen&lt;/a>.&lt;/p></description></item><item><title>ScaleBugs: Reproducible Scalability Bugs</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucdavis/scalebugs/</link><pubDate>Tue, 07 Feb 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucdavis/scalebugs/</guid><description>&lt;p>Scalable systems lay essential foundations of the modern information industry. HPC data centers tend to have hundreds to thousands of nodes in their clusters. The use of “extreme-scale” distributed systems has given birth to a new type of bug: scalability bugs. As its name suggests, scalability bugs may be presented depending on the scale of a run, and thus, symptoms may only be observable in large-scale deployments, but not in small or median deployments. For example, &lt;a href="https://issues.apache.org/jira/browse/CASSANDRA-6127" target="_blank" rel="noopener">Cassandra-6127&lt;/a> is a scalability bug detected in the popular distributed database Cassandra. The scalability bug causes unnecessary CPU usage, however, the symptom is not observed unless ~1000 nodes are deployed. This demonstrates the main challenge of studying scalability bugs: it is extremely challenging to reproduce without deploying the system at a large scale.&lt;/p>
&lt;p>In this project, our goal is to build a dataset of &lt;strong>reproducible&lt;/strong> scalability bugs. To achieve this, we will go through the existing bug reports for popular distributed systems, which include Cassandra, HDFS, Ignite, and Kafka. For each bug report, we determine if the reported bug depends on the scale of the run, such as the number of nodes utilized. With the collected scale-dependent bugs, we then will craft the workload to reproduce those scalability bugs. Our workloads will be designed to trigger some functionalities of the system under different configurations (e.g., different numbers of nodes), for which we will observe the impact on performance. For example, a successful reproduction should be able to show the performance drop along with an increasing number of nodes.&lt;/p>
&lt;h3 id="building-a-dataset-of-reproducible-scalability-bugs">Building a Dataset of Reproducible Scalability Bugs&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: Scalability systems, bug patterns, reproducibility, bug dataset&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Linux Shell, Docker, Java, Python&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/cindy-rubio-gonzalez/">Cindy Rubio González&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/haryadi-s.-gunawi/">Haryadi S. Gunawi&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/hao-nan-zhu/">Hao-Nan Zhu&lt;/a>&lt;/li>
&lt;li>&lt;strong>Contributor(s)&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/goodness-ayinmode/">Goodness Ayinmode&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/zahra-nabila-maharani/">Zahra Nabila Maharani&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The student will build a dataset of reproducible scalability bugs. Each bug artifact in the dataset will contain (1) the buggy and fixed versions of the scalability system, (2) a runtime environment that ensures reproducibility, and (3) a workload shell script that could demonstrate the symptoms of the bug under different scales.&lt;/p>
&lt;h4 id="specific-tasks">Specific Tasks&lt;/h4>
&lt;ul>
&lt;li>Work with the mentors to understand the context of the project.&lt;/li>
&lt;li>Learn the background of scalability systems.&lt;/li>
&lt;li>Inspect the bug reports from Apache JIRA and identify scale-dependent bugs.&lt;/li>
&lt;li>Craft shell scripts to trigger the exact scalability bug described by the bug report.&lt;/li>
&lt;li>Organize the reproducible scalability bugs and write documentation to build the code
and trigger the bug.&lt;/li>
&lt;/ul></description></item><item><title>Strengthening Underserved Segments of the Open Source Pipeline</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/sus/</link><pubDate>Tue, 07 Feb 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/sus/</guid><description>&lt;p>Contributing to an open source project offers novices the opportunity to join a community of practitioners, build a technical portfolio, gain experience with industry tools and technologies, and have real-world impact. This project seeks to invite and support broader, more diverse participation in open source by supporting &lt;em>early contributors&lt;/em> – especially those who have been historically minoritized within tech.&lt;/p>
&lt;p>This work builds upon a number of existing projects with similar or overlapping goals. Some examples:&lt;/p>
&lt;ul>
&lt;li>The &lt;a href="http://teachingopensource.org" target="_blank" rel="noopener">Teaching Open Source (TOS) community&lt;/a>, which brings together instructors teaching open source&lt;/li>
&lt;li>The &lt;a href="http://foss2serve.org/index.php/POSSE" target="_blank" rel="noopener">Professors&amp;rsquo; Open Source Software Experience (POSSE) workshops and wiki&lt;/a>, for faculty teaching - or wanting to teach - open source&lt;/li>
&lt;li>Internships such as &lt;a href="https://summerofcode.withgoogle.com" target="_blank" rel="noopener">Google Summer of Code (GSoC)&lt;/a>, &lt;a href="https://www.outreachy.org" target="_blank" rel="noopener">Outreachy&lt;/a>, and the &lt;a href="https://fellowship.mlh.io" target="_blank" rel="noopener">MLH Fellowship&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://campus.openhatch.org" target="_blank" rel="noopener">Open Source Comes to Campus&lt;/a>, offering student workshops on tools and culture &lt;em>[no longer active]&lt;/em>&lt;/li>
&lt;li>&lt;a href="https://codein.withgoogle.com/archive/" target="_blank" rel="noopener">Google Code-in&lt;/a>, inviting pre-university students to make open source contributions &lt;em>[no longer active]&lt;/em>&lt;/li>
&lt;/ul>
&lt;p>This project will investigate gaps in currently available resources/programs and seek to address them, beginning with the exploration of engaging high school students with open source. Depending on early findings, this project could also entail the development of resources for independent learners and/or mentors.&lt;/p>
&lt;h3 id="learning-resource-development--repository-building">Learning Resource Development + Repository-Building&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Education&lt;/code>, &lt;code>Broadening Participation&lt;/code>, &lt;code>Mentorship and Support&lt;/code>, &lt;code>Community Development&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> independent research, communication, organization, GitHub/Markdown, basic web programming (HTML, CSS, JavaScript)&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Novice to Intermediate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/emily-lovell/">Emily Lovell&lt;/a>, &lt;a href="mailto:davis@soe.ucsc.edu">James Davis&lt;/a>&lt;/li>
&lt;li>&lt;strong>Contributor(s):&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/nandini-saagar/">Nandini Saagar&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>As an early contributor to this project, you will help gather information to inform the project direction – and then help bring it to life!&lt;/p>
&lt;p>Possible tasks:&lt;/p>
&lt;ul>
&lt;li>Meet with teachers and/or community members to identify new opportunities to engage with students (e.g. outside-of-school workshops, classroom visits, materials for teachers to use independently)&lt;/li>
&lt;li>Evaluate and test existing learning activities with a high school audience in mind (e.g. consider necessary pre-requisites, time required, ideal activity format)&lt;/li>
&lt;li>Evaluate and organize existing resources for newcomers (e.g. &lt;a href="https://up-for-grabs.net/#/" target="_blank" rel="noopener">Up For Grabs&lt;/a>, &lt;a href="https://hacktoberfest.com" target="_blank" rel="noopener">Hacktoberfest&lt;/a>, internship/fellowship opportunites)&lt;/li>
&lt;li>Help design and pilot new learning activities and/or workshops&lt;/li>
&lt;li>Assist in curating an open source repository of the aforementioned resources&lt;/li>
&lt;li>Conduct outreach to our target communities (e.g. brainstorm a catchy repository name, compose inviting and inclusive emails, design visual project elements)&lt;/li>
&lt;li>Share your own input and perspective on what it&amp;rsquo;s like to be a newcomer to open source!&lt;/li>
&lt;/ul></description></item><item><title>LabOP - an open specification for laboratory protocols, that solves common interchange problems stemming from variations in scale, labware, instruments, and automation.</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsd/labop/</link><pubDate>Mon, 06 Feb 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsd/labop/</guid><description>&lt;!---
Instructions for project submission here: https://ospo.ucsc.edu/osredocs/formentors/
All the projects so far:
https://ospo.ucsc.edu/osre/#projects
-->
&lt;h3 id="project-idea-1-software-hardware-and-wetware-building-labop-with-simultaneous-language--protocol-development--test-executions">Project idea 1: Software, hardware, and wetware building LabOP with simultaneous language &amp;amp; protocol development &amp;amp; test executions&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Software standard development, Laboratory automation, Biology&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, Semantic Web Technologies (RDF, OWL), interest to think about describing biological &amp;amp; chemical laboratory processes&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong>
&lt;ol>
&lt;li>&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/tim-fallon/">Tim Fallon&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/dan-bryce/">Dan Bryce&lt;/a>&lt;/li>
&lt;/ol>
&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h4 id="about-the-laboratory-open-protocol-language-labop">About: The Laboratory Open Protocol Language (LabOP)&lt;/h4>
&lt;p>&lt;strong>See link: &lt;a href="https://bioprotocols.github.io/labop/" target="_blank" rel="noopener">https://bioprotocols.github.io/labop/&lt;/a>&lt;/strong>&lt;/p>
&lt;p>LabOP is an &lt;em>open&lt;/em> specification for laboratory protocols, that solves common interchange problems stemming from variations in scale,
labware, instruments, and automation. LabOP was built from the ground-up to support protocol interchange. It provides an extensible
library of protocol primitives that capture the control and data flow needed for simple calibration and culturing protocols to
industrial control.&lt;/p>
&lt;h5 id="software-ecosystem">Software Ecosystem&lt;/h5>
&lt;p>LabOP&amp;rsquo;s rich representation underpins an ecosystem of several powerful software tools, including:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://www.github.com/bioprotocols/labop" target="_blank" rel="noopener">labop&lt;/a>: the Python LabOP library, which supports:
&lt;ul>
&lt;li>&lt;em>Programming&lt;/em> LabOP protocols in Python,&lt;/li>
&lt;li>&lt;em>Serialization&lt;/em> of LabOP protocols conforming to the LabOP RDF specification,&lt;/li>
&lt;li>&lt;em>Execution&lt;/em> in the native LabOP semantics (rooted in the UML activity model),&lt;/li>
&lt;li>&lt;em>Specialization&lt;/em> of protocols to 3rd-party protocol formats (including Autoprotocol, OpenTrons, and human readible formats), and&lt;/li>
&lt;li>&lt;em>Integration&lt;/em> with instruments (including OpenTrons OT2, Echo, and SiLA-based automation).&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="https://www.github.com/bioprotocols/laboped" target="_blank" rel="noopener">laboped&lt;/a>: the web-based LabOP Editor, which supports:
&lt;ul>
&lt;li>&lt;em>Programming&lt;/em> LabOP protocols quickly with low-code visual scripts,&lt;/li>
&lt;li>&lt;em>Storing&lt;/em> protocols on the cloud,&lt;/li>
&lt;li>&lt;em>Exporting&lt;/em> protocol specializations for use in other execution frameworks,&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h4 id="about-the-bioprotocols-working-group">About the Bioprotocols Working Group&lt;/h4>
&lt;p>The Bioprotocols Working Group is an open community organization developing a free and open standard for representation of biological
protocols.&lt;/p>
&lt;p>To join the Bioprotocols Working Group:&lt;/p>
&lt;ul>
&lt;li>Join the community mailing list at: &lt;a href="https://groups.google.com/g/bioprotocols" target="_blank" rel="noopener">https://groups.google.com/g/bioprotocols&lt;/a>&lt;/li>
&lt;li>Join the &lt;code>#collab-bioprotocols&lt;/code> channel on the &lt;a href="https://bitsinbio.org/" target="_blank" rel="noopener">Bits in Bio&lt;/a> Slack.&lt;/li>
&lt;/ul>
&lt;h5 id="leadership">Leadership&lt;/h5>
&lt;p>&lt;em>Elected Term: August 24th, 2022 - August 23rd, 2023&lt;/em>&lt;/p>
&lt;p>&lt;strong>Chair:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/dan-bryce/">Dan Bryce&lt;/a> (SIFT)&lt;/p>
&lt;p>&lt;strong>Finance Committee:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;a href="mailto:jeremy.cahill@metamerlabs.io">Jeremy Cahill (Metamer Labs)&lt;/a>&lt;/li>
&lt;li>&lt;a href="mailto:mark.doerr@uni-greifswald.de">Mark Doerr (University of Greifswald)&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/tim-fallon/">Tim Fallon&lt;/a> (UCSD)&lt;/li>
&lt;/ul>
&lt;h5 id="governance">Governance&lt;/h5>
&lt;p>&lt;em>Approved by community vote on August 16th, 2022&lt;/em>&lt;/p>
&lt;p>&lt;strong>&lt;a href="https://bioprotocols.github.io/labop/about#Governance" target="_blank" rel="noopener">https://bioprotocols.github.io/labop/about#Governance&lt;/a>&lt;/strong>&lt;/p>
&lt;h5 id="mission">Mission:&lt;/h5>
&lt;p>The Bioprotocols Working Group is an open community organization developing free and open standards for representation of biological
protocols. In support of that goal, the organization also develops tools and practices and works with other organizations to
facilitate dissemination and adoption of these standards.&lt;/p>
&lt;p>As an organization, the Bioprotocols Working Group holds the following values:&lt;/p>
&lt;ul>
&lt;li>The standards developed by the community should be available under permissive free and open licenses.&lt;/li>
&lt;li>Technical decisions of the community should be made following open and inclusive processes.&lt;/li>
&lt;li>The community is strengthened by fostering a culture of diversity and inclusion, in which all constructive participants feel
comfortable making their voices heard.&lt;/li>
&lt;/ul></description></item><item><title>OpenROAD - An Open-Source, Autonomous RTL-GDSII Flow for VLSI Designs (2023)</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsd/openroad/</link><pubDate>Wed, 01 Feb 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsd/openroad/</guid><description>&lt;p>The &lt;a href="https://theopenroadproject.org" target="_blank" rel="noopener">OpenROAD&lt;/a> project is a non-profit, DARPA-funded and Google sponsored project committed to creating low-cost and innovative Electronic Design Automation (EDA) tools and flows for IC design. Our mission is to democratize IC design, break down barriers of cost and access and mitigate schedule risk through native and open source innovation and collaboration with ecosystem partners. &lt;a href="https://github.com/The-OpenROAD-Project" target="_blank" rel="noopener">OpenROAD&lt;/a> provides an autonomous, no-human-in-the-loop, 24-hour, RTL-GDSII flow for fast ASIC design exploration, QoR estimation and physical implementation for a range of technologies above 12 nm. We welcome a diverse community of designers, researchers, enthusiasts, software engineers and entrepreneurs to use and contribute to OpenROAD and make a far-reaching impact. OpenROAD has been used in &amp;gt; 600 tapeouts across a range of ASIC applications with a rapidly growing and diverse user community.&lt;/p>
&lt;h3 id="enhance-openroad-gui-flow-manager">Enhance OpenROAD GUI Flow Manager&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>GUI&lt;/code>, &lt;code>Visualization&lt;/code>, &lt;code>User Interfaces&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++, Qt&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/matt-liberty/">Matt Liberty&lt;/a>, &lt;a href="mailto:ethanmoon@google.com">Ethan Mahintorabi&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Develop custom features for analysis and visualizations in the [OpenROAD GUI] (&lt;a href="https://openroad.readthedocs.io/en/latest/main/src/gui/README.html" target="_blank" rel="noopener">https://openroad.readthedocs.io/en/latest/main/src/gui/README.html&lt;/a>) to support native and third party flows. These include &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts" target="_blank" rel="noopener">OpenROAD-flow-scripts&lt;/a>, &lt;a href="https://github.com/The-OpenROAD-Project/OpenLane" target="_blank" rel="noopener">OpenLane&lt;/a> and other third-party flows . Create documentation: commands, developer guide notes, tutorials to show GUI usage for supported flows.&lt;/p>
&lt;h3 id="profile-and-tune-openroad-flow-for-runtime-improvements">Profile and tune OpenROAD flow for Runtime improvements&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>OpenROAD-flow-scripts&lt;/code>, &lt;code>Flow Manager&lt;/code>, &lt;code>Runtime Optimization&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Knowledge about Computational resource optimization, Cloud-based computation, Basic VLSI design and tools knowledge&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/matt-liberty/">Matt Liberty&lt;/a>, &lt;a href="mailto:ethanmoon@google.com">Ethan Mahintorabi&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Test, analyze and develop verifiable and re-producible strategies to improve run times in &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts" target="_blank" rel="noopener">OpenROAD-flow-scripts&lt;/a>. These include optimizations of computational resources over the cloud, tuning of algorithmic and design flow parameters. Create test plans using existing or new designs to show runtime improvements.&lt;/p>
&lt;h3 id="update-openroad-documentation-and-tutorials">Update OpenROAD Documentation and Tutorials&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Documentation&lt;/code>, &lt;code>Tutorials&lt;/code>, &lt;code>VLSI design basics&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Knowledge of EDA tools, basics of VLSI design flow, tcl, shell scripts, Documentation, Markdown&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/indira-iyer/">Indira Iyer&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/vitor-bandeira/">Vitor Bandeira&lt;/a>&lt;/li>
&lt;li>&lt;strong>Contributor(s)&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jack-luar/">Jack Luar&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Review and update missing documentation and tutorials in &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts" target="_blank" rel="noopener">OpenROAD-flow-scripts&lt;/a> for existing and new features. Here is an example Tutorial link: &lt;a href="https://openroad-flow-scripts.readthedocs.io/en/latest/tutorials/FlowTutorial.html" target="_blank" rel="noopener">https://openroad-flow-scripts.readthedocs.io/en/latest/tutorials/FlowTutorial.html&lt;/a> for reference.&lt;/p>
&lt;h3 id="lef-and-liberty-model-testing">LEF and Liberty Model Testing&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Testing&lt;/code>, &lt;code>LEF&lt;/code>, &amp;lsquo;LIB&amp;rsquo;, &lt;code>VLSI design basics&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Knowledge of EDA tools, basics of VLSI design, lef and lib model abstracts, tcl, shell scripts, Verilog, Layout&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/matt-liberty/">Matt Liberty&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Test the accuracy of generated LIB and LEF models for signoff in &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts" target="_blank" rel="noopener">OpenROAD-flow-scripts&lt;/a> for flat and hierarchical design flows. Build test cases to validate and add to the regression suite.&lt;/p></description></item><item><title>Polyphorm / PolyPhy</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/polyphy/</link><pubDate>Thu, 15 Dec 2022 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/polyphy/</guid><description>&lt;p>&lt;a href="https://github.com/PolyPhyHub/PolyPhy" target="_blank" rel="noopener">PolyPhy&lt;/a> is a GPU oriented agent-based system for reconstructing and visualizing &lt;em>optimal transport networks&lt;/em> defined over sparse data. Rooted in astronomy and inspired by nature, we have used an early prototype called &lt;a href="https://github.com/CreativeCodingLab/Polyphorm" target="_blank" rel="noopener">Polyphorm&lt;/a> to reconstruct the &lt;a href="https://youtu.be/5ILwq5OFuwY" target="_blank" rel="noopener">Cosmic web&lt;/a> structure, but also to discover network-like patterns in natural language data. You can see an instructive overview of PolyPhy in our &lt;a href="https://elek.pub/workshop_cross2022.html" target="_blank" rel="noopener">workshop&lt;/a> and more details about our research &lt;a href="https://elek.pub/projects/Rhizome-Cosmology" target="_blank" rel="noopener">here&lt;/a>.&lt;/p>
&lt;p>Under the hood, PolyPhy uses a richer 3D scalar field representation of the reconstructed network, instead of a typical discrete representation like a graph or a mesh. The ultimate purpose of PolyPhy is to become a toolkit for a range of specialists across different disciplines: astronomers, neuroscientists, data scientists and even artists and designers. PolyPhy aspires to be a tool for discovering connections between different disciplines by creating quantitatively comparable structural analytics.&lt;/p>
&lt;h3 id="polyphy-infrastructure-engineering-and-practices">PolyPhy infrastructure engineering and practices&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>DevOps&lt;/code> &lt;code>Code Refactoring&lt;/code> &lt;code>CI/CD&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> fluidity in Python, experience with OOP, experience with building and packaging libraries, understanding GitHub and its tools ecosystem&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> 350+ hours&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/oskar-elek/">Oskar Elek&lt;/a>, &lt;a href="mailto:anishagoel14@gmail.com">Anisha Goel&lt;/a>&lt;/li>
&lt;li>&lt;strong>Contributor(s):&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/prashant-jha/">Prashant Jha&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Your responsibility in this project will be developing new infrastructure of the PolyPhy project as well as maintaining the existing &lt;a href="https://github.com/PolyPhyHub/" target="_blank" rel="noopener">codebases&lt;/a>. This is a multifaceted role that will require coordination with the team and active approach to understanding the technical needs of the community.&lt;/p>
&lt;p>&lt;strong>Specific tasks:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Work with the technical lead to develop effective interfaces for PolyPhy, providing access to its functionality on the level of both Python/Jupyter code and the command line.&lt;/li>
&lt;li>Maintain the existing &lt;a href="https://github.com/PolyPhyHub/PolyPhy" target="_blank" rel="noopener">codebase&lt;/a> and configure it according to the team&amp;rsquo;s needs.&lt;/li>
&lt;li>Develop and extend the current CI/CD functionality and related code metrics.&lt;/li>
&lt;li>Document the best practices related to the above.&lt;/li>
&lt;/ul>
&lt;h3 id="write-polyphys-technical-story-and-content">Write PolyPhy&amp;rsquo;s technical story and content&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Writing&lt;/code> &lt;code>Documentation&lt;/code> &lt;code>Storytelling&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> experienced writing structured text, well read, technical or scientific education, webdev basics (preferably NodeJS)&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> 350 hours&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/oskar-elek/">Oskar Elek&lt;/a>, &lt;a href="mailto:ez@nmsu.edu">Ezra Huscher&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Integral to PolyPhy&amp;rsquo;s presentation is a &amp;ldquo;story&amp;rdquo; - a narrative understanding - that the users and the project contributors can relate to. Your responsibility will be to develop the written part of that understanding, as well as major portions of technical documentation that match it.&lt;/p>
&lt;p>&lt;strong>Specific tasks:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Work with mentors on understanding the context of the project.&lt;/li>
&lt;li>Write and edit diverse pages of the project &lt;a href="https://www.polyphy.io" target="_blank" rel="noopener">website&lt;/a>.&lt;/li>
&lt;li>Work with mentors to improve project&amp;rsquo;s written community practices (diversity, communication).&lt;/li>
&lt;li>Write and edit narrative and explanatory parts of PolyPhy&amp;rsquo;s documentation.&lt;/li>
&lt;li>Create tutorials that present core functionality of the toolkit.&lt;/li>
&lt;/ul>
&lt;h3 id="community-engagement-and-management">Community engagement and management&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Community Management&lt;/code> &lt;code>Social Media&lt;/code> &lt;code>Networking&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> documented experience with current social media landscape, social and well spoken, ability to communicate technical concepts&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> 175 or 350 hours&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/oskar-elek/">Oskar Elek&lt;/a>, &lt;a href="mailto:ez@nmsu.edu">Ezra Huscher&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Your responsibility will be to build and engage the community around PolyPhy. This includes its standing team and stakeholders, current expert users, potential adopters as well as the general public. The scope (size) of the project depends on the level of commitment during and beyond the Summer and is negotiable upfront.&lt;/p>
&lt;p>&lt;strong>Specific tasks:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Manage the team&amp;rsquo;s communication channels (Slack, Zoom, email) and maintain active presence therein.&lt;/li>
&lt;li>Develop social media presence for PolyPhy on Twitter, LinkedIn and other selected social media platforms.&lt;/li>
&lt;li>Manage and extend the online presence for the project, including its &lt;a href="https://polyphy.io" target="_blank" rel="noopener">website&lt;/a>, mailing list, and other applicable outreach activities.&lt;/li>
&lt;li>Research and engage with new communities that would benefit from PolyPhy, both as its expert users and contributors.&lt;/li>
&lt;/ul></description></item><item><title>Adaptive Load Balancers for Low-latency Multi-hop Networks</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/adaptiveload/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/adaptiveload/</guid><description>&lt;p>This project aims at designing efficient, adaptive link level load balancers for networks that handle different kinds of traffic, in particular networks where flows are heterogeneous in terms of their round trip times. Geo distributed data centers are one such example. With the large-scale deployments of 5G in the near future, there will be even more applications, including more bulk transfers of videos and photos, augmented reality applications and virtual reality applications which take advantage of 5G’s low latency service. With the development and discussion about Web3.0 and Metaverse, the network workloads across data centers are only going to get more varied and challenging. All these add to heavy, bulk of data being sent to the data centers and over the backbone network. These traffic have varying quality of service requirements, like low latency, high throughput and high definition video streaming. Wide area network (WAN) flows are typically data heavy tasks that consist of backup data taken for a particular data center. The interaction of the data center and WAN traffic creates a very interesting scenario with its own challenges to be addressed. WAN and data center traffic are characterized by differences in the link utilizations and round trip times. Based on readings and literature review, there seems to be very little work on load balancers that address the interaction of data center and WAN traffic. This in turn motivates the need for designing load balancers that take into account both WAN and data center traffic in order to create high performance for more realistic scenarios. This work proposes a load balancer that is adaptive to the kind of traffic it encounters by learning from the network conditions and then predicting the optimal route for a given flow.&lt;/p>
&lt;p>Through this research we seek to contribute the following :&lt;/p>
&lt;ul>
&lt;li>Designing a load balancer, that is adaptive to datacenter and WAN traffic, and in general can be adapted to varied traffic conditions&lt;/li>
&lt;li>Real time learning of the network setup and predicting optimal paths&lt;/li>
&lt;li>Low latency, high throughput and increased network utilization deliverables&lt;/li>
&lt;/ul>
&lt;h3 id="adaptive-dynamic-load-balancing-for-data-center-and-wan-traffic">Adaptive, Dynamic Load Balancing for data center and WAN traffic&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &amp;lsquo;data center networking&amp;rsquo;, TCP/IP stack&amp;rsquo;, &amp;lsquo;congestion control&amp;rsquo;, &amp;rsquo;load balancing&amp;rsquo;&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> C++, python, linux ; experience with network simulators would be helpful&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> moderate/ challenging&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:katia@soe.ucsc.edu"> Katia Obraczka&lt;/a>,&lt;a href="mailto:akabbani@gmail.com">Abdul Kabbani&lt;/a>, &lt;a href="mailto:lakrishn@ucsc.edu">Lakshmi Krishnaswamy&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Understanding the OMNeT++ network simulator and creating simple networks and data center topologies to understand the simulation environment.&lt;/li>
&lt;li>Implementing existing load balancers on OMNeT++ and exploring the effect of different features of the load balancers with data center traffic and WAN traffic.&lt;/li>
&lt;li>Finding and testing out WAN specific traffic that may exist, like video streaming traffic, large database queries etc.&lt;/li>
&lt;li>Working with the mentors on developing a learning-based load balancer framework that learns from past sample traffic, network conditions, to adapt dynamically to current network conditions.&lt;/li>
&lt;/ul></description></item><item><title>Apache AsterixDB</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucr/asterixdb/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucr/asterixdb/</guid><description>&lt;p>&lt;a href="http://asterixdb.apache.org/" target="_blank" rel="noopener">AsterixDB&lt;/a> is an open source parallel big-data management system. AsterixDB is a well-established Apache project that has beedddn active in research for more than 10 years. It provides a flexible data model that supports modern NoSQL applications with a powerful query processor that can scale to billions of records and terabytes of data. Users can interact with AsterixDB through a power and easy to use declarative query language, SQL++, which provides a rich set of data types including timestamps, time intervals, text, and geospatial, in addition to traditional numerical and Boolean data types.&lt;/p>
&lt;h3 id="geospatial-data-science-on-asterixdb">Geospatial Data Science on AsterixDB&lt;/h3>
&lt;ul>
&lt;li>&lt;em>Topics&lt;/em>: Data science, SQL++, documentation&lt;/li>
&lt;li>&lt;em>Skills&lt;/em>: SQL, Writing, Spreadsheets&lt;/li>
&lt;li>&lt;em>Difficulty&lt;/em>: Medium&lt;/li>
&lt;li>&lt;em>Size&lt;/em>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;em>Mentors&lt;/em>: &lt;a href="mailto:eldawy@ucr.edu">Ahmed Eldawy&lt;/a>, &lt;a href="mailto:asevi006@ucr.edu">Akil Sevim&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Build a data science project using AsterixDB that analyzes geospatial data among other dimensions. Use &lt;a href="https://star.cs.ucr.edu/?Chicago%20Crimes#center=41.8313,-87.6830&amp;amp;zoom=11" target="_blank" rel="noopener">Chicago Crimes&lt;/a> as the main dataset and combine with other datasets including &lt;a href="https://star.cs.ucr.edu/?osm21/pois#center=41.8313,-87.6830&amp;amp;zoom=11" target="_blank" rel="noopener">points of interests&lt;/a> &lt;a href="https://star.cs.ucr.edu/?TIGER2018/ZCTA5#center=41.8313,-87.6830&amp;amp;zoom=11" target="_blank" rel="noopener">ZIP Code boundaries&lt;/a>. During this project, we will answer interesting questions about the data and visualize the results such as:&lt;/p>
&lt;ul>
&lt;li>What is the most common crime type in a specific date or over the weekends?&lt;/li>
&lt;li>Where do most of the arrests happen?&lt;/li>
&lt;li>How are the crime rates change over time for different regions?&lt;/li>
&lt;/ul>
&lt;h4 id="the-goals-of-this-project-are">The goals of this project are:&lt;/h4>
&lt;ul>
&lt;li>Understand how to build a scalable data science project using AsterixDB.&lt;/li>
&lt;li>Translate common questions to SQL queries and run them on large data.&lt;/li>
&lt;li>Learn how to visualize the results of queries and present them.&lt;/li>
&lt;li>Write detailed documentation about the process of building a data science application in AsterixDB.&lt;/li>
&lt;li>Improve the documentation of AsterixDB while working in the project to improve the experience for future users.&lt;/li>
&lt;/ul>
&lt;h4 id="machine-learning-integration">Machine Learning Integration&lt;/h4>
&lt;p>As a bonus task, and depending on the progress of the project, we can explore the integration of machine learning with AsterixDB through Python UDFs. We will utilize the AsterixDB Python integration through &lt;a href="https://asterixdb.apache.org/docs/0.9.7/udf.html" target="_blank" rel="noopener">user-defined functions&lt;/a> to connect AsterixDB backend with &lt;a href="https://scikit-learn.org/stable/index.html" target="_blank" rel="noopener">scikit-learn&lt;/a> to build some unsupervised and supervised models for the data. For example, we can cluster the crimes based on their location and other attributes to find interesting patterns or hotspots.&lt;/p></description></item><item><title>CephFS</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/cephfs/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/cephfs/</guid><description>&lt;p>&lt;a href="https://docs.ceph.com/en/latest/cephfs/" target="_blank" rel="noopener">CephFS&lt;/a> is a distributed file system on top of &lt;a href="https://ceph.io" target="_blank" rel="noopener">Ceph&lt;/a>. It is implemented as a distributed metadata service (MDS) that uses dynamic subtree balancing to trade parallelism for locality during a continually changing workloads. Clients that mount a CephFS file system connect to the MDS and acquire capabilities as they traverse the file namespace. Capabilities not only convey metadata but can also implement strong consistency semantics by granting and revoking the ability of clients to cache data locally.&lt;/p>
&lt;h3 id="cephfs-namespace-traversal-offloading">CephFS namespace traversal offloading&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Ceph&lt;/code>, &lt;code>filesystems&lt;/code>, &lt;code>metadata&lt;/code>, &lt;code>programmable storage&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++, Ceph / MDS&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="mailto:carlosm@ucsc.edu">Carlos Maltzahn&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The frequency of metadata service (MDS) requests relative to the amount of data accessed can severely affect the performance of distributed file systems like CephFS, especially for workloads that randomly access a large number of small files as is commonly the case for machine learning workloads: they purposefully randomize access for training and evaluation to prevent overfitting. The datasets of these workloads are read-only and therefore do not require strong coherence mechanisms that metadata services provide by default.&lt;/p>
&lt;p>The key idea of this project is to reduce the frequency of MDS requests by offloading namespace traversal, i.e. the need to open a directory, list its entries, open each subdirectory, etc. Each of these operations usually require a separate MDS request. Offloading namespace traversal refers to a client’s ability to request the metadata (and associated read-only capabilities) of an entire subtree with one request, thereby offloading the traversal work for tree discovery to the MDS.&lt;/p>
&lt;p>Once the basic functionality is implemented, this project can be expanded to address optimization opportunities, e.g. describing regular tree structures as a closed form expression in the tree’s root, shortcutting tree discovery.&lt;/p></description></item><item><title>DirtViz (2022)</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/dirtviz/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/dirtviz/</guid><description>&lt;p>DirtViz is a project to visualize data collected from
sensors deployed in sensor networks. We have deployed a number of
sensors measuring qualities like soil moisture, temperature, current
and voltage in outdoor settings. This project involves extending (or
replacing) our existing plotting scripts to create a fully-feledged
dataviz tool tailored to the types of data collected from embedded
systems sensor networks.&lt;/p>
&lt;h3 id="visualize-sensor-data">Visualize Sensor Data&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Data Visualization&lt;/code>, &lt;code>Analytics&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: javascript, python, bash, webservers, git, embedded systems&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Easy/Moderate&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong> 175 hours&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/colleen-josephson/">Colleen Josephson&lt;/a>&lt;/li>
&lt;/ul>
&lt;ul>
&lt;li>Develop set of visualization tools (ideally web based) that easily allows users to zoom in on date ranges, change axes, etc.&lt;/li>
&lt;li>Document the tool thoroughly for future maintenance&lt;/li>
&lt;li>If interested, we are also interested in investigating correlations between different data streams&lt;/li>
&lt;/ul></description></item><item><title>Eusocial Storage Devices</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/eusocial/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/eusocial/</guid><description>&lt;p>As storage devices get faster, data management tasks rob the host of CPU cycles and main memory bandwidth. The &lt;a href="https://cross.ucsc.edu/projects/eusocialpage.html" target="_blank" rel="noopener">Eusocial project&lt;/a> aims to create a new interface to storage devices that can leverage existing and new CPU and main memory resources to take over data management tasks like availability, recovery, and migrations. The project refers to these storage devices as “eusocial” because we are inspired by eusocial insects like ants, termites, and bees, which as individuals are primitive but collectively accomplish amazing things.&lt;/p>
&lt;h3 id="dynamic-function-injection-for-rocksdb">Dynamic function injection for RocksDB&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Skills:&lt;/strong> C/C++, Java&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong> 175 or 350 hours&lt;/li>
&lt;li>&lt;strong>Mentor:&lt;/strong> &lt;a href="mailto:jliu120@ucsc.edu">Jianshen Liu&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Recent research reveals that the compaction process in RocksDB can be altered to optimize future data access by changing the data layout in compaction levels. The benefit of this approach can be extended to different data layout optimization based on application access patterns and requirements. In this project, we want to create an interface that would allow users to dynamically inject layout optimization functions to RockDB, using containerization technologies such as Webassembly.&lt;/p>
&lt;ul>
&lt;li>Reference: Saxena, Hemant, et al. &amp;ldquo;Real-Time LSM-Trees for HTAP Workloads.&amp;rdquo; arXiv preprint arXiv:2101.06801 (2021).&lt;/li>
&lt;/ul>
&lt;h3 id="demonstrating-a-composable-storage-system-accelerated-by-memory-semantic-technologies">Demonstrating a composable storage system accelerated by memory semantic technologies&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Skills:&lt;/strong> C/C++, Bash, Python, System architecture, Network fabrics&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong> 350 hours&lt;/li>
&lt;li>&lt;strong>Mentor:&lt;/strong> &lt;a href="mailto:jliu120@ucsc.edu">Jianshen Liu&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Since the last decade, the slowing down in the performance improvement of general-purpose processors is driving the system architecture to be increasingly heterogeneous. We have seen the kinds of domain-specific accelerator hardware (e.g., FPAG, SmartNIC, TPU, GPU) are growing to take over many different jobs from the general-purpose processors. On the other hand, the network and storage device performance have been tremendously improved with a trajectory much outweighed than that of processors. With this trend, a natural thought to continuously scale the storage system performance economically is to efficiently utilize and share different sources from different nodes over the network. There already exist different resource sharing protocols like CCIX, CXL, and GEN-Z. Among these GEN-Z is the most interesting because, unlike RDMA, it enables remote memory accessing without exposing details to applications (i.e., not application changes). Therefore, it would be interesting to see how/whether these technologies can help improve the performance of storage systems, and to what extent. This project would require building a demo system that uses some of these technologies (especially GEN-Z) and run selected applications/workloads to better understand the benefits.&lt;/p>
&lt;ul>
&lt;li>References: Gen-Z: An Open Memory Fabric for Future Data Processing Needs: &lt;a href="https://www.youtube.com/watch?v=JLb9nojNS8E" target="_blank" rel="noopener">https://www.youtube.com/watch?v=JLb9nojNS8E&lt;/a>, Pekon Gupta, SMART Modular; Gen-Z subsystem for Linux, &lt;a href="https://github.com/linux-genz" target="_blank" rel="noopener">https://github.com/linux-genz&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="when-will-rotational-media-users-abandon-sata-and-converge-to-nvme">When will Rotational Media Users abandon SATA and converge to NVMe?&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Skills:&lt;/strong> Entrepreneurial mind, interest in researching high technology markets&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> 350 hours&lt;/li>
&lt;li>&lt;strong>Mentor:&lt;/strong> &lt;a href="mailto:carlosm@ucsc.edu">Carlos Maltzahn&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Goal:&lt;/strong> Determine the benefits in particular market verticals such as genomics and health care to converge the storage stack in data center computer systems to the NVMe device interface, even when devices include rotational media (aka disk drives). The key question: “When do people abandon SATA and SAS and converge to NVMe?”&lt;/p>
&lt;p>&lt;strong>Background:&lt;/strong> NVMe is a widely used device interface for fast storage devices such as flash that behave much more like random access memory than the traditional rotational media. Rotational media is accessed mostly via SATA and SAS which has served the industry well for close to two decades. SATA in particular is much cheaper than NVMe. Now that NVMe is widely available and quickly advancing in functionality, an interesting question is whether there is a market for rotational media devices with NVMe interfaces, converging the storage stack to only one logical device interface, thereby enabling a common ecosystem and more efficient connectivity from multiple processes to storage devices.&lt;/p>
&lt;p>The NVMe 2.0 specification, which came out last year, has been restructured to support the increasingly diverse NVMe device environment (including rotational media). The extensibility of 2.0 encourages enhancements of independent command sets such as Zoned Namespaces (ZNS) and Key Value (NVMe-KV) while supporting transport protocols for NVMe over Fabrics (NVMe-oF). A lot of creative energy is now focused on advancing NVMe while SATA has not changed in 16 years. Having all storage devices connect the same way not only frees up space on motherboards but also enables new ways to manage drives, for example via NVMe-oF that allows drives to be networked without additional abstraction layers.&lt;/p>
&lt;p>&lt;strong>Suggested Project Structure:&lt;/strong> This is really just a suggestion for a starting point. As research progresses, a better structure might emerge.&lt;/p>
&lt;ol>
&lt;li>Convergence of software stack: seamless integration between rotational media and hot storage&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>Direct tiering: one unified interface to place data among fast and slow devices on the same NVMe fabric depending on whether the data is hot or cold.&lt;/li>
&lt;/ul>
&lt;ol start="2">
&lt;li>Computational storage:&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>What are the architectures of computational NVMe devices? For example, offloading compute to an FPGA vs an onboard processor in a disk drive?&lt;/li>
&lt;li>Do market verticals such as genomics and health care for one over the other? When do people abandon SATA and converge to NVMe?&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Project tasks:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Review current literature&lt;/li>
&lt;li>Survey what the industry is doing&lt;/li>
&lt;li>Join weekly meetings to discuss findings with Ph.D. students, experienced industry veterans, and faculty (Thursday’s 2-3pm, can be adjusted if necessary)&lt;/li>
&lt;li>Product is a slide deck with lots of pictures&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Interesting links:&lt;/strong>&lt;br>
&lt;a href="https://www.opencompute.org/wiki/Storage/NVMeHDD" target="_blank" rel="noopener">https://www.opencompute.org/wiki/Storage/NVMeHDD&lt;/a>&lt;br>
&lt;a href="https://2021ocpglobal.fnvirtual.app/a/event/1714" target="_blank" rel="noopener">https://2021ocpglobal.fnvirtual.app/a/event/1714&lt;/a> (video and slides, requires $0 registration)&lt;br>
&lt;a href="https://www.storagereview.com/news/nvme-hdd-edges-closer-to-reality" target="_blank" rel="noopener">https://www.storagereview.com/news/nvme-hdd-edges-closer-to-reality&lt;/a>&lt;br>
&lt;a href="https://www.tomshardware.com/news/seagate-demonstrates-hdd-with-pcie-nvme-interface" target="_blank" rel="noopener">https://www.tomshardware.com/news/seagate-demonstrates-hdd-with-pcie-nvme-interface&lt;/a>&lt;br>
&lt;a href="https://nvmexpress.org/everything-you-need-to-know-about-the-nvme-2-0-specifications-and-new-technical-proposals/" target="_blank" rel="noopener">https://nvmexpress.org/everything-you-need-to-know-about-the-nvme-2-0-specifications-and-new-technical-proposals/&lt;/a>&lt;br>
&lt;a href="https://www.tomshardware.com/news/nvme-2-0-supports-hard-disk-drives" target="_blank" rel="noopener">https://www.tomshardware.com/news/nvme-2-0-supports-hard-disk-drives&lt;/a>&lt;/p></description></item><item><title>FasTensor</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/lbl/fastensor/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/lbl/fastensor/</guid><description>&lt;p>&lt;a href="https://sdm.lbl.gov/fastensor/" target="_blank" rel="noopener">FasTensor&lt;/a> is a parallel execution engine for user-defined functions on multidimensional arrays. The user-defined functions follow the stencil metaphor used for scientific computing and is effective for expressing a wide range of computations for data analyses, including common aggregation operations from database management systems and advanced machine learning pipelines. FasTensor execution engine exploits the structural-locality in the multidimensional arrays to automate data management operations such as file I/O, data partitioning, communication, parallel execution, and so on.&lt;/p>
&lt;h3 id="continuous-integration">Continuous Integration&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Data Management&lt;/code>, &lt;code>Analytics&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++, github&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="mailto:kwu@lbl.gov">John Wu&lt;/a>, &lt;a href="mailto:dbin@lbl.gov">Bin Dong&lt;/a>, &lt;a href="mailto:sbyna@lbl.gov">Suren Byna&lt;/a>&lt;/li>
&lt;/ul>
&lt;ul>
&lt;li>Develop a test suite for the public API of FasTensor&lt;/li>
&lt;li>Automate execution of the test suite&lt;/li>
&lt;li>Document the continuous integration process&lt;/li>
&lt;li>Develop performance testing suite&lt;/li>
&lt;/ul></description></item><item><title>FasTensor</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/lbl/fastensor/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/lbl/fastensor/</guid><description>&lt;p>&lt;a href="https://sdm.lbl.gov/fastensor/" target="_blank" rel="noopener">FasTensor&lt;/a> is a parallel execution engine for user-defined functions on multidimensional arrays. The user-defined functions follow the stencil metaphor used for scientific computing and is effective for expressing a wide range of computations for data analyses, including common aggregation operations from database management systems and advanced machine learning pipelines. FasTensor execution engine exploits the structural-locality in the multidimensional arrays to automate data management operations such as file I/O, data partitioning, communication, parallel execution, and so on.&lt;/p>
&lt;h3 id="tensor-execution-engine-on-gpu">Tensor execution engine on GPU&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Data Management&lt;/code>, &lt;code>Analytics&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++, github&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Difficult&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/john-wu/">John Wu&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/bin-dong/">Bin Dong&lt;/a>, &lt;a href="mailto:sbyna@lbl.gov">Suren Byna&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Tensor based computing is needed by scientific applications and now advanced AI model training. Most tensor libraries are hand customized and optimized on GPU, and most of they only serve one kind of application. For example, TensorFlow is only optimized for AI model training. Optimizing generic tensor computing libraries on GPU can benefit wide applications. Our FasTensor, as a generic tensor computing library, can only work efficiently on CPU now. How to run the FasTensor on GPU is still none-explored work. Research and development challenges will include but not limited to: 1) how to maintain structure-locality of tensor data on GPU; 2) how to reduce the performance loss when the structure-locality of tensor is broken on GPU.&lt;/p>
&lt;ul>
&lt;li>Develop a mechanism to move user-define computing kernels onto GPU&lt;/li>
&lt;li>Evaluate the performance of the execution engine&lt;/li>
&lt;li>Document the execution mechanism&lt;/li>
&lt;li>Develop performance testing suite&lt;/li>
&lt;/ul>
&lt;h3 id="continuous-integration">Continuous Integration&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Data Management&lt;/code>, &lt;code>Analytics&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++, github&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (300 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/john-wu/">John Wu&lt;/a>, &lt;a href="mailto:dbin@lbl.gov">Bin Dong&lt;/a>, &lt;a href="mailto:sbyna@lbl.gov">Suren Byna&lt;/a>&lt;/li>
&lt;/ul>
&lt;ul>
&lt;li>Develop a test suite for the public API of FasTensor&lt;/li>
&lt;li>Automate execution of the test suite&lt;/li>
&lt;li>Document the continuous integration process&lt;/li>
&lt;/ul></description></item><item><title>HDF5</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/lbl/hdf5/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/lbl/hdf5/</guid><description>&lt;p>&lt;a href="https://portal.hdfgroup.org/display/knowledge/What&amp;#43;is&amp;#43;HDF5" target="_blank" rel="noopener">HDF5&lt;/a> is a unique technology suite that makes possible the management of extremely large and complex data collections.&lt;/p>
&lt;p>The HDF5 technology suite includes:&lt;/p>
&lt;ul>
&lt;li>A versatile data model that can represent very complex data objects and a wide variety of metadata.&lt;/li>
&lt;li>A completely portable file format with no limit on the number or size of data objects in the collection.&lt;/li>
&lt;li>A software library that runs on a range of computational platforms, from laptops to massively parallel systems, and implements a high-level API with C, C++, Fortran 90, and Java interfaces.&lt;/li>
&lt;li>A rich set of integrated performance features that allow for access time and storage space optimizations.&lt;/li>
&lt;li>Tools and applications for managing, manipulating, viewing, and analyzing the data in the collection.&lt;/li>
&lt;/ul>
&lt;h3 id="python-interface-to-hdf5-asynchronous-io">Python Interface to HDF5 Asynchronous I/O&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Python&lt;/code>, &lt;code>Async I/O&lt;/code>, &lt;code>HDF5&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Python, C, HDF5&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="mailto:sbyna@lbl.gov">Suren Byna&lt;/a>, &lt;a href="mailto:htang4@lbl.gov">Houjun Tang&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>HDF5 is a well-known library for storing and accessing (known as &amp;ldquo;Input and Output&amp;rdquo; or I/O) data on high-performance computing systems. Recently, new technologies, such as asynchronous I/O and caching, have been developed to utilize fast memory and storage devices and to hide the I/O latency. Applications can take advantage of an asynchronous interface by scheduling I/O as early as possible and overlapping computation with I/O operations to improve overall performance. The existing HDF5 asynchronous I/O feature supports the C/C++ interface. This project involves the development and performance evaluation of a Python interface that would allow more Python-based scientific codes to use and benefit from the asynchronous I/O.&lt;/p></description></item><item><title>LiveHD (2022)</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/livehd/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/livehd/</guid><description>&lt;p>Projects for &lt;a href="https://github.com/masc-ucsc/livehd" target="_blank" rel="noopener">LiveHD&lt;/a>. Lead Mentors: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jose-renau/">Jose Renau&lt;/a> and &lt;a href="mailto:swang203@ucsc.edu">Sheng-Hong Wang&lt;/a>.&lt;/p>
&lt;h3 id="hif-tooling">HIF Tooling&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Title&lt;/td>
&lt;td>HIF tooling&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Description&lt;/td>
&lt;td>Tools around Hardware Interchange Format (HIF) files&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Mentor(s)&lt;/td>
&lt;td>Jose Renau&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Skills&lt;/td>
&lt;td>C++17&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Difficulty&lt;/td>
&lt;td>Medium&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Size&lt;/td>
&lt;td>Medium 175 hours&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;a href="https://github.com/masc-ucsc/hif" target="_blank" rel="noopener">Link&lt;/a>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>HIF (&lt;a href="https://github.com/masc-ucsc/hif" target="_blank" rel="noopener">https://github.com/masc-ucsc/hif&lt;/a>) stands for Hardware Interchange Format.
It is designed to be a efficient binary representation with simple API that
allows to have generic graph and tree representations commonly used by hardware
tools. It is not designer to be a universal format, but rather a storate and
traversal format for hardware tools.&lt;/p>
&lt;p>LiveHD has 2 HIF interfaces, the tree (LNAST) and the graph (Lgraph). Both can
read/write HIF format. The idea of this project is to expand the hif repository
to create some small but useful tools around hif. Some projects:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>hif_diff + hif_patch: Create the equivalent of the diff/patch commands that
exist for text but for HIF files. Since the HIF files have a more clear
structure, some patches changes are more constrained or better understood
(IOs and dependences are explicit).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>hif_tree: Print the HIF hierarchy, somewhat similar to GNU tree but showing the HIF hieararchy.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>hif_grep: capacity to grep for some tokens and outout a hif file only with those. Thena hif_tree/hif_cat can show the contents.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="mockturtle">Mockturtle&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Title&lt;/td>
&lt;td>Mockturtle&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Description&lt;/td>
&lt;td>Perform synthesis for graph in LiveHD using Mockturtle&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Mentor(s)&lt;/td>
&lt;td>Jose Renau&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Skills&lt;/td>
&lt;td>C++17, synthesis&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Difficulty&lt;/td>
&lt;td>Medium&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Size&lt;/td>
&lt;td>Medium 175 hours&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;a href="https://github.com/masc-ucsc/livehd/blob/master/docs/cross.md#mockturtle" target="_blank" rel="noopener">Link&lt;/a>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>There are some issues with Mockturtle integration (new cells) and it is not using the latest Mockturtle library versions.
The goal is to use Mockturtle (&lt;a href="https://github.com/lsils/mockturtle" target="_blank" rel="noopener">https://github.com/lsils/mockturtle&lt;/a>) with LiveHD. The main characteristics:&lt;/p>
&lt;ul>
&lt;li>Use mockturtle to tmap to LUTs&lt;/li>
&lt;li>Use mockturtle to synthesize (optimize) logic&lt;/li>
&lt;li>Enable cut-rewrite as an option&lt;/li>
&lt;li>Enable hierarchy cross optimization (hier:true option)&lt;/li>
&lt;li>Use the graph labeling to find cluster to optimize&lt;/li>
&lt;li>Re-timing&lt;/li>
&lt;li>Map to LUTs only gates and non-wide arithmetic. E.g: 32bit add is not mapped to LUTS, but a 2-bit add is mapped.&lt;/li>
&lt;li>List of resources to not map:
&lt;ul>
&lt;li>Large ALUs. Large ALUs should have an OpenWare block (hardcoded in FPGAs and advanced adder options in ASIC)&lt;/li>
&lt;li>Multipliers and dividers&lt;/li>
&lt;li>Barrell shifters with not trivial shifts (1-2 bits) selectable at run-time&lt;/li>
&lt;li>memories, luts&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="query-shell">Query Shell&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Title&lt;/td>
&lt;td>Query Shell&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Description&lt;/td>
&lt;td>Create a console app that interacts with LiveHD to query parameters about designs&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Mentor(s)&lt;/td>
&lt;td>Jose Renau&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Skills&lt;/td>
&lt;td>C++17&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Difficulty&lt;/td>
&lt;td>Medium&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Size&lt;/td>
&lt;td>Medium 175 hours&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;a href="https://github.com/masc-ucsc/livehd/blob/master/docs/cross.md#query-shell-not-lgshell-to-query-graphs" target="_blank" rel="noopener">Link&lt;/a>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;ul>
&lt;li>Based on replxx (like lgshell)&lt;/li>
&lt;li>Query bits, ports&amp;hellip; like
&lt;ul>
&lt;li>&lt;a href="https://github.com/rubund/netlist-analyzer" target="_blank" rel="noopener">https://github.com/rubund/netlist-analyzer&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.jameswhanlon.com/querying-logical-paths-in-a-verilog-design.html" target="_blank" rel="noopener">https://www.jameswhanlon.com/querying-logical-paths-in-a-verilog-design.html&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>It would be cool if subsections (selected) parts can be visualized with something like &lt;a href="https://github.com/nturley/netlistsvg" target="_blank" rel="noopener">https://github.com/nturley/netlistsvg&lt;/a>&lt;/li>
&lt;li>The shell may be expanded to support simulation in the future&lt;/li>
&lt;li>Wavedrom/Duh dumps&lt;/li>
&lt;/ul>
&lt;p>Wavedrom and duh allows to dump bitfield information for structures. It would be interesting to explore to dump tables and bit
fields for Lgraph IOs, and structs/fields inside the module. It may be a way to integrate with the documentation generation.&lt;/p>
&lt;p>Example of queries: show path, show driver/sink of, do topo traversal,&amp;hellip;.&lt;/p>
&lt;p>As an interesting extension would be to have some simple embedded language (TCL or ChaiScript or ???) to control queries more
easily and allow to build functions/libraries.&lt;/p>
&lt;h3 id="lgraph-and-lnast-check-pass">Lgraph and LNAST check pass&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Title&lt;/td>
&lt;td>Lgraph and LNAST check pass&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Description&lt;/td>
&lt;td>Create a pass that check the integrity/correctness of Lgraph and LNAST&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Mentor(s)&lt;/td>
&lt;td>Jose Renau&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Skills&lt;/td>
&lt;td>C++17&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Difficulty&lt;/td>
&lt;td>Medium&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Size&lt;/td>
&lt;td>Large 350 hours&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;a href="https://github.com/masc-ucsc/livehd/blob/master/docs/cross.md#lgraph-and-lnast-check-pass" target="_blank" rel="noopener">Link&lt;/a>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Create a pass that checks that the Lgraph (and/or LNAST) is semantically
correct. The LNAST already has quite a few tests (pass.semantic), but it can be
further expanded. Some checks:&lt;/p>
&lt;ul>
&lt;li>No combinational loops&lt;/li>
&lt;li>No mismatch in bit widths&lt;/li>
&lt;li>No disconnected nodes&lt;/li>
&lt;li>Check for inefficient splits (do not split buses that can be combined)&lt;/li>
&lt;li>Transformations stages should not drop names if same net is preserved&lt;/li>
&lt;li>No writes in LNAST that are never read&lt;/li>
&lt;li>All the edges are possible. E.g: no pin &amp;lsquo;C&amp;rsquo; in Sum_op&lt;/li>
&lt;/ul>
&lt;h3 id="unbitwidth">unbitwidth&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Title&lt;/td>
&lt;td>unbitwidth&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Description&lt;/td>
&lt;td>Not all the variables need bitwidth information. Find the small subset&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Mentor(s)&lt;/td>
&lt;td>Jose Renau&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Skills&lt;/td>
&lt;td>C++17&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Difficulty&lt;/td>
&lt;td>Medium&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Size&lt;/td>
&lt;td>Medium 175 hours&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;a href="https://github.com/masc-ucsc/livehd/blob/master/docs/cross.md#unbitwidth-local-and-global-bitwidth" target="_blank" rel="noopener">Link&lt;/a>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>This pass is needed to create less verbose CHISEL and Pyrope code generation.&lt;/p>
&lt;p>The LGraph can have bitwidth information for each dpin. This is needed for
Verilog code generation, but not needed for Pyrope or CHISEL. CHISEL can
perform local bitwidth inference and Pyrope can perform global bitwidth
inference.&lt;/p>
&lt;p>A new pass should remove redundant bitwidth information. The information is
redundant because the pass/bitwidth can regenerate it if there is enough
details. The goal is to create a pass/unbitwidth that removes either local or
global bitwidth. The information left should be enough for the bitwidth pass to
regenerate it.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Local bitwidth: It is possible to leave the bitwidth information in many
places and it will have the same results, but for CHISEL the inputs should be
sized. The storage (memories/flops) should have bitwidth when can not be
inferred from the inputs.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Global bitwidth: Pyrope bitwidth inference goes across the call hierarchy.
This means that a module could have no bitwidth information at all. We start
from the leave nodes. If all the bits can be inferred given the inputs, the
module should have no bitwidth. In that case the bitwidth can be inferred from
outside.&lt;/p>
&lt;/li>
&lt;/ul></description></item><item><title>LiveHD (2023)</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/livehd/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/livehd/</guid><description>&lt;p>Projects for &lt;a href="https://github.com/masc-ucsc/livehd" target="_blank" rel="noopener">LiveHD&lt;/a>.&lt;br>
Lead Mentors: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jose-renau/">Jose Renau&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sakshi-garg/">Sakshi Garg&lt;/a>.&lt;br>
Contributor(s): &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/shahzaib-kashif/">Shahzaib Kashif&lt;/a>&lt;/p>
&lt;p>LiveHD is a &amp;ldquo;compiler&amp;rdquo; infrastructure for hardware design optimized for synthesis and simulation. The goals is to enable a more productive flow where the ASIC/FPGA designer can work with multiple hardware description languages like CHISEL, Pyrope, or Verilog.&lt;/p>
&lt;p>There are several projects available around LiveHD. A longer explanation and more project options are available at
&lt;a href="https://github.com/masc-ucsc/livehd/blob/master/docs/projects.md" target="_blank" rel="noopener">projects&lt;/a>. Contact the
mentors to find a project that fits your interests.&lt;/p>
&lt;p>A sample of helpful projects:&lt;/p>
&lt;h3 id="mockturtle">Mockturtle&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Title&lt;/td>
&lt;td>Mockturtle&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Description&lt;/td>
&lt;td>Perform synthesis for graph in LiveHD using Mockturtle&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Mentor(s)&lt;/td>
&lt;td>Jose Renau and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sakshi-garg/">Sakshi Garg&lt;/a>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Skills&lt;/td>
&lt;td>C++17, synthesis&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Difficulty&lt;/td>
&lt;td>Medium&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Size&lt;/td>
&lt;td>Medium 175 hours&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;a href="https://github.com/masc-ucsc/livehd/blob/master/docs/projects_large.md#medium-parallel-and-hierarchical-synthesis-with-mockturtle" target="_blank" rel="noopener">Link&lt;/a>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Mockturtle (&lt;a href="https://github.com/lsils/mockturtle" target="_blank" rel="noopener">https://github.com/lsils/mockturtle&lt;/a>) is a synthesis tool partially
integrated with LiveHD. The goal of this task is to iron out bugs and issues
and to use the LiveHD Tasks API to parallelize the synthesis.&lt;/p>
&lt;p>Main features:&lt;/p>
&lt;ul>
&lt;li>The current synthesis divides the circuit in partitions. Each partition can be synthesized in parallel.&lt;/li>
&lt;li>Support hierarchical synthesis to optimize cross Lgraphs (cross verilog module optimization)&lt;/li>
&lt;/ul>
&lt;p>The goal is to use Mockturtle (&lt;a href="https://github.com/lsils/mockturtle" target="_blank" rel="noopener">https://github.com/lsils/mockturtle&lt;/a>) with LiveHD. The main characteristics:&lt;/p>
&lt;ul>
&lt;li>Use mockturtle to tmap to LUTs&lt;/li>
&lt;li>Use mockturtle to synthesize (optimize) logic&lt;/li>
&lt;li>Enable cut-rewrite as an option&lt;/li>
&lt;li>Enable hierarchy cross optimization (hier:true option)&lt;/li>
&lt;li>Use the graph labeling to find cluster to optimize&lt;/li>
&lt;li>Re-timing&lt;/li>
&lt;li>Map to LUTs only gates and non-wide arithmetic. E.g: 32bit add is not mapped to LUTS, but a 2-bit add is mapped.&lt;/li>
&lt;li>List of resources to not map:
&lt;ul>
&lt;li>Large ALUs. Large ALUs should have an OpenWare block (hardcoded in FPGAs and advanced adder options in ASIC)&lt;/li>
&lt;li>Multipliers and dividers&lt;/li>
&lt;li>Barrell shifters with not trivial shifts (1-2 bits) selectable at run-time&lt;/li>
&lt;li>memories, luts&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="livehd-console">LiveHD Console&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Title&lt;/td>
&lt;td>LiveHD Console&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Description&lt;/td>
&lt;td>Create a console app that interacts with LiveHD to query parameters about designs&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Mentor(s)&lt;/td>
&lt;td>Jose Renau and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sakshi-garg/">Sakshi Garg&lt;/a>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Skills&lt;/td>
&lt;td>C++17&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Difficulty&lt;/td>
&lt;td>Medium&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Size&lt;/td>
&lt;td>Medium 175 hours&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;a href="https://github.com/masc-ucsc/livehd/blob/master/docs/projects_small.md#medium-query-shell-not-lgshell-to-query-graphs" target="_blank" rel="noopener">Link&lt;/a>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Current LiveHD uses replxx but it a no longer maintained shell/console. The result is that it fails in newer versions of OSX.&lt;/p>
&lt;p>There is an alternative Crossline (&lt;a href="https://github.com/jcwangxp/Crossline%29" target="_blank" rel="noopener">https://github.com/jcwangxp/Crossline)&lt;/a>. This affects main/main.cpp and nothing else.&lt;/p>
&lt;p>In addition to replace the current console with auto-completion, the plan is to add &amp;ldquo;query&amp;rdquo; capacity to visualize some
of the LiveHD internals.&lt;/p>
&lt;ul>
&lt;li>Query bits, ports&amp;hellip; like
&lt;ul>
&lt;li>&lt;a href="https://github.com/rubund/netlist-analyzer" target="_blank" rel="noopener">https://github.com/rubund/netlist-analyzer&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.jameswhanlon.com/querying-logical-paths-in-a-verilog-design.html" target="_blank" rel="noopener">https://www.jameswhanlon.com/querying-logical-paths-in-a-verilog-design.html&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>It would be cool if subsections (selected) parts can be visualized with something like &lt;a href="https://github.com/nturley/netlistsvg" target="_blank" rel="noopener">https://github.com/nturley/netlistsvg&lt;/a>&lt;/li>
&lt;li>The shell may be expanded to support simulation in the future&lt;/li>
&lt;li>Wavedrom/Duh dumps&lt;/li>
&lt;/ul>
&lt;p>Wavedrom and duh allows to dump bitfield information for structures. It would be interesting to explore to dump tables and bit
fields for Lgraph IOs, and structs/fields inside the module. It may be a way to integrate with the documentation generation.&lt;/p>
&lt;p>Example of queries: show path, show driver/sink of, do topo traversal,&amp;hellip;.&lt;/p>
&lt;h3 id="compiler-error-generation-pass">Compiler error generation pass&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Title&lt;/td>
&lt;td>Lgraph and LNAST check pass&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Description&lt;/td>
&lt;td>Create a pass that check the integrity/correctness of Lgraph and LNAST&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Mentor(s)&lt;/td>
&lt;td>Jose Renau and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sakshi-garg/">Sakshi Garg&lt;/a>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Skills&lt;/td>
&lt;td>C++17&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Difficulty&lt;/td>
&lt;td>Medium&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Size&lt;/td>
&lt;td>Large 350 hours&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;a href="https://github.com/masc-ucsc/livehd/blob/master/docs/projects_small.md#medium-diagnostics" target="_blank" rel="noopener">Link&lt;/a>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Create a pass that checks that the Lgraph (and/or LNAST) is semantically
correct. The LNAST already has quite a few tests (pass.semantic), but it can be
further expanded. Some checks:&lt;/p>
&lt;ul>
&lt;li>No combinational loops&lt;/li>
&lt;li>No mismatch in bit widths&lt;/li>
&lt;li>No disconnected nodes&lt;/li>
&lt;li>Check for inefficient splits (do not split buses that can be combined)&lt;/li>
&lt;li>Transformations stages should not drop names if same net is preserved&lt;/li>
&lt;li>No writes in LNAST that are never read&lt;/li>
&lt;li>All the edges are possible. E.g: no pin &amp;lsquo;C&amp;rsquo; in Sum_op&lt;/li>
&lt;/ul></description></item><item><title>Open Source Autonomous Vehicle Controller</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/osavc/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/osavc/</guid><description>&lt;p>The OSAVC is a vehicle-agnostic open source hardware and software project. This project is designed to provide a real-time hardware controller adaptable to any vehicle type, suitable for aerial, terrestrial, marine, or extraterrestrial vehicles. It allows control researchers to develop state estimation algorithms, sensor calibration algorithms, and vehicle control models in a modular fashion such that once the hardware set has been developed switching algorithms requires only modifying one C function and recompiling.&lt;/p>
&lt;p>Lead mentor: &lt;a href="mailto:aamuhunt@ucsc.edu">Aaron Hunter&lt;/a>&lt;/p>
&lt;p>Projects for the OSAVC:&lt;/p>
&lt;h3 id="vehiclecraft-sensor-driver-development">Vehicle/Craft sensor driver development&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: Driver code to integrate sensor to a microcontroller&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C, I2C, SPI, UART interfaces&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong> 175 hours&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong> Aaron Hunter&lt;/li>
&lt;/ul>
&lt;p>Help develop a sensor library for use in autonomnous vehicles. Possible sensors include range finders, ping sensors, IMUs, GPS receivers, RC receivers, barometers, air speed sensors, etc. Code will be written in C using state machine methodology and non-blocking algorithms. Test the drivers on a Microchip microncontroller.&lt;/p>
&lt;h3 id="path-finding-algorithm-using-opencv-and-machine-learning">Path finding algorithm using OpenCV and machine learning&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: Computer vision, blob detection&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C/Python, OpenCV&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong> 175 or 350 hours&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong> Aaron Hunter&lt;/li>
&lt;/ul>
&lt;p>Use OpenCV to identify a track for an autonomous vehicle to follow. Build on previous work by developing a new model using EfficientDet and an existing training set of images. Port the model to TFlite and implement on the Coral USB Accelerator. Evaluate its performance against our previous efforts.&lt;/p>
&lt;h3 id="state-estimationsensor-fusion-algorithm-development">State estimation/sensor fusion algorithm development&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: Kalman filtering, Mahoney&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C/Python, Matlab/Simulink, numerical optimization algorithms&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong> 350 hours&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong> Aaron Hunter&lt;/li>
&lt;/ul>
&lt;p>Implement an optimal state estimation algorithm from a model. This model can be derived from a Kalman filter or some other state estimation filter (e.g., Mahoney filter). THe model takes sensor readings as input and provides an estimate of the state of a vehicle. Finally, convert the model to standard C using the Simulink code generation or implement in Python (for use on a single board computer, e.g., Raspberry Pi)&lt;/p></description></item><item><title>Open Source Autonomous Vehicle Controller</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/osavc/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/osavc/</guid><description>&lt;p>The OSAVC is a vehicle-agnostic open source hardware and software project. This project is designed to provide a real-time hardware controller adaptable to any vehicle type, suitable for aerial, terrestrial, marine, or extraterrestrial vehicles. It allows control researchers to develop state estimation algorithms, sensor calibration algorithms, and vehicle control models in a modular fashion such that once the hardware set has been developed switching algorithms requires only modifying one C function and recompiling.&lt;/p>
&lt;p>Lead mentor: &lt;a href="mailto:aamuhunt@ucsc.edu">Aaron Hunter&lt;/a>&lt;/p>
&lt;p>Projects for the OSAVC:&lt;/p>
&lt;h3 id="vehiclecraft-sensor-driver-development">Vehicle/Craft sensor driver development&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: Driver code to integrate sensor to a microcontroller&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C, I2C, SPI, UART interfaces&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong> 175 hours&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong> &lt;a href="mailto:aamuhunt@ucsc.edu">Aaron Hunter&lt;/a>, &lt;a href="mailto:caiespin@ucsc.edu">Carlos Espinosa&lt;/a>, Pavlo Vlastos&lt;/li>
&lt;/ul>
&lt;p>Help develop sensor libraries for use in autonomous vehicles. We are in particular interested in sensors for UAVs: airspeed sensors (pitot tube) or barometers, but also proximity detectors (ultrasonic), and range sensors. Code will be written in C using state machine methodology and non-blocking algorithms. Test the drivers on a Microchip microncontroller.&lt;/p>
&lt;h3 id="technical-documentation">Technical Documentation&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: Documentation&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Technical writing, markdown language, website&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong> 175 hours&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong> Aaron Hunter/Carlos Espinosa/Pavlo Vlastos&lt;/li>
&lt;li>&lt;strong>Contributor(s)&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/aniruddha-thakre/">Aniruddha Thakre&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Technical Documentation:
Write a tutorial to demonstrate how to start with an OSAVC and program it with the robotic equivalent of HelloWorld, moving onto more sophisticated applications. Create a web page interface to the OSAVC repo highlighting this tutorial. In this project you will start from scratch with an OSAVC PCB and bring it to life, while documenting it in a way to help new users.&lt;/p>
&lt;h3 id="rosgazebo-robot-simulation">ROS/Gazebo Robot Simulation&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: Robot simulation with ROS/Gazebo&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong> ROS/Gazebo, Python&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong> 175 or 350 hours&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong> Medium to Hard&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong> &lt;a href="mailto:aamuhunt@ucsc.edu">Aaron Hunter&lt;/a>, &lt;a href="mailto:caiespin@ucsc.edu">Carlos Espinosa&lt;/a>, Pavlo Vlastos&lt;/li>
&lt;li>&lt;strong>Contributor(s)&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/damodar-datta-kancharla/">Damodar Datta Kancharla&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Generate a simulated world and a quadcopter model in ROS/Gazebo. Provide a link from Mavlink to ROS using the mavros package and simulate a real vehicle data stream to command the simulated quadcopter in Gazebo. At the same time return the image stream from Gazebo to allow for offline processing of ML models on the images.&lt;/p></description></item><item><title>OpenRAM</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/openram/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/openram/</guid><description>&lt;p>&lt;a href="https://github.com/VLSIDA/OpenRAM" target="_blank" rel="noopener">OpenRAM&lt;/a> is an award winning open-source Python framework to create the layout, netlists, timing and power models, placement and routing models, and other views necessary to use SRAMs in ASIC design. OpenRAM supports integration in both commercial and open-source flows with both predictive and fabricable technologies. Most recently, it has created memories that are included on all of the &lt;a href="https://efabless.com/open_shuttle_program/" target="_blank" rel="noopener">eFabless/Google/Skywater MPW tape-outs&lt;/a>.&lt;/p>
&lt;h3 id="replace-logging-framework-with-library">Replace logging framework with library&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>User Interfaces&lt;/code>, &lt;code>Python APIs&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Easy&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium (175 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:mrg@ucsc.edu">Matthew Guthaus&lt;/a>,&lt;a href="mailto:jcirimel@ucsc.edu">Jesse Cirimelli-Low&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Replace the custom logging framework in OpenRAM with &lt;a href="https://docs.python.org/3/library/logging.html" target="_blank" rel="noopener">Python logging&lt;/a> module. New logging should allow levels of detail as well as tags to enable/disable logging of particular features to aid debugging.&lt;/p>
&lt;h3 id="rom-generator">ROM generator&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>VLSI Design Basics&lt;/code>, &lt;code>Memories&lt;/code>, &lt;code>Python&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, VLSI&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium/Challenging&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:mrg@ucsc.edu">Matthew Guthaus&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Use the OpenRAM API to generate a Read-Only Memory (ROM) file from an input hex file. Project
will automatically generate a Spice netlist, layout, Verilog model and timing characterization.&lt;/p>
&lt;h3 id="register-file-generator">Register File generator&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>VLSI Design Basics&lt;/code>, &lt;code>Memories&lt;/code>, &lt;code>Python&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, VLSI&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium/Challenging&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:mrg@ucsc.edu">Matthew Guthaus&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Use the OpenRAM API to generate a Register File from standard library cells. Project
will automatically generate a Spice netlist, layout, Verilog model and timing characterization.&lt;/p>
&lt;h3 id="built-in-self-test-and-repair">Built-In Self Test and Repair&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>VLSI Design Basics&lt;/code>, &lt;code>Python&lt;/code>, &lt;code>Verilog&lt;/code>, &lt;code>Testing&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, Verilog&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium/Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium (175 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:mrg@ucsc.edu">Matthew Guthaus&lt;/a>, &lt;a href="mailto:bonal@ucsc.edu">Bugra Onal&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Finish integration of parameterized Verilog modeule to support Built-In-Self-Test and Repair
of OpenRAM memories using spare rows and columns in OpenRAM memories.&lt;/p>
&lt;h3 id="layout-verses-schematic-lvs-visualization">Layout verses Schematic (LVS) visualization&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>VLSI Design Basics&lt;/code>, &lt;code>Python&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, VLSI, JSON&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Easy/Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:mrg@ucsc.edu">Matthew Guthaus&lt;/a>,&lt;a href="mailto:jcirimel@ucsc.edu">Jesse Cirimelli-Low&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Create a visualization interface to debug layout verses schematic mismatches in &lt;a href="https://github.com/RTimothyEdwards/magic" target="_blank" rel="noopener">Magic&lt;/a> layout editor. Results will be parsed from a JSON output of &lt;a href="https://github.com/RTimothyEdwards/netgen" target="_blank" rel="noopener">Netgen&lt;/a>.&lt;/p></description></item><item><title>OpenROAD - A Complete, Autonomous RTL-GDSII Flow for VLSI Designs</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/openroad/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/openroad/</guid><description>&lt;p>&lt;a href="https://theopenroadproject.org" target="_blank" rel="noopener">OpenROAD&lt;/a> is a front-runner in open-source semiconductor design automation tools and know-how. OpenROAD reduces barriers of access and tool costs to democratize system and product innovation in silicon. The OpenROAD tool and flow provide an autonomous, no-human-in-the-loop, 24-hour RTL-GDSII capability to support low-overhead design exploration and implementation through tapeout. We welcome a diverse community of designers, researchers, enthusiasts and entrepreneurs who use and contribute to OpenROAD to make a far-reaching impact.
Our mission is to democratize and advance design automation of semiconductor devices through leadership, innovation, and collaboration.&lt;/p>
&lt;p>OpenROAD is the key enabler of successful Chip initiatives like the Google-sponsored &lt;a href="efabless.com">Efabless&lt;/a> that has made possible more than 150 successful tapeouts by a diverse and global user community. The OpenROAD project repository is &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD" target="_blank" rel="noopener">https://github.com/The-OpenROAD-Project/OpenROAD&lt;/a>.&lt;/p>
&lt;p>Design of static RAMs in VLSI designs for good performance and area is generally time-consuming. Memory compilers significantly reduce design time for complex analog and mixed-signal designs by allowing designers to explore, verify and configure multiple variants and hence select a design that is optimal for area and performance. This project requires the support of memory compilers to &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts" target="_blank" rel="noopener">OpenROAD-flow-scripts&lt;/a> based on popular PDKS such as those provided by &lt;a href="https://github.com/vlsida/openram" target="_blank" rel="noopener">OpenRAM&lt;/a>.&lt;/p>
&lt;h3 id="openlane-memory-design-macro-floorplanning">OpenLane Memory Design Macro Floorplanning&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Memory Compilers&lt;/code>, &lt;code>OpenRAM&lt;/code>, &lt;code>Programmable RAM&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: python, basic knowledge of memory design, VLSI technology, PDK, Verilog&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="mailto:mrg@ucsc.edu">Matthew Guthaus&lt;/a>, &lt;a href="mailto:mehdi@umich.edu">Mehdi Saligane&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Improve and verify &lt;a href="https://github.com/The-OpenROAD-Project/OpenLane" target="_blank" rel="noopener">OpenLane&lt;/a> design planning with OpenRAM memories. Specifically, this project will utilize the macro placer/floorplanner and resolve any issues for memory placement. Issues that will need to be addressed may include power supply connectivity, ability to rotate memory macros, and solving pin-access issues.&lt;/p>
&lt;h3 id="openlane-memory-design-timing-analysis">OpenLane Memory Design Timing Analysis&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Memory Compilers&lt;/code>, &lt;code>OpenRAM&lt;/code>, &lt;code>Programmable RAM&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: python, basic knowledge of memory design, VLSI technology, PDK, Verilog&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="mailto:mrg@ucsc.edu">Matthew Guthaus&lt;/a>, &lt;a href="mailto:mehdi@umich.edu">Mehdi Saligane&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Improve and verify &lt;a href="https://github.com/The-OpenROAD-Project/OpenLane" target="_blank" rel="noopener">OpenLane&lt;/a> Static Timing Analysis using OpenRAM generated library files. Specifically, this will include verifying setup/hold conditions as well as creating additional checks such as minimum period, minimum pulse width, etc. Also, the project will add timing information to Verilog behavioral model.&lt;/p>
&lt;h3 id="openlane-memory-macro-pdk-support">OpenLane Memory Macro PDK Support&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Memory Compilers&lt;/code>, &lt;code>OpenRAM&lt;/code>, &lt;code>Programmable RAM&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: python, basic knowledge of memory design, VLSI technology, PDK, Verilog&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="mailto:mrg@ucsc.edu">Matthew Guthaus&lt;/a>, &lt;a href="mailto:mehdi@umich.edu">Mehdi Saligane&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Integrate and verify FreePDK45 OpenRAM memories with an &lt;a href="https://github.com/The-OpenROAD-Project/OpenLane" target="_blank" rel="noopener">OpenLane&lt;/a> FreePDK45 design flow. OpenLane currently supports only Skywater 130nm PDK, but OpenROAD supports FreePDK45 (which is the same as Nangate45). This project will create a design using OpenRAM memories with the OpenLane flow using FreePDK45.&lt;/p>
&lt;h3 id="vlsi-power-planning-and-analysis">VLSI Power Planning and Analysis&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Power Planning for VLSI&lt;/code>, &lt;code>IR Drop Analysis&lt;/code>, &lt;code>Power grid Creation and Analysis&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++, tcl, VLSI Layout&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: Mehdi Saligane &lt;a href="mailto:mehdi@umich.edu">mailto:mehdi@umich.edu&lt;/a>, Ming-Hung &lt;a href="mailto:minghung@umich.edu">mailto:minghung@umich.edu&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Take the existing power planning (pdngen.tcl) module of openroad and recode the functionality in C++ ensuring that all of the unit tests on the existing code pass correctly. Work with a senior member of the team at ARM. Ensure that designs created are of good quality for power routing and overall power consumption.&lt;/p>
&lt;h3 id="demos-and-tutorials">Demos and Tutorials&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Demo Development&lt;/code>, &lt;code>Documentation&lt;/code>, &lt;code>VLSI design basics&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Knowledge of EDA tools, basics of VLSI design flow, tcl, shell scripts, Documentation, Markdown&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium (175 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/indira-iyer/">Indira Iyer&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/vitor-bandeira/">Vitor Bandeira&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>For &lt;a href="https://github.com/The-OpenROAD-Project/OpenLane" target="_blank" rel="noopener">OpenLane&lt;/a>, develop demos showing:
The OpenLane flow and highight key features
GUI visualizations
Design Explorations and Experiments
Different design styles and particular challenges&lt;/p>
&lt;h3 id="comprehensive-flow-testing">Comprehensive Flow Testing&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Testing&lt;/code>, &lt;code>Documentation&lt;/code>, &lt;code>VLSI design basics&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Knowledge of EDA tools, basics of VLSI design, tcl, shell scripts, Verilog, Layout&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium (175 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/indira-iyer/">Indira Iyer&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Develop detailed test plans to test the OpenLane flow to expand coverage and advanced features. Add open source designs to the regression test suite to improve tool quality and robustness. This includes design specification, configuration and creation of all necessary files for regression testing. Suggested sources : ICCAS benchmarks, opencores, LSOracle for synthesis flow option.&lt;/p>
&lt;h3 id="enhance-gui-features">Enhance GUI features&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>GUI&lt;/code>, &lt;code>Visualization&lt;/code>, &lt;code>User Interfaces&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++, Qt&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/matt-liberty/">Matt Liberty&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/vitor-bandeira/">Vitor Bandeira&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>For &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD" target="_blank" rel="noopener">OpenROAD&lt;/a>, develop and enhance visualizations for EDA data and algorithms in the OpenROAD GUI. Allow deeper understanding of the tool results for users and tool internals for developers.&lt;/p>
&lt;h3 id="automate-opendb-code-generation">Automate OpenDB code Generation&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Database&lt;/code>, &lt;code>EDA&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++, Python, JSON, Jinja templating&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/matt-liberty/">Matt Liberty&lt;/a>, &lt;a href="mailto:aspyrou@eng.ucsd.edu">Tom Spyrou&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>For &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD" target="_blank" rel="noopener">OpenROAD&lt;/a>- Automatic code generation for the OpenDB database which allows improvements to the data model with much less hand coding. Allow the generation of storage, serialization, and callback code from a custom schema description format.
r&lt;/p>
&lt;h3 id="implement-an-nlp-based-ai-bot-aimed-at-increasing-users-enhancing-usability-and-building-a-knowledge-base">Implement an NLP based AI bot aimed at increasing users, enhancing usability and building a knowledge base&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>AI&lt;/code>, &lt;code>ML&lt;/code>, &lt;code>Analytics&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Python. ML libraries (e.g., Tensorflow, PyTorch)&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/vitor-bandeira/">Vitor Bandeira&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/indira-iyer/">Indira Iyer&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD" target="_blank" rel="noopener">OpenROAD&lt;/a> project contains a storehouse of knowledge in it&amp;rsquo;s Github repositories within Issues and Pull requests. Additionally, project related slack channels also hold useful information in the form of questions and answers, problems and solutions in conversation threads. Implement an AI analytics bot that filters, selects relevant discussions and classifies/records them into useful documentation and actionable issues. This should also directly track, increase project usage and report outcome metrics.&lt;/p></description></item><item><title>Package Management &amp; Reproducibility</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/packaging/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/packaging/</guid><description>&lt;p>Project ideas related to reproducibility and package management, especially as it relates to &lt;em>store type package managers&lt;/em> (&lt;a href="http://nixos.org/" target="_blank" rel="noopener">NixOS&lt;/a>, &lt;a href="https://guix.gnu.org/" target="_blank" rel="noopener">Guix&lt;/a> or &lt;a href="https://spack.io/" target="_blank" rel="noopener">Spack&lt;/a>).&lt;/p>
&lt;p>Lead Mentor: &lt;a href="https://users.soe.ucsc.edu/~fmzakari" target="_blank" rel="noopener">Farid Zakaria&lt;/a> &lt;a href="mailto:fmzakari@ucsc.edu">mailto:fmzakari@ucsc.edu&lt;/a>&lt;/p>
&lt;h3 id="investigate-the-dynamic-linking-landscape">Investigate the dynamic linking landscape&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Operating Systems&lt;/code> &lt;code>Compilers&lt;/code> &lt;code>Linux&lt;/code> &lt;code>Package Management&lt;/code> &lt;code>NixOS&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Experience with systems programming and Linux familiarity&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate to Challenging&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:fmzakari@ucsc.edu">Farid Zakaria&lt;/a> &amp;amp; &lt;a href="https://people.llnl.gov/scogland1" target="_blank" rel="noopener">Tom Scogland&lt;/a> &lt;a href="mailto:scogland1@llnl.gov">mailto:scogland1@llnl.gov&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Dynamic linking as specified in the ELF file format has gone unchallenged since it&amp;rsquo;s invention. With many new package management models that eschew the filesystem hierarchy standard (i.e. Nix, Guix and Spack), many of the idiosyncrasies that define the way in which libraries are discovered are no longer useful and potentially harmful.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Continue development on &lt;a href="https://github.com/fzakaria/shrinkwrap" target="_blank" rel="noopener">Shrinkwrap&lt;/a> a tool to make dynamic library loading simpler and more robust.&lt;/li>
&lt;li>Evaluate it&amp;rsquo;s effectiveness across a wide range of binaries.&lt;/li>
&lt;li>Upstream contributions to &lt;a href="http://nixos.org/" target="_blank" rel="noopener">NixOS&lt;/a> or &lt;a href="https://guix.gnu.org/" target="_blank" rel="noopener">Guix&lt;/a> to leverage the improvement when suitable.&lt;/li>
&lt;li>Investigate alternative improvements to dynamic linking by writing a dynamic linker &amp;ldquo;loadder wrapper&amp;rdquo; to explore new ideas.&lt;/li>
&lt;/ul></description></item><item><title>Polyphorm / PolyPhy</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/polyphorm/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/polyphorm/</guid><description>&lt;p>&lt;a href="https://github.com/CreativeCodingLab/Polyphorm" target="_blank" rel="noopener">Polyphorm&lt;/a> is an agent-based system for reconstructing and visualizing &lt;em>optimal transport networks&lt;/em> defined over sparse data. Rooted in astronomy and inspired by nature, we have used Polyphorm to reconstruct the &lt;a href="https://youtu.be/5ILwq5OFuwY" target="_blank" rel="noopener">Cosmic web&lt;/a> structure, but also to discover network-like patterns in natural language data. You can find more details about our research &lt;a href="https://elek.pub/projects/Rhizome-Cosmology" target="_blank" rel="noopener">here&lt;/a>. Under the hood, Polyphorm uses a richer 3D scalar field representation of the reconstructed network, instead of a discrete representation like a graph or a mesh.&lt;/p>
&lt;p>&lt;strong>PolyPhy&lt;/strong> will be a Python-based redesigned version of Polyphorm, currently in the beginning of its development cycle. PolyPhy will be a multi-platform toolkit meant for a wide audience across different disciplines: astronomers, neuroscientists, data scientists and even artists and designers. All of the offered projects focus on PolyPhy, with a variety of topics including design, coding, and even research. Ultimately, PolyPhy will become a tool for discovering connections between different disciplines by creating quantitatively comparable structural analytics.&lt;/p>
&lt;h3 id="develop-website-for-polyphy">Develop website for PolyPhy&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Web Development&lt;/code> &lt;code>Dynamic Updates&lt;/code> &lt;code>UX&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> web development experience, good communicator, (HTML/CSS), (Javascript)&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:oelek@ucsc.edu">Oskar Elek&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Develop a clean and welcoming website for the project. The organization needs to reflect the needs of PolyPhy users, but also provide a convenient entry point for interested project contributors. No excessive pop-ups or webjunk.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Work with mentors on understanding the context of the project.&lt;/li>
&lt;li>Port the contents of the &lt;a href="https://github.com/CreativeCodingLab/Polyphorm" target="_blank" rel="noopener">repository page&lt;/a> to a dedicated website.&lt;/li>
&lt;li>Design the structure of the website according to best OS practices.&lt;/li>
&lt;li>Work with the visual designer (see below) in creating a coherent and organic presentation.&lt;/li>
&lt;li>Interactively link important metrics from the project dev environment as well as documentation.&lt;/li>
&lt;/ul>
&lt;h3 id="design-visual-experience-for-polyphys-website-and-presentations">Design visual experience for PolyPhy&amp;rsquo;s website and presentations&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Design&lt;/code> &lt;code>Art&lt;/code> &lt;code>UX&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> vector and bitmap drawing, sense for spatial symmetry and framing, (interactive content creation), (animation)&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium (175 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:oelek@ucsc.edu">Oskar Elek&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Develop visual content for the project using its main themes: nature-inspired computation, biomimetics, interconnected structures. Aid in designing visual structure of the website as well as other public-facing artifacts.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Work with mentors on understanding the context of the project.&lt;/li>
&lt;li>Design imagery and other graphical elements to visually (re-)present PolyPhy.&lt;/li>
&lt;li>Work with the technical writer (see below) in designing a coherent story.&lt;/li>
&lt;li>Work with the web developer (see above) in creating a coherent and organic presentation.&lt;/li>
&lt;/ul>
&lt;h3 id="write-polyphys-technical-story-and-content">Write PolyPhy&amp;rsquo;s technical story and content&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Writing&lt;/code> &lt;code>Documentation&lt;/code> &lt;code>Storytelling&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> experienced writing structured text over 10 pages, well read, (technical or scientific education)&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:oelek@ucsc.edu">Oskar Elek&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Integral to PolyPhy&amp;rsquo;s presentation is a story that the users and the project contributors can relate to. The objective is to develop the verbal part of that story, as well as major portions of technical documentation that matches it. The difficulty of the project is scalable.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Work with mentors on understanding the context of the project.&lt;/li>
&lt;li>Write different pages of the project website.&lt;/li>
&lt;li>Work with mentors to improve project&amp;rsquo;s written community practices (diversity, communication).&lt;/li>
&lt;li>Write and edit narrative and explanatory parts of PolyPhy&amp;rsquo;s documentation.&lt;/li>
&lt;li>Work with the visual designer (see above) in designing a coherent story.&lt;/li>
&lt;/ul>
&lt;h3 id="video-tutorials-and-presentation-for-polyphy">Video tutorials and presentation for PolyPhy&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Video Presentation&lt;/code> &lt;code>Tutorials&lt;/code> &lt;code>Didactics&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> video editing, creating educational content, communication, (native or fluent in another language)&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Easy-Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:oelek@ucsc.edu">Oskar Elek&lt;/a>, &lt;a href="mailto:deehrlic@ucsc.edu">Drew Ehrlich&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Create a public face for PolyPhy that reflects its history, context, and teaches its functionality to users in different degrees of familiarity.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Work with mentors on understanding the context and history of the project.&lt;/li>
&lt;li>Interview diverse project contributors.&lt;/li>
&lt;li>Create a video documenting PolyPhy&amp;rsquo;s history, with roots in astronomy, complex systems, fractals.&lt;/li>
&lt;li>Create a set of tutorial videos for starting and intermediate PolyPhy users.&lt;/li>
&lt;li>Create an accessible template for future tutorials.&lt;/li>
&lt;/ul>
&lt;h3 id="implement-heterogeneous-data-io-ops">Implement heterogeneous data I/O ops&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>I/O Operations&lt;/code> &lt;code>File Conversion&lt;/code> &lt;code>Numerics&lt;/code> &lt;code>Testing&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, experience working with scientific or statistical data, good debugging skills&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate-Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:oelek@ucsc.edu">Oskar Elek&lt;/a>, &lt;a href="mailto:anishagoel14@gmail.com">Anisha Goel&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>By default, PolyPhy operates with an unordered set of points as an input and scalar fields (float ndarrays) as an output, but others are applicable as well. Design and implement interfaces to load and export different data formats (CSV, OBJ, HDF5, FITS&amp;hellip;) and modalities (points, meshes, density fields). The difficulty of the project can be scaled based on contributor&amp;rsquo;s interest.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Research which modalities are used by members of the target communities.&lt;/li>
&lt;li>Implement modular loaders for the inputs and an interface to PolyPhy core.&lt;/li>
&lt;li>Implement exporters for simulation datasets and visualization captures.&lt;/li>
&lt;li>Write testing code for the above.&lt;/li>
&lt;li>Integrate external packages as necessary.&lt;/li>
&lt;/ul>
&lt;h3 id="setup-cicd-for-polyphy">Setup CI/CD for PolyPhy&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Continuous Integration&lt;/code> &lt;code>Continuous Deployment&lt;/code> &lt;code>DevOps&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> experience with CI/CD, GitHub, Python package deployment&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:oelek@ucsc.edu">Oskar Elek&lt;/a>, &lt;a href="mailto:anishagoel14@gmail.com">Anisha Goel&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The objective is to setup a CI/CD pipeline that automates the build testing and deployment of the software. The resulting process needs to be robust to contributor errors and work in the distributed conditions of a diverse contributor base.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Automate continuous building, testing, merging and deployment for PolyPhy in GitHub.&lt;/li>
&lt;li>Publish the CI/CD metrics and build assets to the project webpage.&lt;/li>
&lt;li>Work with other contributors in educating them about the best practices of using the developed CI/CD pipeline.&lt;/li>
&lt;li>Add support for automated packaging using common management systems (pip, Anaconda).&lt;/li>
&lt;/ul>
&lt;h3 id="refine-polyphys-ui-and-develop-new-functional-elements">Refine PolyPhy&amp;rsquo;s UI and develop new functional elements&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>UI/UX&lt;/code> &lt;code>Visual Experience&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python programming, UI/UX development experience, (knowledge of graphics)&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:oelek@ucsc.edu">Oskar Elek&lt;/a>, &lt;a href="mailto:dabramov@ucsc.edu">David Abramov&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The key feature of PolyPhy is its interactivity. By interacting with the underlying simulation model, the user can adjust its parameters in real time and respond to its behavior. For instance, an astrophysics expert can load a dataset of 100k galaxies and reconstruct the large-scale structure of the intergalactic medium. A responsive UI combined with real-time visualization allows them to judge the fidelity of the reconstruction and make necessary changes.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Implement a platform-agnostic UI to house PolyPhy&amp;rsquo;s main rendering context as well as secondary analytics.&lt;/li>
&lt;li>Work with the visualization developer (see below) to integrate the rendering functionality.&lt;/li>
&lt;li>Optimize to UI&amp;rsquo;s performance.&lt;/li>
&lt;li>Test the implementation on different OS platforms.&lt;/li>
&lt;/ul>
&lt;h3 id="create-new-data-visualization-regimes">Create new data visualization regimes&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Interactive Visualization&lt;/code> &lt;code>Data Analytics&lt;/code> &lt;code>3D Rendering&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> basic graphics theory and math, Python, GPU programming, (previous experience visualizing novel datasets)&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:oelek@ucsc.edu">Oskar Elek&lt;/a>, &lt;a href="mailto:dabramov@ucsc.edu">David Abramov&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Data visualization is one of the core components of PolyPhy, as it provides a real-time overview of the underlying MCPM simulation. Through the feedback provided by the visualization, PolyPhy users can adjust the simulation model and make new findings about the dataset. Various operations over the reconstructed data (e.g. spatial searching) as well as important statistical summaries also benefit from clear visual presentation.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Develop novel ways of visualizing scientific data in PolyPhy.&lt;/li>
&lt;li>Work with diverse data modalities - point clouds, graphs, scalar and vector fields.&lt;/li>
&lt;li>Add support for visualizing metadata, such as annotations and labels.&lt;/li>
&lt;li>Create UI elements for plotting statistical summaries computed in real-time.&lt;/li>
&lt;/ul>
&lt;h3 id="discrete-graph-extraction-from-simulated-scalar-fields">Discrete graph extraction from simulated scalar fields&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Graph Theory&lt;/code> &lt;code>Data Science&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> good understanding of discrete math and graph theory, Python, (GPU programming)&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:oelek@ucsc.edu">Oskar Elek&lt;/a>, &lt;a href="mailto:farhasan@nmsu.edu">Farhanul Hasan&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Develop a custom method for graph extraction from scalar field data produced by PolyPhy. Because PolyPhy typically produces network-like structures, representing these structures as weighted discrete graphs is very useful for efficiently navigating the data. The most important property of this abstracted representation is that it preserves the topology of the base scalar field by navigating the 1D ridges of the scalar field.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Become familiar with different algorithms for graph growing and skeleton extraction.&lt;/li>
&lt;li>Implement the most suitable method in PolyPhy, interpreting the source scalar field as a throughput (transport) network. The weights of the resulting graph need to reflect the source throughputs between the respective node locations.&lt;/li>
&lt;li>Implement common graph operations, e.g. hierarchical clustering and reduction, shortest path between two nodes, range queries.&lt;/li>
&lt;li>Optimize the runtime of the implemented methods.&lt;/li>
&lt;li>Work with the visualization developer (see above) to visualize the resulting graphs.&lt;/li>
&lt;/ul></description></item><item><title>Proactive Data Containers (PDC)</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/lbl/pdc/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/lbl/pdc/</guid><description>&lt;p>&lt;a href="https://sdm.lbl.gov/pdc/about.html" target="_blank" rel="noopener">Proactive Data Containers&lt;/a> (PDC) are containers within a locus of storage (memory, NVRAM, disk, etc.) that store science data in an object-oriented manner. Managing data as objects enables powerful optimization opportunities for data movement and
transformations, and storage mechanisms that take advantage of the deep storage hierarchy and enable automated performance tuning&lt;/p>
&lt;h3 id="python-interface-to-an-object-centric-data-management-system">Python interface to an object-centric data management system&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Python&lt;/code>, &lt;code>object-centric data management&lt;/code>, &lt;code>PDC&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Python, C, PDC&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="mailto:sbyna@lbl.gov">Suren Byna&lt;/a>, &lt;a href="mailto:htang4@lbl.gov">Houjun Tang&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;a href="https://sdm.lbl.gov/pdc/about.html" target="_blank" rel="noopener">Proactive Data Containers (PDC)&lt;/a> is an object-centric data management system for scientific data on high performance computing systems. It manages objects and their associated metadata within a locus of storage (memory, NVRAM, disk, etc.). Managing data as objects enables powerful optimization opportunities for data movement and transformations, and storage mechanisms that take advantage of the deep storage hierarchy and enable automated performance tuning. Currently PDC has a C interface. Providing a python interface would make it easier for more Python applications to utilize it.&lt;/p></description></item><item><title>Skyhook Data Management</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/skyhookdm/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/skyhookdm/</guid><description>&lt;p>&lt;a href="https://iris-hep.org/projects/skyhookdm.html" target="_blank" rel="noopener">SkyhookDM&lt;/a>&lt;/p>
&lt;p>The Skyhook Data Management project extends object storage with data
management functionality for tabular data. SkyhookDM enables storing and query
tabular data in the &lt;a href="https://ceph.io" target="_blank" rel="noopener">Ceph&lt;/a> distributed object storage system. It thereby
turns Ceph into an &lt;a href="https://arrow.apache.org" target="_blank" rel="noopener">Apache Arrow&lt;/a>-native
storage system, utilizing the Arrow Dataset API to store and query data with server-side data processing, including selection and projection that can significantly reduce the data returned to the client.&lt;/p>
&lt;p>SkyhookDM is now part of Apache Arrow (see &lt;a href="https://arrow.apache.org/blog/2022/01/31/skyhook-bringing-computation-to-storage-with-apache-arrow/" target="_blank" rel="noopener">blog post&lt;/a>).&lt;/p>
&lt;hr>
&lt;h3 id="support-reading-from-skyhook-in-daskray-using-the-arrow-dataset-api">Support reading from Skyhook in Dask/Ray using the Arrow Dataset API&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Arrow&lt;/code>, &lt;code>Dask/Ray&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: 175 hours&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;/ul>
&lt;ul>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="mailto:jayjeetc@ucsc.edu">Jayjeet Chakraboorty&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Problem:&lt;/strong> Dask and Ray are parallel-computing frameworks similar to Apache Spark but in a Python ecosystem. Each of these frameworks support reading tabular data from different data sources such as a local filesystem, cloud object stores, etc. These systems have recently added support for the Arrow Dataset API to read data from different sources. Since, the Arrow dataset API supports Skyhook, we can leverage this capability to offload compute-heavy Parquet file decoding and decompression into the Ceph storage layer. This can help us speed up the queries significantly as CPU will get freed up in the Dask/Ray workers for other processing tasks.&lt;/p>
&lt;h3 id="implement-gandiva-based-query-executor-in-skyhookdm">Implement Gandiva based query executor in SkyhookDM&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Arrow&lt;/code>, &lt;code>Gandiva&lt;/code>, &lt;code>SIMD&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: 350 hours&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Hard&lt;/li>
&lt;/ul>
&lt;ul>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="mailto:jayjeetc@ucsc.edu">Jayjeet Chakraboorty&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Problem:&lt;/strong> &lt;a href="https://arrow.apache.org/blog/2018/12/05/gandiva-donation/" target="_blank" rel="noopener">Gandiva&lt;/a> allows efficient evaluation of query expressions using runtime code generation using LLVM. The generated code leverages SIMD instructions and is highly optimized for parallel processing in modern CPUs. It is natively supported by Arrow for compiling and executing expressions. SkyhookDM currently uses the Arrow Dataset API (which internally uses Arrow Compute APIs) to execute query expressions inside the Ceph OSDs. Since, the Arrow Dataset API particularly does not support Gandiva currently, the goal of this project is to add support for Gandiva in the Arrow Dataset API in order to accelerate query processing when offloaded to the storage layer. This will help Skyhook combat some of the peformance issues due to the inefficient serialization interface of Arrow.&lt;/p>
&lt;p>&lt;strong>References:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://arrow.apache.org/blog/2018/12/05/gandiva-donation/" target="_blank" rel="noopener">https://arrow.apache.org/blog/2018/12/05/gandiva-donation/&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.dremio.com/subsurface/increasing-performance-with-arrow-and-gandiva/" target="_blank" rel="noopener">https://www.dremio.com/subsurface/increasing-performance-with-arrow-and-gandiva/&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/apache/arrow/tree/master/cpp/src/gandiva" target="_blank" rel="noopener">https://github.com/apache/arrow/tree/master/cpp/src/gandiva&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="add-ability-to-create-and-save-views-from-datasets">Add Ability to create and save views from Datasets&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Arrow&lt;/code>, &lt;code>Database views&lt;/code>, &lt;code>virtual datasets&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: 175 hours&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;/ul>
&lt;ul>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="mailto:jayjeetc@ucsc.edu">Jayjeet Chakraboorty&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Problem - Workloads may repeat the same or similar queries over time. This causes repetition of IO and compute operations, wasting resources.
Saving previous computation in the form of materialized views can provide benefit for future
workload processing.
Solution - Add a method to the Dataset API to create views from queries and save the view as an object in a separate pool with some object key that can be generated from the query that created it.&lt;/p>
&lt;p>Reference:
&lt;a href="https://docs.dremio.com/working-with-datasets/virtual-datasets.html" target="_blank" rel="noopener">https://docs.dremio.com/working-with-datasets/virtual-datasets.html&lt;/a>&lt;/p>
&lt;hr>
&lt;h3 id="integrating-delta-lake-on-top-of-skyhookdm">Integrating Delta Lake on top of SkyhookDM&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>data lakes&lt;/code>, &lt;code>lake house&lt;/code>, &lt;code>distributed query processing&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++&lt;/li>
&lt;li>&lt;strong>NSize&lt;/strong>: 175 or 350 hours&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;/ul>
&lt;ul>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="mailto:jayjeetc@ucsc.edu">Jayjeet Chakraboorty&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;a href="https://delta.io/" target="_blank" rel="noopener">Delta Lake&lt;/a> is a new architecture for querying big data lakes through Spark, providing transactions.
An important benefit of this integration will be to provide an SQL interface for SkyhookDM functionality, through Spark SQL.
This project will further build upon our current work connecting Spark to SkyhookDM through the Arrow Dataset API.
This would allow us to run some of the TPC-DS queries (popular set of SQL queries for benchmarking databases) on SkyhookDM easily.&lt;/p>
&lt;p>Reference: [Delta Lake paper] (&lt;a href="https://databricks.com/jp/wp-content/uploads/2020/08/p975-armbrust.pdf" target="_blank" rel="noopener">https://databricks.com/jp/wp-content/uploads/2020/08/p975-armbrust.pdf&lt;/a>)&lt;/p></description></item><item><title>Efficient Communication with Key/Value Storage Devices</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/kvstore/</link><pubDate>Sun, 27 Feb 2022 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/kvstore/</guid><description>&lt;p>Network key value stores are used throughout the cloud as a storage backends (eg AWS ShardStore) and are showing up in devices (eg NVMe KV SSD). The KV clients use traditional network sockets and POSIX APIs to communicate with the KV store. An advancement that has occurred in the last 2 years is a new kernel interface that can be used in lieu of the POSIX API, namely &lt;code>io_uring&lt;/code>. This new interface uses a set of shared memory queues to provide for kernel-to-user communication and permits zero copy transfer of data. This scheme avoids the overhead of system calls and can improve performance.&lt;/p>
&lt;h3 id="implement-io_uring-communication-backend">Implement &lt;code>io_uring&lt;/code> communication backend&lt;/h3>
&lt;p>&lt;strong>Topics:&lt;/strong> performance, I/O, network, key-value, storage&lt;br>
&lt;strong>Difficulty:&lt;/strong> Medium&lt;br>
&lt;strong>Size:&lt;/strong> Medium or large (120 or 150 hours)&lt;br>
&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:philip.kufeldt@seagate.com">Philip Kufeldt (Seagate)&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/aldrin-montana/">Aldrin Montana&lt;/a> (UC Santa Cruz)
&lt;strong>Contributor(s):&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/manank-patel/">Manank Patel&lt;/a>&lt;/p>
&lt;p>Seagate has been using a network-based KV HDD as a research vehicle for computational storage. This research vehicle uses open-source user library that implements a KV API by sending network protobuf-based RPCs to a network KV store. Currently it is implemented with the standard socket and POSIX APIs to communicate with the KV backend. This project would implement an &lt;code>io_uring&lt;/code> communication backend and compare the results of both implementations.&lt;/p></description></item><item><title>DirtViz 2.0 (2023)</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/dirtviz/</link><pubDate>Mon, 07 Feb 2022 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/dirtviz/</guid><description>&lt;p>DirtViz is a project to visualize data collected from sensors deployed in sensor networks. We have deployed a number of sensors measuring qualities like soil moisture, temperature, current and voltage in outdoor settings. This project involves extending our existing visualization stack, DirtViz 1.0 (see github), and expanding it to version 2.0. The project goal is to create a fully-fledged dataviz tool tailored to the types of data collected from embedded systems sensor networks.&lt;/p>
&lt;h3 id="visualize-sensor-data">Visualize Sensor Data&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> Data Visualization, Analytics&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> javascript, python, bash, webservers, git, embedded systems&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Easy/Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large, 350 hours&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/colleen-josephson/">Colleen Josephson&lt;/a>, &lt;a href="mailto:sonaderi@ucsc.edu">Sonia Naderi&lt;/a>, &lt;a href="mailto:sgtaylor@ucsc.edu">Stephen Taylor&lt;/a>, &lt;a href="mailto:jtmadden@ucsc.edu">John Madden&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Refine our web-based visualization tools to easily allow users to zoom in on date ranges, change axes, etc.&lt;/li>
&lt;li>Create a system for remote collaborators/citizen scientists to upload their own data in a secure manner&lt;/li>
&lt;li>Craft an intuitive navigation system so that data from deployment sites around the world can be easily viewed&lt;/li>
&lt;li>Document the tool thoroughly for future maintenance&lt;/li>
&lt;li>If interested, we are also open to you investigating correlations between different data streams and doing self-directed data analysis&lt;/li>
&lt;/ul></description></item></channel></rss>