uc | UCSC OSPO

NETAI: AI-Powered Network Anomaly Detection and Diagnostics Platform

Thu, 05 Feb 2026 00:00:00 +0000

NETAI (Network AI) is an AI-powered network anomaly detection and diagnostics platform for the National Research Platform (NRP). This project combines Kubernetes-native LLM integration, network performance monitoring, and predictive analytics to create an intelligent assistant for network operators. Students will work with cutting-edge technologies including Large Language Models (LLMs), Kubernetes, perfSONAR network measurements, time-series analysis, and containerized AI/ML workloads, while contributing to real-world applications in network operations and diagnostics.

The project involves developing a Kubernetes chatbot that leverages NRP’s managed LLM service (providing access to models like Qwen3-VL, GLM-4.7, and GPT-OSS) to help network operators understand complex network behaviors, diagnose anomalies, and receive natural language explanations of network issues. Students will integrate perfSONAR measurement data with traceroute path analysis to create an interactive network topology visualization, and develop AI/ML models for predictive network performance analysis using NRP’s GPU resources.

In addition, students will gain hands-on experience with fine-tuning LLMs on historical network diagnostics data, developing time-series forecasting models for network metrics, and implementing anomaly detection using deep learning techniques. The entire AI/ML pipeline will be containerized and deployed as Kubernetes workloads, utilizing GPU-enabled pods for model training and inference, ensuring scalability and seamless integration with existing NRP infrastructure.

The platform builds upon existing network diagnostics capabilities, combining end-to-end throughput measurements with detailed traceroute data to enable operators to visualize network paths, identify performance bottlenecks, and understand relationships between metrics and underlying infrastructure. The AI enhancement will provide predictive capabilities, automated incident reporting, and intelligent recommendations for network remediation strategies.

NETAI / LLM Integration & Kubernetes Chatbot

The proposed work includes developing a Kubernetes-native chatbot that integrates with NRP’s managed LLM service to provide intelligent network diagnostics assistance. Students will create a conversational interface that can answer questions about network performance, explain anomalies in natural language, and suggest remediation strategies. They will fine-tune LLMs on historical network diagnostics data, test results, and traceroute information to create domain-specific assistants. Students will implement RESTful APIs for chatbot interactions, develop prompt engineering strategies for network diagnostics, and create context-aware responses that incorporate real-time network telemetry. The chatbot will be deployed as Kubernetes services, utilizing GPU pods for inference and integrating with the existing diagnostics platform.

Topics: Large Language Models, Kubernetes, Chatbots, Natural Language Processing, Network Diagnostics, API Development
Skills: Python, Kubernetes, LLM APIs (Qwen3-VL, GLM-4.7, GPT-OSS), Prompt Engineering, REST APIs, Docker, GPU Computing
Difficulty: Hard
Size: Large (350 hours)
Mentors: Dmitry Mishin, Derek Weitzel

NETAI / Network Anomaly Detection Models

The proposed work includes developing deep learning models for network anomaly detection using historical perfSONAR and traceroute data. Students will create models that can identify slow links, high packet loss, excessive retransmits, and failed network tests automatically. They will implement anomaly detection algorithms using techniques such as autoencoders, LSTM networks, and transformer architectures. Students will train models on NRP’s GPU clusters using historical network telemetry stored in SQLite databases, develop feature engineering pipelines for network metrics, and create real-time inference services deployed as Kubernetes workloads. The models will be integrated into the diagnostics platform to provide automated anomaly detection alongside the interactive visualization.

Topics: Deep Learning, Anomaly Detection, Time-Series Analysis, Network Monitoring, Model Training, GPU Computing
Skills: Python, PyTorch/TensorFlow, scikit-learn, Pandas, NumPy, SQLite, Kubernetes, GPU Pods, MLOps
Difficulty: Hard
Size: Large (350 hours)
Mentors: Dmitry Mishin, Derek Weitzel

NETAI / Predictive Analytics & Forecasting

The proposed work includes developing predictive models that can forecast network performance degradation and identify patterns in network anomalies before they impact users. Students will create time-series forecasting models for network metrics such as throughput, latency, and packet loss, using techniques like ARIMA, Prophet, and deep learning-based forecasting. They will implement few-shot learning approaches to adapt models to new network topologies and measurement patterns, develop early warning systems for potential network issues, and create automated incident report generation using LLMs. Students will leverage NRP’s GPU resources for training forecasting models and deploy them as Kubernetes services for real-time predictions integrated with the diagnostics dashboard.

Topics: Time-Series Forecasting, Predictive Analytics, Machine Learning, Network Performance, Early Warning Systems, LLM Integration
Skills: Python, PyTorch/TensorFlow, Prophet, ARIMA, Pandas, NumPy, Time-Series Analysis, Kubernetes, GPU Computing
Difficulty: Hard
Size: Large (350 hours)
Mentors: Dmitry Mishin, Derek Weitzel

NETAI / Kubernetes Deployment & Infrastructure

The proposed work includes setting up Kubernetes-based infrastructure for deploying the entire NETAI platform, including LLM services, ML models, and the diagnostics dashboard. Students will create Helm charts for deploying containerized AI/ML workloads, configure GPU-enabled pods for model training and inference, and implement persistent storage solutions for maintaining historical network telemetry. They will develop GitLab CI/CD pipelines for automated testing and deployment, set up monitoring and observability using Prometheus and Grafana for tracking model performance and resource usage, and create scalable deployment strategies that leverage NRP’s distributed computing resources. Students will also integrate the platform with existing perfSONAR infrastructure and ensure seamless operation within the NRP cluster.

Topics: Kubernetes, DevOps, CI/CD, GPU Computing, Container Orchestration, Infrastructure as Code, Monitoring
Skills: Kubernetes, Helm, GitLab CI/CD, Prometheus, Grafana, Docker, GPU Pods, Persistent Storage, Infrastructure Automation
Difficulty: Medium to Hard
Size: Large (350 hours)
Mentors: Dmitry Mishin, Derek Weitzel

Project Resources

National Research Platform: https://nrp.ai/
NRP LLM Service: https://nrp.ai/documentation/userdocs/ai/llm-managed/
perfSONAR: https://www.perfsonar.net/
MaDDash: https://github.com/esnet/maddash
Network Monitoring Documentation: https://nrp.ai/documentation/

Background

This project addresses critical gaps in network performance monitoring for the National Research Platform by integrating AI/ML capabilities with existing perfSONAR-based diagnostics. The platform combines end-to-end network measurements with detailed path-level analysis, enhanced by intelligent AI assistants that can help operators understand complex network behaviors and predict potential issues. By leveraging NRP’s managed LLM service and GPU resources, students will create a Kubernetes-native system that scales across the distributed research network infrastructure, providing both real-time diagnostics and predictive analytics to improve network reliability and performance for researchers nationwide.

VINE: Precision Agriculture Data Platform & Digital Twin

Thu, 05 Feb 2026 00:00:00 +0000

VINE (Vineyard Intelligence Network & Environment) is an AI/ML research project focused on precision agriculture using the National Research Platform (NRP). This project leverages the innovative demonstration at Iron Horse Vineyards to study how AI and machine learning can optimize agricultural practices through data-driven insights. Students will work with cutting-edge AI/ML technologies, distributed computing on NRP, and large-scale data analysis, while contributing to real-world applications in sustainable agriculture and climate adaptation.

The project involves AI/ML research using agricultural data from Iron Horse Vineyards, leveraging the computational resources of the National Research Platform for training and deploying machine learning models. Students will work with agricultural datasets including sensor data, multi-spectral drone imagery, and historical records, developing models for predictive analytics, computer vision, and time-series forecasting. The integration of NRP’s distributed infrastructure enables scalable AI research that can process large volumes of sensor data, multi-spectral imagery, and historical agricultural records.

Students will gain hands-on experience with AI/ML model development for agricultural applications, learning how to analyze multi-spectral drone imagery, process time-series sensor data, and build predictive models for irrigation scheduling, pest detection, and harvest timing. They will deploy and train models on NRP’s Kubernetes clusters, utilize GPU resources for deep learning workloads, and work with agricultural datasets for comprehensive research. The project emphasizes using distributed computing on NRP to scale AI/ML experiments and create open, shareable datasets for collaborative research.

The platform builds upon the success demonstrated at Iron Horse Vineyards, where AI-driven analytics have shown potential for 10% water use reduction and improved yield optimization. This project aims to advance AI/ML research in precision agriculture by utilizing NRP’s computational capabilities, creating reproducible research that can benefit the broader agricultural and research communities.

VINE / Data Pipeline & Integration

The proposed work includes building data pipelines to ingest, process, and prepare agricultural data from Iron Horse Vineyards and other sources for AI/ML research. Students will develop pipelines to collect sensor data (soil moisture, temperature, CO2, weather), multi-spectral drone imagery, and historical agricultural records. They will create data validation and quality assurance processes, implement data preprocessing for ML model training, and develop data integration workflows that connect agricultural datasets with NRP computational resources. Students will also work on data sharing mechanisms to make processed datasets available for the research community.

Topics: Data Engineering, Time-Series Data, Data Preprocessing, Data Sharing, ML Data Pipelines
Skills: Python, Pandas, NumPy, Data Validation, REST APIs, Docker, Kubernetes, Data Processing
Difficulty: Medium to Hard
Size: Large (350 hours)
Mentors: Mohammad Firas Sada

VINE / AI/ML Models for Agricultural Analytics on NRP

The proposed work includes developing and training machine learning models for agricultural applications using the National Research Platform (NRP). Students will create models for predictive irrigation scheduling based on soil moisture, weather forecasts, and historical data. They will develop computer vision models for analyzing multi-spectral drone imagery to detect plant health, identify pests, and estimate yield. Students will also work on time-series forecasting models for predicting harvest timing and optimizing resource allocation. The project will involve training models on NRP’s GPU clusters, utilizing distributed training capabilities, and deploying models for real-time inference. Students will leverage agricultural datasets for training and validation, and contribute model outputs and insights for the research community.

Topics: Machine Learning, Computer Vision, Time-Series Analysis, Predictive Analytics, Agricultural AI, Distributed Training
Skills: Python, PyTorch/TensorFlow, scikit-learn, OpenCV, Pandas, NumPy, MLOps, NRP Kubernetes, GPU Computing
Difficulty: Hard
Size: Large (350 hours)
Mentors: Mohammad Firas Sada

VINE / Digital Twin & AI-Driven Visualization

The proposed work includes creating AI-enhanced digital twin systems for agricultural sites using computational resources on NRP. Students will develop 3D visualization systems (potentially using Omniverse or similar platforms) to represent vineyards and farms, integrate AI model predictions into the digital twin for real-time insights, and create interactive dashboards for monitoring and analysis. They will implement spatial data processing using ML models to map sensor locations and readings to geographic coordinates, and develop AI-driven simulation capabilities for testing different agricultural strategies (irrigation patterns, planting layouts, etc.) before implementation. Students will deploy visualization services on NRP infrastructure and integrate with agricultural data sources for real-time updates.

Topics: Digital Twin, AI-Enhanced Visualization, GIS, Spatial Data, ML-Driven Simulation, Real-Time Systems
Skills: Python, 3D Graphics (Omniverse/Unity/Blender), GIS tools, WebGL, React/Three.js, ML Integration, NRP Deployment
Difficulty: Hard
Size: Large (350 hours)
Mentors: Mohammad Firas Sada

VINE / Web Dashboard & NRP Integration Platform

The proposed work includes building a comprehensive web dashboard for visualizing agricultural data, AI model predictions, and research insights. Students will develop a full-stack web application using modern frameworks (React, Flask/FastAPI) deployed on the National Research Platform (NRP). The dashboard will display real-time sensor readings, historical trends from agricultural datasets, AI model predictions, and digital twin visualizations. Students will create API endpoints that integrate with NRP computational resources and agricultural data sources, implement role-based access control for researchers, and enable data export/sharing with the broader research community. The platform will support interactive data exploration tools and provide programmatic access to AI/ML models running on NRP.

Topics: Full-Stack Web Development, Data Visualization, API Development, NRP Deployment, ML Model Serving
Skills: React, Flask/FastAPI, PostgreSQL, D3.js/Plotly, Bootstrap/Tailwind CSS, REST APIs, Kubernetes, NRP APIs
Difficulty: Medium to Hard
Size: Large (350 hours)
Mentors: Mohammad Firas Sada

Project Resources

National Research Platform: https://nrp.ai/
Iron Horse Vineyards Project: https://gitlab.nrp-nautilus.io/ihv
Omniverse Integration: https://gitlab.nrp-nautilus.io/omniverse
CENIC Network: https://cenic.org/
CENIC Precision Agriculture Blog: https://nrp.ai/cenic-precision-agriculture-2025

Background

This project builds upon the successful demonstration at Iron Horse Vineyards, where CENIC, UC San Diego, and partners have created a living laboratory for precision agriculture. The VINE project focuses on AI/ML research using the National Research Platform (NRP) for computational resources. By leveraging NRP’s distributed infrastructure and GPU clusters, students can train and deploy sophisticated ML models for agricultural applications. The project works with agricultural datasets from Iron Horse Vineyards and aims to create open, shareable datasets for the research community. This approach creates a scalable, reproducible framework for AI/ML research in precision agriculture that can benefit researchers, educators, and practitioners nationwide.

AI Data Readiness Inspector (AIDRIN)

Fri, 30 Jan 2026 10:15:00 -0700

Garbage In, Garbage Out (GIGO) is a widely accepted quote in computer science across various domains, including Artificial Intelligence (AI). As data is the fuel for AI, models trained on low-quality, biased data are often ineffective. Computer scientists who use AI invest considerable time and effort in preparing the data for AI.

AIDRIN (AI Data Readiness INspector) is a framework that provides a quantifiable assessment of data readiness for AI processes, covering a broad range of dimensions from the literature. AIDRIN uses metrics from traditional data quality assessment, such as completeness, outliers, and duplicates, to evaluate data. Furthermore, AIDRIN uses metrics specific to assessing AI data, such as feature importance, feature correlations, class imbalance, fairness, privacy, and compliance with the FAIR (Findability, Accessibility, Interoperability, and Reusability) principles. AIDRIN provides visualizations and reports to assist data scientists in further investigating data readiness.

AIDRIN Multiple File Formats

The proposed work will include improvements in the AIDRIN framework to (1) add support for new file formats such as Zarr, ROOT, and HDF5; and (2) to allow providing custom data ingestion mechanisms.

Topics: data readiness, AI, data analysis
Skills: Python, C/C++, data analysis, good communicator
Difficulty: Moderate
Size: Large (350 hours)
Mentors: Jean Luca Bez and Suren Byna

Drishti

Fri, 30 Jan 2026 10:15:00 -0700

Drishti is a novel interactive web-based analysis framework to visualize I/O traces, highlight bottlenecks, and help understand the I/O behavior of scientific applications. Drishti aims to fill the gap between the trace collection, analysis, and tuning phases. The framework contains an interactive I/O trace analysis component for end-users to visually inspect their applications’ I/O behavior, focusing on areas of interest and getting a clear picture of common root causes of I/O performance bottlenecks. Based on the automatic detection of I/O performance bottlenecks, our framework maps numerous common and well-known bottlenecks and their solution recommendations that can be implemented by users.

Drishti Comparisons and Heatmaps

The proposed work will include investigating and building a solution to allow comparing and finding differences between two I/O trace files (similar to a diff), covering the analysis and visualization components. It will also explore additional metrics and counters such as Darshan heatmaps in the analysis and visualization components of the framework.

Topics: I/O, HPC, data analysis, visualization, profiling, tracing
Skills: Python, data analysis, performance profiling
Difficulty: Moderate
Size: Large (350 hours)
Mentors: Jean Luca Bez and Suren Byna

EnergyAPI: An End-to-End API for Energy-Aware Forecasting and Scheduling

Fri, 30 Jan 2026 00:00:00 +0000

Over the past decades, electricity demand has increased steadily, driven by structural shifts such as the electrification of transportation and, more recently, the rapid expansion of artificial intelligence (AI). Power grids have responded by expanding generation capacity, integrating renewable energy sources such as solar and wind, and deploying demand-response mechanisms. However, the current pace of demand growth is increasingly outstripping grid expansion, leading to integration delays, greater reliance on behind-the-meter consumption, and rising operational complexity.

To mitigate the environmental and socioeconomic impacts of electricity consumption, large consumers such as cloud data centers and electric vehicle (EV) charging infrastructures are increasingly participating in demand-response programs. These programs provide consumers with grid signals indicating favorable periods for electricity usage, such as when energy is cheapest or has the lowest carbon intensity. Consumers can then shift workloads across time and location to better align with grid conditions and their own operational constraints. A key challenge, however, is the online nature of this problem: operators must make real-time decisions without full knowledge of future grid conditions. While forecasting and optimization techniques exist, their effectiveness depends heavily on workload characteristics, such as whether tasks are delay-tolerant cloud jobs or EV charging sessions with route and deadline constraints.

This project proposes the design and implementation of a modular, extensible API for energy-aware workload scheduling. The API will ingest grid signals alongside workload Service Level Objectives (SLOs) and operational requirements, and produce execution plans that adapt to changing grid conditions. It will support multiple pluggable scheduling strategies and heuristics, enabling developers to compare real-time and forecast-based approaches across different workload classes. By providing a reusable, open-source interface for demand-response-aware scheduling, this project aims to lower the barrier for developers to integrate energy-aware decision-making into distributed systems and applications.

Building an End-to-End Service for Energy Forecasting and Scheduling

Topics: Databases Machine Learning
Skills: Python, command line tools (bash), SQL (MySQL or SQLite), FastAPI, time-series analysis, basic machine learning
Difficulty: Moderate
Size: Large (350 hours)
Mentors: Abel Souza

Develop a containerized, end-to-end platform consisting of a backend, API, and web-based frontend for collecting, estimating, and visualizing real-time and forecasted electrical grid signals. These signals include electricity demand, prices, energy production, grid saturation, and carbon intensity. The system will support scalable data ingestion, region-specific forecasting models, and interactive visualizations to enable energy-aware application development and analysis.

Tasks:

Study electrical grid signals and demand-response data sources (e.g., demand, price, carbon intensity, grid saturation) and identify their requirements for real-time and forecast-based consumption planning.
Design and implement a relational data model for storing historical, real-time, and forecasted grid signals.
Ingest and validate grid signal data into a MySQL or SQLite database, ensuring data quality and time alignment across regions.
Implement baseline time-series forecasting models for grid signals (e.g., demand, price, or carbon intensity), with support for region-specific configurations.
Query European Network of Transmission System Operators for Electricity (ENTSO-E) and EIA (Energy Information Administration (EIA)) APIs to collect grid data.
Develop a RESTful API that exposes both raw and forecasted grid signals for use by energy-aware applications and schedulers.
Build a web-based user interface to visualize historical trends, forecasts, and regional differences in grid conditions.
Implement an interactive choropleth map to display spatial variations in grid signals such as carbon intensity and electricity prices.
Design an extensible architecture that allows different regions to plug in custom forecasting models or heuristics.
Containerize the backend, API, and frontend components using Docker to enable reproducible deployment and easy integration by external users.

Environmental NeTworked Sensor (ENTS)

Fri, 30 Jan 2026 00:00:00 +0000

ENTS I: Usability improvements for visualization dashboard

Topics: Data Visualization, Backend, Frontend, UI/UX, Analytics
Skills:
- Required: React, Javascript, Python, SQL, Git
- Nice to have: Flask, Docker, CI/CD, AWS, Authentication
Difficulty: Medium
Size: Large (350 hours)
Mentors: Colleen Josephson, Alec Levy, John Madden

The Environmental NeTworked Sensor (ENTS) platform, formally Open Sensing Platform (OSP), implements data visualization website for monitoring microbial fuel cell sensors (see GitHub). The mission is to scale up the current platform to support other researchers or citizen scientists in integrating their novel sensing hardware or microbial fuel cell sensors for monitoring and data analysis. Examples of the types of sensors currently deployed are sensors measuring soil moisture, temperature, current, and voltage in outdoor settings. The focus of the software half of the project involves building upon our existing visualization web platform, and adding additional features to support the mission. A live version of the website is available here.

Below is a list of project ideas that would be beneficial to the ENTS project. You are not limited to the following projects, and encourage new ideas that enhance the platform:

Drag and drop charts functionality
Creation of unique charts by users (with unique equations)
Customizable options of charts (color, line width, datapoint/line style, axis labels)
Exportable charts (with customizable options)
Saving layouts via url

ENTS II: Migration to TockOS

Topics: Embedded system, operating system
Skills:
- Required: Rust, C/C++, Git, Github
- Nice to have: STM32 HAL, python
Difficulty: Hard
Size: Large (350 hours)
Mentors: Colleen Josephson, John Madden

The current version of the hardware firmware is implemented in baremetal through the use of STM hardware abstraction layer (HAL) drivers. We are interested in porting the firmware implementation to an operating system (OS) to allow for additional functionality to support environmental data logging. TockOS is an embedded operating system designed for running multiple concurrent, mutually distrustful applications on low-memory and low-power microcontrollers that will be used. TockOS allows for OTA updates, dynamic app loading, hardware multiplexing, and more. We envision multiple users utilizing shared ENTS hardware that provides communication and measurement capabilities. Thus, the initial cost of deploying wireless sensor networks is reduced.

The TockOS kernel is written in Rust to enhance security. Userspace apps can be written in either C, C++, or Rust. Development will be done through a remote development server to access the hardware. See the following repos for the current status of the project:

Userspace library: libtock-c
Kernel: tock
Baremetal: ENTS-node-firmware

Scope of work:

Writing kernel peripheral drivers.
- Done entirely in Rust.
- Low-level understanding of microcontroller
- Basic kernel functionality knowledge.
Porting baremetal components to userland apps.
- Involves porting STM HAL calls to TockOS syscalls.
- Primarily done in C.
- Understanding of syscalls.

Reproducible CXL Emulation

Fri, 30 Jan 2026 00:00:00 +0000

Compute Express Link (CXL) is an emerging memory interconnect standard that enables shared, coherent memory across CPUs, accelerators, and multiple hosts, unlocking new possibilities in hyperscale, HPC, and disaggregated systems. However, because access to real multi-host CXL hardware is limited, it is difficult for researchers and students to experiment with, evaluate, and reproduce results on advanced CXL topologies. OCEAN (Open-source CXL Emulation At Hyperscale) [https://github.com/cxl-emu/OCEAN] is a full-stack CXL emulation platform built on QEMU that enables detailed emulation of CXL 3.0 memory systems, including multi-host shared memory pools, coherent fabric topologies, and latency modeling. This project will create reproducible experiment pipelines, automated deployment workflows, and user-friendly tutorials so that others can reliably run and extend CXL emulation experiments without requiring specialized hardware.

Reproducible CXL Emulation for Multi-Host Memory Systems

Streamline multi-host CXL emulation without specialized hardware.

Topics: CXL emulation Memory Systems Reproducibility
Skills: C/C++, Virtualization (QEMU), Scripting, Performance Modeling
Difficulty: Medium
Size: Large (350 hours)
Mentors: Mujahid Al Rafi, Luanzheng "Lenny" Guo.

Tasks:

Create automated deployment scripts and configuration templates for OCEAN-based CXL emulation topologies (single-host and multi-host).
Develop a standardized experiment harness for running memory performance benchmarks (e.g., OSU micro-benchmarks, STREAM-style tests) in emulated CXL environments.
Build reproducible experiment pipelines that others can run to evaluate latency, bandwidth, and scaling properties of CXL memory systems.
Produce tutorials, documentation, and reproducibility artifacts to guide new users through setup, execution, and analysis.
Package and contribute all scripts, configurations, and documentation back to the OCEAN open-source repository.

Exploring Security and Isolation in CXL-Based Memory Systems

Investigate security and isolation properties of CXL-based memory systems using software emulation.

Topics: CXL Systems Security Memory Isolation Side Channel Emulation
Skills: C/C++, Virtualization (QEMU), Scripting, Computer Architecture, Security
Difficulty: Medium
Size: Large (350 hours)
Mentors: Mujahid Al Rafi, Luanzheng "Lenny" Guo.

Tasks:

Study the CXL memory model and fabric architecture to identify potential security and isolation risks in multi-host shared memory environments (e.g., contention, timing variation, and resource interference).
Set up multi-host or multi-VM CXL emulation environments using OCEAN that mimic realistic multi-tenant deployments.
Design and implement reproducible micro-benchmarks to measure timing, bandwidth contention, or observable interference through shared CXL memory pools.
Analyze how fabric configuration choices (e.g., topology, latency injection, memory partitioning, or allocation policies) affect isolation and leakage behavior.
Explore and prototype mitigation strategies—such as memory partitioning, throttling, or policy-driven allocation—and evaluate their effectiveness using the emulation platform.

Network Simulation Bridge • Enabling Interactive Network Models

Wed, 28 Jan 2026 00:00:00 +0000

The Network Simulation Bridge – NSB – is a network co-simulation framework that bridges together applications and network simulators. It enables students, researchers, and developers to prototype their applications and systems on simulated networks. It consists of a message server and client endpoint interfaces which together form a bridge, routing application message payloads through the network simulator. NSB is designed to be extensible through modular interfaces that serve to allow users to contribute new features and modules that suit evolving and emerging use cases. NSB is developed to be application-, network simulator-, and platform-agnostic so that users and developers are empowered to integrate any application front-end with any network simulator back-end, providing versatility and flexibility when used alongside other tools in larger systems and applications.

NSB was created in-house by the Inter-Networking Research Group and is now being developed into a more full-featured open-source tool and ecosystem in partnership with the UCSC OSPO and as part of the NSF Pathways to Enable Open-Source Ecosystems program. In this transition to a more polished and feature-rich product, the next phase of NSB development will involve the engineering of new quality-of-life features, testing and iteration of the core tool itself, and user-centric refinement via implementation in interdisciplinary system models.

Develop a User-Centric Website for NSB

Topics: Web Development Dynamic Updates UX
Skills: web development experience, good communicator, (HTML/CSS), (Javascript)
Difficulty: Moderate
Size: Large
Mentors: Harikrishna Kuttivelil

Develop a clean and welcoming landing page and website for the project. The organization needs to reflect the needs of both users and potential project contributors. This website will be the first impression for people new to the project and should

Specific tasks:

Work with mentors on understanding the context of the project and the expected needs of the users.
Port relevant documentation and tutorials from the repository page, ensuring updates in the repository are reflected in the website.
Study existing open source product websites and draw insights to include in our own design.
Design the structure of the website according to best OS, visual design, and accessibility design practices.
Include visual content that showcases NSB integration and testimonials (if applicable).

Improve the User Experience of NSB

Topics: Software Engineering User-Centric Development Visualization UI/UX Documentation
Skills: package management, toolchain implementation, process automation, technical writing, (visualization), (bash), (Python), (C++)
Difficulty: Moderate
Size: Medium
Mentors: Harikrishna Kuttivelil

Our goal has always been to keep NSB streamlined and out of the way of the users and developers. In line with that, we want our tool to be easily available and installable, and we want the experience of using it to feel minimal and non-intrusive while providing sufficient observability of NSB’s internals for those who want it.

Specific tasks:

Work with mentors and potential users on identifying aspects of the user experience that can refined for better quality-of-life experiences.
Verify and iterate on existing software packaging methods for NSB to ensure that tool setup is stress-free.
Refine and update existing documentation and tutorials to reflect improvements in the setup, installation, and usage processes.
Work with mentors and other contributors to work backwards from what the user wants to see to design the user interface.
Work with other contributors (see below) to develop a Network-in-a-Box experience with NSB.

Create a Network-in-a-Box Experience with NSB

Topics: Software Engineering, Simulation, System Modeling, System Design, Visualization, UI/UX
Skills: software integration and interfacing, toolchain implementation, process automation, C++, (visualization), (LLM-enabled code generation), (technical writing)
Difficulty: Challenging
Size: Large
Mentors: Harikrishna Kuttivelil

NSB was originally designed for networking graduate students to interface with application-layer programs. But since then, there’s been more of an appetite for a simpler network-in-a-box approach that would allow users to quickly deploy baseline or generated network simulations that are ready for use with NSB.

Specific tasks:

Learn how to use one of the major open-source network simulators (ns3 or OMNeT++).
Work with mentors in designing a simpler, minimal user experience of operating NSB.
Develop tools to automatically create network simulations given input parameters (type of network, number of nodes, description of infrastructure).
Create documentation aimed at new users.
Implement or embed network visualizations to enrich the user experience.

Implement Networked System Models to Evaluate Quality of NSB

Topics: System Modeling Simulation System Design Software Development Product Testing
Skills: software integration, good communication, qualitative research, (proficiency in Python and/or C++), (processing scientific and technical literature)
Difficulty: Challenging
Size: Large
Mentors: Harikrishna Kuttivelil

NSB is a relatively new tool and has not been extensively tested outside of the core contributors, who know a bit too much about the tool. We need to better understand what external user and contributor experience will be like, and the best way to do that is to start developing with NSB to build models of connected systems, i.e., sensor networks, smart homes, smart farms, etc.

Specific tasks:

Research academic literature and relevant works to identify relevant distributed applications to model.
Work with mentors and collaborators to plan implementation of selected system models.
Track and report issues and concerns in quality-of-life experiences, critical errors, or difficulties.
Work with mentors and contributors to address issues and concerns.
Refine and update existing documentation and tutorials to reflect improvements in the setup, installation, and usage processes.
Work with other contributors (see below) in reviewing and cross-referencing model implementations.

Model Autonomous Vehicle Networks to Drive New Feature Development in NSB

Topics: System Modeling Simulation System Design Software Development
Skills: requirement-based software design, message parsing interfaces, server-client communication, (proficiency in Python and/or C++), (processing scientific and technical literature)
Difficulty: Challenging
Size: Large
Mentors: Harikrishna Kuttivelil

NSB today serves its named purpose – message relaying. However, modeling complex systems can sometimes involving synchronizing other simulation features, like mobility when dealing with vehivle networks. Implementing a generic layer of being able to synchronize user-defined features across endpoints would be a powerful, enabling feature in NSB. In the process, we may also uncover opportunities for improving the NSB developer experience.

Specific tasks:

Research academic literature and relevant works to identify and design potential autonomous vehicle network models.
Work with mentors and collaborators to iterate on system designs to ensure it serves the purpose of furthering NSB development.
Help mentors design and develop the new feature synchronization feature in NSB, driven by the autonomous vehicle system model.
Develop and iterate feature synchronization, using mobility as the synchronized feature.
Create documentation and tutorials to serve as resources for future users, contributors, and developers.
Work with other contributors (see above) in reviewing and cross-referencing model implementations.

Scenic: A Language for Design and Verification of Autonomous Cyber-Physical Systems

Sat, 24 Jan 2026 00:00:00 +0000

Scenic is a probabilistic programming language for the design and verification of autonomous cyber-physical systems like self-driving cars. Scenic allows users to define scenarios for testing or training their system by putting a probability distribution on the system’s environment: the positions, orientations, and other properties of objects and agents, as well as their behaviors over time. Sampling these scenarios and running them in a simulator yields synthetic data which can be used to train or test a system. Since Scenic was released open-source in 2019, our group and many others in academia have used Scenic to find, diagnose, and fix bugs in autonomous cars, aircraft, robots, and other kinds of systems. In industry, it is being used by companies including Boeing, Meta, Deutsche Bahn, and Toyota in domains spanning autonomous driving, aviation, household robotics, railways, maritime, and virtual reality.

Our long-term goal is for Scenic to become a widely-used common representation and toolkit supporting the entire design lifecycle of AI-based cyber-physical systems. Towards this end, we have many summer projects available, ranging from adding new application domains to working on the Scenic compiler and sampler:

Extensions to the Scenic driving domain
Interfacing Scenic to new simulators
Scenic distribution visualizer

See the sections below for details.

Extensions to the Scenic Driving Domain

Topics: Autonomous Driving 3D modeling
Skills: Python; basic vector geometry
Difficulty: Moderate
Size: Medium or Large (175 or 350 hours)
Mentors: Daniel Fremont, Eric Vin

There are several potential goals of this project, including:

Supporting importing complex object information from simulators like CARLA.
Extending the domain to incorporate additional metadata, such as highway entrances and exits.
Fixing various bugs and limitations that exist in the driving domain (e.g. Issue #274 and Issue #295).

Interfacing Scenic to New Simulators

Topics: Simulation Autonomous Driving
Skills: Python
Difficulty: Moderate
Size: Medium or Large (175 or 350 hours)
Mentors: Daniel Fremont, Eric Vin

Scenic is designed to be easily-interfaced to new simulators. Depending on student interest, we could pick a simulator which would open up new kinds of applications for Scenic and write an interface for it. Some possibilities include:

The AWSIM driving simulator (to allow testing the Autoware open-source autonomous driving software stack)
The CarMaker driving simulator

The goal of the project would be to create an interface between Scenic and the new simulator and write scenarios demonstrating it. If time allows, we could do a case study on a realistic system for publication at an academic conference.

Tool to Visualize Scenario Distributions

Topics: Visualization
Skills: Python; basic visualization and graphics
Difficulty: Moderate
Size: Medium or Large (175 or 350 hours)
Mentors: Daniel Fremont, Eric Vin

A Scenic scenario represents a distribution over scenes, but it can be difficult to interpret what exactly this distribution represents. Being able to visualize this distribution would be helpful for understanding and reasoning about Scenarios.

The goal of this project would be to build on an existing prototype for visualizing these distributions, and to create a tool that can be used by the wider Scenic community.

CauST: Causal Gene Intervention for Robust Spatial Domain Identification

Wed, 21 Jan 2026 00:00:00 +0000

Topics: spatial transcriptomics, spatial domain identification, causal inference, gene intervention
Skills:
- Programming Languages: Python (PyTorch preferred)
- Machine Learning: causal inference, representation learning, clustering
- Data Analysis: spatial transcriptomics preprocessing and evaluation (ARI, cross-slice generalization)
- Bioinformatics Knowledge (preferred): spatial transcriptomics, scRNA-seq, gene perturbation analysis
Difficulty: Advanced
Size: Large (350 hours)
Mentors: Lijinghua Zhang (contact person)

Project Idea Description

Spatial domain identification is a core task in spatial transcriptomics (ST), aiming to segment tissue sections into biologically meaningful regions based on spatially resolved gene expression profiles. These spatial domains often correspond to anatomical layers, functional niches, or microenvironmental states, and are widely used as the basis for downstream biological interpretation.

Despite strong empirical performance, most existing spatial domain identification methods rely on purely correlational gene signals. Genes are selected or weighted based on association with spatial patterns, without distinguishing whether they causally drive domain formation or merely reflect downstream or confounded effects. As a result, current models often suffer from limited robustness and poor generalization across tissue sections or donors.

Problem: Correlation-Driven Gene Usage and Limited Generalization

In standard pipelines, gene expression features are typically used wholesale or filtered using heuristic criteria (e.g., highly variable genes). However, many genes that are strongly correlated with spatial domains are not causally responsible for domain structure. Including such non-causal or confounded genes can:

Reduce robustness across slices and donors
Obscure true domain-driving biological signals
Limit interpretability of spatial domain assignments

Empirically, domain identification performance often degrades substantially in cross-slice or cross-donor evaluation settings, underscoring the need for causally informed feature selection.

Proposed Solution: CauST

This project proposes CauST, a Causal Gene Intervention framework for robust spatial domain identification.

CauST aims to identify domain-driving genes by estimating their causal influence on spatial domain assignments via in-silico gene interventions. Instead of relying on observational correlations, CauST approximates counterfactual gene knockouts by perturbing individual gene expressions while controlling for confounding factors.

In addition, CauST leverages cross-slice invariance as a practical criterion for causal gene discovery, prioritizing genes whose effects on spatial domain identification remain stable across tissue sections and donors.

By filtering or reweighting genes based on estimated causal influence, CauST improves the robustness, generalizability, and interpretability of spatial domain identification models.

Project Objectives

Causal Gene Effect Estimation
- Design in-silico intervention strategies to estimate gene-level causal effects on spatial domain assignments.
Invariant Effect Analysis
- Identify genes with stable effects across tissue sections or donors.
Causal Gene Filtering
- Develop filtering or reweighting schemes based on estimated causal influence.
Integration with Existing Methods
- Integrate CauST into state-of-the-art spatial domain identification pipelines.
Evaluation and Validation
- Benchmark robustness, cross-slice generalization, and interpretability on public spatial transcriptomics datasets.

Project Deliverables

CauST Framework Implementation
- Open-source Python implementation compatible with common spatial transcriptomics toolchains.
Causal Gene Benchmarks
- Quantitative evaluation of causal gene filtering and its impact on domain identification.
Visualization Tools
- Tools for visualizing gene interventions, causal scores, and spatial effects.
Documentation and Tutorials
- Clear examples enabling adoption of CauST by the broader community.

Impact

CauST introduces a causally grounded perspective to spatial domain identification by explicitly modeling gene-level interventions. By shifting from correlation-driven gene usage to causal gene selection, this project improves robustness, generalizability, and biological interpretability in spatial transcriptomics analysis. CauST has the potential to serve as a foundational framework for integrating causal reasoning into spatial omics representation learning.

Agent4Target: An Agent-based Evidence Aggregation Toolkit for Therapeutic Target Identification

Tue, 20 Jan 2026 00:00:00 +0000

Topics: therapeutic target identification, drug discovery, evidence aggregation, AI agents, biomedical knowledge integration
Skills:
- Programming Languages: Python; experience with modern ML tooling preferred
- Machine Learning / AI: agent-based systems, workflow orchestration, weak supervision (basic), representation learning
- Software Engineering: modular system design, APIs, CLI tools, documentation
- Biomedical Knowledge (preferred): familiarity with drug–target databases (e.g., PHAROS, DepMap, Open Targets)
Difficulty: Advanced
Size: Large (350 hours)
Mentors: Ziheng Duan (contact person)

Project Idea Description

Identifying and prioritizing high-quality therapeutic targets is a foundational yet challenging task in drug discovery. Modern target identification relies on aggregating heterogeneous evidence from multiple sources, including genetic perturbation screens, disease associations, chemical biology, and biomedical literature. These evidence sources are highly fragmented, noisy, and heterogeneous in both format and reliability.

While large language models and AI agents have recently shown promise in automating scientific workflows, many existing approaches focus on end-to-end prediction or conversational interfaces. Such systems are often difficult to reproduce, extend, or integrate into existing research pipelines, limiting their practical adoption by the biomedical community.

This project proposes Agent4Target, an agent-based evidence aggregation toolkit that reframes therapeutic target identification as a structured, modular workflow. Instead of using agents for free-form reasoning, Agent4Target employs agents as orchestrated components that systematically collect, normalize, score, and explain evidence supporting candidate therapeutic targets.

The goal is to deliver a reusable, open-source toolchain that can be integrated into diverse drug discovery workflows, independent of any single downstream prediction model or publication.

Key Idea and Technical Approach

Agent4Target models target identification as a multi-stage, agent-driven pipeline, coordinated by a central orchestrator:

Evidence Collector Agents
Specialized agents retrieve target-level evidence from heterogeneous sources, such as:
- Genetic perturbation and dependency data (e.g., DepMap)
- Target annotation and development status (e.g., PHAROS)
- Disease association scores (e.g., Open Targets)
- Automatically summarized literature evidence
Normalization & Scoring Agent
Collected evidence is converted into a unified, structured schema using typed data models (e.g., JSON / Pydantic).
This agent performs:
- Evidence normalization across sources
- Confidence-aware scoring and aggregation
- Optional weighting or calibration strategies
Explanation Agent
Rather than free-text generation, this agent produces structured explanations that explicitly link scores to supporting evidence, enabling transparency and interpretability for downstream users.
Workflow Orchestrator
A lightweight orchestration layer (e.g., LangGraph or a state-machine-based controller) manages agent execution, dependencies, and failure handling, ensuring reproducibility and extensibility.

This modular design allows individual agents to be replaced, extended, or reused without altering the overall system.

Project Objectives

Design a Modular Agent-based Architecture
- Define clear interfaces for evidence collection, normalization, scoring, and explanation agents.
Implement a Standardized Evidence Schema
- Develop a unified data model for heterogeneous target-level evidence.
Build a Reproducible Orchestration Framework
- Implement a deterministic, inspectable workflow for agent coordination.
Deliver a Community-Ready Toolkit
- Provide CLI tools, example notebooks, and clear documentation to support adoption.
Benchmark and Case Studies
- Demonstrate the toolkit on representative target identification scenarios using public datasets.

Project Deliverables

Open-Source Agent4Target Codebase
- A well-documented Python package with modular agent components.
Command-Line Interface (CLI)
- Tools for running end-to-end evidence aggregation pipelines.
Standardized Output Schema
- Machine-readable evidence summaries suitable for downstream modeling.
Example Notebooks and Benchmarks
- Demonstrations of usage and performance on real-world target identification tasks.
Documentation
- Installation guides, extension tutorials, and developer documentation.

Impact

Agent4Target provides a practical bridge between AI agents and real-world drug discovery workflows. By emphasizing structured evidence aggregation, reproducibility, and interpretability, this project enables researchers to systematically reason about therapeutic targets rather than relying on opaque, end-to-end models. The resulting toolkit can serve as a foundation for future work in AI-assisted drug discovery, weak supervision, and biomedical knowledge integration.

HistoMoE: A Histology-Guided Mixture-of-Experts Framework for Gene Expression Prediction

Tue, 20 Jan 2026 00:00:00 +0000

Topics: computational pathology, spatial transcriptomics, gene expression prediction, mixture-of-experts, multimodal learning
Skills:
- Programming Languages: Python; experience with PyTorch preferred
- Machine Learning: CNNs / vision encoders, mixture-of-experts, multimodal representation learning
- Data Analysis: handling large-scale histology image patches and gene expression matrices
- Bioinformatics Knowledge (preferred): familiarity with spatial transcriptomics or scRNA-seq data
Difficulty: Advanced
Size: Large (350 hours)
Mentors: Ziheng Duan (contact person)

Project Idea Description

Histology imaging is one of the most widely available data modalities in biomedical research and clinical practice, capturing rich morphological information about tissues and disease states. In parallel, spatial transcriptomics (ST) technologies provide spatially resolved gene expression measurements, enabling unprecedented insights into tissue organization and cellular heterogeneity. However, the high cost and limited accessibility of ST experiments remain a major barrier to their widespread adoption.

Predicting gene expression directly from histology images offers a promising alternative, enabling molecular-level inference from routinely collected pathology data. Existing approaches typically rely on a single global model that maps image embeddings to gene expression profiles. While effective to some extent, these models struggle to capture the strong organ-, tissue-, and cancer-specific heterogeneity that underlies gene expression patterns.

This project proposes HistoMoE, a histology-guided mixture-of-experts (MoE) framework that explicitly models biological heterogeneity by learning specialized expert models for different cancer types or organs, and dynamically routing histology image patches to the most relevant experts.

Key Idea and Technical Approach

As illustrated in the figure above, HistoMoE integrates multiple data modalities and learning components:

Vision Encoder
Histology image patches are encoded into high-dimensional visual representations using a convolutional or transformer-based vision backbone.
Text / Metadata Encoder
Sample-level metadata (e.g., tissue type, organ, disease context) is encoded using a lightweight text or embedding model.
Gating Network
A gating network jointly considers image and metadata embeddings to infer routing weights over multiple cancer- or organ-specific expert models.
Expert Models
Each expert specializes in modeling gene expression patterns for a specific biological context (e.g., CCRCC, COAD, LUAD), producing patch-level gene expression predictions.

By explicitly modeling biological structure through expert specialization, HistoMoE aims to improve both prediction accuracy and interpretability, allowing researchers to understand which biological experts drive each prediction.

Project Objectives

Design and Implement the HistoMoE Framework
- Build a modular MoE architecture with pluggable vision encoders, gating networks, and expert models.
Multimodal Routing and Expert Specialization
- Explore how image features and metadata jointly inform expert selection.
Benchmarking and Evaluation
- Compare HistoMoE against single-model baselines on multiple cancer and organ-specific spatial transcriptomics datasets.
Interpretability Analysis
- Analyze expert routing behavior to reveal biologically meaningful patterns.

Project Deliverables

Open-Source HistoMoE Codebase
- Well-documented Python implementation with training, evaluation, and visualization tools.
Benchmark Results
- Quantitative comparisons demonstrating improvements over non-expert baselines.
Visualization and Analysis Tools
- Tools for inspecting expert usage, routing weights, and gene-level predictions.
Documentation and Tutorials
- Clear instructions and examples to enable adoption by the research community.

Impact

HistoMoE introduces an expert-system perspective to histology-based gene expression prediction, bridging morphological and molecular representations through biologically informed specialization. By combining multimodal learning with mixture-of-experts modeling, this project advances the interpretability and accuracy of computational pathology methods and contributes toward scalable, cost-effective alternatives to spatial transcriptomics experiments.

StaR: A Stability-Aware Representation Learning Framework for Spatial Domain Identification

Tue, 20 Jan 2026 00:00:00 +0000

Topics: spatial transcriptomics, spatial domain identification, representation learning, model robustness
Skills:
- Programming Languages: Python; PyTorch experience preferred
- Machine Learning: representation learning, clustering, robustness and stability analysis
- Data Analysis: spatial transcriptomics preprocessing and evaluation (ARI, clustering metrics)
- Bioinformatics Knowledge (preferred): familiarity with spatial transcriptomics or scRNA-seq data
Difficulty: Advanced
Size: Large (350 hours)
Mentors: Ziheng Duan (contact person)

Project Idea Description

Spatial domain identification is a fundamental task in spatial transcriptomics (ST), aiming to partition tissue sections into biologically meaningful regions based on spatially resolved gene expression profiles. These spatial domains often correspond to distinct anatomical structures, cellular compositions, or functional microenvironments, and serve as a critical foundation for downstream biological analysis.

Despite rapid methodological progress, most existing spatial domain identification methods are highly sensitive to random initialization. In practice, simply changing the random seed can lead to substantially different clustering results and large performance fluctuations, even when using identical hyperparameters and datasets. This instability severely undermines the reliability, reproducibility, and interpretability of spatial transcriptomics analyses.

Problem: Seed Sensitivity and Unstable Representations

Empirical evidence shows that state-of-the-art spatial domain identification models can exhibit substantial performance variance across random seeds. For example, the Adjusted Rand Index (ARI) may vary from relatively strong performance (e.g., ARI ≈ 0.65) to noticeably degraded yet still reasonable outcomes (e.g., ARI ≈ 0.50) solely due to different random initializations.

By systematically evaluating models across hundreds to thousands of random seeds, we observe that:

Model performance landscapes are highly rugged, with sharp cliffs and isolated high-performing regions.
Standard training objectives implicitly favor brittle representations that are not robust to small perturbations in initialization or optimization trajectories.

These observations suggest that instability is not a peripheral issue, but rather a structural limitation of current representation learning approaches for spatial transcriptomics.

Proposed Solution: StaR

This project proposes StaR, a Stability-Aware Representation Learning framework designed to explicitly address seed sensitivity in spatial domain identification.

The core idea of StaR is to learn representations that are robust to perturbations in model parameters and training dynamics, rather than optimizing solely for peak performance under a single random seed. Concretely, StaR introduces controlled noise or perturbations into the training process and encourages consistency across multiple perturbed model instances, guiding the model toward flatter and more stable regions of the parameter space.

By prioritizing stability during representation learning, StaR aims to produce embeddings that:

Yield consistent spatial domain assignments across random seeds
Maintain competitive or improved clustering accuracy
Better reflect underlying biological structure

Project Objectives

Characterize Instability in Existing Methods
- Systematically quantify seed sensitivity across popular spatial domain identification models.
Develop Stability-Aware Training Objectives
- Design perturbation-based or consistency-driven losses that encourage robust representations.
Integrate StaR into Existing Pipelines
- Apply StaR to widely used spatial transcriptomics workflows with minimal architectural changes.
Evaluation and Benchmarking
- Evaluate StaR using clustering metrics (e.g., ARI) and stability metrics across multiple datasets and random seeds.
Biological Validation
- Assess whether stability-aware representations preserve biologically meaningful spatial patterns.

Project Deliverables

StaR Framework Implementation
- An open-source Python implementation compatible with common spatial transcriptomics toolchains.
Stability Benchmarks
- Comprehensive evaluations demonstrating reduced performance variance across seeds.
Visualization Tools
- Tools for visualizing performance landscapes, stability surfaces, and spatial domain consistency.
Documentation and Tutorials
- Clear examples enabling researchers to adopt StaR in their own analyses.

Impact

StaR addresses a critical yet underexplored challenge in spatial transcriptomics: model instability and poor reproducibility. By shifting the focus from single-run performance to stability-aware representation learning, this project improves the reliability and trustworthiness of spatial domain identification methods. StaR has the potential to become a foundational component in robust spatial transcriptomics pipelines and to inspire broader adoption of stability-aware principles in biological representation learning.

MedJEPA: Self-Supervised Medical Image Representation Learning with JEPA

Mon, 19 Jan 2026 10:15:56 -0700

Project Description

[MedJEPA] Medical image analysis is fundamental to modern healthcare, enabling disease diagnosis, treatment planning, and patient monitoring across diverse clinical applications. In radiology and pathology, deep learning models support automated detection of abnormalities, tumor segmentation, and diagnostic assistance. Medical imaging modalities including X-rays, CT scans, MRI, ultrasound, and histopathology slides generate vast amounts of unlabeled data that could benefit from self-supervised representation learning. Clinical applications include cancer detection and staging, cardiovascular disease assessment, neurological disorder diagnosis, and infectious disease screening. In drug discovery and clinical research, analyzing medical images helps evaluate treatment efficacy, predict patient outcomes, and identify biomarkers for disease progression. Telemedicine and point-of-care diagnostics benefit from AI-powered image analysis that extends expert-level interpretation to underserved regions. However, medical imaging faces unique challenges: limited labeled datasets due to expensive expert annotation, patient privacy concerns restricting data sharing, domain shift across different imaging equipment and protocols, and the need for models that generalize across hospitals and populations. Traditional medical image analysis relies heavily on supervised learning with manually annotated labels, creating bottlenecks due to the scarcity and cost of expert annotations. Existing self-supervised methods applied to medical imaging often employ complex training procedures with numerous heuristics—momentum encoders, stop-gradients, teacher-student architectures, and carefully tuned augmentation strategies—that may not translate well across different medical imaging modalities and clinical contexts. These approaches struggle with domain-specific challenges such as subtle pathological features, high-resolution images, 3D volumetric data, and the need for interpretable representations that clinicians can trust. To address these challenges, we propose MedicalJEPA: Self-Supervised Medical Image Representation Learning with Joint-Embedding Predictive Architecture, which leverages the theoretically grounded LeJEPA framework for 2D medical images and V-JEPA principles for medical video and volumetric data, creating a unified, scalable, and heuristics-free approach specifically tailored for medical imaging applications. By utilizing the principled JEPA frameworks with objectives like Sketched Isotropic Gaussian Regularization (SIGReg), MedJEPA eliminates complex training heuristics while learning clinically meaningful representations from unlabeled medical images. Unlike conventional self-supervised methods that require extensive hyperparameter tuning and may not generalize across medical imaging modalities, MedicalJEPA provides a clean, theoretically motivated framework with minimal hyperparameters that adapts to diverse medical imaging contexts—from chest X-rays to histopathology slides to cardiac MRI sequences. The learned representations can support downstream tasks including disease classification, lesion detection, organ segmentation, and survival prediction, while requiring significantly fewer labeled examples for fine-tuning. This approach democratizes access to state-of-the-art medical AI by enabling effective learning from the vast amounts of unlabeled medical imaging data available in hospital archives, addressing the annotation bottleneck that has limited progress in medical AI.

Project Objectives

Aligned with the vision of the 2026 Open Source Research Experience (OSRE), this project aims to apply Joint-Embedding Predictive Architecture (JEPA) frameworks to medical image representation learning, addressing the critical challenge of learning from limited labeled medical data. Medical imaging generates enormous amounts of unlabeled data, but supervised learning approaches are bottlenecked by the scarcity and cost of expert annotations. Existing self-supervised methods often rely on complex heuristics that don’t generalize well across diverse medical imaging modalities, equipment vendors, and clinical protocols. This project will leverage the theoretically grounded LeJEPA framework for 2D medical images (X-rays, histopathology slides, fundus images) and V-JEPA principles for temporal and volumetric medical data (cardiac MRI sequences, CT scans, surgical videos). The core challenge lies in adapting these heuristics-free, stable frameworks to medical imaging’s unique characteristics: subtle pathological features requiring fine-grained representations, high-resolution images demanding efficient processing, domain shift across hospitals and equipment, and the need for interpretable features that support clinical decision-making. The learned representations will be evaluated on diverse downstream clinical tasks including disease classification, lesion detection, organ segmentation, and prognosis prediction, with emphasis on few-shot learning scenarios that reflect real-world annotation constraints. Below is an outline of the methodologies and models that will be developed in this project.

Step 1: Medical Data Preparation: Develop data processing pipelines for diverse medical imaging modalities, implementing DICOM/NIfTI parsing, standardized preprocessing, and efficient data loading for self-supervised pre-training. Prepare 2D medical image datasets: Chest X-rays: ChestX-ray14, MIMIC-CXR, CheXpert for lung disease detection Histopathology: Camelyon16/17 (breast cancer), PCam (patch-level classification) Retinal imaging: EyePACS, APTOS (diabetic retinopathy), Messidor Dermatology: HAM10000, ISIC (skin lesion classification) Prepare 3D volumetric and temporal medical data: CT scans: LIDC-IDRI (lung nodules), Medical Segmentation Decathlon datasets MRI sequences: BraTS (brain tumors), ACDC (cardiac MRI), UK Biobank cardiac videos Medical video: Surgical procedure videos, endoscopy recordings, ultrasound sequences Implement medical imaging-specific preprocessing: intensity normalization, resolution standardization, handling of multi-channel medical images (different MRI sequences, RGB histopathology), and privacy-preserving anonymization. Design masking strategies appropriate for medical imaging: spatial masking for 2D images, volumetric masking for 3D scans, temporal masking for sequences, and anatomy-aware masking that respects organ boundaries. Create data loaders supporting high-resolution medical images, 3D volumes, and multi-modal inputs (e.g., multiple MRI sequences).
Step 2: JEPA Model Implementation for Medical Imaging: Implement LeJEPA for 2D medical images: Adapt joint-embedding predictive architecture for medical image characteristics (high resolution, subtle features, domain-specific patterns) Apply Sketched Isotropic Gaussian Regularization (SIGReg) to learn clinically meaningful embedding distributions Maintain single trade-off hyperparameter and heuristics-free training for reproducibility across medical imaging centers Support various encoder architectures: Vision Transformers for global context, ConvNets for local features, hybrid approaches Extend to V-JEPA for medical video and volumetric data: Spatiotemporal encoding for cardiac MRI sequences, surgical videos, and time-series medical imaging Temporal prediction objectives for understanding disease progression and treatment response 3D volume processing for CT and MRI scans with efficient memory management Multi-slice and multi-sequence learning for comprehensive medical imaging contexts Develop medical domain-specific enhancements: Multi-scale representation learning to capture both fine-grained pathological details and global anatomical context Interpretability mechanisms: attention visualization, feature attribution, and embedding space analysis for clinical validation Robustness to domain shift: training strategies that generalize across different scanners, protocols, and institutions Privacy-preserving training considerations compatible with medical data regulations (HIPAA, GDPR) Implement efficient training infrastructure: Support for distributed training across multiple GPUs for large medical imaging datasets Memory-efficient processing of high-resolution images and 3D volumes Checkpoint management and model versioning for clinical deployment pipelines Minimal-code implementation (≈50-100 lines) demonstrating framework simplicity
Step 3: Evaluation & Safety Validation: : Disease Classification Tasks: Multi-label chest X-ray classification: 14 pathology classes on ChestX-ray14, MIMIC-CXR Diabetic retinopathy grading: 5-class classification on EyePACS, APTOS Skin lesion classification: 7-class classification on HAM10000 Brain tumor classification: glioma grading on BraTS dataset Evaluate with linear probing, few-shot learning (5-shot, 10-shot), and full fine-tuning Lesion Detection and Segmentation: Lung nodule detection on LIDC-IDRI dataset Tumor segmentation on Medical Segmentation Decathlon tasks Polyp detection in colonoscopy videos Cardiac structure segmentation in MRI sequences Clinical Prediction Tasks: Survival prediction from histopathology slides Disease progression prediction from longitudinal imaging Treatment response assessment from pre/post imaging pairs Few-Shot and Low-Data Regime Evaluation: Systematic evaluation with 1%, 5%, 10%, 25%, 50% of labeled training data Comparison against supervised baselines and ImageNet pre-training Analysis of annotation efficiency: performance vs. number of labeled examples required

Project Deliverables

This project will deliver three components: software implementation, clinical evaluation, and practical deployment resources. The software implementing MedicalJEPA will be hosted on GitHub as an open-access repository with modular code supporting multiple medical imaging modalities (2D images, 3D volumes, videos), pre-trained model checkpoints on major medical imaging datasets (chest X-rays, histopathology, MRI), training and evaluation scripts with medical imaging-specific preprocessing pipelines, privacy-preserving training implementations compatible with clinical data regulations, and comprehensive documentation including tutorials for medical AI researchers and clinicians. The evaluation results will include benchmarks on 10+ medical imaging datasets across diverse modalities and clinical tasks, few-shot learning analysis demonstrating annotation efficiency gains, cross-institutional validation studies showing robustness to domain shift, interpretability visualizations enabling clinical validation of learned representations, and detailed comparisons against supervised baselines and existing medical self-supervised methods. .

NeuroHealth

Topics: Self-Supervised Medical Image Representation Learning with JEPA
Skills: Proficiency in Python, Pytorch, Github, JEPA
Difficulty: Difficult
Size: Large (350 hours)
Mentor: Bin Dong, Linsey Pang

References:

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics - Randall Balestriero and Yann LeCun, arXiv 2024
Revisiting Feature Prediction for Learning Visual Representations from Video (V-JEPA) - Adrien Bardes et al., arXiv 2024
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture - Mahmoud Assran et al., CVPR 2023 (I-JEPA)
ChestX-ray14: Hospital-Scale Chest X-Ray Database - https://nihcc.app.box.com/v/ChestXray-NIHCC
Medical Segmentation Decathlon - http://medicaldecathlon.com/
MIMIC-CXR Database - https://physionet.org/content/mimic-cxr/
The Cancer Imaging Archive (TCIA) - https://www.cancerimagingarchive.net/
UK Biobank Imaging Study - https://www.ukbiobank.ac.uk/enable-your-research/about-our-data/imaging-data

NeuroHealth: AI-Powered Health Assistant

Mon, 19 Jan 2026 10:15:56 -0700

Project Description

[NeuroHealth] Intelligent health assistance systems are increasingly essential for improving healthcare accessibility, patient engagement, and clinical decision support. In primary care and preventive medicine, AI assistants help users understand symptoms, schedule appropriate appointments, and receive preliminary health guidance. Telemedicine applications include triage support, appointment scheduling optimization, and patient education based on health inquiries. In chronic disease management, these systems provide medication reminders, lifestyle recommendations, and timely alerts for medical follow-ups. Healthcare navigation applications include finding appropriate specialists, understanding treatment options, and coordinating care across multiple providers. In wellness and preventive care, intelligent assistants enhance health literacy by delivering personalized health information, screening recommendations, and proactive health management strategies. By leveraging natural language understanding and medical knowledge integration, these systems enhance healthcare access, reduce unnecessary emergency visits, and empower users to make informed health decisions across diverse populations. Traditional health information systems often provide generic responses that fail to account for individual health contexts, medical history, and personal circumstances. Existing symptom checkers and health chatbots primarily rely on rule-based logic or simple decision trees, limiting their ability to understand nuanced health inquiries, reason about complex symptom patterns, or provide contextually appropriate guidance. These systems struggle with interpreting ambiguous descriptions, adapting to users’ health literacy levels, and generating personalized recommendations that account for individual medical constraints and preferences. To address these challenges, we propose NeuroHealth: AI-Powered Health Assistant, which leverages Large Language Models (LLMs) to create an intelligent conversational agent that synthesizes user health inquiries, symptom descriptions, and contextual information into actionable, personalized health guidance and appointment recommendations. By integrating LLM-based medical reasoning with structured clinical knowledge bases, NeuroHealth enhances symptom interpretation, appointment routing, and health education delivery. Unlike conventional systems that provide static responses from predetermined templates, NeuroHealth dynamically understands user intent, asks clarifying questions, assesses urgency levels, and generates appropriate recommendations—whether scheduling a doctor appointment, suggesting self-care measures, or directing users to emergency services. This fusion of LLM intelligence with validated medical knowledge enables a more accessible, adaptive, and helpful health assistance platform, bridging the gap between users seeking health information and appropriate medical care.

Project Objectives

Aligned with the vision of the 2026 Open Source Research Experience (OSRE), this project aims to develop an AI-Powered Health Assistant (NeuroHealth) to improve healthcare accessibility and patient engagement through intelligent conversational guidance. Healthcare systems face significant challenges in providing timely, personalized health information and connecting patients with appropriate care resources. Traditional symptom checkers and health information systems often deliver generic, rule-based responses that fail to account for individual contexts and struggle with natural language understanding. To address these limitations, this project will leverage Large Language Models (LLMs) to create an intelligent health assistant that understands user health inquiries, interprets symptom descriptions, assesses urgency, and provides personalized recommendations including doctor appointment suggestions, self-care guidance, and healthcare navigation support. The core challenge lies in designing NeuroHealth as a safe, accurate, and user-friendly system capable of natural conversation, medical knowledge retrieval, and appropriate response generation while maintaining clinical safety guardrails. Unlike conventional health chatbots that follow rigid conversation flows, NeuroHealth will reason over user inputs, ask clarifying questions, and dynamically adapt responses based on context, resulting in more helpful, accurate, and appropriate health assistance. Below is an outline of the methodologies and models that will be developed in this project.

Step 1: Data Collection & Knowledge Base Construction: Develop a comprehensive medical knowledge base integrating validated health information sources, symptom databases, condition descriptions, and appointment routing guidelines. Collect and curate conversational health inquiry datasets from public medical Q&A forums, symptom checker logs, and healthcare chatbot interactions to create training and evaluation data. Design structured representations for symptoms, conditions, urgency levels, and appointment recommendations to enable effective retrieval and reasoning. Extract common health inquiry patterns, symptom descriptions, and user intent categories to inform conversation flow design. Data sources can include public medical knowledge bases such as MedlinePlus, Mayo Clinic health information, clinical practice guidelines, and synthetic patient inquiry scenarios based on common healthcare use cases. Implement data validation mechanisms to ensure medical accuracy and clinical safety compliance.
Step 2: Model Development: Design and implement an LLM-based conversational health assistant that integrates medical knowledge retrieval with natural language understanding and generation. Develop a Retrieval-Augmented Generation (RAG) architecture that grounds LLM responses in validated medical information sources, reducing hallucination risks and ensuring factual accuracy. Create prompt engineering strategies and reasoning frameworks that enable the system to: interpret symptom descriptions, assess urgency levels, ask appropriate clarifying questions, and generate personalized health guidance. Implement a multi-component architecture including: intent recognition, symptom extraction, urgency assessment, appointment recommendation generation, and response formatting modules. Develop clinical safety guardrails that detect high-risk scenarios requiring immediate medical attention and provide appropriate emergency guidance. Design conversation management strategies that maintain context across multi-turn dialogues and adapt to users’ health literacy levels. The baseline architecture can leverage state-of-the-art models such as GPT-4, Claude, or open-source alternatives like Llama, Qwen, combined with medical knowledge retrieval systems.
Step 3: Evaluation & Safety Validation: : Benchmark NeuroHealth against existing symptom checkers and health chatbots, evaluating on metrics including response accuracy, appropriateness of appointment recommendations, urgency assessment precision, and user satisfaction. Conduct human evaluation studies with healthcare professionals to assess clinical safety, response quality, and appropriateness of medical guidance. Perform adversarial testing to identify potential failure modes, unsafe responses, or inappropriate recommendations under edge cases. Conduct ablation studies to analyze the impact of retrieval-augmented generation, safety guardrails, and conversation management strategies on system performance. Evaluate system performance across diverse health inquiry types including acute symptoms, chronic condition management, preventive care questions, and healthcare navigation requests. Assess response quality across different user demographics and health literacy levels to ensure equitable access. Optimize inference efficiency and response latency for real-time conversational interaction across web and mobile platforms.

Project Deliverables

This project will deliver three components: model development, evaluation and validation, and interactive demonstration. The software implementing the NeuroHealth system will be hosted on GitHub as an open-access repository with comprehensive documentation, deployment guides, and API specifications. The evaluation results, including benchmark comparisons against existing systems, clinical safety assessments, and user study findings, will be published alongside the GitHub repository. An interactive demo showcasing the conversational interface, symptom interpretation capabilities, and appointment recommendation generation will be provided to illustrate real-world application scenarios.

NeuroHealth

Topics: AI-Powered Health Assistant
Skills: Proficiency in Python, Github, LLM
Difficulty: Difficult
Size: Large (350 hours)
Mentor: Linsey Pang, Bin Dong

References:

Large Language Models in Healthcare - Singhal et al., Nature 2023
Med-PaLM: Large Language Models for Medical Question Answering - Singhal et al., arXiv 2022
Capabilities of GPT-4 on Medical Challenge Problems - Nori et al., arXiv 2023
MedlinePlus Medical Encyclopedia - https://medlineplus.gov/
Clinical Practice Guidelines Database - https://www.guidelines.gov/

LMS Toolkit

Tue, 13 Jan 2026 13:00:00 -0800

The EduLinq LMS Toolkit is a suite of tools used by several courses at UCSC to interact with LMS’s (e.g. Canvas) from the command line or Python. A Learning Management System (LMS) is a system that institutions use to manage courses, assignments, students, and grades. The most popular LMSs are Canvas, Blackboard, Moodle, and Brightspace. These tools can be very helpful, especially from an administrative standpoint, but can be hard to interact with. They can be especially difficult when instructors and TAs want to do something that is not explicitly supported by their built-in GUIs (e.g., when an instructor wants to use a special grading policy). The LMS Toolkit project is an effort to create a single suite of command-line tools (along with a Python interface) to connect to all the above mentioned LMSs in a simple and uniform way. So, not only can instructors and TAs easily access the modify the data held in an LMS (like a student’s grades), but they can also do it the same way on any LMS. The LINQS Lab has made many contributions to the maintain and improve the Quiz Composer.

Currently, the LMS Toolkit supports Canvas, Moodle, and Blackboard. But, the degree of support for each LMS varies.

All students interested in LINQS projects for OSRE/GSoC 2026 should fill out this form. Towards the end of the application window, we will contact those who we believe to be a good fit for a LINQS project. The form will stop accepting responses once the application window closes. Do not post on any of the project repositories about OSRE/GSoC (e.g., comment on an issue that you want to tackle it as a part of OSRE/GSoC 2026). Remember, these are active repositories that were not created for OSRE/GSoC.

Advanced LMS Support

Topics: Backend Teaching Tools API
Skills: software development, backend, rest api, data munging, http request inspection, python
Difficulty: Moderate
Size: Medium or Large (175 or 350 hours)
Mentors: Eriq Augustine, Batuhan Salih, Lise Getoor

The LMS Toolkit already has basic read-write support for many core pieces of LMS functionality (e.g., working with grades and assignments). However, there are still many more features that can be supported such as group management, quiz management, quiz statistics, and assignment statuses.

The task for this project is to choose a set of advanced features (not limited to those features mentioned above), design an LMS-agnostic way to support those features, and implement those features. The flexibility in the features chosen to implement account for the variable size of this project.

See Also:

Repository for LMS Toolkit
GitHub Issues

New LMS Support: Brightspace

Topics: Backend Teaching Tools API
Skills: software development, backend, rest api, data munging, http request inspection, python
Difficulty: Challenging
Size: Large (350 hours)
Mentors: Eriq Augustine, Batuhan Salih, Lise Getoor

The goal of the LMS toolkit is to provide a single interface for all LMSs. D2L Brightspace is one of the more popular LMSs. Naturally, the LMS Toolkit wants to support Brightspace as well. However, a challenge in supporting Brightspace is that it is not open source (unlike Canvas and Moodle). Therefore, support and testing on Brightspace may be very challenging.

The task for this project is to add basic support for the Brightspace LMS. It is not necessary to support all the same features that are supported for other LMSs, but at least the core features of score and assignment management should be implemented. The closed-source nature of Brightspace makes this a challenging and uncertain project.

See Also:

Lynx Grader

Tue, 13 Jan 2026 13:00:00 -0800

The EduLinq Lynx Grader (also referred to as “autograder”) is an open source tool used by several courses at UCSC to safely and quickly grade programming assignments. Grading student code is something that may seem simple at first (you just need to run their code!), but quickly becomes exceeding complex as you get more into the details. Specifically, grading a student’s code securely while providing the “last mile” service of getting code from students and sending results to instructors/TAs and the course’s LMS (e.g., Canvas) can be very difficult. The Lynx Grader provides all of this in a free and open source project. The LINQS Lab has made many contributions to the maintain and improve the Lynx Grader.

As an open source project, there are endless opportunities for development, improvements, and collaboration. Here, we highlight some specific projects that will work well in the summer mentorship setting.

LLM Detection

Topics: AI/ML LLM Research Backend
Skills: software development, backend, systems, data munging, go, docker
Difficulty: Challenging
Size: Large (350 hours)
Mentors: Eriq Augustine, Fabrice Kurmann, Lise Getoor

As Large Language Model (LLM) tools like ChatGPT become more common and powerful, instructors need tools to help determine if students are the actual authors of the code they submit. More classical instances of plagiarism are often discovered by code similarity tools like MOSS. However these tools are not sufficient for detecting code written not by a student, but by an AI model like ChatGPT or GitHub Copilot.

The task for this project is to create a system that provides a score indicating the system’s confidence that a given piece of code was written by an AI tool and not a student. This will supplement the existing code analysis tools in the Lynx Grader. There are many approaches to completing this task that will be considered. A more software development approach can consist of levering exiting systems to create a production-ready system, whereas a more research approach can consist of creating a novel approach complete with a paper and experiments.

There has been previous work on this issue, where a student did a survey of existing solutions, collection of initial datasets, and exploratory experiments on possible directions. This project would build off of this previous work.

See Also:

Code Analysis GUI

Topics: Frontend
Skills: software development, frontend, data munging, js, css, go
Difficulty: Easy
Size: Medium or Large (175 or 350 hours)
Mentors: Eriq Augustine, Fabrice Kurmann, Lise Getoor

The Lynx Grader has existing functionality to analyze the code in a student’s submission for malicious content. Relevant to this project is that the Lynx Grader can run a pairwise similarity analysis against all submitted code. This is how most existing software plagiarism systems detect offending code. The existing infrastructure provides detailed statistics on code similarity, but does not currently have a visual way to display this data.

The task for this project is to create a web GUI using the Lynx Grader REST API to display the results of a code analysis. The size of this project depends on how many of the existing features are going to be supported by the web GUI.

See Also:

Web GUI

Topics: Frontend
Skills: software development, frontend, js, css
Difficulty: Easy
Size: Medium or Large (175 or 350 hours)
Mentors: Eriq Augustine, Fabrice Kurmann, Lise Getoor

The Lynx Grader contains dozens of API endpoints, most directly representing a piece of functionality exposed to the user. All of these features are exposed in the Lynx Grader’s Python Interface. However, the Python interface is a purely command-line interface. And although command-line interface are objectively (read: subjectively) the best, a web GUI would be more accessible to a wider audience. The autograder already has a web GUI, but it does not cover all the features available in the Lynx Grader.

The task for this project is to augment the Lynx Grader’s web GUI with more features. Specifically, add support for more tools used to create and administer courses.

See Also:

Quiz Composer

Tue, 13 Jan 2026 13:00:00 -0800

The EduLinq Quiz Composer (also called the “Quiz Generator”) is a tool used by several courses at UCSC to create and maintain platform-agnostic quizzes (including exams and worksheets). Knowledge assessments like quizzes, exams, and tests are a core part of the learning process for many courses. However maintaining banks of questions, collaborating on new questions, and converting quizzes to new formats can use up a lot of time, taking time away from actually working on improving course materials. The Quiz Composer helps by providing a single text-based format that can be stored in a repository and “compiled” into many different formats including: HTML, LaTeX, PDF, Canvas, GradeScope, and QTI. The LINQS Lab has made many contributions to the maintain and improve the Quiz Composer.

Canvas Import

Topics: Backend Teaching Tools API
Skills: software development, backend, rest api, data munging, http request inspection, python
Difficulty: Moderate
Size: Medium (175 hours)
Mentors: Eriq Augustine, Lucas Ellenberger, Lise Getoor

The Quiz Composer houses quizzes and quiz questions in a simple and unambiguous format based on JSON and Markdown (specifically, the CommonMark specification). This allows the Quiz Composer to unambiguously create versions of the same quiz in many different formats. However, creating a quiz in the Quiz Composer format can be a daunting task for those not familiar with JSON or Markdown. Instead, it would be easier for people to import quizzes from another format into the Quiz Composer format, and then edit it as they see fit. Unfortunately not all other quiz formats, namely Canvas in this case, are unambiguous.

The task for this project is to implement the functionality of importing quizzes from Canvas to the standard Quiz Composer format. The unambiguous nature of Canvas quizzes makes this task non-trivial, and adds an additional element of design decisions to this task. It will be impossible to import quizzes 100% correctly, but we want to be able to get close enough that most people can import their quizzes without issue.

See Also:

Google Forms Export

Topics: Backend Teaching Tools API
Skills: software development, backend, rest api, data munging, python
Difficulty: Moderate
Size: Medium (175 hours)
Mentors: Eriq Augustine, Lucas Ellenberger, Lise Getoor

The Quiz Composer can export quizzes to many different formats, each with a varying level of interactivity and feature support. For example, quizzes can be exported to PDFs which will be printed and the students will just write down their answers to be checked in the future. Quizzes can also be exported to interactive platforms like Canvas where students can enter answers that may be automatically checked with feedback immediately provided to the student. On potential platform with functionality somewhere between the above two examples is Google Forms. “Forms” (an entity on Google Forms) can be something like a survey or (as of more recently) a quiz.

The task for this project is to add support for exporting quizzes from the Quiz Composer to Google Forms. There is a large overlap in the quiz features supported in Canvas (which the Quiz Composer already supports) and Google Forms, so most settings should be fairly straightforward. There may be some design work around deciding what features are specific to one quiz platform and what features can be abstracted to work across several platforms.

See Also:

Template Questions

Topics: Backend Teaching Tools API
Skills: software development, backend, data munging, python
Difficulty: Moderate-Challenging
Size: Large (350 hours)
Mentors: Eriq Augustine, Lucas Ellenberger, Lise Getoor

Questions in the Quiz Composer are described using JSON and Markdown files which contain the question prompt, possible answers, and the correct answer. (Of course there are many differ question types, each with different semantics and requirements.) However, a limitation of this is that each question is always the same. You can have multiple copies of a question with slightly different prompts, numbers, and answers; but you are still limited to each question being static and unchanging. It would be useful to have “template questions” that can dynamically create static questions from a template and collection of replacement data.

The task for this project is to add support for the “template questions” discussed above. Much of the high-level design work for this issue has already been completed. But there is still the implementation and low-level design decision left to do.

See Also:

Scenic-RoboSuite Integration: Building the First Working Prototype

Mon, 29 Sep 2025 00:00:00 +0000

I’m Sahil, presenting the first working prototype of the Scenic-RoboSuite integration. This project is being mentored by Daniel Fremont and Eric Vin.

After months of development, we have achieved a functional prototype of the Scenic-RoboSuite interface. Researchers can now write basic declarative robotic manipulation scenarios in Scenic that execute with physics simulation in RoboSuite. While still in development, the prototype demonstrates the feasibility and potential of bridging probabilistic scenario generation with detailed robot control.

Major Achievements

MJCF XML Injection

The interface introduces direct MJCF XML support, allowing Scenic to build RoboSuite-native manipulable objects from raw XML definitions. Users can define custom objects with complex mesh geometries, textures, and physics properties directly in their Scenic scenarios:

dragon_xml = '''
<mujoco>
 <asset>
 <mesh file="dragon.stl" scale="0.01 0.01 0.01"/>
 <texture file="dragon_texture.png"/>
 </asset>
 <worldbody>
 <body name="object">
 <geom mesh="dragon_mesh" type="mesh"/>
 </body>
 </worldbody>
</mujoco>
'''

dragon = new CustomObject with mjcfXml dragon_xml

The system automatically handles collision geometry generation, joint creation for physics, and asset file resolution.

Complex Mesh Object Support

Import and manipulate arbitrary 3D models (STL, OBJ) with automatic mesh repair and texture mapping. The interface resolves file paths relative to Scenic files, copies assets to temporary directories for MuJoCo, and converts textures (JPG to PNG) when needed. This enables using custom robotic tools, industrial parts, or any 3D model in manipulation scenarios.

Custom Arena Definition

Define complete custom environments using MJCF XML, extending beyond RoboSuite’s built-in arenas:

custom_arena = new CustomArena with arenaXml localPath("warehouse.xml")

This allows creating specialized workspaces, factory floors, or research-specific environments while maintaining full physics simulation.

Multi-Robot Support

The interface handles multiple robots operating in the same workspace:

robot1 = new Panda at (-0.5, 0, 0)
robot2 = new UR5e at (0.5, 0, 0)
table = new Table at (0, 0, 0.425)

Each robot maintains independent control and can execute coordinated or individual behaviors.

Built-in Manipulation Behaviors

Ready-to-use behaviors for immediate testing and development:

MoveToPosition - Precise end-effector positioning
PickObject - Automated grasping with approach and closure
LiftToHeight - Controlled lifting to target heights
PickAndLift - Complete pick-and-place sequence

These behaviors use Operational Space Control (OSC) for intuitive 3D movement commands.

Extended Environment Configuration

The interface extends RoboSuite’s configurability through Scenic’s parameter system:

param controller_config = {'type': 'OSC_POSITION', 'impedance': 'low'}
param camera_view = 'robot0_eye_in_hand'
param lite_physics = True # Faster simulation for testing

Example: Probabilistic Pick-and-Place

model scenic.simulators.robosuite.model

# Randomly position cube on table
table = new Table at (0.6, 0, 0.425)
cube = new Box on table,
 with color (1, 0, 0, 1),
 with position (Uniform(-0.2, 0.2), Uniform(-0.2, 0.2), _)

# Robot adapts to random cube position
behavior AdaptivePickup():
 do PickAndLift(cube, height=1.1)

ego = new Panda at (0, 0, 0),
 with behavior AdaptivePickup()

Each scenario run generates a different cube position, testing the robot’s adaptive capabilities.

Challenges Overcome

Understanding Dual Architecture Paradigms

RoboSuite and Scenic operate on fundamentally different principles. RoboSuite builds environments imperatively through MuJoCo XML composition, expecting complete scene specification upfront. Scenic generates scenes probabilistically through constraint solving, requiring geometric knowledge before simulation. Bridging these required developing a two-pass system where we first extract geometry from a temporary RoboSuite environment, update Scenic’s understanding, then create the final simulation. This architectural mismatch touched every aspect of the integration, from object creation to property updates.

Discovering and Extending ManipulationEnv

RoboSuite’s documentation focuses on using pre-built tasks, not creating custom environments. Through extensive source code analysis, we discovered that ManipulationEnv was the key - it accepts robots as configuration while allowing customizable arenas and objects as components. This class became our foundation, but required significant extension. We implemented ScenicManipulationEnv to intercept Scenic’s object configurations, handle dynamic arena selection (EmptyArena vs MultiTableArena based on scene content), and manage the complex initialization sequence where robots, arenas, and objects must be assembled in specific order for MuJoCo compilation.

XML to 3D Mesh Pipeline

Converting MJCF XML to usable 3D meshes proved complex. MuJoCo uses XML to describe geometry, but Scenic needs actual mesh data for collision checking. We built a multi-stage pipeline: First, ElementTree parses the XML to extract mesh references and primitive definitions. Then, we handle two paths - for mesh files, we load STL/OBJ files with trimesh and apply XML-specified transformations; for primitives (boxes, cylinders), we generate meshes programmatically. The challenge intensified with composite objects - a table might have a box tabletop and four cylinder legs. We developed ComponentExtractor to analyze the MuJoCo scene graph, identify related geometries through naming patterns and hierarchy, and export each component as a separate GLB file with proper world transforms preserved.

File Path Resolution Discrepancies

Scenic and RoboSuite handle file paths completely differently. Scenic uses localPath() for paths relative to the scenario file, while RoboSuite expects paths relative to its package structure or absolute paths. MJCF XML compounds this - mesh references can be relative to the XML file location, not the calling code. We implemented a sophisticated path resolution system: detect whether paths come from embedded XML (relative to Scenic file) or external XML files (relative to XML location), copy all referenced assets (meshes, textures) to temporary directories accessible to MuJoCo, and handle texture format conversion (JPG to PNG) when needed. This system transparently manages assets whether they’re in the Scenic project, RoboSuite package, or absolute paths, making the interface truly portable.

Impact and Applications

This bridge enables:

Research: Generate diverse manipulation scenarios for robot learning algorithms
Testing: Validate robotic systems against probabilistic task variations
Development: Rapid prototyping of manipulation tasks without manual scene setup
Education: Teach robotics concepts through declarative scenario specification

The integration makes complex robotic simulations accessible through Scenic’s intuitive language while preserving RoboSuite’s detailed physics and control capabilities.

Documentation and Resources

The project includes:

example scenarios demonstrating all features
Comprehensive STATUS.md tracking working features and known issues
Technical documentation in docs/ covering architecture and troubleshooting
Mesh extraction utilities for pre-processing and caching

Current Status and Future Work

This prototype demonstrates that the Scenic-RoboSuite bridge is viable and functional. Basic features are working reliably:

Single-robot manipulation scenarios execute successfully
MJCF XML injection creates custom objects
Pick-and-place behaviors operate consistently
Multi-robot support functions in controlled scenarios

However, significant work remains:

Stability improvements: Some features work intermittently and need refinement
Velocity tracking: Full implementation awaits framework updates
Multi-robot coordination: Advanced synchronization primitives needed
Performance optimization: Mesh extraction and caching can be streamlined
Extended testing: More diverse scenarios and edge cases need validation

The prototype serves as a proof of concept, showing that probabilistic scenario specification can successfully drive physics-based robot simulation. The architecture is sound, the core features function, and the path forward is clear.

Conclusion

This working prototype of the Scenic-RoboSuite integration represents significant progress toward bridging probabilistic programming with robotic simulation. We’ve successfully demonstrated that declarative scenario specification can control detailed physics simulation, opening new possibilities for robotic system development and testing.

While not yet production-ready, the prototype provides a solid foundation for future development. Researchers can begin experimenting with basic manipulation scenarios, developers can test the interface with their use cases, and the community can contribute to making this bridge more robust and feature-complete.

The challenges overcome - from understanding dual architectures to implementing XML-to-mesh pipelines - have resulted in a functional system that validates our approach. This prototype proves that Scenic’s elegant scenario language and RoboSuite’s detailed physics can work together, setting the stage for a powerful new tool in robotics research and development.

Final Report: CarbonCast — An end-to-end consumption-based Carbon Intensity Forecasting service

Mon, 15 Sep 2025 00:00:00 +0000

Hi everyone—this is my final report for CarbonCast, mentored by Professor Abel Souza. Back in June, my goal was simple to say and harder to pull off: help people see when the grid is cleaner and make it easy to act on that information. Over the summer I turned CarbonCast from a research prototype into something you can open, click, and rely on: a containerized backend, a clean API, and a fast, friendly map UI.

Background

CarbonCast forecasts the carbon intensity of electricity (gCO₂e/kWh) using grid data and weather. Earlier versions were accurate but difficult to run and even harder to use outside a research context. My OSRE focus was to make CarbonCast usable for real people: provide a standard API, build a web UI that feels responsive, and package everything so it starts quickly and keeps itself healthy.

Goals

I centered the work around four goals. First, I wanted to ship an end-to-end containerized stack—data collection, validation, storage, API, and UI—that someone else could run without digging through my notes. Second, I aimed to expand coverage beyond a handful of regions so the map would be genuinely useful. Third, I needed to make it reliable, with retries, monitoring, and graceful fallbacks so the system could run for weeks without babysitting. Finally, I wanted to lay the groundwork for a consumption-based signal, because imports from neighboring regions also shape a region’s true emissions picture.

What I built

By the end of the program, CarbonCast runs as a containerized backend + API + web app that you can bring up with Docker. The pipelines now reach 85+ regions, and the UI currently exposes 58+ while we finish integrating the rest. The API offers straightforward endpoints for current conditions and multi-day views, plus region metadata so clients can discover what’s available. The UI presents an interactive choropleth map with a side panel for the energy mix and a simple timeline to move between past, now, and the next few days. To keep things feeling snappy, I tuned caching so “now” data updates quickly while historical and forecast views load instantly from cache. I also added a small “mission control” dashboard that shows what updated, what failed, and how the system recovered, which makes maintenance far less mysterious.

How it works

Fresh weather and grid data arrive on a regular schedule. The system checks each file for sanity, stores it, and serves it through a clean API. The React app calls that API and paints the map. Hovering reveals regional details; clicking opens a richer panel with the energy mix and trends; the timeline lets you scrub through hours naturally. In short, the path is fresh data → API → map, and each step is designed to be obvious and quick.

Behind the scenes, I extended the existing Django backend with a SQLite path so the UI works out of the box on a laptop. For production, you can point the same code at Postgres or MySQL without changing the UI. This choice made local testing easy while leaving room for scale later.

Highlights

A few moments stand out. The first time the dashboard flipped from red to green on its own—after the system retried through a wave of timeouts—was a turning point. Clicking across the map and getting instant responses because the right data was cached felt great too. And packaging everything so another person can run it without asking me for help might be the biggest quality-of-life win for future contributors.

Challenges

The first big hurdle was refactoring the old vanilla-JS interface. The original UI worked, but it was dated and hard to extend. I rebuilt it as a modern React + TypeScript app with a cleaner component structure and a fresh look—think glassmorphic panels, readable color scales, and a layout that feels consistent on both laptops and smaller screens. Moving to this design system made the codebase far easier to maintain, theme, and iterate on.

The next challenge was performance under real-time load. With dozens of regions updating, it was easy to hit API limits and make the UI feel jittery. I solved this by adding a smart caching layer with short, volatility-aware timeouts, request de-duplication, and background prefetching. That combination dramatically reduced round-trips, essentially eliminated rate-limit hits, and made the map feel responsive even as you scrub through time. The result is a UI that can handle many simultaneous updates without hiccups.

Finally, there were plenty of stubborn UI bugs. Some regions wouldn’t color even when data was available, certain charts refused to render, and a few elements flickered or never showed up. Most of this came down to learning React state management in a real project: taming race conditions, canceling in-flight requests when users navigate, and making sure state only updates when fresh data actually arrives. Fixing those issues taught me a lot about how maps re-paint, how charts expect their data, and how to keep components simple enough that they behave the way users expect.

What didn’t make the cut (yet)

I designed—but did not finish—per-region plug-in models so each grid can use the approach that fits it best. We decided to ship a stable, deployable service first and reserve that flexibility work for the next phase. The design is written down and ready to build.

Links and resources:

Project page: CarbonCast
Proposal: https://ucsc-ospo.github.io/report/osre25/ucsc/carboncast/20250710-tanushsavadi/
Midterm blog: https://ucsc-ospo.github.io/report/osre25/ucsc/carboncast/20250803-tanushsavadi/
Backend/API (branch): https://github.com/carbonfirst/CarbonCast/tree/django_apis_sqlite
Frontend/UI: https://github.com/carbonfirst/CarbonCastUI/tree/main

What’s next

My next steps are clear. I want to finish the per-region model plug-ins so grids can bring their own best forecasting logic. I also plan to carry the consumption-based signal end-to-end, including imports and interconnects surfaced directly in the UI. Finally, I’ll harden the system for production by enabling auth and throttling and by moving to a production-grade database where appropriate.

Thank you

Huge thanks to Professor Abel Souza for steady mentorship and to the OSRE community for thoughtful feedback. The most rewarding part of this summer was watching a research idea become something people can click on—and use to make cleaner choices.

Midterm Report: Learning and Building ORB

Thu, 07 Aug 2025 00:00:00 +0000

Project Overview

UC ORB is an open-source platform developed to increase visibility and engagement with open source projects across the University of California system.

By providing a structured and searchable repository browser, ORB makes it easier for researchers, students, and collaborators to discover relevant open source initiatives, track their impact, and connect with contributors. It also helps campuses demonstrate the value of their open source output to potential funders and institutional partners.

Progress So Far

Significant progress has been made in building out core features of the ORB Showcase platform:

Searching and Filtering Options

Users can now search and filter repositories using multiple criteria:

Development Team / UC Campus
Programming Language
License Type
Topic / Domain Area

These filtering tools make it easy to explore the growing set of repositories in a meaningful and personalized way.

Pagination has been added to ensure scalability and smooth performance, even as the number of projects continues to grow.

Repository Details View

Each repository page now displays rich metadata and contextual information, including:

README preview – offering a quick look at the project’s purpose and usage

License – clearly indicating how the project can be used or adapted

Contributors and Funders – acknowledging the people and institutions behind the work

What’s Next

As we prepare UC ORB for public launch, we’re focused on improving the backend workflow and addressing some key challenges:

⚙️ GitHub Workflow Challenges Creating a GitHub-first workflow for adding repositories is powerful, but also tricky:

GitHub Actions cannot be triggered by API calls from a backend directly, which limits automation via server-side tools.

The GitHub bot has permission limitations, especially when it comes to interacting with PRs and validating submissions outside of standard GitHub UI flows.

I’m currently working on designing a more robust and maintainable workflow to handle these edge cases, including:

A standalone script that can add repositories directly to the database, bypassing the need for a pull request and enabling more flexible internal submissions.

Better logging and validation to ensure consistency between the file-based data model and the live PostgreSQL database.

Reflection

This project has been a great learning experience despite challenges with Frontend, Backend, GitHub Actions / Bots and APIs, it’s been exciting to build a platform that highlights open source work across the UC system.

I’m looking forward to what’s coming next as we get closer to launching ORB.

Midterm Report: Learning, Building, and Documenting Brahma

Tue, 05 Aug 2025 00:00:00 +0000

Project Overview

Brahma-XR is an open-source WebXR framework designed for building collaborative virtual environments especially those involving spatial data and scientific visualization.

What makes Brahma powerful is that the same codebase runs seamlessly across both the browser and XR devices like the Apple Vision Pro, Meta Quest 3, and VARJO. This makes it ideal for rapid prototyping and creating cross-platform immersive experiences.

Some of Brahma’s built-in features include:

Grab-and-pull locomotion
Raycasting and interaction
Avatar embodiment
Spatial rendering
Support for geospatial and data-driven visualizations

Brahma is intentionally lightweight, optimized to run even on low-compute devices—making immersive collaboration more accessible to everyone.

What Worked (and What Didn’t)

As Brahma transitioned from a private research repo to a public open-source project, a lot of important foundational work had to be done around documentation, packaging, and example previews.

There are two aspects that make Brahma especially unique:

Bipartite npm package structure – which requires detailed and thoughtful documentation.
Immersive, real-time examples – unlike typical libraries, Brahma’s examples aren’t just static demos. They are live, multi-user XR apps designed to be interacted with.

The first half of the project focused on setting the stage—structuring and preparing the framework for broader use.

🔧 Key Accomplishments

Learning Three.js
I spent time learning the fundamentals of Three.js—how it handles 3D rendering, scene setup, materials, cameras, and animations. I also explored how large-scale Three.js projects are organized, which helped me understand how Brahma’s example apps are built.
Setting up the project structure
I looked at the architecture of various open-source projects and used that knowledge to shape Brahma’s structure. The goal was to align with community best practices while keeping things clean and modular for future contributors.
Understanding npm packaging (especially bipartite)
Since Brahma includes both client- and server-side logic, I spent time understanding how multi-part npm packages are published and maintained. I explored best practices around versioning, distribution, and separating internal vs public modules.
Creating a documentation system
After exploring different approaches (and with my mentor’s help), I set up a static documentation site using JSDoc with the Docdash theme. The current version includes guides, API references, and contribution instructions. This is just the beginning—the docs will evolve as the community grows.

What’s Next

In the second half of the project, I’ll be focusing on:

Building a routing system
For both documentation and example apps, so that users can easily browse through different components and use cases.
Setting up UI and 3D infrastructure
To make it easier for others to start building apps with Brahma by providing clean base layers for interface and spatial development.
Prepping for the first public release
Publishing the Brahma NPM package along with a curated set of featured examples and contributor-friendly documentation—making it easier for developers to get started and contribute.

Reflection

This project has truly been the highlight of my summer. Learning about WebXR, Three.js, and open-source workflows has been both exciting and rewarding. Every challenge taught me something new.

I am specially greatfull to my mentor Samir Ghosh for his constant support, patience, and guidance. It’s been a privilege learning from you!

I’m looking forward to what’s coming next as we get closer to the first public release of Brahma!

Midterm blog: CarbonCast Midpoint Update: From Vision to Reality

Sun, 03 Aug 2025 00:00:00 +0000

A few months ago, I shared my vision for making carbon intensity forecasts more accessible through the CarbonCast project. My proposal under the mentorship of Professor Abel Souza aims to build an API that makes carbon intensity forecasts more accessible and actionable. I had two main goals: expand CarbonCast to work with more regional electricity grids, and transform it from a research project into something that could actually run and be interacted with in the real world.

Today, I’m excited to share that we’ve not only hit those goals – we’ve exceeded them in ways I didn’t expect.

What We’ve Built So Far

Remember how I mentioned that CarbonCast needed to support more regional grids? Well, we’ve gone big. The system now covers 85+ regions across two continents. We’re talking about major US grid operators like ERCOT (Texas), CISO (California), PJM (Mid-Atlantic), MISO (Midwest), and NYISO (New York), plus we’ve expanded into European countries like Germany, France, Spain, and the UK.

But here’s the thing – collecting weather data for carbon intensity forecasting isn’t as simple as just downloading a few files. Each region needs four different types of weather data: solar radiation (for solar power predictions), wind patterns (for wind power), temperature and humidity (for energy demand), and precipitation (which affects both supply and demand). That means we’re managing data collection for over 340 different combinations of regions and weather variables.

The Automation Challenge

When I started this project, I quickly realized that manually managing data collection for this many regions would be impossible. We’re talking about thousands of data requests, each taking time to process, with various things that can go wrong along the way.

So we built something I’m really proud of: an intelligent automation system that handles 95% of the work without human intervention. That means 19 out of every 20 data collection tasks happen automatically, even when things go wrong.

The system is smart about it too. It knows when to speed up data collection, when to slow down to avoid overwhelming the servers, and how to recover when errors happen. We’ve achieved 99% data completeness, which means almost every piece of weather data we need actually makes it into our system successfully.

Making It Production-Ready

The biggest challenge was taking CarbonCast from a research project that worked on my laptop to something that could run reliably for weeks without me babysitting it. This meant building in all the boring but crucial stuff that makes software actually work in the real world.

We created a comprehensive error handling system that can automatically recover from 95% of the problems it encounters. Network hiccups, server timeouts, data format changes – the system handles these gracefully and keeps running.

There’s also a real-time monitoring dashboard that shows exactly what’s happening across all regions. I can see which areas are collecting data successfully, which ones might be having issues, and get alerts if anything needs attention. It’s like having a mission control center for carbon data.

The Dashboard: Mission Control for Carbon Data

Let me show you what this monitoring system actually looks like. We built a comprehensive web dashboard that gives us real-time visibility into everything that’s happening:

The main dashboard showing real-time system metrics and status across all regions

The dashboard shows key metrics at a glance – total requests, completion rates, and active regions. But it goes much deeper than that. You can drill down into individual requests to see their complete lifecycle:

Detailed view of individual data requests showing processing timelines and status

Each request card shows everything from the initial request time to when the data becomes available for download. This level of visibility is crucial when you’re managing hundreds of data requests across different regions and weather variables.

The regional analytics view shows how well we’re doing across different grid operators:

Regional breakdown showing completion status across different electricity grid operators

What I’m particularly proud of is the error handling dashboard. When things do go wrong (which they inevitably do with any large-scale data system), we can see exactly what happened and how the system recovered:

Error tracking and resolution system showing 100% success rate in region mapping

The fact that we’re showing “No unknown regions found” means our coordinate-based region detection system is working perfectly – every weather data request gets properly mapped to the right electricity grid.

The Technical Foundation

Under the hood, we’ve built what I’d call enterprise-grade infrastructure. The system can run autonomously for weeks, automatically organizing data by region and weather type, managing storage efficiently, and even optimizing its own performance based on what it learns.

We’ve also created comprehensive testing systems to make sure everything works reliably. When you’re dealing with data that people might use to make real decisions about when to charge their electric vehicles or run their data centers, reliability isn’t optional.

The architecture follows a modular, service-oriented design with clear separation between data collection, processing, monitoring, and user interfaces. This makes it much easier to maintain and extend as we add new features.

Why This Matters

All of this infrastructure work might sound technical, but it’s directly connected to the original vision: making carbon intensity forecasts accessible to everyone.

With this foundation in place, we can now provide reliable, up-to-date weather data for carbon intensity forecasting across major electricity grids in North America and Europe. That means developers building carbon-aware applications, companies trying to reduce their emissions, and individuals wanting to time their energy use for lower environmental impact all have access to the data they need.

What’s Next: Breaking Down CarbonCast

The next phase is where things get really exciting. Now that we have this solid data collection foundation, we’re going to break down CarbonCast itself into modular components. This will make it easier for developers to integrate carbon intensity forecasting into their own applications, whether that’s a smart home system, a cloud computing platform, or a mobile app that helps people make greener energy choices.

Looking Back

When I started this project, I knew we needed better infrastructure for carbon data. What I didn’t expect was how much we’d end up building – or how well it would work. We’ve created something that can reliably collect and organize weather data across two continents, handle errors gracefully, and run without constant supervision.

More importantly, we’ve built the foundation that will make it possible for anyone to access accurate carbon intensity forecasts. Whether you’re a developer building the next generation of carbon-aware applications or someone who just wants to know the best time to do laundry to minimize your environmental impact, the infrastructure is now there to support those decisions.

The vision of making carbon data accessible and actionable is becoming reality, one automated data collection at a time.

Impact Beyond Research

This work builds directly on the foundation of Multi-day Forecasting of Electric Grid Carbon Intensity using Machine Learning, transforming research into practical, real-world infrastructure. We’re not just making carbon intensity forecasts more accurate – we’re making them accessible to everyone who wants to reduce their environmental impact.

The open-source nature of CarbonCast means that anyone can run, contribute to, and benefit from this work. Whether you’re a developer building carbon-aware applications, a policymaker working on grid decarbonization strategies, or a sustainability-conscious individual looking to reduce your carbon footprint, the tools are now there to make informed, impactful choices.

Looking ahead, I’m excited to see how this infrastructure will enable the next generation of carbon-aware computing and smart energy decisions.

Robot Manipulation with Scenic-RoboSuite

Wed, 30 Jul 2025 00:00:00 +0000

We’re Sahil, continuing work on the Scenic-RoboSuite integration for GSoC 2025. This project is mentored by Daniel Fremont and Eric Vin.

Since the last update, the Scenic-RoboSuite interface has made significant progress. The bidirectional bridge is now functional - robots can read sensor data and execute behaviors based on observations. However, these features are still in early stages and we’re working on making them more stable and consistent.

We’ve integrated RoboSuite’s Operational Space Control into Scenic. This control method lets you command the robot’s hand directly in 3D space (like “move 10cm left”) instead of calculating complex joint rotations. While the integration works, it’s rough around the edges and we’re currently focused on stabilizing it across different scenarios.

The main challenge was architectural - RoboSuite expects all robot commands bundled together each timestep, while Scenic processes them one by one. We solved this with a pending actions system that collects everything first, then executes in one go. Time synchronization was another challenge, matching Scenic’s steps with MuJoCo’s physics.

We’ve implemented a basic pick-and-place behavior for basic testing. The robot reads sensor data, calculates where to move, and adjusts continuously. It can successfully grasp and lift objects, though consistency varies between runs. The system supports three robot models and works with RoboSuite’s pre-built environments.

Custom world building is currently on hold. We’ve decided to focus on integrating existing RoboSuite features into Scenic first, then build Scenic’s capabilities like dynamic scenario randomization on top. For our first prototype, we’re aiming to extend the pick-and-place behavior into a full randomization demo - Scenic will randomly position the cube each run, and the robot will adapt to find and grasp it regardless of location.

The next two weeks focus on stabilizing current features and preparing this randomized scenario prototype. Expanding the behavior library and supporting additional environments will come in future phases after we have a solid foundation.

The core bridge between Scenic and RoboSuite is operational, but there’s significant work ahead to make it reliable and user-friendly.

AIDRIN Privacy-Centric Enhancements: Backend & UX Upgrades

Fri, 25 Jul 2025 00:00:00 +0000

⏱️ Reading time: 5–6 minutes

Hey everyone,

If you’ve ever wondered what it takes to make AI data pipelines not just smarter, but safer and more transparent, you’re in the right place. The last few weeks working on AIDRIN for GSoC have been a deep dive into the engine room of privacy and backend systems that power the AIDRIN project. My focus has been on building out the core privacy infrastructure and backend features that power AIDRIN’s ability to give users real, actionable insights about their data. It’s been challenging, sometimes messy, but incredibly rewarding to see these changes make a tangible difference.

Having Dr. Jean Luca Bez and Prof. Suren Byna as mentors, along with the support of the entire team, has truly made all the difference. Their guidance, encouragement, and collaborative spirit have been a huge part of this journey, whether I’m brainstorming new ideas or just trying to untangle a tricky bug.

Privacy Metrics: Making Data Safer

A major part of my work has been putting data privacy at the front and center in AIDRIN. I focused on integrating essential privacy metrics like k-anonymity, l-diversity, t-closeness, and more, making sure they’re not just theoretical checkboxes, but real tools that users can interact with and understand. Now, these metrics are fully wired up in the backend and visualized in AIDRIN, so privacy risks are no longer just a vague concern. They are something AI data preparers can actually see and act on. Getting these metrics to work seamlessly with different datasets and ensuring their accuracy took some serious backend engineering, but the payoff has been worth it.

Speeding Things Up (So You Don’t Have To Wait Around)

As AIDRIN started handling bigger datasets, some of the calculations can be time-consuming because data has to be accessed every time a metric is computed. To address this, I added caching for previously computed metrics, like class imbalance and privacy checks, and set up asynchronous execution with Celery and Redis. This should make the app super responsive. Rather than waiting for heavy computations to finish, one can start taking notes about other metrics or explore different parts of the app while their results are loading in the background. It’s a small change, but it helps keep the workflow moving smoothly.

Small Touch Ups That (Hopefully) Make a Big Difference

I also spent time on the details that make the app easier to use. Tooltips now explain what the privacy metrics actually mean, error messages are clearer, and there’s a new cache info page where you can see and clear your cached data. The sensitive attribute dropdown is less confusing now, especially if you’re working with quasi-identifiers. These tweaks might seem minor, but they add up and make the app friendlier for everyone.

Docs, Docs, Docs

I’m a big believer that good documentation is just as important as good code. I updated the docs to cover all the new features, added citations for the privacy metrics, and made the install process a bit more straightforward. Hopefully, this means new users and contributors can get up to speed without too much hassle.

Huge Thanks to My Mentors and the Team

I really want to shine a light on Dr. Bez, Prof. Byna, and the entire AIDRIN team here. Their encouragement, practical advice, and collaborative spirit have been a huge part of my progress. Whether I’m stuck on a bug, brainstorming a new feature, or just need a second opinion, there’s always someone ready to help me think things through. Their experience and support have shaped not just the technical side of my work, but also how I approach problem-solving and teamwork.

What’s Next?

Looking ahead, I’m planning to expand AIDRIN’s support for multimodal datasets and keep refining the privacy and fairness modules. There’s always something new to learn or improve, and I’m excited to keep building. If you’re interested in data quality, privacy, or open-source AI tools, I’d love to connect and swap ideas.

Thanks for reading and for following along with my GSoC journey. I’ll be back soon with more updates!

This is the second post in my 3-part GSoC series with AIDRIN. Stay tuned for the final update.

LLMSeqRec: LLM Enhanced Contextual Sequential Recommender

Tue, 22 Jul 2025 10:15:56 -0700

Midway Through OSRE

My Journey with LLMSeqRec

Hello from the Midpoint!

Hi everyone! I’m Connor Lee, a student at NYU studying Computer Science and Mathematics, and I’m excited to share the progress I’ve made halfway through the Open Source Research Experience (OSRE) with my project: LLMSeqRec – a large language model-enhanced sequential recommender system.

Over the past several weeks, I’ve had the opportunity to explore the intersection of recommender systems and large language models (LLMs), and it’s been a deep, challenging, and rewarding dive into building smarter, more contextual recommendation engines.

What is LLMSeqRec?

LLMSeqRec stands for LLM-Enhanced Contextual Sequential Recommender. Traditional sequential recommendation systems like SASRec are great at capturing patterns from user-item interactions, but they often fall short in two areas: understanding semantic context (e.g., item descriptions, reviews) and dealing with cold-start problems.

LLMSeqRec aims to address this by incorporating pretrained LLM embeddings into the recommendation pipeline. The goal is to enhance models like SASRec with semantic signals from text (like product reviews or titles), allowing them to better model user intent, long-range dependencies, and generalize to new items or users.

Progress So Far

✅ Baseline SASRec Runs

To establish a benchmark, I successfully ran the original SASRec implementation (in PyTorch) using both the MovieLens 1M and Amazon Beauty datasets. After debugging initial data formatting issues and adjusting batch sizes for local CPU/GPU compatibility, I automated training with scripts that let me scale to 200+ epochs to acheive the best performance in both Colab and on my MacBook via CPU.

Note: At this stage, we have not yet integrated LLMs into the model. These baseline runs (SASRec) serve as the control group for evaluating the future impact of LLM-based enhancements.

What’s Next

As I enter the second half of the OSRE, I’ll be shifting gears toward LLM integration, model evaluation, and running LLM-powered sequential recommendations using product metadata and contextual information. Here’s what’s ahead:

Designing pipelines to extract and align textual metadata with item sequences
Integrating LLM-generated embeddings into the recommender model
Evaluating performance changes across different dataset characteristics

📊 Experimental Results

We have not yet utilized LLMs in our current experiments. The results below reflect our reproduced baseline performance of SASRec across datasets.

Below are the performance curves on different test sets, where we evaluate model performance every 20 epochs during training:

Beauty Dataset Performance

Hit@10 performance on the test set for the Beauty dataset (every 20 epochs)

Training loss for the Beauty dataset

NDCG@10 performance on the test set for the Beauty dataset (every 20 epochs)

ML-1M Dataset Performance

Training loss for the ML-1M dataset

Hit@10 performance on the test set for the ML-1M dataset (every 20 epochs)

NDCG@10 performance on the test set for the ML-1M dataset (every 20 epochs)

These results demonstrate that our baseline SASRec reproductions are converging as expected and will serve as a solid foundation for comparison once LLM integration is complete.

Closing Thoughts

This project has been an exciting journey into both research and engineering and I’m excited to explore LLM-powered embedding integration in the upcoming phase.

I’m incredibly grateful to my mentors Dr. Linsey Pang and Dr. Bin Dong for their support and guidance throughout the project so far. I’m looking forward to sharing more technical results as we work toward building smarter, more adaptable recommender systems.

CarbonCast

Thu, 10 Jul 2025 00:00:00 +0000

As part of the CarbonCast project, my proposal under the mentorship of Professor Abel Souza aims to build an API that makes carbon intensity forecasts more accessible and actionable.

Under the mentorship of Professor Abel Souza, my proposal is centered around building upon CarbonCast to create an API to enable user access and utilization of energy data in optimizing their electricity consumption. Before diving into the details of the project, I’d like to share a bit about my background.

About Me

Hi, I’m Tanush—a rising senior at the University of Massachusetts Amherst, majoring in Computer Science and Mathematics and graduating in Spring 2026. Currently, I’m an AI Intern for the Commonwealth of Massachusetts Department of Unemployment Assistance, where I’m developing an end-to-end retrieval-augmented generation (RAG) chatbot on AWS.

In the past, I’ve contributed to CarbonCast in a different capacity, designing a user interface to help visualize carbon intensity forecasts. I also worked at MathWorks as a Machine Learning Intern, where I collaborated in an AGILE environment to design and deploy predictive models that improved precision torque control and dynamic responsiveness in motor-driven robotic and industrial systems.

I’m excited to bring these experiences to this year’s GSoC project, where I’ll be building tools to make carbon data more accessible and actionable for everyone.

What is CarbonCast?

CarbonCast is a Python-based machine-learning library designed to forecast the carbon intensity of electrical grids. Carbon intensity refers to the amount of carbon emitted per kilowatt-hour (kWh) of electricity consumed. Developed in Python, the current version of CarbonCast delivers accurate forecasts in numerous regions by using historical energy production data of a particular geographical region, time of day/year, and weather forecasts as features.

However, there is no easy way to access, visualize, and utilize the data through a standard interface. In addition, much important information is left out and is not available to users. For instance, electricity grids often import electricity from neighboring regions, and so electricity consumption depends on both electricity generation and imports. Moreover, it is imperative for each energy source to utilize a tailored predictive mechanism. Consequently, any carbon optimization solution trying to reduce carbon emissions due to its electricity consumption will benefit more from following a consumption-based carbon intensity signal.

Unlike other third-party carbon services, CarbonCast’s model is open-sourced, allowing users to study, understand, and improve its behavior. This transparency invites public collaboration and innovation. It also contrasts sharply with proprietary services that often withhold both the logic behind their models and the data they are trained on.

Why This Matters

Electricity usage is one of the largest contributors to carbon emissions globally. Carbon intensity—the amount of carbon emitted per kilowatt-hour of electricity consumed—varies based on how electricity is generated and demanded (for example, coal versus solar). With better visibility into when the grid is cleaner, individuals and organizations can shift their energy consumption to lower-carbon periods and lower prices. This enables everyday energy optimizations without compromising comfort or productivity.

By improving CarbonCast’s accessibility and functionality, we are helping people and institutions answer questions like:

When is the best time to charge my EV to reduce environmental impact?
Can I run my energy-hungry server jobs when the electricity is cheaper?
How do I actually reduce my emissions without guessing?

By providing clear, accurate forecasts of carbon intensity, CarbonCast can help users make informed decisions to optimize their energy footprint and reduce emissions without sacrificing convenience or productivity.

What I’m Building

The plan for this summer is to develop the backend API services for CarbonCast. This summer, I’m focused on two major goals:

Geographical Expansion

I am extending CarbonCast’s compatibility to support more regional electricity grids. Each model will be customized for local grid behavior and renewable energy characteristics. This involves tuning the model pipeline to adapt to each region’s energy mix, weather patterns, and reporting granularity.

System Refactoring and Modularity

The original CarbonCast system was built as a research artifact. To refine it into production-grade infrastructure, I am refactoring the codebase to improve modularity. This makes it easier to plug in new regions, update forecasting algorithms, and integrate new data sources.

Impact Beyond Research

The paper that inspired this project, Multi-day Forecasting of Electric Grid Carbon Intensity using Machine Learning, pioneered the idea of forecasting carbon intensity over multiple days using a hierarchical machine learning model. This goes beyond the typical 24-hour day-ahead models that are common in the industry and allows for better planning and longer-term decision-making.

CarbonCast builds directly on that foundation by transforming research into practical, real-world infrastructure. It is an open-source library that anyone can run, contribute to, and benefit from. Whether you’re a developer building carbon-aware applications, a policymaker working on grid decarbonization strategies, or a sustainability-conscious individual looking to reduce your carbon footprint, CarbonCast provides the tools to make informed, impactful choices.

Looking Ahead

I am excited to contribute to a project that blends machine learning, systems engineering, sustainability, and public impact. My goal is to help make it easier for everyone to see, understand, and act on their carbon footprint while also providing the “visibility” people need to take meaningful, informed actions.

Develop a clean and intuitive web-based interface for WildberryEye

Sun, 15 Jun 2025 00:00:00 +0000

As part of the WildberryEye, my proposal under the mentorship of Isaac Espinosa aims to develop a clean, intuitive, and responsive web-based interface to support real-time pollinator detection, data visualization, and system configuration.

WildberryEye leverages edge computing (Raspberry Pi 5) and object detection (YOLO) to monitor pollinators like bees and hummingbirds. The expectations for this project focuse on developing a full-stack web interface to support real-time pollinator detection, data visualization, and system configuration. The whole development also include the real-time data extraction from the Raspberry Pi 5). The final result empowers researchers and contributors to engage with environmental data in an accessible and meaningful way.

Into the VR-Verse: My GSoC Adventure Begins!

Sun, 15 Jun 2025 00:00:00 +0000

Hello! I’m Kajal Jotwani, an undergraduate Computer Science student from India who is passionate about building creative, interactive technologies and contributing to open source. This summer, as part of Google Summer of Code 2025, I will be working on the Brahma / Allocentric WebXR Interfaces project under the mentorship of Samir Ghosh. You can read my complete proposal here.

This project focuses on creating a formalized framework for building collaborative and cross-platform WebXR-based experiences. As part of its first public release of Brahma- a lightweight open-source toolkit, our goal is to formalize the framework, create documentation, and implement example applications like multi-user games and scientific visualizations. This will help make Brahma extensible and accessible for a wider developer community.

I’m excited to be working on this project and will be documenting my journey, learnings, and progress here throughout the summer.

Introducing Scenic-RoboSuite Interface

Sun, 15 Jun 2025 00:00:00 +0000

Hey! I’m Sahil, working on integrating Scenic with RoboSuite for GSoC 2025. My project is mentored by Daniel Fremont and Eric Vin .

I’m connecting Scenic (a probabilistic programming language for scenarios) with RoboSuite (a robotics simulation framework). Basically, you write simple scenario descriptions and get complex 3D robot simulations automatically.

Currently, as I’m building things and learning how Scenic works, I have been able to get the basic skeleton for the simulator interface working. I’ve implemented the simulator class and built a world model that can translate Scenic objects into RoboSuite’s simulator (which is MuJoCo-based). The interface now handles precise object placement in the world pretty well.

One of the trickier parts was figuring out the translation logic between Scenic and RoboSuite. I managed to overcome this by building a system that automatically detects the shape of objects when moving between the two frameworks, which lays a foundation for more complex object mapping later on.

I’ve also built some basic example scenarios to run and test with. Currently working on more complex examples and testing Scenic’s features like probabilistic object placement, constraint satisfaction, and spatial relationships between objects.

In summary, the “Scenic to RoboSuite” part of the interface is pretty much done. For next week, I need to work on the “RoboSuite to Scenic” part - basically getting feedback and state information flowing back from the simulation. Achieving this will make a complete bridge and give us a working simulator interface, which is the first major milestone for the project.

Improving AI Data Pipelines in AIDRIN: A Privacy-Centric and Multimodal Expansion

Thu, 12 Jun 2025 00:00:00 +0000

⏱️ Reading time: 4–5 minutes

Hi 👋

I’m Harish Balaji, a Master’s student at NYU with a focus on Artificial Intelligence, Machine Learning, and Cybersecurity. I’m especially interested in building scalable systems that reflect responsible AI principles. For me, data quality isn’t just a technical detail. It’s a foundational aspect of building models that are reliable, fair, and reproducible in the real world.

This summer, I’m contributing to AIDRIN (AI Data Readiness Inspector) as part of Google Summer of Code 2025. I’m grateful to be working under the mentorship of Dr. Jean Luca Bez and Prof. Suren Byna from the Scientific Data Management Group at Lawrence Berkeley National Laboratory (LBNL).

AIDRIN is an open-source framework that helps researchers and practitioners evaluate whether a dataset is truly ready to be used in production-level AI workflows. From fairness to privacy, it provides a structured lens through which we can understand the strengths and gaps in our data.

Why this work matters

In machine learning, one principle always holds true:

“Garbage in, garbage out.”

Even the most advanced models can underperform or amplify harmful biases if trained on incomplete, imbalanced, or poorly understood data. This is where AIDRIN steps in. It provides practical tools to assess datasets across key dimensions like privacy, fairness, class balance, interpretability, and support for multiple modalities.

By making these characteristics measurable and transparent, AIDRIN empowers teams to make informed decisions early in the pipeline. It helps ensure that datasets are not only large or complex, but also trustworthy, representative, and purpose-fit.

My focus this summer

As part of my GSoC 2025 project, I’ll be focusing on extending AIDRIN’s evaluation capabilities. A big part of this involves strengthening its support for privacy metrics and designing tools that can handle non-tabular datasets, such as image-based data.

The goal is to expand AIDRIN’s reach without compromising on interpretability or ease of use. More technical insights and updates will follow in the next posts as the summer progresses.

What comes next

As the AI community continues to evolve, there’s a growing shift toward data-centric practices. I believe frameworks like AIDRIN are essential for helping us move beyond the question of “Does the model work?” toward a deeper and more meaningful one: “Was the data ready in the first place?”

Over the next few weeks, I’ll be working on development, testing, and integration. I’m excited to contribute to a tool that emphasizes transparency and reproducibility across the AI lifecycle, and to share lessons and ideas with others who care about responsible AI.

If you’re exploring similar challenges or working in the space of dataset evaluation and readiness, I’d love to connect and exchange thoughts. You can also read my full GSoC 2025 proposal below for more context around the project scope and vision:

👉 Read my GSoC 2025 proposal here

This is the first in a 3-part blog series documenting my GSoC journey with AIDRIN. Stay tuned for technical updates and behind-the-scenes insights as the summer unfolds!

LLMSeqRec: LLM Enhanced Contextual Sequential Recommender

Fri, 06 Jun 2025 10:15:56 -0700

Project Description

Sequential Recommender Systems are widely used in scientific and business applications to analyze and predict patterns over time. In biology and ecology, they help track species behavior by suggesting related research on migration patterns and environmental changes. Medical applications include personalized treatment recommendations based on patient history and predicting disease progression. In physics and engineering, these systems optimize experimental setups by suggesting relevant past experiments or simulations. Environmental and climate science applications include forecasting climate trends and recommending datasets for monitoring deforestation or pollution. In business and e-commerce, sequential recommenders enhance user experiences by predicting consumer behavior, suggesting personalized products, and optimizing marketing strategies based on browsing and purchase history. By leveraging sequential dependencies, these recommender systems enhance research efficiency, knowledge discovery, and business decision-making across various domains. Traditional sequential recommendation systems rely on historical user interactions to predict future preferences, but they often struggle with capturing complex contextual dependencies and adapting to dynamic user behaviors. Existing models primarily use predefined embeddings and handcrafted features, limiting their ability to generalize across diverse recommendation scenarios. To address these challenges, we propose LLM Enhanced Contextual Sequential Recommender (LLMSeqRec), which leverages Large Language Models (LLMs) to enrich sequential recommendations with deep contextual understanding and adaptive reasoning. By integrating LLM-generated embeddings and contextual representations, LLMSeqRec enhances user intent modeling, cold-start recommendations, and long-range dependencies in sequential data. Unlike traditional models that rely solely on structured interaction logs, LLMSeqRec dynamically interprets and augments sequences with semantic context, leading to more accurate and personalized recommendations. This fusion of LLM intelligence with sequential modeling enables a more scalable, adaptable, and explainable recommender system, bridging the gap between traditional sequence-based approaches and advanced AI-driven recommendations.

Project Objectives

Aligned with the vision of the 2025 Open Source Research Experience (OSRE), this project aims to develop an LLM-Enhanced Contextual Sequential Recommender (LLMSeqRec) to improve sequential recommendation accuracy across various scientific and business applications. Sequential recommender systems are widely used to analyze and predict patterns over time, assisting in fields such as biology, ecology, medicine, physics, engineering, environmental science, and e-commerce. However, traditional models often struggle with capturing complex contextual dependencies and adapting to dynamic user behaviors, as they primarily rely on vanilla sequential Id orders. To address these limitations, this project will leverage Large Language Models (LLMs) to enhance context-aware sequential recommendations by dynamically integrating LLM-generated embeddings and contextual representations. The core challenge lies in designing LLMSeqRec, a unified and scalable model capable of enriching user intent modeling, mitigating cold-start issues, and capturing long-range dependencies within sequential data. Unlike conventional systems that rely solely on structured interaction logs, LLMSeqRec will interpret and augment sequences with semantic context, resulting in more accurate, adaptable, and explainable recommendations. Below is an outline of the methodologies and models that will be developed in this project:

Step 1: Data Preprocessing & Feature Creation: Develop a data processing pipeline to parse user’s sequential interaction behaviors into sequential data points for LLM-based embeddings and contextual sequential transformer modeling; Extract user behavior sequences, items’ metadata, and temporal patterns to create context-aware sequential representations for training, validation and testing; The data source can be from Amazon open public data or Movie Lense data set. The data points creation can follow SASRec (in the reference 1).
Step 2: Model Development: Design and implement LLM-enhanced sequential recommendation models, integrating pretrained language models to augment user-item interactions with semantic context; Develop an adaptive mechanism to incorporate external contextual signals, such as product descriptions, reviews into the sequential recommendation process; The baseline model can be SASRec pytorch implementation.
Step 3: Evaluation: : Benchmark LLMSeqRec against state-of-the-art sequential recommenders, evaluating on accuracy, NDCG and cold-start performance; Conduct ablation studies to analyze the impact of LLM-generated embeddings on recommendation quality; Optimize model inference speed and efficiency for real-time recommendation scenarios.

Project Deliverables

This project will deliver three components, software, model training, validation and performance evaluation and demo. The software which implements the above LLMSeqRec model will be hosted on the github repo as open-access repositories. The evaluation results and demo will be published along the github repo .

LLMSeqRec

Topics: LLM Enhanced Contextual Sequential Recommender
Skills: Proficiency in Python, Pytorch, Github, Self-attention, Transformer
Difficulty: Difficult
Size: Large (350 hours)
Mentor: Linsey Pang, Bin Dong

References:

Self-Attentive Sequential Recommendation (SASRec)
BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Amazon Dataset: https://cseweb.ucsd.edu/~jmcauley/datasets.html#amazon_reviews
Movie Lense Data: https://grouplens.org/datasets/movielens/

Introduction

I’m Connor, a student at NYU studying CS and Math. This summer I’ve gotten the opportunity to work on LLMSeqRec under Dr. Bin Dong and Dr. Linsey Pang.

In today’s digital age, sequential recommender systems power everything from e-commerce suggestions to personalized content everywhere. However, traditional models fall short in capturing user intent, adapting to dynamic behavior, or tackling cold-start problems. That’s where LLMSeqRec comes in.

Problem Statement

Most sequential recommender systems rely heavily on historical user-item interactions and predefined embeddings. This approach limits their ability to understand nuanced user preferences, struggles to scale across domains, and performs poorly in scenarios like new users or sparse data. The absence of semantic and contextual modeling is a major gap in current solutions.

Overview of project

LLMSeqRec is a novel, LLM-enhanced sequential recommender framework that bridges this gap. By leveraging large language models (LLMs), it incorporates semantic embeddings and prompt-based contextual modeling to understand both user behavior and item metadata at a deeper level. The system explores two core approaches:

Embedding-based: LLMs generate embeddings from item attributes.
Prompt-based: LLMs receive full transaction history in natural language format and infer recommendations.

These techniques are tested using well-known datasets (e.g., Amazon, MovieLens), and evaluated with ranking metrics like NDCG@10 and Hit@10. The goal: deliver more accurate, context-rich, and explainable recommendations.

Next Steps

The project is currently progressing through stages including model training, embedding integration, and evaluation. Upcoming tasks include:

Fine-tuning enhanced models
Designing zero-/few-shot prompts
Running comparative experiments
Publishing findings and writing technical blogs

As part of the LLMSeqRec my proposal under the mentorship of Dr. Bin Dong and Dr. Linsey Pang.

Understanding Skin-Tone based Bias in Text-to-Image Models Using Stable Diffusion

Tue, 27 May 2025 00:00:00 +0000

This project investigates skin tone bias in text-to-image generation by analyzing the output of Stable Diffusion models when prompted with socially and occupationally descriptive text. Despite the growing popularity of generative models like Stable Diffusion, little has been done to evaluate how these models reproduce or amplify visual bias—especially related to skin tone, perceived race, and social class—based solely on textual prompts.

This work builds on prior studies of bias in large language models (LLMs) and vision-language models (VLMs), and aims to explore how biases manifest visually, without explicitly specifying race or ethnicity in the input prompt. Our approach combines systematic prompt generation, model-based image creation, and skin tone quantification to assess disparities across generated samples.

The ultimate goal is to develop a reproducible evaluation pipeline, visualize disparities across demographic and occupational prompts, and explore strategies to mitigate representational harms in generative models.

Our goal is to create a reproducible pipeline for:

Generating images from prompts
Annotating or analyzing them using computer vision tools
Measuring bias across categories like skin tone, gender presentation, or status markers

Project webpage: https://github.com/marzianizam/ucsc-ospo.github.io/tree/main/content/project/osre25/UCSC/FairFace

Project Idea: Measuring Bias in AI-Generated Portraits

Topics: Responsible AI, Generative Models, Ethics in AI
Skills: Python, PyTorch, Stable Diffusion, Prompt Engineering, Data Analysis
Difficulty: Medium
Size: 350 hours
Mentors:
- Marzia Binta Nizam (mailto:manizam@ucsc.edu)
- Professor James Davis (mailto:davisje@ucsc.edu)

Background

Recent research has shown that text-to-image models can perpetuate racial and gender stereotypes through visual output. For instance, prompts like “CEO” or “nurse” often produce racially skewed results even when no explicit race or demographic cues are provided. This project examines whether similar disparities exist along skin tone dimensions, focusing on subtle biases rather than overt stereotypes.

The key challenge is that visual bias is not always easy to measure. This project addresses this issue by utilizing melanin-level quantification, a continuous and interpretable proxy for skin tone, in conjunction with consistent prompt templating and multi-sample averaging to ensure statistical rigor.

Objectives

Generate datasets using consistent prompts (e.g., “A portrait of a doctor”, “A homeless person”, etc.)
Use Stable Diffusion (and optionally, other models like DALL·E or Midjourney) to generate diverse image sets
Measure bias across demographic and occupational categories using image processing tools
Visualize the distribution of melanin values and facial features across samples
Explore prompt-level mitigation strategies to improve fairness in output

Deliverables

Open-source codebase for prompt generation and image evaluation
Statistical analysis of visual bias trends
Blog post or visual explainer on findings
Final report and recommendations on prompt engineering or model constraints

UC Open Source Repository Browser

Mon, 03 Mar 2025 13:00:00 -0800

The University of California Open Source Repository Browser (UC ORB) is a discovery tool designed to map and classify open source projects across the UC system. This project is a collaboration with the UC Network of Open Source Program Offices (OSPOs), which brings together six UC campuses (Santa Cruz, Berkeley, Davis, Los Angeles, Santa Barbara, and San Diego) to support open source research, promote sustainability, and establish best practices within academic environments.

By providing a centralized platform, UC ORB enhances the visibility of UC’s open source contributions, fosters collaboration among researchers and developers, and serves as a model for other institutions aiming to improve open source discovery and sustainability.

This project focuses on building the web application for UC ORB, which will serve as the primary interface for users to explore and interact with UC’s open source projects. The student will work on developing a clean, user-friendly, and scalable web application.

Develop the UC ORB Application

Topics: Web development
Skills: Experience in Python and at least one Python-based web framework (e.g., Flask, Django, FastAPI), experience with front-end technologies (React, HTML, CSS, JavaScript), familiarity with Git and collaborative development workflows, familiarity with database interaction (SQL).
Difficulty: Moderate
Size: Large (350 hours)
Mentors: Juanita Gomez

Develop a web application that serves as the front-end interface for the UC ORB. The application will allow users to browse, search, and explore open source projects across the UC system. The project will involve integrating with the repository database to fetch and display repository data, designing an intuitive user interface, and ensuring the application is scalable and maintainable.

Specific Tasks:

Choose an appropriate Python-based web framework (e.g., Flask, Django, or FastAPI) for the backend and set up the basic structure of the application.
Develop a responsive and user-friendly front-end interface ensuring that it is accessible and works well on both desktop and mobile devices.
Add search functionality to allow users to find projects by keywords, tags, or other metadata.
Implement filtering options to narrow down search results (e.g., by campus, topic, or programming language).
Deploy the application to a cloud platform (e.g., AWS, or Google Cloud) or GitHub Pages (GitHub.io) for public access.
Create developer documentation that explains the application’s architecture, setup instructions, and contribution guidelines.
Write a short user manual to help end-users browse and use the web application effectively.

FairFace

Fri, 28 Feb 2025 00:00:00 +0000

FairFace: Reproducible Bias Evaluation in Facial AI Models via Controlled Skin Tone Manipulation

Bias in facial AI models remains a persistent issue, particularly concerning skin tone disparities. Many studies report that AI models perform differently on lighter vs. darker skin tones, but these findings are often difficult to reproduce due to variations in datasets, model architectures, and evaluation settings. The goal of this project is to investigate bias in facial AI models by manipulating skin tone and related properties in a controlled, reproducible manner. By leveraging BioSkin, we will adjust melanin levels and other skin properties on existing human datasets to assess whether face-based AI models (e.g., classification and vision-language models) exhibit biased behavior toward specific skin tones.

Topics: Fairness & Bias in AI, Face Recognition & Vision-Language Models, Dataset Augmentation for Reproducibility
Skills: Machine Learning & Computer Vision, Deep Learning (PyTorch/TensorFlow), Data Augmentation & Image Processing, Reproducibility & Documentation (GitHub, Jupyter Notebooks).
Difficulty: Moderate
Size: Medium or Large ( Can be completed in either 175 or 350 hours, depending on the depth of analysis and number of models tested.)
Mentors: James Davis, Alex Pang

Key Research Questions

Do AI models perform differently based on skin tone?
- How do classification accuracy, confidence scores, and error rates change when skin tone is altered systematically?
What are the underlying causes of bias?
- Is bias solely dependent on skin tone, or do other skin-related properties (e.g., texture, reflectance) contribute to model predictions?
- Is bias driven by dataset imbalances (e.g., underrepresentation of certain skin tones)?
- Do facial features beyond skin tone (e.g., structure, expression, pose) contribute to biased predictions?
Are bias trends reproducible?
- Can we replicate bias patterns across different datasets, model architectures, and experimental setups?
- How consistent are the findings when varying image sources and preprocessing methods?

Specific Tasks:

Dataset Selection & Preprocessing
- Choose appropriate face/human datasets (e.g., FairFace, CelebA, COCO-Human).
- Preprocess images to ensure consistent lighting, pose, and resolution before applying transformations.
Skin Tone Manipulation with BioSkin
- Systematically modify melanin levels while keeping facial features unchanged.
- Generate multiple variations per image (lighter to darker skin tones).
Model Evaluation & Bias Analysis
- Test face classification models (e.g., ResNet, FaceNet) and vision-language models (e.g., BLIP, LLaVA) on the modified images.
- Compute fairness metrics (e.g., demographic parity, equalized odds).
Investigate Underlying Causes of Bias
- Compare model behavior across different feature sets.
- Test whether bias persists across multiple datasets and model architectures.
Ensure Reproducibility
- Develop an open-source pipeline for others to replicate bias evaluations.
- Provide codebase and detailed documentation for reproducibility.

ReasonWorld

Fri, 28 Feb 2025 00:00:00 +0000

ReasonWorld: Real-World Reasoning with a Long-Term World Model

A world model is essentially an internal representation of an environment that an AI system would construct based on external information to plan, reason, and interpret its surroundings. It stores the system’s understanding of relevant objects, spatial relationships, and/or states in the environment. Recent augmented reality (AR) and wearable technologies like Meta Aria glasses provide an opportunity to gather rich information from the real world in the form of vision, audio, and spatial data. Along with this, large language (LLM), vision language models (VLMs), and general machine learning algorithms have enabled nuanced understanding and processing of multimodal inputs that can label, summarize, and analyze experiences.

With ReasonWorld, we aim to utilize these technologies to enable advanced reasoning about important objects/events/spaces in real-world environments in a structured manner. With the help of wearable AR technology, the system would be able to capture real-world multimodal data. We aim to utilize this information to create a long-memory modeling toolkit that would support features like:

Longitudinal and structured data logging: Capture and storing of multimodal data (image, video, audio, location coordinates etc.)
Semantic summarization: Automatic scene labeling via LLMs/VLMs to identify key elements in the surroundings
Efficient retrieval: For querying and revisiting past experiences and answering questions like “Where have I seen this painting before?”
Adaptability: Continuously refining and understanding the environment and/or relationships between objects/locations.
Adaptive memory prioritization: Where the pipeline can assess the contextual significance of the captured data and retrieve those that are the most significant. The model retains meaningful, structured representations rather than raw, unfiltered data.

This real-world reasoning framework with a long-term world model can function as a structured search engine for important objects and spaces, enabling:

Recognizing and tracking significant objects, locations, and events
Supporting spatial understanding and contextual analysis
Facilitating structured documentation of environments and changes over time

Alignment with Summer of Reproducibility:

Core pipeline for AR data ingestion, event segmentation, summarization, and indexing (knowledge graph or vector database) would be made open-source.
Clear documentation of each module and how they collaborate with one another
The project could be tested with standardized datasets, simulated environments as well as controlled real-world scenarios, promoting reproducibility
Opportunities for Innovation - A transparent, modular approach invites a broad community to propose novel expansions

Specific Tasks:

A pipeline for real-time/batch ingestion of data with the wearable AR device and cleaning
Have an event segmentation module to classify whether the current object/event is contextually significant, filtering out the less relevant observations.
Have VLMs/LLMs summarize the events with the vision/audio/location data to be stored and retrieved later by structured data structures like knowledge graph, vector databases etc.
Storage optimization with prioritizing important objects and spaces, optimizing storage based on contextual significance and frequency of access.
Implement key information retrieval mechanisms
Ensure reproducibility by providing datasets and scripts

ReasonWorld

Topics: Augmented reality Multimodal learning Computer vision for AR LLM/VLM Efficient data indexing
Skills: Machine Learning and AI, Augmented Reality and Hardware integration, Data Engineering & Storage Optimization
Difficulty: Hard
Size: Large (350 hours)
Mentors: James Davis, Alex Pang

AI for Science: Automating Domain Specific Tasks with Large Language Models

Sun, 23 Feb 2025 21:30:56 -0800

Recent advancements in Large Language Models (LLMs) have transformed various fields by demonstrating remarkable capabilities in processing and generating human-like text. This project aims to explore the development of an open-source framework that leverages LLMs to enhance discovery across specialized domains.

The proposed framework will enable LLMs to analyze and interpret complex datasets, automate routine tasks, and uncover novel insights. A key focus will be on equipping LLMs with domain-specific expertise, particularly in areas where specialized tools – such as ANDES – are not widely integrated with LLM-based solutions. By bridging this gap, the framework will empower researchers and professionals to harness LLMs as intelligent assistants capable of navigating and utilizing niche computational tools effectively.

AI for Science: Automating Domain Specific Tasks with Large Language Models

Topics: Large Language Models AI for Science
Skills: Python, Experience with LLMs, Prompt Engineering, Fine-Tuning, LLM Frameworks
Difficulty: Medium-Difficult
Size: Large (350 hours)
Mentor: [Daniel Wong]Daniel Wong, [Luanzheng “Lenny” Guo]Luanzheng "Lenny" Guo

Project Tasks and Milestones

Designing an extensible framework that facilitates the integration of LLMs with specialized software and datasets.
Developing methodologies for fine-tuning LLMs to act as domain experts.
Implementing strategies for improving tool interoperability, allowing LLMs to interact seamlessly with less commonly used but critical analytical platforms.

Exploration of I/O Reproducibility with HDF5

Wed, 19 Feb 2025 09:00:00 -0700

Parallel I/O is a critical component in high-performance computing (HPC), allowing multiple processes to read and write data concurrently from a shared storage system. HDF5—a widely adopted data model and library for managing complex scientific data—supports parallel I/O but introduces challenges in I/O reproducibility, where repeated executions do not always produce identical results. This lack of reproducibility can stem from non-deterministic execution orders, variations in collective buffering strategies, and race conditions in metadata and dataset chunking operations within HDF5’s parallel I/O hierarchy. Moreover, many HDF5 operations that leverage MPI I/O require collective communication; that is, all processes within a communicator must participate in operations such as metadata creation, chunk allocation, and data aggregation. These collective calls ensure that the file structure and data layout remain consistent across processes, but they also introduce additional synchronization complexity that can impact reproducibility if not properly managed. In HPC scientific workflows, consistent I/O reproducibility is essential for accurate debugging, validation, and benchmarking, ensuring that scientific results are both verifiable and trustworthy. Tools such as h5bench—a suite of I/O kernels designed to exercise HDF5 I/O on parallel file systems—play an important role in identifying these reproducibility challenges, tuning performance, and ultimately supporting the overall robustness of large-scale scientific applications.

Workplan

The proposed work will include (1) analyzing and characterizing parallel I/O operations in HDF5 with h5bench miniapps, (2) exploring and validating potential reproducibility challenges within the parallel I/O hierarchy (e.g., MPI I/O), and (3) implementing solutions to address parallel I/O reproducibility.

Topics: Parallel I/O MPI-I/O Reproducibility HPC HDF5
Skills: C/C++, Python
Difficulty: Medium
Size: Large (350 hours)
Mentors: Luanzheng "Lenny" Guo and [Wei Zhang]Wei Zhang

AR4VIP

Tue, 18 Feb 2025 00:00:00 +0000

We are interested in developing navigation aids for visually impaired people (VIP) using AR/VR technologies. Our intended use is primarily indoors or outdoors but within private confines e.g. person’s backyard. Using AR/VR headsets or smart glasses allows navigation without using a cane and frees the users’ hands for other tasks.

Continue Development on Meta Quest 3 Headset

Topics: Dynamic scenes Spatial audio Proximity detection
Skills: AR/VR familiarity, WebXR, Unity, SLAM, good communicator, good documentation skills
Difficulty: Moderate
Size: Medium or large (175 or 350 hours)
Mentors: Alex Pang, James Davis

Continue development and field testing with the Meta Quest 3 headset. See this repository page for current status.

Specific tasks:

Improve spatial audio mapping
Improve obstacle detection, at different heights, with pre-scanned geometry as well as dynamic objects e.g. other people, pets, doors
Special handling of hazards e.g. stairs, uneven floors, etc.
Explore/incorporate AI to help identify objects in the scene when requested by user

New Development on Smart Glasses

Topics: Dynamic scenes Spatial audio Proximity detection
Skills: AR/VR familiarity, WebXR, Unity, SLAM, good communicator, good documentation skills
Difficulty: Moderate
Size: Large (350 hours)
Mentors: Alex Pang, James Davis

VR headsets are bulky and awkward, but currently is more advanced than AR glasses in terms of programmability. Ultimately, the form factor of smart glasses is more practical for extended use by our target users. There are many vendors working on pushing out their version of smart glasses targetting various applications e.g. alternative for watching TV, etc. We are interested in those that provide capabilities to support spatial computing. Most of these will likely have their own brand specific APIs. This project has 2 goals: (a) develop generic brand-independent API, perhaps extensions to WebXR, to support overarching goal of navigation aid for VIP, and (b) port functionality of VR version to smart glasses while taking advantage of smart glass functionalities and sensors.

Specific tasks:

Explore current and soon-to-be-available smart glass options e.g. Snap Spectacles, Xreal Air 2 ultra, etc. and select a platform to work on (subject to cost and availability of SDK). At a minimum, glass should be microphones and speakers, and cameras. Infrared cameras or other low light capability is a plus. Sufficient battery life or option for quick exchange.
Identify support provided by SDK e.g. does it do realtime scene reconstruction? does it support spatial audio? etc. If it supports features outside of WebXR, provide generic hooks to improve portability of code to other smart glasses.
Port and extend functionalities from the Meta Quest 3 VR headsets to smart glass platform.
Add AI support if glasses support them.
Provide documentation of work.

CarbonCast: Building an end-to-end consumption-based Carbon Intensity Forecasting service

Tue, 18 Feb 2025 00:00:00 +0000

CarbonCast is a machine-learning-based approach to provide multi-day forecasts of the electrical grid’s carbon intensity. Developed in Python, the current version of CarbonCast delivers accurate forecasts in numerous regions by using historical source production data of a particular geographical region, time of day/year, and weather forecasts as features. However, there is no easy way to access and visualize the data through a standard interface. In addition, much important information is left out and is not available to users. For instance, electricity grids often import electricity from neighboring regions and so electricity consumption depends on both electricity generation and imports. Moreover, it is imperative for each energy source to utilize a tailored predictive mechanism. Consequently, any carbon optimization solution trying to reduce carbon emissions due to its electricity consumption will benefit more from following a consumption-based CI signal.

The plan for this project is to develop both the frontend and the backend API services for CarbonCast. We also intend to enhance CarbonCast by implementing an architecture wherein each region can employ a distinct interface for their predictive modeling. In scenarios where these new models do not yield superior outcomes within a region, the current architecture will serve as a fallback solution.

Building an end-to-end consumption-based Carbon Intensity Forecasting service

Topics: Databases Machine Learning
Skills: Python, command line (bash), MySQL, Django, machine learning, cronjob
Difficulty: Moderate
Size: Medium (175 hours)
Mentors: Abel Souza

Develop a containerized end-to-end backend, API, and frontend for collecting, estimating, and visualizing real-time and forecast electrical grid’s carbon intensity data in a scalable manner.

Tasks:

Research web technologies and frameworks relevant to CarbonCast development.
Run and collect CarbonCast’s data (CSV)
Ingest CSV into a MySQL or SQLite database
Develop an Application Programming Interface (API) and a Web User Interface (UI) to provide real-time data access and visualization.
Deploy the CarbonCast API as a service and dockerize it so that other users and applications can locally deploy and use it easily.
Implement a choropleth web map to visualize the carbon intensity data across the different geographical regions supported by CarbonCast.
Enhance CarbonCast by implementing an extensible architecture wherein every region can employ distinct models for their predictive modeling.

Vector Embeddings Dataset

Tue, 11 Feb 2025 13:00:00 -0800

Vector Embeddings Dataset

Topics: Vector Embeddings LLMs Transformers
Skills: software development, apis, scripting, python
Difficulty: Moderate
Size: Medium or Large (175 or 350 hours)
Mentors: Jayjeet Chakraborty

To benchmark vector search algorithms (aka ANN algorithms), there are several datasets available but none of them represent actual real world workloads. This is because they usually have small vectors of only a few hundred dimensions. For vector search experiments to represent real world workloads, we want to have datasets with several thousand dimensions like what is generated by OpenAIs text-embedding models. This project aims to create a dataset with 1B embeddings from a wikipedia dataset using open source models. Ideally, we will have 3 versions of this dataset, with 1024, 4096, and 8192 sized embeddings to start with.

Brahma

Tue, 11 Feb 2025 12:34:56 -0700

Brahma is a lightweight framework for building collaborative and cross platform WebXR based experiences using Three.js for the front-end and a simple Node.js/WebSocket script on the backend. It was created at the Social Emotional Technology Lab to facilitate the development of novel collaborative interfaces and virtual environments capable of loading scientific datasets. For example, in the featured image, multiple avatars are exploring a marine science dataset related to seal migration paths overlaid on NOAA bathymetry and telemetry data.

It addresses a gap where prior open-source collaborative VR is no longer available such as the defunct Mozilla Hubs or proprietary engine based frameworks such as Ubiq. Furthermore, it uses very little computational resources to run and develop, enabling creators who may not have a powerful computer to run a game engine in order to develop a networked VR application.

This project involves the first public release of Brahma– creating a lightweight open source framework that facilitates multi-user games, scientific visualizations and other applications. In order to do so, we need to formalize the framework, provide documentation, and implement key examples so that the open source tool can be extensible and serve a wider community.

Mentees can expect to learn best practices for VR development and testing and gain familiarity with full stack development practices. Mentees should have access and experience using a VR headset.

Brahma / Protoocol Release and Validation

Topics: Web Development Software Architecture VR Development Computer Graphics Cloud Platforms
Skills: Node.js, Three.js
Difficulty: Moderate-Challenging
Size: Large (350 hours)
Mentors: Samir Ghosh

The proposed work includes three phases, primarily working on backend code, and API design. In the first phase, to gain familiarity, the mentee will be running and testing the Brahma backend on a variety of cloud platforms such as AWS, Google Cloud, and Azure– and learning best methods for documentation in the process. Then, in the second phase, the mentee will work on formalizing the protocol for avatar embodiment and other multi-user interfaces, testing the application with a simple pong game. In the third phase, the mentee will address telemetry, logging, and analysis considerations.

This project is well suited for someone who has interest in virtual reality, especially social VR, multi-user, or collaborative applications

Brahma / Allocentric WebXR Interfaces

Topics: Web Development VR Development Computer Graphics UX/UI
Skills: Three.js, GLSL, WebSocket
Difficulty: Moderate-Challenging
Size: Medium or large (175 or 350 hours)
Mentors: Samir Ghosh

The proposed work primarily involves front-end code and VR interface design. In the first phase, the mentee will gain familiarity with best practices for WebXR development through the implementation and documentation of simple interaction patterns. Then, the mentee will implement a simple multi-user pong game to learn about allocentric interfaces. In the final phase of the project, the mentee will design and implement one or more allocentric interface of their choosing.

This project is well suited for someone who has interest in virtual reality, especially aspects of graphics and interaction design.

WildBerryEye

Tue, 11 Feb 2025 10:15:56 -0700

WildBerryEye leverages Raspberry Pi and YOLO object detection models to monitor pollinizers like bees and hummingbirds visiting flowers. This initiative aims to enhance environmental research by automating data collection and analysis of pollinator activities, which are crucial for ecological assessments and conservation efforts. The project utilizes video data provided by Dr. Rossana Maguiña, processed through advanced machine learning techniques to accurately identify and track pollinator interactions in natural habitats.

Develop web-based user interface

Topics: Full Stack Development React Flask
Skills: Experience with full stack development and real time processing
Difficulty: Moderate to Challenging
Size: Medium or large (175 or 350 hrs)
Mentors: Carlos Isaac Espinosa Ramirez

Develop a clean and intuitive web-based interface for WildBerryEye, ensuring ease of use for researchers and contributors. The platform should present real-time pollinator detection results, facilitate data visualization, and allow users to interact with system settings efficiently. The website must be accessible, visually appealing, and optimized for both desktop and mobile users, avoiding unnecessary complexity or intrusive elements.

Specific tasks:

Frontend Development: Continue development to enhance the user interface using React and CSS, ensuring a responsive and user-friendly design.
Backend Development: Expand functionality using Flask, focusing on efficient API endpoints and seamless interaction with the frontend (excluding database implementation).
Real-Time Communication: Implement and refine real-time updates between the frontend and backend to enhance system responsiveness.
Usability & Design Optimization: Research and propose improvements to the system’s usability, design, and overall user experience.

AI Data Readiness Inspector (AIDRIN)

Tue, 11 Feb 2025 10:15:00 -0700

Garbage In Garbage Out (GIGO) is a universally agreed quote by computer scientists from various domains, including Artificial Intelligence (AI). As data is the fuel for AI, models trained on low-quality, biased data are often ineffective. Computer scientists who use AI invest considerable time and effort in preparing the data for AI.

AIDRIN (AI Data Readiness INspector) is a framework that provides a quantifiable assessment of the readiness of data for AI processes, covering a broad range of readiness dimensions available in the literature. AIDRIN uses metrics in traditional data quality assessment, such as completeness, outliers, and duplicates, for data evaluation. Furthermore, AIDRIN uses metrics specific to assess data for AI, such as feature importance, feature correlations, class imbalance, fairness, privacy, and FAIR (Findability, Accessibility, Interoperability, and Reusability) principle compliance. AIDRIN provides visualizations and reports to assist data scientists in further investigating the readiness of data.

AIDRIN Visualizations and Science Gateway

The proposed work will include improvements in the AIDRIN framework to (1) enhance, extend, and optimize the visualizations of metrics related to all six pillars of AI data readiness and (2) set up a science gateway on NERSC or AWS cloud service.

Topics: data readiness AI
Skills: Python, C/C++, good communicator
Difficulty: Moderate
Size: Large (350 hours)
Mentors: Jean Luca Bez and Suren Byna

h5bench with AI workloads

Tue, 11 Feb 2025 10:15:00 -0700

h5bench is a suite of parallel I/O benchmarks or kernels representing I/O patterns that are commonly used in HDF5 applications on high performance computing systems. h5bench measures I/O performance from various aspects, including the I/O overhead, and observed I/O rate.

Parallel I/O is a critical technique for moving data between compute and storage subsystems of supercomputers. With massive amounts of data produced or consumed by compute nodes, high-performant parallel I/O is essential. I/O benchmarks play an important role in this process; however, there is a scarcity of I/O benchmarks representative of current workloads on HPC systems. Toward creating representative I/O kernels from real-world applications, we have created h5bench, a set of I/O kernels that exercise HDF5 I/O on parallel file systems in numerous dimensions. Our focus on HDF5 is due to the parallel I/O library’s heavy usage in various scientific applications running on supercomputing systems. The various tests benchmarked in the h5bench suite include I/O operations (read and write), data locality (arrays of basic data types and arrays of structures), array dimensionality (1D arrays, 2D meshes, 3D cubes), I/O modes (synchronous and asynchronous). h5bench measurements can be used to identify performance bottlenecks and their root causes and evaluate I/O optimizations. As the I/O patterns of h5bench are diverse and capture the I/O behaviors of various HPC applications, this study will be helpful to the broader supercomputing and I/O community.

h5bench with AI workloads

The proposed work will include (1) analyzing and characterizing AI workloads that rely on HDF5 datasets, (2) extracting a kernel of their I/O operations, and (3) implementing and validating the kernel in h5bench.

Topics: I/O HPC benchmarking
Skills: Python, C/C++, good communicator
Difficulty: Moderate
Size: Large (350 hours)
Mentors: Jean Luca Bez and Suren Byna

HAgent

Tue, 11 Feb 2025 00:00:00 +0000

HAgent is a platform to build AI hardware agent engine to support multiple components in chip design, such as code generation, verification, debugging, and tapeout.

HAgent is build as a compiler for for Hardware Agents, it interfaces with typical EDA tools like compilers, synthesis, and verification. There are several projects around enhancing HAgent.

BugFarm hagent step

Objective: Develop a HAgent step (pass) to create bugs in a given design.

Description: Using LLMs (Hagent APIs), the goal is to add “bugs” to input Verilog design. The goal is for other tools passes that need to fix bugs, to use this infrastructure as a bug generator. There is a MCY (https://github.com/YosysHQ/mcy) that does something similar but it does not use verilog and create a very different Verilog output. The BugFarm is supposed to have somewhat similar functionality but edit the Verilog directly which results in a code with just a few edits. Like MCY, there has to be a step to confirm that the change affects results. The project should benchmarks and compare with MCY.

Skills Needed: Python, Verilog, and understand agents
Difficulty: Medium
Size: Medium
Mentors: Jose Renau, Farzaneh Rabiei Kashanaki

HDEval Competition Repository

Objective: Create a platform for HDL programming challenges and community engagement.

Description: Develop a repository where users can solve HDL problems in Verilog, Chisel, PyRTL, etc. Implement a points system for successful solutions. Allow users to submit new problems (code, specifications, verification, and tests) that are not easily solvable by LLMs. Automate solution testing and provide feedback on submissions.

The submissions consist of 4 components: code, specification, verification, and tests. It should be possible to submit also examples of bugs in code/specification/verification/tests during the design.

If the code is different from Verilog, it should include the HDL (chisel, PyRTL,…) and also the Verilog.

The specification is free form. For any given specification, an expert on the area should be able to generate code, verification, and tests. Similarly, from any pair. Any expert should be able to generate the rest. For example, from verification and tests, it should be able to generate the code and specification.

Typical specifications consist of a plan, API, and a sample usage.

Skills Needed: Web design, some hardware understanding
Difficulty: Medium
Size: Medium
Mentors: Jose Renau, Farzaneh Rabiei Kashanaki

Integrate Silicon Compiler

Objective: Silicon Compiler is an open-source Python library that allows to interface with many EDA tools. The idea is to integrate it with HAgent to allow prompts/queries to interface with it.

Description: The agentic component requires to check with silicon compiler that the generated Python compiles but also that has reasonable parameters. This will require a react loop for compiler errors, and likely a judge loop for testing for reasonable options/flow with feedback from execution. Since there is not much training examples, it will require a few shot with a database to populate context accordingly.

The end result should allow to select different tools and options trhough silicon compiler.

Skills Needed: Backend chip design
Difficulty: High
Size: Medium
Mentors: Jose Renau

Comodore 64 or MSX or Gameboy

Objective: Create a prompt-only specification to build a hardware accelerated for the target platform (Comodore 64, MSX or Gameboy). The generated code should focus on Verilog, but it is fine to also target some other HDL. In all the cases, the project should include a generated Verilog integrated with some emulator for verification.

Description: Using Hagent, create an HDLEval benchmark (set of prompts) that provide the necessary information to create the Verilog implementation. HDLEval prompts usually consists of a high-level PLAN or specification, an API to implement, and a few examples of usage for the given API.

The result of running the bencharmk, a generated Verilog runs program in the emulator and the Verilog to compare correctness. The platform should have an already existing emulator vice-emu or mGBA to perform cosimulation against the generated specification.

Skills Needed: Verilog for front-end design
Difficulty: High
Size: Large (175 or 350 hours)
Mentors: Jose Renau

Scenic: A Language for Design and Verification of Autonomous Cyber-Physical Systems

Tue, 11 Feb 2025 00:00:00 +0000

3D Driving Scenarios
A Library for Aviation Scenarios
Interfacing Scenic to new simulators
Optimizing and parallelizing Scenic
Improvements and infrastructure for the VerifAI toolkit

See the sections below for details.

3D Driving Scenarios

Topics: Autonomous Driving 3D modeling
Skills: Python; basic vector geometry
Difficulty: Moderate
Size: Medium or Large (175 or 350 hours)
Mentors: Daniel Fremont, Eric Vin

Scenic scenarios written to test autonomous vehicles use the driving domain, a Scenic library defining driving-specific concepts including cars, pedestrians, roads, lanes, and intersections. The library extracts information about road networks, such as the shapes of lanes, from files in the standard OpenDRIVE format. Currently, we only generate 2D polygons for lanes, throwing away 3D information. While this suffices for many driving scenarios, it means we cannot properly model overpasses (the roads appear to overlap) or test driving scenarios where 3D geometry is important, such as hilly terrain.

The goals of this project are to extend our road network library to generate 3D meshes (instead of 2D polygons) for roads, write new Scenic scenarios which use this new capability, and (if time allows) test autonomous driving software using them.

A Library for Aviation Scenarios

Topics: Autonomous Aircraft
Skills: Python; ideally some aviation experience
Difficulty: Moderate
Size: Medium or Large (175 or 350 hours)
Mentors: Daniel Fremont, Eric Vin

We have used Scenic to find, diagnose, and fix bugs in software for autonomous aircraft: in particular, this paper studied a neural network-based automated taxiing system using the X-Plane flight simulator. We also have prototype interfaces to AirSim and Microsoft Flight Simulator. However, our experiments so far have mainly focused on simple scenarios involving a single aircraft.

The goal of this project is to develop an aviation library for Scenic (like the driving domain mentioned in the previous project) which will allow users to create complex aviation scenarios in a simulator-agnostic way. The library would define concepts for aircraft, flight paths, weather, etc. and allow importing real-world data about these. The student would demonstrate the library’s functionality by writing some example scenarios and testing either simple aircraft controllers or (if time allows) ML-based flight software.

Interfacing Scenic to New Simulators

Topics: Simulation Autonomous Driving Robotics LLMs
Skills: Python
Difficulty: Moderate
Size: Medium or Large (175 or 350 hours)
Mentors: Daniel Fremont, Eric Vin

The AWSIM driving simulator (to allow testing the Autoware open-source autonomous driving software stack)
The CoppeliaSim robotics simulator
NVIDIA’s Cosmos, an LLM which generates videos from text prompts
NVIDIA’s Omniverse (various applications, e.g. simulating virtual factories)
Various simulators for which we have prototype interfaces that could be generalized and made more usable, including MuJoCo and Isaac Sim

Optimizing and Parallelizing Scenic

Topics: Optimization Parallelization
Skills: Python
Difficulty: Moderate
Size: Medium or Large (175 or 350 hours)
Mentors: Daniel Fremont, Eric Vin

Large-scale testing with Scenic, when one wants to generate thousands of simulations, can be very computationally-expensive. In some cases, the bottleneck is the simulator, and being able to easily run multiple simulations in parallel would greatly increase scalability. In others, Scenic itself spends substantial time trying to sample scenarios satisfying all the given constraints.

This project would explore a variety of approaches to speeding up scene and simulation generation in Scenic. Some possibilities include:

Parallelizing scene generation and simulation (e.g. using Ray)
Systematically profiling real-world Scenic programs to characterize the main bottlenecks and propose optimizations
JIT compiling Scenic’s internal sampling code (e.g. using Numba)

Improvements and Infrastructure for the VerifAI Toolkit

Topics: DevOps Documentation APIs
Skills: Python
Difficulty: Easy
Size: Medium or Large (175 or 350 hours)
Mentors: Daniel Fremont, Eric Vin

VerifAI is a toolkit for design and analysis of AI-based systems that builds on top of Scenic. It adds among other features the ability to perform falsification, intelligently searching for scenarios that will cause a system to behave in an undesirable way.

The goal of this project is to improve VerifAI’s development infrastructure, documentation, and ease of use, which are currently relatively poor compared to Scenic. Specific tasks could include:

Setting up continuous integration (CI) on GitHub
Creating processes to help users/developers submit issues and PRs and deal with them in a timely manner
Writing more documentation, including tutorials and examples (not only for end users of VerifAI but those wanting to develop custom falsification components, for example)
Refactoring VerifAI’s API to make it easier to use and extend

Smart Batching for Large Language Models

Sun, 09 Feb 2025 10:15:56 -0700

Sequence tokenization is a crucial step during Large Language Model training, fine-tuning, and inference. User prompts and training data are tokenized and zero-padded before being fed to the model in batches. This process allows models to interpret human language by breaking down complex sentences into simple token units that are numerically represented in a token set. However, the process of sequence padding for maintaining batch dimensions can introduce unnecessary overhead if batching is not properly done.

In this project, we introduce Smart Batching, where we dynamically batch sequences in a fine-tuning dataset by their respective lengths. With this method, we aim to minimize the amount of zero padding required during sequence batching, which can result in improved and efficient fine-tuning and inference speeds. We also analyze this method with other commonly used batching practices (Longest Sequence, Random Shuffling) on valuable metrics such as runtime and model accuracy.

Project Title

Topics: Large Language Models Fine-Tuning AI Transformers
Skills: Python, Pytorch, Large Language Models
Difficulty: Moderate
Size: Large (350 hours)
Mentor: [Daniel Wong]Daniel Wong, [Luanzheng “Lenny” Guo]Luanzheng "Lenny" Guo

Project Tasks and Milestones

Implement an open source smart batching framework based on HuggingFace to allow for dynamically grouping sequences of similar token lengths into batches
Analyze runtime, padding, and model accuracy with smart batching and other commonly used batching practices
Apply smart batching with distributed fine-tuning and observe large language model outputs

Disentangled Generation and Editing of Pathology Images

Fri, 07 Feb 2025 00:00:00 +0000

Topics: computational pathology, image generation, disentangled representations, latent space manipulation, deep learning
Skills:
- Programming Languages:
  - Proficient in Python, with experience in machine learning libraries such as PyTorch or TensorFlow.
- Generative Models:
  - Familiarity with Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and contrastive learning methods.
- Data Analysis:
  - Image processing techniques, statistical analysis, and working with histopathology datasets.
- Biomedical Knowledge (preferred):
  - Basic understanding of histology, cancer pathology, and biological image annotation.
Difficulty: Advanced
Size: Large (350 hours). The project involves substantial computational work, model development, and evaluation of generated pathology images.
Mentors: Xi Li (contact person), Mentor Name

Project Idea Description

The project aims to advance the generation and disentanglement of pathology images, focusing on precise control over key histological features. By leveraging generative models, we seek to create synthetic histological images where specific pathological characteristics can be independently controlled.

Challenges in Current Approaches

Current methods in histopathology image generation often struggle with:

Feature Entanglement: Difficulty in isolating individual factors such as cancer presence, severity, or staining variations.
Lack of Control: Limited capability to manipulate specific pathological attributes without affecting unrelated features.
Consistency Issues: Generated images often fail to maintain realistic cellular distributions, affecting biological validity.

Project Motivation

This project proposes a disentangled representation framework to address these limitations. By separating key features within the latent space, we aim to:

Control Histological Features: Adjust factors such as cancer presence, tumor grade, number of malignant cells, and staining methods.
Ensure Spatial Consistency: Maintain the natural distribution of cells during image reconstruction and editing.
Enable Latent Space Manipulation: Provide interpretable controls for editing and generating realistic histopathology images.

Project Objectives

Disentangled Representation Learning:
- Develop generative models (e.g., VAEs, GANs) to separate and control histological features.
Latent Space Manipulation:
- Design mechanisms for intuitive editing of pathology images through latent space adjustments.
Spatial Consistency Validation:
- Implement evaluation metrics to ensure that cell distribution remains biologically consistent during image generation.

Project Deliverables

Generative Model Framework:
- An open-source Python implementation for pathology image generation and editing.
Disentangled Latent Space Tools:
- Tools for visualizing and manipulating latent spaces to control specific pathological features.
Evaluation Metrics:
- Comprehensive benchmarks assessing image quality, feature disentanglement, and biological realism.
Documentation and Tutorials:
- Clear guidelines and code examples for the research community to adopt and build upon this work.

Impact

By enabling precise control over generated histology images, this project will contribute to data augmentation, model interpretability, and biological insight in computational pathology. The disentangled approach offers new opportunities for researchers to explore disease mechanisms, develop robust diagnostic models, and improve our understanding of cancer progression and tissue morphology.

Autograder

Thu, 06 Feb 2025 13:00:00 -0800

The EduLinq Autograder is an open source tool used by several courses at UCSC to safely and quickly grade programming assignments. Grading student code is something that may seem simple at first (you just need to run their code!), but quickly becomes exceeding complex as you get more into the details. Specifically, grading a student’s code securely while providing the “last mile” service of getting code from students and sending results to instructors/TAs and the course’s LMS (e.g., Canvas) can be very difficult. The Autograder provides all of this in a free and open source project. The LINQS Lab has made many contributions to the maintain and improve the Autograder.

All students interested in LINQS projects for OSRE/GSoC 2025 should fill out this form. Towards the end of the application window, we will contact those who we believe to be a good fit for a LINQS project. The form will stop accepting responses once the application window closes. Do not post on any of the project repositories about OSRE/GSoC (e.g., comment on an issue that you want to tackle it as a part of OSRE/GSoC 2025). Remember, these are active repositories that were not created for OSRE/GSoC.

LLM Detection

Topics: AI/ML LLM Research Backend
Skills: software development, backend, systems, data munging, go, docker
Difficulty: Challenging
Size: Large (350 hours)
Mentors: Eriq Augustine, Fabrice Kurmann, Lise Getoor

The task for this project is to create a system that provides a score indicating the system’s confidence that a given piece of code was written by an AI tool and not a student. This will supplement the existing code analysis tools in the Autograder. There are many approaches to completing this task that will be considered. A more software development approach can consist of levering exiting systems to create a production-ready system, whereas a more research approach can consist of creating a novel approach complete with a paper and experiments.

See Also:

Code Analysis GUI

Topics: Frontend
Skills: software development, frontend, data munging, js, css, go
Difficulty: Easy
Size: Medium or Large (175 or 350 hours)
Mentors: Eriq Augustine, Fabrice Kurmann, Lise Getoor

The Autograder has existing functionality to analyze the code in a student’s submission for malicious content. Relevant to this project is that the Autograder can run a pairwise similarity analysis against all submitted code. This is how most existing software plagiarism systems detect offending code. The existing infrastructure provides detailed statistics on code similarity, but does not currently have a visual way to display this data.

The task for this project is to create a web GUI using the Autograder REST API to display the results of a code analysis. The size of this project depends on how many of the existing features are going to be supported by the web GUI.

See Also:

Web GUI

Topics: Frontend
Skills: software development, frontend, js, css
Difficulty: Easy
Size: Medium or Large (175 or 350 hours)
Mentors: Eriq Augustine, Fabrice Kurmann, Lise Getoor

The Autograder contains dozens of API endpoints, most directly representing a piece of functionality exposed to the user. All of these features are exposed in the Autograder’s Python Interface. However, the Python interface is a purely command-line interface. And although command-line interface are objectively (read: subjectively) the best, a web GUI would be more accessible to a wider audience. The autograder already has a web GUI, but it does not cover all the features available in the Autograder.

The task for this project is to augment the Autograder’s web GUI with more features. Specifically, add support for more tools used to create and administer courses.

See Also:

LMS Toolkit

Thu, 06 Feb 2025 13:00:00 -0800

The EduLinq LMS Toolkit (also called the “Canvas Tool” or “py-canvas”) is a suite of tools used by several courses at UCSC to interact with Canvas from the command line or Python. A Learning Management System (LMS) is a system that institutions use to manage courses, assignments, students, and grades. The most popular LMSs are Canvas, Blackboard, Moodle, and Brightspace. These tools can be very helpful, especially from an administrative standpoint, but can be hard to interact with. They can be especially difficult when instructors and TAs want to do something that is not explicitly supported by their built-in GUIs (e.g., when an instructor wants to use a special grading policy). The LMS Toolkit project is an effort to create a single suite of command-line tools (along with a Python interface) to connect to all the above mentioned LMSs in a simple and uniform way. So, not only can instructors and TAs easily access the modify the data held in an LMS (like a student’s grades), but they can also do it the same way on any LMS. The LINQS Lab has made many contributions to the maintain and improve the Quiz Composer.

Currently, the LMS Toolkit only supports Canvas, but this suite of projects hopes to not only expand existing support, but add support for more LMSs.

Advanced Canvas Support

Topics: Backend Teaching Tools API
Skills: software development, backend, rest api, data munging, http request inspection, python
Difficulty: Moderate
Size: Medium or Large (175 or 350 hours)
Mentors: Eriq Augustine, Batuhan Salih, Lise Getoor

The LMS Toolkit already has basic read-write support for core Canvas functionality (working with grades and assignments). However, there are still many more features that can be supported such as group management, quiz management, quiz statistics, and assignment statuses.

The task for this project is to implement chose of set of advanced Canvas features to support (not limited to those features mentioned above), design an LMS-agnostic way to support those features, and implement those features. The flexibility in the features chosen to implement account for the variable size of this project.

See Also:

Repository for LMS Toolkit
GitHub Issues

New LMS Support: Moodle

Topics: Backend Teaching Tools API
Skills: software development, backend, rest api, data munging, http request inspection, python
Difficulty: Moderate
Size: Large (350 hours)
Mentors: Eriq Augustine, Batuhan Salih, Lise Getoor

The goal of the LMS toolkit is to provide a single interface for all LMSs. It is a lofty goal, however there is currently only support for Canvas. Moodle is one of the more popular LMSs. Naturally, the LMS Toolkit wants to support Moodle as well. Moodle is open source, so adding support in the LMS Toolkit should not be too challenging.

The task for this project is to add basic support for the Moodle LMS. It is not necessary to support all the same features that are supported for Canvas, but at least the core features of score and assignment management should be implemented.

See Also:

New LMS Support: Blackboard

Topics: Backend Teaching Tools API
Skills: software development, backend, rest api, data munging, http request inspection, python
Difficulty: Challenging
Size: Large (350 hours)
Mentors: Eriq Augustine, Batuhan Salih, Lise Getoor

The goal of the LMS toolkit is to provide a single interface for all LMSs. It is a lofty goal, however there is currently only support for Canvas. Blackboard (also called “Blackboard Learn”) is one of the more popular LMSs. Naturally, the LMS Toolkit wants to support Blackboard as well. However, a challenge in supporting Blackboard is that it is not open source (unlike Canvas). Therefore, support and testing on Blackboard may be very challenging.

The task for this project is to add basic support for the Blackboard LMS. It is not necessary to support all the same features that are supported for Canvas, but at least the core features of score and assignment management should be implemented. The closed nature of Blackboard makes this a challenging and uncertain project.

See Also:

New LMS Support: Brightspace

Topics: Backend Teaching Tools API
Skills: software development, backend, rest api, data munging, http request inspection, python
Difficulty: Challenging
Size: Large (350 hours)
Mentors: Eriq Augustine, Batuhan Salih, Lise Getoor

The goal of the LMS toolkit is to provide a single interface for all LMSs. It is a lofty goal, however there is currently only support for Canvas. D2L Brightspace is one of the more popular LMSs. Naturally, the LMS Toolkit wants to support Brightspace as well. However, a challenge in supporting Brightspace is that it is not open source (unlike Canvas). Therefore, support and testing on Brightspace may be very challenging.

The task for this project is to add basic support for the Brightspace LMS. It is not necessary to support all the same features that are supported for Canvas, but at least the core features of score and assignment management should be implemented. The closed nature of Brightspace makes this a challenging and uncertain project.

See Also:

Testing / CI Infrastructure

Topics: Backend Teaching Tools Testing CI
Skills: software development, backend, testing, ci, docker
Difficulty: Challenging
Size: Large (350 hours)
Mentors: Eriq Augustine, Batuhan Salih, Lise Getoor

The goal of the LMS toolkit is to provide a single interface for all LMSs. This means that our system must communicate with several different (the LMSs), each with their own systems, data patterns, versions, and quirks. Testing will be essential to ensure that our tools keep working as the different LMSs evolve and update. The LMS Toolkit currently tests with Canvas by mocking API responses. However, this tactic does not scale well with multiple LMSs (and multiple versions of each system). A more scalable approach would be to have test instances of the different LMSs that our testing infrastructure can interact with both interactively and in continuous integration (CI).

The task for this project is to create testing infrastructure that connects to test instances of different LMS systems (e.g., Canvas). This task does not require that all the LMSs in this document are used, but the testing infrastructure should be robust enough to support them all. The open source LMSs (Canvas and Moodle) will likely be much easier to setup than the others, and should be targeted first. We should be able to run tests locally as well as in CI, and will likely heavily use Docker containers.

See Also:

Quiz Composer

Thu, 06 Feb 2025 13:00:00 -0800

Canvas Import

Topics: Backend Teaching Tools API
Skills: software development, backend, rest api, data munging, http request inspection, python
Difficulty: Moderate
Size: Medium (175 hours)
Mentors: Eriq Augustine, Lucas Ellenberger, Lise Getoor

See Also:

Google Forms Export

Topics: Backend Teaching Tools API
Skills: software development, backend, rest api, data munging, python
Difficulty: Moderate
Size: Medium (175 hours)
Mentors: Eriq Augustine, Lucas Ellenberger, Lise Getoor

See Also:

Template Questions

Topics: Backend Teaching Tools API
Skills: software development, backend, data munging, python
Difficulty: Moderate-Challenging
Size: Large (350 hours)
Mentors: Eriq Augustine, Lucas Ellenberger, Lise Getoor

See Also:

LLMSeqRec: LLM Enhanced Contextual Sequential Recommender

Thu, 06 Feb 2025 10:15:56 -0700

Project Description

Project Objectives

Step 1: Data Preprocessing & Feature Creation: Develop a data processing pipeline to parse user’s sequential interaction behaviors into sequential data points for LLM-based embeddings and contextual sequential transformer modeling; Extract user behavior sequences, items’ metadata, and temporal patterns to create context-aware sequential representations for training, validation and testing; The data source can be from Amazon open public data or Movie Lense data set. The data points creation can follow SASRec (in the reference 1).
Step 2: Model Development: Design and implement LLM-enhanced sequential recommendation models, integrating pretrained language models to augment user-item interactions with semantic context; Develop an adaptive mechanism to incorporate external contextual signals, such as product descriptions, reviews into the sequential recommendation process; The baseline model can be SASRec pytorch implementation.
Step 3: Evaluation: : Benchmark LLMSeqRec against state-of-the-art sequential recommenders, evaluating on accuracy, NDCG and cold-start performance; Conduct ablation studies to analyze the impact of LLM-generated embeddings on recommendation quality; Optimize model inference speed and efficiency for real-time recommendation scenarios.

Project Deliverables

LLMSeqRec

Topics: LLM Enhanced Contextual Sequential Recommender
Skills: Proficiency in Python, Pytorch, Github, Self-attention, Transformer
Difficulty: Difficult
Size: Large (350 hours)
Mentor: Linsey Pang, Bin Dong

References:

Self-Attentive Sequential Recommendation (SASRec)
BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Amazon Dataset: https://cseweb.ucsd.edu/~jmcauley/datasets.html#amazon_reviews
Movie Lense Data: https://grouplens.org/datasets/movielens/

ReIDMM: Re-identifying Multiple Objects across Multiple Streams

Thu, 06 Feb 2025 10:15:56 -0700

Project Description

Re-identifying multiple objects across multiple streams (ReIDMM) is essential in scientific research and various industries. It involves tracking and analyzing entities across different viewpoints or time frames. In astronomy, ReIDMM helps track celestial objects like asteroids and space debris using multiple observatories. In biology and ecology, it enables the identification of animals across different camera traps and aids in tracking microscopic organisms in laboratory studies. In physics and engineering, it is used for tracking particles in high-energy physics experiments, monitoring structural changes in materials, and identifying robots or drones in lab automation. Beyond scientific applications, ReIDMM plays a critical role in industries such as retail, where it tracks customer behavior across multiple stores and improves sales and prevents theft. In smart cities, it supports traffic monitoring by identifying vehicles across intersections for improved traffic flow management. In manufacturing, it enables supply chain tracking by locating packages across conveyor belts and warehouse cameras. In autonomous systems, ReIDMM enhances multi-camera sensor fusion and warehouse robotics by identifying pedestrians, obstacles, and objects across different camera views.

Project Objectives

Aligned with the vision of the 2025 Open Source Research Experience (OSRE), this project aims to develop an open-source algorithm for multiple-object re-identification across diverse open-source data streams. As highlighted earlier, this method is expected to have wide-ranging applications in both scientific research and industry. Utilizing an open-source dataset, our focus will be on re-identifying common objects such as vehicles and pedestrians. The primary challenge lies in designing a unified algorithm, ReIDMM, capable of performing robust multi-object re-identification across multiple streams. Users will be able to tag any object as a target in a video or image for tracking across streams. Below is an outline of the algorithms to be developed in this project:

Step 1: Target Object Identification: Randomly select a target object from an image or video using object detection models such as YOLOv7. These models detect objects by generating bounding boxes around them. Target objects could include vehicles, pedestrians, animals, or other recognizable entities. This step ensures an initial object of interest is chosen for re-identification.
Step 2: Feature Extraction and Embedding: Once the target object is identified, extract relevant features such as bounding box coordinates, timestamp, location metadata (if available), and visual characteristics. A multimodal embedding approach is used, where these features are transformed into a numerical representation (embedding vector) that captures the object’s unique identity. This allows for efficient comparison across different images or videos.
Step 3: Searching and Matching: To find the target object in other images or videos: (1) Extract embeddings of all objects detected in the other images/videos; (2) Compute similarity between the target object’s embedding and those of all detected objects using metrics like cosine similarity or Euclidean distance. (3) Rank objects by similarity, returning the most probable matches. The highest-ranked results are likely to be the same object observed from different angles, lighting conditions, or time frames.

Project Deliverables

This project will deliver three things, software, evaluation results and demo. The software which implements the above ReIDMM algorithm will be hosted on the github repo as open-access repositories. The evaluation results and demo will be published along the github repo.

ReIDMM

Topics: ReIDMM: Re-identifying Multiple Objects across Multiple Streams`
Skills: Proficient in Python, Experience with images processing, machine learning
Difficulty: Difficult
Size: Large (350 hours)
Mentor: Bin Dong, Linsey Pang

Reference:

Seam: Kubernetes-Aware Programmable Networking & Cloud Provisioning

Wed, 05 Feb 2025 00:00:00 +0000

Seam is a project focused on building a Kubernetes-aware programmable networking and cloud provisioning system. It combines Python, Kubernetes, P4 programming, and SmartNICs to create a robust framework for managing cloud resources, optimizing networking, and provisioning virtual machines. Students will learn about cutting-edge technologies such as Kubernetes, Docker, P4 programming, SmartNICs, KubeVirt, Prometheus, Grafana, and Flask, while working on real-world applications in high-performance computing environments. This project will help students understand the intricacies of cloud resource management and programmable networking, providing them with valuable skills for future careers in software engineering, networking, and DevOps.

The project involves creating a Python library for provisioning Kubernetes resources, including virtual machines and networking, using tools such as KubeVirt for VM provisioning and ESnet SENSE for network configuration. The library will also integrate monitoring solutions with Prometheus and Grafana for real-time metrics collection and visualization. Students will develop Flask-based dashboards for managing these resources, implement automated pipelines using GitLab CI/CD, and explore full-stack web development, database management with PostgreSQL, and API design.

In addition, students will gain hands-on experience with programmable networking using P4 and SmartNICs, learning how to write P4 programs for dynamic routing, security, and network policy enforcement at the hardware level. The integration of Kubernetes, SmartNICs, and P4 programming will allow for advanced optimizations and efficient management of high-performance cloud environments.

Thus far, the framework has been developed to allow provisioning of resources within Kubernetes, integrating Prometheus and Grafana for monitoring, and providing an interface for users to manage cloud resources. We aim to extend this by incorporating advanced network policies and improving the web interface.

Seam / Kubernetes Resource Provisioning and Management

The proposed work includes expanding the Python library to support comprehensive Kubernetes resource provisioning, network management, and virtual machine provisioning using KubeVirt. Students will enhance the current implementation to allow users to define resource limits, CPU/GPU quotas, and network policies. They will also integrate with ESnet SENSE to facilitate L2 networking, and explore the use of Prometheus and Grafana for real-time performance monitoring and metrics collection.

Topics: Kubernetes, Python, Cloud Computing, Networking, Programmable Networking, Monitoring, CI/CD
Skills: Python, Kubernetes, P4 programming, KubeVirt, ESnet SENSE, Docker, GitLab CI/CD, Prometheus, Grafana, PostgreSQL, Flask
Difficulty: Hard
Size: Large (350 hours)
Mentors: Mohammad Firas Sada, Thomas A. DeFanti, Jeffrey Weekley, Derek Weitzel, Dmitry Mishin

Seam / Full-Stack Web Development and Dashboard

The proposed work includes building a Flask-based web dashboard using Bootstrap for UI, integrating it with the Python library to enable users to easily provision resources, monitor network performance, and track resource usage in real-time. The dashboard will support role-based access control (RBAC), allowing for secure multi-user management. Students will also integrate PostgreSQL for managing and storing configurations, logs, and performance metrics.

Topics: Full-Stack Web Development, Flask, Bootstrap, PostgreSQL, Kubernetes, Monitoring, DevOps
Skills: Web Development, Flask, Bootstrap, PostgreSQL, API Development, Kubernetes
Difficulty: Medium to Hard
Size: Large (350 hours)
Mentors: Mohammad Firas Sada, Thomas A. DeFanti, Jeffrey Weekley, Derek Weitzel, Dmitry Mishin

Seam / CI/CD and GitLab Integration

The proposed work includes setting up GitLab CI/CD pipelines for automated testing, deployment, and maintenance of the Python library, Kubernetes resources, and web dashboard. Students will automate the deployment of P4 programs, Kubernetes deployments, and networking configurations. They will also focus on unit testing, integration testing, and the automation of benchmarking experiments to ensure reproducibility of results.

Topics: CI/CD, GitLab, Python, Kubernetes, DevOps, Testing, Automation
Skills: GitLab CI/CD, Python, Kubernetes, Docker, Automation, Testing, Benchmarking
Difficulty: Medium to Hard
Size: Large (350 hours)
Mentors: Mohammad Firas Sada, Thomas A. DeFanti, Jeffrey Weekley, Derek Weitzel, Dmitry Mishin

Seam / Networking & SmartNIC Programming

The proposed work includes writing P4 programs to control network traffic flow, enforce network security policies, and optimize data transfer across the Kubernetes cluster. Students will gain experience with SmartNICs (Xilinx Alveo U55C, SN1000, NVIDIA Bluefield 2) and Tofino switches, using P4 to write network policies and integrate with the Kubernetes network layer (Multus, Calico). Students will also explore gRPC APIs for dynamically adjusting network policies and provisioning virtual network interfaces in real time.

Topics: Networking, P4 Programming, SmartNICs, Kubernetes Networking, Cloud Computing
Skills: P4, Networking, SmartNICs, Kubernetes Networking, Multus, Calico, gRPC
Difficulty: Hard
Size: Large (350 hours)
Mentors: Mohammad Firas Sada, Thomas A. DeFanti, Jeffrey Weekley, Derek Weitzel, Dmitry Mishin

WaDAR

Wed, 05 Feb 2025 00:00:00 +0000

WaDAR (Water Radar) is an innovative, low-cost, hybrid approach to soil moisture sensing that combines the benefits of in-ground (in situ) and remote sensing technologies. Traditional soil moisture measurement methods suffer from drawbacks: in situ sensors are expensive and difficult to maintain, while remote sensing offers lower accuracy and resolution. WaDAR bridges this gap by using inexpensive underground backscatter tags paired with above-ground radars, enabling completely wireless, high-resolution soil moisture monitoring.

Key Features of WaDAR

Uses RF backscatter tags buried underground to provide high-accuracy soil moisture readings.
Uses ultra-wideband radar for above-ground sensing.
Offers an average error of just 1.4%, comparable to state-of-the-art commercial sensors.
Reduces deployment costs significantly, making it accessible for widespread agricultural use.
Supports real-time, scalable, and maintenance-free soil moisture monitoring for farmers.

Improving and Optimizing Data Processing Pipeline for More Accurate Soil Moisture Measurements

Topics: Digital Signal Processing Machine Learning
Skills: C/embedded, signal processing, machine learning, MATLAB (optional)
Difficulty: Moderate
Size: Medium (175 hours)
Mentors: Colleen Josephson, Eric Vetha

Enhance the accuracy of soil moisture measurements by refining the data processing pipeline.

Tasks:

Develop and test algorithms for noise reduction and signal improvement.
Implement advanced filtering and statistical techniques to improve measurement precision.
Validate improvements using real-world field data.
Translate algorithms into embedded to be implemented in real-time embedded hardware.

Improving Backscatter Tag PCB

Topics: Hardware Design Signal Processing
Skills: PCB design, RF knowledge
Difficulty: Moderate
Size: Medium (175 hours)
Mentors: Colleen Josephson, Eric Vetha

Enhance the performance of WaDAR’s backscatter tags by optimizing PCB design for improved signal-to-noise ratio (SNR) and implementing a communication protocol for tag identification.

Tasks:

Redesign PCB for improved readings.
Implement and test a communication protocol to distinguish between multiple tags.
Evaluate hardware changes in real-world field conditions.
Optimize power consumption and scalability for practical deployment.

Mediglot

Tue, 04 Feb 2025 00:00:00 +0000

PolyPhy is a GPU-oriented agent-based system for reconstructing and visualizing optimal transport networks defined over sparse data. Rooted in astronomy and inspired by nature, we have used an early prototype called Polyphorm to reconstruct the Cosmic web structure, but also to discover network-like patterns in natural language data. You can see an instructive overview of PolyPhy in our workshop and more details about our research here. Recent projects, such as Polyglot and Mediglot have focused on using PolyPhy to better visualize language embeddings.

Medicinal Language Embeddings

Topics: Large Language Models NLP Embeddings Medicine
Skills: Python, JavaScript, Data Science, Technical Communication
Difficulty: Challenging
Size: Large (350 hours)
Mentors: Oskar Elek, Kiran Deol

This project aims to refine and enhance Mediglot, a web application for visualizing 3D medicinal embeddings, which extends the Polyglot app and leverages the PolyPhy toolkit for network-inspired data science. Mediglot currently enables users to explore high-dimensional vector representations of medicines (derived from their salt compositions) in a 3D space using UMAP, as well as analyze similarity through the innovative Monte-Carlo Physarum Machine (MCPM) metric. Unlike traditional language data, medicinal embeddings do not have an inherent sequential structure. Instead, we must work with the salt compositions of each medicine to create embeddings that are faithful to the intended purpose of each medicine.

This year, we would like to focus on exploring and integrating state-of-the-art AI techniques and algorithms to improve Mediglot’s clustering capabilities and its representation of medicinal data in 3D. The contributor will experiment with advanced large language models (LLMs) and cutting-edge AI methods to develop innovative approaches for refining clustering and extracting deeper insights from medicinal embeddings. Beyond LLMs, we would like to experiment with more traditional language processing methods to design novel embedding procedures. Additionally, we would like to experiment with other similarity metrics. While the similarity of two medicines depends on the initial embedding, we would like to examine the effects of different metrics on the kinds of insights a user can extract. Finally, the contributor is expected to evaluate and compare different algorithms for dimensionality reduction to enhance the faithfulness of the visualization and its interpretability.

The ideal contributor for this project has experience with Python (and common scientific toolkits such as NumPy, Pandas, SciPy). They will also need some experience with JavaScript and web development (MediGlot is distributed as a vanilla JS web app). Knowledge of embedding techniques for language processing is highly recommended.

Specific tasks:

Closely work with the mentors to understand the context of the project and its detailed requirements in preparation for the proposal.
Become acquainted with the tooling (PolyPhy, PolyGlot, Mediglot) prior to the start of the project period.
Explore different embedding techniques for medicinal data (including implementing novel embedding procedures).
Explore different dimensionality reduction techniques, with a focus on faithful visualizations.
Document the process and resulting findings in a publicly available report.

Enhancing PolyPhy Web Application

Topics: Web Development UI/UX Design Full Stack Development JavaScript Next.js Node.js
Skills: Full Stack Web Development, UI/UX Design, JavaScript, Next.js, Node.js, Technical Communication
Difficulty: Challenging
Size: Medium (175 hours)
Mentors: Oskar Elek, Kiran Deol

This project aims to revamp and enhance the PolyPhy web platform to better support contributors, users, and researchers. The goal is to optimize the website’s UI/UX, improve its performance, and integrate Mediglot to provide users with a seamless experience in visualizing both general network structures and 3D medicinal embeddings.

The contributor will be responsible for improving the website’s overall look, feel, and functionality, ensuring a smooth and engaging experience for both contributors and end-users. This includes addressing front-end and back-end challenges, optimizing the platform for better accessibility, and ensuring seamless integration with Mediglot.

The ideal candidate should have experience in full-stack web development, particularly with Next.js, JavaScript, and Node.js, and should be familiar with UI/UX design principles. A strong ability to communicate effectively, both in writing and through code, is essential for this role.

Specific tasks:

Collaborate with mentors to understand the project’s goals and the specific requirements for the website improvements.
UI/UX Redesign:
- Redesign and enhance the website’s navigation, layout, and visual elements to create an intuitive and visually engaging experience.
- Improve mobile responsiveness for broader accessibility across devices.
Website Performance & Stability:
- Identify and resolve performance bottlenecks, bugs, or issues affecting speed, stability, and usability.
Mediglot Integration:
- Integrate the Mediglot web application with PolyPhy, ensuring seamless functionality and a unified user experience for visualizing medicinal data alongside general network reconstructions.
Documentation:
- Document the development process, challenges, and solutions in a clear and organized manner, ensuring transparent collaboration with mentors and the community.

Environmental NeTworked Sensor (ENTS)

Fri, 31 Jan 2025 00:00:00 +0000

ENTS I: Web portal for large-scale sensor networks

Topics: Data Visualization, Backend, Frontend, UI/UX, Analytics
Skills:
- Required: React, Javascript, Python, SQL, Git
- Nice to have: Flask, Docker, CI/CD, AWS, Authentication
Difficulty: Medium
Size: Large (350 hours)
Mentors: Colleen Josephson, John Madden, Alec Levy

Below is a list of project ideas that would be beneficial to the ENTS project. You are not limited to the following projects, and encourage new ideas that enhance the platform:

Improve streaming functionality
Generic interface for sensor measurements
Logger registration
Over the air (OTA) configuration updates
Implement unit tests and API documentation

ENTS II: Hardware to for large-scale field sensor networks

Topics: Embedded system, wireless communication, low-power remote sensing
Skills:
- Required: C/C++, Git, Github, PlatformIO
- Nice to have: STM32 HAL, ESP32 Arduino, protobuf, python, knowledge of standard communication protocols (I2C, SPI, and UART)
Difficulty: Hard
Size: Large (350 hours)
Mentors: Colleen Josephson, John Madden, Jack Lin

The Environmental NeTworked Sensor (ENTS) node aims to be a general purpose hardware platform for outdoor sensing (e.g. agriculture, ecological monitoring, etc.). The typical use case involves a sensor deployment in an agricultural field, remotely uploading measurements without interfering with farming operations. The current hardware revision (Soil Power Sensor was originally designed for monitoring power output of microbial fuel cells using high fidelity voltage and current measurement channels, as well as auxiliary sensors such as the SDI-12 TEROS-21 soil moisture sensor. The primary activities of this project will involve low-level firmware design and implementation, but may also incorporate hardware design revisions if necessary. We are looking to expand functionality to other external sensors, as well as optimize for power consumption, via significant firmware design activities.

Long-range, low-power wireless communication is achieved through a LoRa capable STM32 microcontroller with in-lab experiments using an ESP32 microcontroller to enable the simpler WiFi interface. Both wireless interfaces communicate upload measurements to our data visualization dashboard, ENTS I. The combined goal across both of these projects is to create a system that enables researchers to test and evaluate novel sensing solutions. We are looking to make the device usable to a wide range of researchers which may not have a background in electronics, so are interested in design activities that enhance user friendliness.

In total there will be 2-4 people working on the hardware with progress being tracked on GitHub. Broader project planning is tracked through a Jira board. We intend to have weekly meetings to provide updates on current issue progress along with assigning tasks. Please reach out to John Madden if there are any questions or specific ideas for the project.

Below is a list of project ideas that would be beneficial to the ENTS project. You are not limited to the following projects, and encourage new ideas that enhance the platform:

Backup logging via SD card
I2C multiplexing for multiple of the same sensors
Batch sensor measurement uploading

Causeway: Scaling Experiential Learning Through Micro-Roles

Thu, 30 Jan 2025 00:00:00 +0000

Causeway is a platform for learning to develop web applications using an Angular, RxJS, NgRx, and Firebase stack. Most online coding tutorials focus on covering the technical syntax or features of a language or framework, which means that new developers don’t have great resources for building a holistic picture of how everything they learn connects to actually developing a complex web application. Causeway breaks down the process of developing a web application into a hierarchy of micro-roles which provides learners with a clear pathway for learning that also translates to a clear process for developing an application. In the longer future, this would also enable learners to easily contribute to projects as they learn through taking on micro-roles for yet-to-be-developed projects. The platform uses the Stackblitz WebContainer API to run full applications in the browser for interactive learning.

Thus far, we have developed a version of the platform that walks learners through the process of developing UI components of a web application as well as containers that contain multiple UI components and are responsible for fetching data from the backend and handling events and updates to the database. We’d like to extend the content to cover defining the database schema and entire applications, and to other topics beyond web development like AI/ML. We’d like to add quizzes to the experience and explore ways to use Generative AI to augment the learning experience, e.g. to support planning, reflection, and assessment. Finally, we’d like to instrument the application with logs and analytics so we can better measure impact and learning outcomes, and develop a stronger CI/CD pipeline.

Causeway / Improving the Core Infrastructure

The proposed work includes adding logging, analytics, and a production-level CI/CD pipeline, adding a robust testing framework, and refactoring some of our code into seperate modules. Both roles will also contribute to running usability studies and documenting the platform.

Topics: Web Development, Educational Technologies, Angular
Skills: Web development experience, HTML, CSS, Javascript, Angular, RxJS, NgRx, Firebase
Difficulty: Medium to Hard
Size: Large (350 hours)
Mentors: David Lee

Causeway / Quizzes and Generative AI

The proposed work includes extending the application to support quizzes, adding quizzes for the existing tasks, and exploring the use of generative AI to support the quizzes feature. Both roles will also contribute to running usability studies and documenting the platform.

Topics: Web Development, Educational Technologies, Angular
Skills: Web development experience, HTML, CSS, Javascript, Angular, RxJS, NgRx, Firebase, Generative AI
Difficulty: Medium to Hard
Size: Large (350 hours)
Mentors: David Lee

OpenROAD - An Open-Source, Autonomous RTL-GDSII Flow for Chip Design

Sun, 19 Jan 2025 00:00:00 +0000

The OpenROAD project is a non-profit project, originally funded by DARPA with the aim of creating open-source EDA tools; an Autonomous flow from RTL-GDSII that completes < 24 hrs, to lower cost and boost innovation in IC design. This project is now supported by Precision Innovations.

OpenROAD massively scales and supports EWD (Education and Workforce Development) and supports a broad ecosystem making it a vital tool that supports a rapidly growing Semiconductor Industry.

OpenROAD is the fastest onramp to gain knowledge, skills and create pathways for great career opportunities in chip design. You will develop important software and hardware design skills by contributing to these interesting projects. You will also have the opportunity to work with mentors from the OpenROAD project and other industry experts.

We welcome a diverse community of designers, researchers, enthusiasts, software engineers and entrepreneurs to use and contribute to OpenROAD and make a far-reaching impact in the rapidly growing, global Semiconductor Industry.

Improving Code Quality in OpenROAD

Topics: Coding Best Practices in C++, Code Quality Tooling, Continuous Integration
Skills: C++
Difficulty: Medium
Size: Medium (175 hours)
Mentors: Matt Liberty & Arthur Koucher

OpenROAD is a large and complex program. This project is to improve the code quality through resolving issues flagged by tools like Coverity and clang-tidy. New tools like the clang sanitizers ASAN/TSAN/UBSAN should also be set up and integrated with the Jenkins CI.

GUI Testing in OpenROAD

Topics: Testing, Continuous Integration
Skills: C++, Qt
Difficulty: Medium
Size: Large (350 hours)
Mentors: Matt Liberty & Peter Gadfort

The OpenROAD GUI is a crucial set of functionality for users to see and investigate their design. GUI testing is specialized and rather different from standard unit testing. The GUI therefore needs improvements to its testing to cover both interaction and rendering. The GUI uses the Qt framework. An open-source testing tool like https://github.com/faaxm/spix will be set up and key tests developed. This will provide the framework for all future testing.

Rectilinear Floorplans in OpenROAD

Topics: Electronic Design Automation, Algorithms
Skills: C++, data structures and algorithms
Difficulty: Medium
Size: Large (350 hours)
Mentors: Eder Monteiro & Augusto Berndt

OpenROAD supports block floorplans that are rectangular in shape. Some designs may require more complex shapes to fit. This project extends the tool to support rectilinear polygon shapes as floorplans. This will require upgrading data structures and algorithms in various parts of OpenROAD including floor plan generation, pin placement, and global placement.

LEF Reader and Database Enhancements in OpenROAD

Topics: Electronic Design Automation, Database, Parsing
Skills: Boost Spirit parsers, Database, C++
Difficulty: Medium
Size: Medium (175 hours)
Mentors: Osama Hammad & Ethan Mahintorabi

LEF (Library Exchange Format) is a standard format for describing physical design rules for integrated circuits. OpenROAD has support for many constructs but some newer ones for advanced process nodes are not supported. This project is to support parsing such information and storing in the OpenDB for use by the rest of the tool.

ORAssistant - LLM Data Engineering and Testing

Topics: Large Language Model, Machine Learning, Data Engineering, Model Deployment, Testing, Full-Stack Development
Skills: large language model engineering, database, evaluation, CI/CD, open-source or related software development, full-stack
Difficulty: Medium
Size: Medium (175 hours)
Mentor: Jack Luar & Palaniappan R

This project is aimed at enhancing robustness and accuracy for OR Assistant, the conversational assistant for OpenROAD through comprehensive testing and evaluation. You will work with members of the OpenROAD team and other researchers to enhance the existing dataset to cover a wide range of use cases to deliver accurate responses more efficiently. This project will focus on data engineering and benchmarking and you will collaborate on a project on the LLM model engineering. Tasks include: creating evaluation pipelines, building databases to gather feedback, improving CI/CD, writing documentation, and improving the backend and frontend services as needed (non-exhaustive). You will gain valuable experience and skills in understanding chip design flows and applications. Open to proposals from all levels of ML practitioners.

ORAssistant - LLM Model Engineering

Topics: Large Language Model, Machine Learning, Model Architecture, Model Deployment
Skills: large language model engineering, prompt engineering, fine-tuning
Difficulty: Medium
Size: Medium (175 hours)
Mentor: Jack Luar & Palaniappan R

This project is aimed at enhancing robustness and accuracy for OR Assistant, the conversational assistant for OpenROAD through enhanced model architectures. You will work with members of the OpenROAD team and other researchers to explore alternate architectures beyond the existing RAG-based implementation. This project will focus on improving reliability and accuracy of the existing model architecture. You will collaborate on a tandem project on data engineering for OR assistant. Tasks include: reviewing and understanding the state-of-the-art in retrieval augmented generation, implementing best practices, caching prompts, improving relevance and accuracy metrics, writing documentation and improving the backend and frontend services as needed (non-exhaustive). You will gain valuable experience and skills in understanding chip design flows and applications. Open to proposals from all levels of ML practitioners.

RAG-ST: Retrieval-Augmented Generation for Spatial Transcriptomics

Wed, 15 Jan 2025 00:00:00 +0000

Topics: bioinformatics, spatial transcriptomics, gene expression generation, retrieval-augmented generation, large models
Skills:
- Programming Languages:
  - Proficient in Python, and familiarity with machine learning libraries such as PyTorch.
- Data Analysis:
  - Experience with spatial transcriptomics datasets and statistical modeling.
- Machine Learning:
  - Understanding of vision models, retrieval-based systems, and MLP architectures.
- Bioinformatics Knowledge (preferred):
  - Familiarity with scRNA-seq data integration and computational biology tools.
Difficulty: Advanced
Size: Large (350 hours). Given the scope of integrating RAG models, building a robust database, and ensuring interpretable predictions, this project involves substantial computational and data preparation work.
Mentors: Ziheng Duan (contact person)

Project Idea Description

Spatial transcriptomics (ST) is a revolutionary technology that provides spatially resolved gene expression measurements, enabling researchers to study cellular behaviour within tissues with unprecedented detail. This technology has transformed our understanding of complex biological systems, such as disease progression, tissue development, and cellular heterogeneity. However, the widespread adoption of ST is limited by its high cost and technical requirements.

Histology imaging, on the other hand, is far more accessible and cost-effective. If gene expression could be accurately predicted from histology images, it would enable researchers to leverage these abundant images for high-resolution biological insights without the need for expensive spatial transcriptomics experiments. This task has immense potential to democratize spatial transcriptomics research and significantly reduce costs.

Challenges in Current Approaches

Current methods for predicting gene expression from histology images typically involve:

Using large vision models to encode histology image patches into embeddings.
Employing Multi-Layer Perceptrons (MLPs) to map these embeddings to gene expression profiles.

While these approaches have shown promise, they suffer from two critical limitations:

Accuracy: The MLP-based mappings often fail to fully capture the biological complexity encoded in the histology images, leading to suboptimal predictions.
Interpretability: These models act as black boxes, providing no insight into the underlying biological rationale for the predictions. Researchers cannot determine why a specific gene expression profile was generated, limiting trust and utility in biological contexts.

Project Motivation

To overcome these limitations, this project proposes a novel Retrieval-Augmented Generation (RAG) framework for spatial transcriptomics. Instead of relying solely on black-box MLPs, RAG-ST will:

Retrieve relevant examples from a curated database of paired histology images, scRNA-seq data, and gene expression profiles.
Use these retrieved examples to inform and enhance the generation process, resulting in predictions that are both more accurate and biologically interpretable.

This approach not only grounds predictions in biologically meaningful data but also provides transparency by revealing which database entries influenced the results.

Project Objectives

Database Construction:
- Curate a large and diverse database of histology images paired with scRNA-seq and gene expression data.
Model Development:
- Develop a RAG framework combining vision-based encoders and retrieval-enhanced generation techniques.
- Incorporate interpretability mechanisms to link predicted gene expressions to retrieved examples.
Evaluation and Benchmarking:
- Assess RAG-ST against state-of-the-art methods, focusing on accuracy, interpretability, and biological validity.

Project Deliverables

Curated Database:
- A publicly available, well-documented database of histology images and gene expression profiles.
RAG-ST Framework:
- An open-source Python implementation of the RAG-ST model, with retrieval, generation, and visualization tools.
Benchmark Results:
- Comprehensive evaluations demonstrating the benefits of RAG-ST over conventional pipelines.
Documentation and Tutorials:
- User-friendly guides to facilitate adoption by the spatial transcriptomics research community.

Impact

By integrating retrieval-augmented generation with large models, RAG-ST represents a paradigm shift in spatial transcriptomics. It offers a cost-effective, accurate, and interpretable solution for gene expression prediction, democratizing access to high-quality spatial transcriptomic insights and fostering advancements in biological research.

Final Report: Stream processing support for FasTensor

Fri, 30 Aug 2024 00:00:00 +0000

Final Report: Stream processing support for FasTensor

Project Description

FasTensor is a scientific computing library specialized in performing computations over dense matrices that exhibit spatial locality, a characteristic often found in physical phenomena data. Our GSoC'24 project aimed to enhance FasTensor by enabling it to ingest and process live data streams from sensors and scientific equipment.

What is FasTensor?

Imagine you’re working on a physical simulation or solving partial differential equations (PDEs). You’ve discretized your PDE, but now you face a new challenge: you need to run your computations fast and parallelize them across massive compute clusters.

At this point, you find yourself describing a stencil [1] operation. But should you really spend your time tinkering with loop orders, data layouts, and countless other side-quests unrelated to your core problem?

This is where FasTensor comes in: Describe your computation as a stencil, and it takes care of ensuring optimal execution. FasTensor lets you focus on the science, not the implementation details.

Repository Links

FasTensor: https://github.com/BinDong314/FasTensor
My fork: https://github.com/my-name/FasTensor/tree/ftstream

PR(s)

Work done this summer

Develop Streaming simulator: FTStream

I was first entasked by Dr. Bin to develop a stream simulator for testing the streaming capability of FasTensor. For testing purposes, a stream is characterized by file size, count, and arrival interval. FTStream can generate streams of various sizes and intervals, up to the theoretical limits of disk and filesystem. We’re talking speeds up to 2.5 GiB/s on a non-parallel NVMe!

Writing this tool was an adventure in throughput testing and exploring APIs. I wrote multiple drivers, each for a different whim and hijinks of systems in the HPC world. Here’s a brief journey through the APIs we explored:

HDF5 APIs: Pretty fast in flush-to-disk operation, but the API design strongly binds to file handles, which inhibits high throughput duplication.
HDF5 VFL and VOL: We dabbled in these dark arts, but there be dragons! Keeping a long-term view of maintenance, we dropped the idea.
POSIX O_DIRECT: This involved getting your buffers aligned right and handling remainders correctly. A step up, but not quite at the theoretical limits.
Linux AIO: Streaming is latency sensitive domain, to reach the theoretical limits, every syscall saved matters. Linux AIO allowed us syscall batching with io_submit(). It took a few testing sessions to get the correct combo of queue depth, buffer size, and alignment right.

We settled on O_DIRECT + Linux AIO. Feel free to modify ftstream/fastflush.h to suit your needs.

Stream Support

FasTensor has just one simple paradigm: you give it a data source, an output data store, and your transform, and it handles all the behind-the-scenes grunt work of computing over big datasets so you can focus on your research.

We aimed to achieve the same for streaming: Drop in the STREAM keyword, append a pattern identifying your stream, and use your usual transform.

Voila! Now your previous FasTensor code supports live data streams.

Technical tidbits:

Implements a manager-worker pattern to allow us flexibility in the future to implement different stream semantics such as windowing, CPU-memory based load balancing
Supports streams of indefinite size

Challenges

HPC has its fair share of challenges. Things you take for granted might not be available there, and it takes a while to adjust to paradigms of scale and parallelization.

For example, when developing FTStream, we found O_DIRECT is available on some parallel file systems like GPFS but not supported on Lustre/CFS. We developed a separate MPIO driver for FTStream that will be upstreamed once thoroughly tested on Lustre.

Future Work

Implement windowing and explore more advanced stream semantics.
Implement support for for defining workload policies
Optimize interleaving IO and Compute.

References

[1] Anshu Dubey. 2014. Stencils in Scientific Computations. In Proceedings of the Second Workshop on Optimizing Stencil Computations (WOSC ‘14). Association for Computing Machinery, New York, NY, USA, 57. https://doi.org/10.1145/2686745.2686756

Acknowledgement

I struck gold when it comes to mentors.

Dr. Bin Dong was really kind and supportive throughout the journey. From the very first steps of giving a tour around the codebase to giving me a lot of freedom to experiment, refactor, and refine.

Dr. John Wu was encouraging and nurturing of budding talent. We had great research presentations every Monday apart from usual mentor interactions, where different research groups presented their talks and students were invited to present their progress.

I’ve come across Quantum computing many times in the news, but I never thought I’d get a frontline preview from the researchers working at the bleeding edge at the Lawrence Berkeley National Laboratory (LBL).

This GSoC experience, made possible by Google and UC OSPO, has been invaluable for my growth as a developer and researcher.

For people interested in HPC, ML, Systems, or Reproducibility, I encourage you all to apply to UC OSPO. It’s been an incredible journey, and I’m grateful for every moment of it!

ORAssistant - LLM Assistant for OpenROAD

Tue, 27 Aug 2024 00:00:00 +0000

Introduction

Hello! I’m Palaniappan R, an undergraduate student at BITS Pilani, India. Over the past few months, I’ve been working as a GSoC contributor on the LLM Assistant for OpenROAD - Model Architecture and Prototype project, under the mentorship of Indira Iyer and Jack Luar.

The primary objective of my project is to improve the user experience within OpenROAD and OpenROAD-flow-scripts by utilizing Large Language Models(LLMs) to offer fast, relevant answers to FAQs and common issues. The ORAssistant chatbot aims to act as a first line of support, addressing basic queries in domains such as installation and command usage. Its goal is to resolve simple issues before they escalate to public forums, thereby reducing the number of support tickets on platforms like GitHub Issues.

Architecture Overview

Retrieval-augmented-generation (RAG) is a technique that improves the q&a capabilities and reliability of LLMs by incorporating factual information from external sources. When a user submits a query, the RAG process begins by fetching relevant information from a knowledge base. The retrieved content, combined with the original query is the provided to the LLM to generate a relevant, informed response.

The Knowledge Base

ORAssistant is designed to answer queries about all the major tools in the OR flow. The knowledge base primarily consists of official documentation from OpenROAD, OpenROAD-flow-scripts, and their respective manpages. Instead of scraping these primary sources from their websites, the docs are built to the desired markdown format directly from the respective GitHub repositories, using specific commit hashes for reproducibility. The knowledge base also includes documentation from other essential applications in the EDA flow, such as Yosys and OpenSTA. Additionally, it includes scraped and annotated conversational data from discussions on the OpenROAD and OpenROAD-flow-scripts GitHub pages.

The entire dataset building process has been automated, allowing for dynamic updates to accommodate any live changes.

The Tool-Based Architecture

After experimenting with multiple RAG approaches, a tool-based setup proved to be the most effective solution. Data from various domains are embedded into vector databases, and hybrid search retriever functions are applied to these vector stores. These functions are organized as individual tools that can be called by the chatbot. To maintain context, each query is rephrased while considering the chat history. This ensures a more precise and context-rich query. Please refer to my previous blog post for more information on the retrieval tools.

As depicted in the flowchart, a preliminary LLM call analyzes the input query, rephrases it based on the chat history and picks the appropriate tools for the rephrased query. Subsequently, documents are retrieved using the tool and sent to the LLM, which produces a relevant, context-aware response.

Using ORAssistant

ORAssistant is currently hosted at this link.

To set up out ORAssistant locally, find detailed instructions in the GitHub Repo. Both cloud based LLM providers (Gemini, VertexAI) and local options (Ollama) are supported.

Here’s an example of ORAssistant in action,

Future Plans

To further enhance the usability of ORAssistant, there are plans to add support for flow script generation. This will become possible after adding a dedicated script generation tool into the current tool-based workflow. Support for more tools in the EDA flow, such as KLayout will also be added in the near future.

Additionally, ORAssistant is planned to be integrated directly into OpenROAD’s CLI and GUI interfaces.

As I near the end of my GSoC, I’d like to thank the GSoC Organizing Committee, UC OSPO and The OpenROAD Project for this incredible opportunity. I’m immensely grateful to Indira Iyer and Jack Luar for their support and guidance throughout my GSoC journey. Thank You.

Hardware Hierarchical Dynamical Systems

Sat, 24 Aug 2024 00:00:00 +0000

Hi everyone! I am Ujjwal Shekhar, a Computer Science student at the International Institute of Information Technology - Hyderabad. I am excited to share my work on the project titled “Hardware Hierarchical Dynamical Systems” as part of the Open Source Research Experience (OSRE) program and Google Summer of Code. This project has been an incredible journey, and I’ve had the privilege of working with my mentors, Jose Renau and Sakshi Garg.

Project Overview and Goals

Abstract Syntax Trees (ASTs) are fundamental to modern compilers, serving as the backbone for parsing and transforming code. When compiling hardware code, the sheer volume of data can make compilation times a significant bottleneck. My project focuses on building a memory-optimized tree data structure specifically tailored for AST-typical queries.

The LiveHD repository, developed by the Micro Architecture Lab at UCSC, offers a compiler infrastructure optimized for hardware synthesis and simulation. The existing LHTree data structure provides a foundation, but there was significant potential for further optimization, which I explored throughout this project.

Key AST Queries

The core queries that the tree is optimized for include:

Finding the parent of a node.
Finding the first and last child of a node.
Locating the previous and next sibling of a node.
Adding a child to a node.
Inserting a sibling to a node.
Performing preorder, postorder, and sibling order traversal.
Removing a leaf or an entire subtree from the tree.

The primary goal was to create a tree class that excels at handling these queries efficiently, while still being robust enough to support less frequent operations. The new HHDS tree structure has demonstrated superior performance for specific tree configurations and continues to show potential across other types, particularly in memory consumption and cache efficiency, compared to the current LHTree.

The benchmarks were done using Google Bench to test the tree for scalability and performance. The new version of the tree is currently being integrated into the LiveHD core repository. Profiling to find bottlenecks in the tree was also done using Callgrind and KCachegrind.

Background and Motivation

Naive approach

A straightforward method for storing an n-ary tree is to maintain pointers from each node to its parent, children, and immediate siblings. While simple, this approach is memory-intensive and has poor cache efficiency due to the non-contiguous nature of nodes in memory. The variable memory usage per node, depending on the number of children, can also introduce significant overhead.

Enhancements to the Naive Approach

To reduce memory overhead, one optimization is to store only pointers to the first and last child within each node. This reduces memory usage to a constant per node. Additionally, since many AST-related queries focus on the tree’s structure rather than the data itself, we can separate the data from the structure. The tree would store only pointers to the data, allowing the tree structure to be optimized independently of the data storage.

While separating the data and the structure may seem like an obvious improvement, we will see that it can be extended to provide greater benefits.

Improving the cache efficiency

While reducing memory consumption is beneficial, the tree’s cache efficiency can still be suboptimal if the children of a node are scattered in memory. To enhance cache efficiency, storing children in contiguous memory locations is crucial. This improves spatial locality, which in turn boosts cache performance. Additionally, this approach eliminates the need to explicitly store data pointers in the tree, as the data resides at a contiguous memory index aligned with the bookkeeping.

By storing children contiguously, we can also eliminate the need for previous and next sibling pointers, as siblings are inherently adjacent in memory. Similarly, we can avoid storing the parent pointer for every child, since all children share the same parent.

Optimizations in LHTree (Old method)

The LHTree class in LiveHD was designed with these optimizations in mind. It groups siblings into chunks of four, storing the parent pointer only in the first sibling of each chunk. The last sibling in each chunk points to the next chunk, minimizing the number of pointers required and thus reducing memory overhead.

LHTree organizes the entire tree as a 2-dimensional array, where the first dimension represents the tree level and the second dimension represents the node index at that level. This structure improves cache efficiency by storing nodes contiguously in memory. Each tree position is a 48-bit ID, with the last 32 bits representing the node’s index and the first 16 bits indicating the tree level.

This explicit maintenance of level separately limits the tree’s scalability for deeper trees, due to the fixed number of bits allocated for the level.

Despite these optimizations, LHTree has some limitations, particularly in cache alignment and flexibility, which the HHDS tree aims to address.

Unfortunately, the number of bits required by each “chunk” happens to be slightly bigger than a single cache line (512 bits). This means that the cache efficiency of the tree is not optimal.

HHDS Tree : A New Approach

Eliminating Levels

The HHDS tree stores everything in a single vector, removing the need for explicit level information. This simplification not only improves cache efficiency but also eliminates restrictions on the number of nodes per level and the total number of levels.

Enhanced Cache Alignment

In the HHDS tree, each node has a 46-bit ID. Chunks in the HHDS tree contain up to eight children, with the first 43 bits of the absolute ID serving as the chunk ID and the last three bits indicating the node’s offset within the chunk.

For each chunk, which is exactly 64 bytes (or 512 bits) long—matching the size of a cache line—the following information is stored:

A 46-bit parent pointer (absolute ID).
A 43-bit first child long pointer (chunk ID).
A 43-bit last child long pointer (chunk ID).
43-bit previous and next sibling chunk pointers.
Seven 21-bit short delta pointers for the first child.
Seven 21-bit short delta pointers for the last child.

NOTE: The 0th chunk is an INVALID node, the real nodes start from the 1st chunk, with the node at an absolute ID of 8 (chunk ID of 1) being the root node.

Refer to the next section for more information on the short delta pointers.

The chunk is 512 bits long, which is 64 bytes, exactly the size of a cache line. Thus the amount of memory required in the worst case is 512 bits for a single node in the chunk, and in the best case is 46 bits for all 8 nodes in the chunk.

We utilized the __attribute__((packed, aligned(64))) attribute in C++ to ensure that each chunk aligns perfectly with a cache line. Bitfields were employed to pack the data efficiently within the chunk.

class __attribute__((packed, aligned(64))) Tree_pointers {
private:
 // We only store the exact ID of parent
 Tree_pos parent : CHUNK_BITS + CHUNK_SHIFT;
 Tree_pos next_sibling : CHUNK_BITS;
 Tree_pos prev_sibling : CHUNK_BITS;

 // Long child pointers
 Tree_pos first_child_l : CHUNK_BITS;
 Tree_pos last_child_l : CHUNK_BITS;

 // Short (delta) child pointers
 // You cannot make an array of bitfields inside a packed
 // struct, since the compiler will align each bitfield to the
 // size of the nearest power of two.
 Short_delta first_child_s_0 : SHORT_DELTA;
 Short_delta first_child_s_1 : SHORT_DELTA;
 Short_delta first_child_s_2 : SHORT_DELTA;
 Short_delta first_child_s_3 : SHORT_DELTA;
 Short_delta first_child_s_4 : SHORT_DELTA;
 Short_delta first_child_s_5 : SHORT_DELTA;
 Short_delta first_child_s_6 : SHORT_DELTA;

 Short_delta last_child_s_0 : SHORT_DELTA;
 Short_delta last_child_s_1 : SHORT_DELTA;
 Short_delta last_child_s_2 : SHORT_DELTA;
 Short_delta last_child_s_3 : SHORT_DELTA;
 Short_delta last_child_s_4 : SHORT_DELTA;
 Short_delta last_child_s_5 : SHORT_DELTA;
 Short_delta last_child_s_6 : SHORT_DELTA;
}

Build Append - Short Delta Heuristic

Empirical observations show that children are often added to a node shortly after the parent, meaning they are stored close to the parent in memory. This allows children to be stored as a delta from the parent, reducing the need for full chunk IDs.

When adding a child:

Attempt to store the child as a delta from the parent.
If not feasible, allocate a new chunk for the parent and store the pointer to the child chunk in the newly created parent chunk.

Implementing chunk breaking required careful handling to ensure that when a parent moves to a new chunk, its new chunk can still be referenced efficiently by its parent, potentially requiring recursive adjustments.

This is because the grandparent might not be able to store the parent as a delta from itself after the parent moves to a new chunk.

Compliance with the LiveHD core repository

Since the HHDS tree is an evolution of the LHTree, it was crucial to maintain compatibility with the LiveHD core repository. All necessary methods were implemented in the HHDS tree to ensure seamless integration. Naming conventions and syntax were kept consistent with the LHTree to facilitate a smooth transition.

Exposed methods in the HHDS tree are:

/**
 * Query based API (no updates)
 */
 Tree_pos get_parent (const Tree_pos& curr_index) const;
 Tree_pos get_last_child (const Tree_pos& parent_index) const;
 Tree_pos get_first_child (const Tree_pos& parent_index) const;
 bool is_last_child (const Tree_pos& self_index) const;
 bool is_first_child (const Tree_pos& self_index) const;
 Tree_pos get_sibling_next (const Tree_pos& sibling_id) const;
 Tree_pos get_sibling_prev (const Tree_pos& sibling_id) const;
 bool is_leaf (const Tree_pos& leaf_index) const;


/**
 * Update based API (Adds and Deletes from the tree)
 */
 // FREQUENT UPDATES
 Tree_pos append_sibling(const Tree_pos& sibling_id, const X& data);
 Tree_pos add_child(const Tree_pos& parent_index, const X& data);
 Tree_pos add_root(const X& data);

 void delete_leaf(const Tree_pos& leaf_index);
 void delete_subtree(const Tree_pos& subtree_root);

 // INFREQUENT UPDATES
 Tree_pos insert_next_sibling(const Tree_pos& sibling_id,
 const X& data);

Benchmarking Results

Preliminary benchmarks indicate that the HHDS tree outperforms the LHTree in both runtime efficiency (for certain cases, more on this in a later section) and memory consumption. The HHDS tree demonstrates enhanced performance across various tests, offering a more optimized solution for handling Abstract Syntax Tree (AST) operations.

I constructed identical trees using both the LHTree and HHDS tree structures and executed a series of queries on each. The benchmarks were performed using Google Benchmark to ensure accurate and consistent results. Below, I detail the specific tests conducted.

Benchmark Tests Overview

Deep Tree Test
This test simulates a line graph by repeatedly adding a child to the last node in the tree. It is designed to assess the tree’s performance when handling deep structures, where each node has a single child.
Wide Tree Test
In this scenario, a single root node is created, followed by the addition of numerous child nodes directly under the root. This test evaluates the tree’s efficiency in managing wide structures with many immediate children.
Chip-Typical Tree Test
This test models a tree commonly seen in hardware design. For each node, a random number of children (ranging from 1 to 7) are added, and the process is recursively applied to the leaf nodes up to a certain depth. This test measures the tree’s performance in realistic, varied conditions.
Chip-Typical (Long) Tree Test
Similar to the Chip-Typical Tree Test, but with a broader range of children per node (1 to 20). This test is particularly useful for examining performance when the tree is more complex and chunk splitting is more likely.

These tests provide a comprehensive analysis of the HHDS tree’s capabilities, highlighting its superiority over the LHTree for deeper trees.

Add/Append Benchmarks

Deep Tree Test

test_deep_tree_100_hhds indicates the time taken to run a benchmark on a deep tree of 100 nodes using the HHDS tree structure. This nomenclature is consistent across all tests.

Disabled compiler optimizations

------------------------------------------
Benchmark Time
------------------------------------------
test_deep_tree_10_hhds 11704 ns
test_deep_tree_10_lh 19541 ns
test_deep_tree_100_hhds 85317 ns
test_deep_tree_100_lh 163058 ns
test_deep_tree_1000_hhds 760260 ns
test_deep_tree_1000_lh 1442391 ns
test_deep_tree_10000_hhds 9889199 ns
test_deep_tree_10000_lh 16215232 ns
test_deep_tree_100000_hhds 84650074 ns
test_deep_tree_100000_lh 163255882 ns
test_deep_tree_1000000_hhds 877646208 ns
test_deep_tree_1000000_lh 1659725904 ns
test_deep_tree_10000000_hhds 9256118059 ns
test_deep_tree_10000000_lh 1.4431e+10 ns

Enabled compiler optimizations

------------------------------------------
Benchmark Time
------------------------------------------
test_deep_tree_10_hhds 1443 ns
test_deep_tree_10_lh 1462 ns
test_deep_tree_100_hhds 7398 ns
test_deep_tree_100_lh 17455 ns
test_deep_tree_1000_hhds 79544 ns
test_deep_tree_1000_lh 165656 ns
test_deep_tree_10000_hhds 1337406 ns
test_deep_tree_10000_lh 1494153 ns
test_deep_tree_100000_hhds 12288324 ns
test_deep_tree_100000_lh 14897463 ns
test_deep_tree_1000000_hhds 116810846 ns
test_deep_tree_1000000_lh 188815892 ns
test_deep_tree_10000000_hhds 2338596582 ns
test_deep_tree_10000000_lh 2238844395 ns

Here, the HHDS tree structure consistently outperforms the LHTree in the Deep Tree Test, showcasing its efficiency in handling deep tree structures.

Wide Tree Test

Disabled compiler optimizations

------------------------------------------
Benchmark Time
------------------------------------------
test_wide_tree_10_hhds 6581 ns
test_wide_tree_10_lh 6235 ns
test_wide_tree_100_hhds 34911 ns
test_wide_tree_100_lh 35734 ns
test_wide_tree_1000_hhds 323228 ns
test_wide_tree_1000_lh 312755 ns
test_wide_tree_10000_hhds 3547963 ns
test_wide_tree_10000_lh 2975894 ns
test_wide_tree_100000_hhds 33800125 ns
test_wide_tree_100000_lh 32538424 ns
test_wide_tree_1000000_hhds 332509041 ns
test_wide_tree_1000000_lh 336261868 ns
test_wide_tree_10000000_hhds 3527352810 ns
test_wide_tree_10000000_lh 8774024963 ns

Enabled compiler optimizations

------------------------------------------
Benchmark Time
------------------------------------------
test_wide_tree_10_hhds 837 ns
test_wide_tree_10_lh 512 ns
test_wide_tree_100_hhds 3394 ns
test_wide_tree_100_lh 2675 ns
test_wide_tree_1000_hhds 26019 ns
test_wide_tree_1000_lh 20141 ns
test_wide_tree_10000_hhds 319068 ns
test_wide_tree_10000_lh 245964 ns
test_wide_tree_100000_hhds 3369183 ns
test_wide_tree_100000_lh 2910862 ns
test_wide_tree_1000000_hhds 39243340 ns
test_wide_tree_1000000_lh 26777306 ns
test_wide_tree_10000000_hhds 454508781 ns
test_wide_tree_10000000_lh 331688046 ns

Here without compiler optimizations, the HHDS tree structure typically outperforms the LHTree in the Wide Tree Test for large tree sizes. For smaller tree sizes, the LHTree showed a slightly better performance. However, using compiler optimizations, the LHTree starts to perform better than HHDS.

The reason for the HHDS tree’s superior performance can be attributed to the chunk size being large, which allows for better cache utilization and reduced memory overhead. However, the LH Tree has been put through more tuning and has been in use for a longer time, which could explain its better performance with compiler optimizations. In the future, the HHDS tree could be optimized further to match or exceed the LH Tree’s performance.

Chip Typical Tree Test

Disabled compiler optimizations

--------------------------------------------------------
Benchmark Time
--------------------------------------------------------
test_chip_typical_tree_1_hhds 7109 ns
test_chip_typical_tree_1_lh 6803 ns
test_chip_typical_tree_2_hhds 22728 ns
test_chip_typical_tree_2_lh 22064 ns
test_chip_typical_tree_3_hhds 75398 ns
test_chip_typical_tree_3_lh 70910 ns
test_chip_typical_tree_4_hhds 270062 ns
test_chip_typical_tree_4_lh 254423 ns
test_chip_typical_tree_5_hhds 1110254 ns
test_chip_typical_tree_5_lh 1074439 ns
test_chip_typical_tree_6_hhds 5024264 ns
test_chip_typical_tree_6_lh 3900709 ns
test_chip_typical_tree_7_hhds/iterations:5 13290739 ns
test_chip_typical_tree_7_lh/iterations:5 22145462 ns
test_chip_typical_tree_8_hhds/iterations:5 83438683 ns
test_chip_typical_tree_8_lh/iterations:5 105475664 ns

Enabled compiler optimizations

--------------------------------------------------------
Benchmark Time
--------------------------------------------------------
test_chip_typical_tree_1_hhds 938 ns
test_chip_typical_tree_1_lh 387 ns
test_chip_typical_tree_2_hhds 1877 ns
test_chip_typical_tree_2_lh 1351 ns
test_chip_typical_tree_3_hhds 7095 ns
test_chip_typical_tree_3_lh 5052 ns
test_chip_typical_tree_4_hhds 35019 ns
test_chip_typical_tree_4_lh 21569 ns
test_chip_typical_tree_5_hhds 130915 ns
test_chip_typical_tree_5_lh 78010 ns
test_chip_typical_tree_6_hhds 522385 ns
test_chip_typical_tree_6_lh 278223 ns
test_chip_typical_tree_7_hhds/iterations:5 4015636 ns
test_chip_typical_tree_7_lh/iterations:5 1648426 ns
test_chip_typical_tree_8_hhds/iterations:5 9873724 ns
test_chip_typical_tree_8_lh/iterations:5 4607773 ns

For the Chip Typical test, the HHDS tree’s performance is better for larger tree sizes, while the LHTree performs better for smaller tree sizes. However, with compiler optimizations, the LH Tree performs better than the HHDS tree.

Chip Typical (long) Tree test

Disabled compiler optimizations

-------------------------------------------------------------
Benchmark Time
-------------------------------------------------------------
test_chip_typical_long_tree_1_hhds 8875 ns
test_chip_typical_long_tree_1_lh 8479 ns
test_chip_typical_long_tree_2_hhds 62490 ns
test_chip_typical_long_tree_2_lh 64620 ns
test_chip_typical_long_tree_3_hhds 625064 ns
test_chip_typical_long_tree_3_lh 654787 ns
test_chip_typical_long_tree_4_hhds 6128047 ns
test_chip_typical_long_tree_4_lh 6528778 ns
test_chip_typical_long_tree_5_hhds 71345448 ns
test_chip_typical_long_tree_5_lh 77170587 ns
test_chip_typical_long_tree_6_hhds/iterations:5 656595039 ns
test_chip_typical_long_tree_6_lh/iterations:5 860193491 ns

Enabled compiler optimizations

-------------------------------------------------------------
Benchmark Time
-------------------------------------------------------------
test_chip_typical_long_tree_1_hhds 1139 ns
test_chip_typical_long_tree_1_lh 692 ns
test_chip_typical_long_tree_2_hhds 8666 ns
test_chip_typical_long_tree_2_lh 5238 ns
test_chip_typical_long_tree_3_hhds 90856 ns
test_chip_typical_long_tree_3_lh 48758 ns
test_chip_typical_long_tree_4_hhds 1034346 ns
test_chip_typical_long_tree_4_lh 472964 ns
test_chip_typical_long_tree_5_hhds 13040238 ns
test_chip_typical_long_tree_5_lh 5025192 ns
test_chip_typical_long_tree_6_hhds/iterations:3 131143411 ns
test_chip_typical_long_tree_6_lh/iterations:3 68739573 ns

Similar to the previous case, the HHDS tree performs better in debug mode (without compiler optimizations). However, the LH Tree performs better with compiler optimizations.

We see that the HHDS tree has shown overall better performance without compiler optimizations, however, with compiler optimizations, the LH Tree has shown better performance. HHDS Tree has shown better performance regardless, for the Deep Tree test. This indicates an inherent trade-off between the choice of both trees. To further investigate this behaviour I conducted some profiling, which is in a later section.

Iterators Benchmarks

Deep Tree test

Disabled compiler optimizations

--------------------------------------------------------
Benchmark Time
-------------------------------------------------------
test_deep_tree_10_hhds 884 ns
test_deep_tree_10_lh 1356 ns
test_deep_tree_100_hhds 7987 ns
test_deep_tree_100_lh 11191 ns
test_deep_tree_1000_hhds 86991 ns
test_deep_tree_1000_lh 105809 ns
test_deep_tree_10000_hhds 894127 ns
test_deep_tree_10000_lh 1076983 ns
test_deep_tree_100000_hhds 7927102 ns
test_deep_tree_100000_lh 11177187 ns
test_deep_tree_1000000_hhds/iterations:4 80470145 ns
test_deep_tree_1000000_lh/iterations:4 145763040 ns
test_deep_tree_10000000_hhds/iterations:3 1055529435 ns
test_deep_tree_10000000_lh/iterations:3 995416880 ns

Enabled compiler optimizations

------------------------------------------
Benchmark Time
------------------------------------------
test_deep_tree_10_hhds 202 ns
test_deep_tree_10_lh 93.1 ns
test_deep_tree_100_hhds 1595 ns
test_deep_tree_100_lh 1039 ns
test_deep_tree_1000_hhds 15663 ns
test_deep_tree_1000_lh 11000 ns
test_deep_tree_10000_hhds 164778 ns
test_deep_tree_10000_lh 107293 ns
test_deep_tree_100000_hhds 1615928 ns
test_deep_tree_100000_lh 1260507 ns
test_deep_tree_1000000_hhds 19582402 ns
test_deep_tree_1000000_lh 15954697 ns
test_deep_tree_10000000_hhds 214887559 ns
test_deep_tree_10000000_lh 179118729 ns

Wide Tree test

Disabled compiler optimizations

-------------------------------------------------------
Benchmark Time
-------------------------------------------------------
test_wide_tree_10_hhds 7171 ns
test_wide_tree_10_lh 7098 ns
test_wide_tree_100_hhds 6204 ns
test_wide_tree_100_lh 10372 ns
test_wide_tree_1000_hhds 62762 ns
test_wide_tree_1000_lh 106132 ns
test_wide_tree_10000_hhds 622999 ns
test_wide_tree_10000_lh 1124283 ns
test_wide_tree_100000_hhds 6118490 ns
test_wide_tree_100000_lh 9550170 ns
test_wide_tree_1000000_hhds/iterations:10 59438777 ns
test_wide_tree_1000000_lh/iterations:10 97842431 ns
test_wide_tree_10000000_hhds/iterations:7 778347697 ns
test_wide_tree_10000000_lh/iterations:7 1163215808 ns

Enabled compiler optimizations

------------------------------------------
Benchmark Time
------------------------------------------
test_wide_tree_10_hhds 2103 ns
test_wide_tree_10_lh 1284 ns
test_wide_tree_100_hhds 1563 ns
test_wide_tree_100_lh 632 ns
test_wide_tree_1000_hhds 15627 ns
test_wide_tree_1000_lh 6410 ns
test_wide_tree_10000_hhds 149588 ns
test_wide_tree_10000_lh 56030 ns
test_wide_tree_100000_hhds 1511278 ns
test_wide_tree_100000_lh 563926 ns
test_wide_tree_1000000_hhds 17056051 ns
test_wide_tree_1000000_lh 7754815 ns
test_wide_tree_10000000_hhds 143994848 ns
test_wide_tree_10000000_lh 55040231 ns

Chip typical test

Disabled compiler optimizations

--------------------------------------------------------
Benchmark Time
--------------------------------------------------------
test_chip_typical_tree_1_hhds 344 ns
test_chip_typical_tree_1_lh 892 ns
test_chip_typical_tree_2_hhds 2192 ns
test_chip_typical_tree_2_lh 1691 ns
test_chip_typical_tree_3_hhds 13628 ns
test_chip_typical_tree_3_lh 14235 ns
test_chip_typical_tree_4_hhds 34049 ns
test_chip_typical_tree_4_lh 84096 ns
test_chip_typical_tree_5_hhds 206482 ns
test_chip_typical_tree_5_lh 203680 ns
test_chip_typical_tree_6_hhds 848996 ns
test_chip_typical_tree_6_lh 708212 ns
test_chip_typical_tree_7_hhds/iterations:5 3645372 ns
test_chip_typical_tree_7_lh/iterations:5 6657982 ns
test_chip_typical_tree_8_hhds/iterations:5 7375050 ns
test_chip_typical_tree_8_lh/iterations:5 4577351 ns

Enabled compiler optimizations

-------------------------------------------
Benchmark Time
-------------------------------------------
test_chip_typical_tree_1_hhds 93.1 ns
test_chip_typical_tree_1_lh 50.1 ns
test_chip_typical_tree_2_hhds 149 ns
test_chip_typical_tree_2_lh 212 ns
test_chip_typical_tree_3_hhds 1166 ns
test_chip_typical_tree_3_lh 554 ns
test_chip_typical_tree_4_hhds 7385 ns
test_chip_typical_tree_4_lh 3138 ns
test_chip_typical_tree_5_hhds 54477 ns
test_chip_typical_tree_5_lh 10643 ns
test_chip_typical_tree_6_hhds 215050 ns
test_chip_typical_tree_6_lh 53043 ns
test_chip_typical_tree_7_hhds 492555 ns
test_chip_typical_tree_7_lh 577120 ns
test_chip_typical_tree_8_hhds 2630675 ns
test_chip_typical_tree_8_lh 1278702 ns

Chip typical (long) test

Disabled compiler optimizations

------------------------------------------------
Benchmark Time
------------------------------------------------
test_chip_typical_long_tree_1_hhds 911 ns
test_chip_typical_long_tree_1_lh 1435 ns
test_chip_typical_long_tree_2_hhds 8161 ns
test_chip_typical_long_tree_2_lh 8619 ns
test_chip_typical_long_tree_3_hhds 76618 ns
test_chip_typical_long_tree_3_lh 132467 ns
test_chip_typical_long_tree_4_hhds 1644808 ns
test_chip_typical_long_tree_4_lh 1962406 ns
test_chip_typical_long_tree_5_hhds 7199648 ns
test_chip_typical_long_tree_5_lh 9195894 ns
test_chip_typical_long_tree_6_hhds 169002499 ns
test_chip_typical_long_tree_6_lh 207296570 ns

Enabled compiler optimizations

------------------------------------------------
Benchmark Time
------------------------------------------------
test_chip_typical_long_tree_1_hhds 223 ns
test_chip_typical_long_tree_1_lh 101 ns
test_chip_typical_long_tree_2_hhds 2270 ns
test_chip_typical_long_tree_2_lh 719 ns
test_chip_typical_long_tree_3_hhds 38291 ns
test_chip_typical_long_tree_3_lh 12547 ns
test_chip_typical_long_tree_4_hhds 294222 ns
test_chip_typical_long_tree_4_lh 187010 ns
test_chip_typical_long_tree_5_hhds 4721230 ns
test_chip_typical_long_tree_5_lh 835256 ns
test_chip_typical_long_tree_6_hhds 30302468 ns
test_chip_typical_long_tree_6_lh 10057136 ns

Overall, both add/append and iterators related benchmarks show an improvement in performance. Without compiler optimizations, HHDS tree performs better than the LH Tree. With compiler optimizations, there are similar differences in the traversal benchmarks. We will now look at some profiling that was done to identify the bottlenecks in the HHDS tree.

Exceptions, and a reminder of why they are slow.

When looking at the performance difference between the HHDS tree and LH tree (after enabling compiler optimizations), I was shocked to see that the HHDS tree was performing worse than the LH tree by multiple orders of magnitude upon using exceptions. This was a surprise to me, as I had not expected exceptions to have such a large impact on performance.

The reason this happens is because exceptions are slow. When an exception is thrown, the stack is unwound, and the program has to jump to the catch block. This is a slow process, and should be avoided in performance-critical code. Moreover, the compiler cannot optimize code with exceptions as well as it can without them. This is why the HHDS tree performs so much worse than the LH tree when exceptions are enabled. But the HHDS tree still wasn’t performing as well as it should have been.

Profiling

I used callgrind to profile the HHDS tree and identify potential bottlenecks. The profiling results provided valuable insights into the tree’s performance and areas for optimization. I generated a call graph using KCachegrind and analyzed the function calls to determine the most time-consuming operations.

The call graph clearly shows that the bottleneck is the _create_space call that is tasked with creating space for a new node. This function is called when a new node is added to the tree, and its performance directly impacts the tree’s efficiency.

inline Tree_pos _create_space(const X& data) {
 // Make space for CHUNK_SIZE number of entries at the end
 data_stack.emplace_back(data);
 for (int i = 0; i < CHUNK_MASK; i++) {
 data_stack.emplace_back();
 }

 // Add the single pointer node for all CHUNK_SIZE entries
 pointers_stack.emplace_back();

 return pointers_stack.size() - 1;
}

However, the _create_space function is relatively simple and should not be causing such a significant performance hit. This indicates that the issue may lie in the memory allocation process or the data structure itself. One possible way of dealing with this would be to increase chunk sizes, or enable dynamic chunk sizing, which would allow for more efficient memory allocation.

Another possible bottleneck, seems to be any amount of computation that will be done to find the next vacant space in the chunk (like in get_last_child()). This is because the chunk is a fixed size, and if the chunk is full, the program will have to search for the next chunk that has space. This is a linear operation, and can be slow for wide trees. To fix this, I tried to add extra bookkeeping in the Tree_pointers node structure:

class __attribute__((packed, aligned(64))) Tree_pointers {
private:
 // We only store the exact ID of parent
 Tree_pos parent : CHUNK_BITS + CHUNK_SHIFT;
 Tree_pos next_sibling : CHUNK_BITS;
 Tree_pos prev_sibling : CHUNK_BITS;

 // Long child pointers
 Tree_pos first_child_l : CHUNK_BITS;
 Tree_pos last_child_l : CHUNK_BITS;

 // Storing the last occupied index in the short delta
 // This is to avoid iterating over all short deltas
 // to find the last occupied index
 unsigned short last_occupied : CHUNK_SHIFT;

 // Short (delta) child pointers
 Short_delta first_child_s_0 : SHORT_DELTA;
 Short_delta first_child_s_1 : SHORT_DELTA;
 ...

However, the improvement in performance was marginal after making this change. This indicates that the issue may be more complex and require further investigation. This tree has also been added to the repository, in case a future contributor might be able to make use of it.

There are other possible bottlenecks that might be coming from storing separate short deltas instead of reducing the size of the delta and packing it into a single large integer type. I will be implementing this idea in the future.

Code contributions

All of my Pull requests and code changes here made on the HHDS repository. Each contribution has undergone thorough review and been successfully merged into the main repository:

Additionally, we are planning to integrate these changes into the LiveHD repository in the near future.

Conclusion and Future Work

Working on this project has been a valuable learning experience, particularly in applying core C++ features. I discovered that simple, fundamentally sound optimizations often outperform more complex ones. The greatest challenge for me was to steer through the changes in our original Plan of Action, however, due to the support and guidance from my mentors I was able to make it.

There are still areas where the HHDS tree can be improved to make it more robust. One area of future exploration is dynamic chunk sizing:

Dynamic Chunk Sizing: Instead of using fixed 8-sized chunks as we did, we could implement multiple chunk sizes. This would allow users to “hint” the HHDS tree to use specific chunk types, potentially reducing memory consumption further.

Overall, the HHDS tree has shown promise in handling deep tree structures efficiently. With further optimization and enhancements, it can become a powerful tool for handling complex tree operations.

Acknowledgements

I would like to thank my mentors, Jose Renau and Sakshi Garg for their guidance and support throughout the project. It would not have been possible without their help. Their insights and mentorship have significantly contributed to my learning and the success of this work.

Final Blogpost: HDEval's LLM Benchmarking for HDL Design

Wed, 21 Aug 2024 00:00:00 +0000

Introduction

Hello everyone! I’m Ashwin Bardhwaj, an undergraduate student studying at UC Berkeley. As part of Micro Architecture Santa Cruz (MASC) my proposal under the mentorship of Jose Renau and Sakshi Garg looks to create a suite of benchmark programs for HDEval.

The goal of this project is to create large-scale Verilog programs in order to benchmark that capability of LLMs to develop HDL code. Throughout this project, I have created 3 of the large Verilog testbenches called 3-Stage-RISC_V processor, Gameboy Emulator, and Sorts. The benchmark programs will lose their effectriveness if LLMs such as ChatGPT scrape over Github reposotires and learn from them. As a result, the code itself cannot be made public due to LLM scraping over repositories, this file will cover the test report for all 3 of these projects.

3 Stage RISC V Processor

This is a pipelined RISC processor developed to to handle RV32I instructions. A 3-Stage processsor will typically contain a Fetch, Decode, and Execute cycle. As a result, every instruction will take exactly 3 clock cycles. For this processor, instructions can be formatted into R, I (Load), S (Store), B (Cond), and J (Jump and Link) type instructions. Once a 32 bit instruction is fetched at the location in memory specifed by the pc (Program Counter) register, it is sent to be decoded by the “decode unit”. Through decoding an instruction, we can determine the exact operation code, register location of the 2 operands (rs1 and rs2), and the destination register (rd) at which to write the calculated result. After decoding, an activation flag is sent to the excetution cycle to then take and access the register file at address rs1 and rs2 in order to get the correct operand data. The data and operation is then sent to the ALU to compute the result based on the opcode. The result is then written back into the register file at the rd address and the program counter is incremented and the next instruction is fetched.

The prompts for each module in this processor have been generated and tested against a GPT 3 turbo and GPT 4o models as an example. In the RISC V tab in my test report, I have provided the exact prompts and results after running on MASC’s HDLAgent tool which can access the APIs of many LLMs.

Gameboy Emulator

The Gameboy Emulator is a Verilog implementation of the classic GameBoy console that was widely popular in the 1990s. The main aspects of the GameBoy that were focused on in this project were the Z-80 like CPU, memory objects like RAM, VRAM, and ROM, the PPU (Picture Processing Unit), and other peripherals. The instructions are given to the CISC (variable-length instructions) CPU where they are decoded and executed based on the details and expectations of that specific instruction. In some cases, timing becomes a concern and there is significant effort made to ensure that instructions can be parsed and run predictably and effictively. Instructions from the ROM may take between 1 to 4 clock cycles to run depending on the requirements. For example, the instruction “LD B, HL” , loads the data found at the 16 bit address given by registers H and L into register B is a 2 cycle instruction. The first cycle decodes the HL address and fetches the data at the accurate location, while the second cycle takes the new input data at writes it into register B. This requires accurate timing control between different asects of the GameBoy.

The Picture Processing Unit is also an integral feature of the gameboy. Three frames called Background, Window, and Sprite are combined into the classic Gameboy screens we know today. White the Background and Window data are consistently called from the VRAM after certain clock cycle times, the Sprite and sprtite attributes are accessed using DMA (Direct Memory Access) from OAM (Object Attribute Memory). This reduces the CPU load and improves the speed of sprite data.

Deliverables

HDEval Test Report: The HDEval Test Report contains the module prompts for each testbench, the results after testing on GPT 3 turbo and 4o, and test cases to ensure code correctness and reliability.
HDEval Repo: HDEval contains the encrypted version of the yaml files that encapsulate the code, prompts, and additional data.

Next Steps

Given these benchmarks, it is important to track the abilities of these LLMs to generate HDL code. Therefore, including GPT 3-turbo and 4o. I would like these benchmarks to be applied to more models so that we can track their growth and keep informed on their effectiveness in HDL and hardware.

Previous Blogs

Please feel free to check out my previous blogs!

Thank you for reading!

Midterm Report : Halfway through medicinal data visulaization using PolyPhy/Polyglot

Mon, 12 Aug 2024 00:00:00 +0000

Introduction

Hello! My name is Ayush Sharma, a machine learning engineer and researcher based out of Chandigarh, a beautiful city in Northern India known for its modern architecture and green spaces. For the last month and a half I have been working closely with my mentors Oskar Elek and Kiran Deol on the project titled Unveiling Medicine Patterns: 3D Clustering with Polyphy/Polyglotas part of GSoC 2024.

Progress and Challenges

The project focuses on developing effective clustering algorithms to visualize medicine data in three dimensions using PolyPhy and Polyglot. My journey began with data preprocessing and cleaning, where unnecessary data points were removed, and missing values were addressed.

One of the primary techniques we’ve employed is UMAP (Uniform Manifold Approximation and Projection). UMAP’s ability to preserve the global structure of the data while providing meaningful clusters proved advantageous. Initial experiments with UMAP on datasets of various sizes (ranging from 1,500 to 15,000 medicines) provided valuable insights into the clustering patterns. By iteratively halving the dimensions and refining the parameters, we achieved more accurate clustering results.

To complement UMAP, we explored t-SNE (t-distributed Stochastic Neighbor Embedding). t-SNE’s focus on local relationships helped in understanding finer details within the clusters. By adjusting t-SNE parameters and conducting perturbations, we could better comprehend the data’s behavior. Combining UMAP with t-SNE in a loop, halving dimensions iteratively, showed promise, allowing us to leverage the strengths of both techniques to enhance clustering accuracy.

We also experimented with pre-trained models like BERT and Glove to create embeddings for the medicines. BERT’s splitting of salts into subparts and Glove’s limitations in recognizing specific salts led us to inaccurate clustering and we’ve been working on improving it for the time being.

Next Steps

Moving forward, I will focus on refining our clustering and embedding techniques to enhance overall accuracy. This involves integrating Jaccard distance alongside other distance measures to improve similarity assessments between medicines and clusters. Additionally, I’ll continue experimenting with advanced models like gpt,CLIP, gemini etc., for better embeddings while addressing the limitations of BERT and Glove by leveraging custom embeddings created with transformers and one-hot encoding. Optimization of UMAP and t-SNE algorithms will also be crucial, ensuring their effectiveness in clustering and visualization. These steps aim to overcome current challenges and further advance the project’s goals.

Midway Through GSoC

Wed, 31 Jul 2024 00:00:00 +0000

Hello everyone! I’m Joel Tony, and I’m excited to share my progress update on the Drishti project as part of my Google Summer of Code (GSoC) experience. Over the past few weeks, I’ve been diving deep into the world of I/O visualization for scientific applications, and I’m thrilled to tell you about the strides we’ve made.

What is Drishti?

For those unfamiliar with Drishti, it’s an application used to visualize I/O traces of scientific applications. When running complex scientific applications, understanding their I/O behavior can be challenging. Drishti steps in to parse logs from various sources, with a primary focus on those collected using Darshan, a lightweight I/O characterization tool for HPC applications. Drishti provides human-interpretable insights on how to improve I/O performance based on these logs. While Drishti supports multiple log sources, our current work emphasizes Darshan logs due to their comprehensive I/O information. Additionally, Drishti offers visually appealing and easy-to-understand graphs to help users better grasp their application’s I/O patterns, making it easier to identify bottlenecks and optimize performance.

Progress and Challenges

Export Directory Feature

One of the first features I implemented was the export directory functionality. In earlier versions of Drishti, users couldn’t select where they wanted their output files to be saved. This became problematic when working with read-only log locations. I familiarized myself with the codebase, created a pull request, and successfully added this feature, allowing users to choose their preferred output location.

CI Improvements and Cross-Project Dependencies

While working on Drishti, I discovered the tight coupling between various tools in the HPC I/O organization, such as Drishti and DXT Explorer. This highlighted the need for improved Continuous Integration (CI) practices. We currently run about eight GitHub Actions for each pull request, but they don’t adequately test the interactions between different branches of these interconnected tools. This is an area we’ve identified for future improvement to ensure smoother integration and fewer conflicts between projects.

Refactoring for Multi-File Support

The bulk of my time was spent refactoring Drishti to extend its framework from parsing single Darshan files to handling multiple files. This task was more complex than it initially appeared, as Drishti’s insights are based on the contents of each Darshan file. When dealing with multiple files, we needed to find a way to aggregate the data meaningfully without sacrificing on performance.

The original codebase had a single, thousand-line function for parsing Darshan files. To improve this, I implemented a data class structure in Python. This refactoring allows for:

Better separation of computation and condition checking
Easier parallelization of processing multiple traces
Finer-grained profiling of performance bottlenecks
More flexibility in data manipulation and memory management

Learnings and Skills Gained

Through this process, I’ve gained valuable insights into:

Refactoring large codebases
Understanding and improving cross-project dependencies
Implementing data classes in Python for better code organization
Balancing performance with code readability and maintainability

Next Steps

As I move forward with the project, my focus will be on:

Adding unit tests for individual methods to ensure functionality
Exploring alternative data frame implementations like Polars for better performance
Developing aggregation methods for different types of data across multiple Darshan files
Optimizing memory usage and computational efficiency for large datasets

Conclusion

Working on Drishti has been an incredible learning experience. I’ve had the opportunity to tackle real-world challenges in scientific computing and I/O visualization. As we progress, I’m excited about the potential impact of these improvements on the scientific community’s ability to optimize their applications’ I/O performance.

I’m grateful for this opportunity and looking forward to the challenges and discoveries that lie ahead in the second half of my GSoC journey. Stay tuned for more updates as we continue to enhance Drishti!

If you have any questions or would like to learn more about the project, feel free to reach out to me. Let’s keep pushing the boundaries of scientific computing together!

Streaming into the Future: Adding Real-Time Processing to FasTensor

Tue, 30 Jul 2024 00:00:00 +0000

Hey there, HPC enthusiasts and fellow coders! I’m excited to share my progress on this summer’s Google Summer of Code project under UC OSPO’s FasTensor. Here’s a glimpse into how we’re pushing the boundaries of real-time data processing.

The Big Picture: FasTensor and HPC Challenges

First, a quick refresher: FasTensor is our go-to tool for handling dense arrays in scientific computing. It tackles three major HPC challenges:

Optimizing computations
Distributing data efficiently
Balancing workloads across computing cores

FasTensor excels at these tasks, especially when dealing with data that has structural locality - a common feature in scientific computing. Here, the Stencil computations come in handy, capturing data locality for operations like solving partial differential equations in physical simulations.

The Mission: Bringing FasTensor into Real-Time

While FasTensor is great at processing existing data, the next frontier is handling live data streams from scientific instruments and sensors. That’s where my GSoC project comes in: adding stream processing capabilities to FasTensor.

Progress Highlights:

Building a Stream Simulator

We’ve created FTstream, a nifty tool that simulates data streams. It can generate streams of various sizes and intervals, pushing the limits of what your disk can handle. We’re talking speeds up to 2.5 GiB/s on a non-parallel NVMe! This tool is crucial because many scientific instruments, from particle accelerators to radio telescopes, generate massive amounts of data at incredible speeds and we need to able to simulate that. For context, that’s faster than a 10MP RGB camera shooting at 35 frames per second that generates data at ~1 GiB/s.

Optimizing I/O Strategies

We’ve been experimenting with various I/O approaches to optimize high-speed data stream handling.

Exploring Streaming Semantics

We’re investigating various ways to express and execute stream transformations, to ensure that FasTensor can handle a wide range of streaming computations.

Developing I/O Drivers

We’ve developed two new I/O drivers based on LinuxAIO and MPI IO to ingest incoming data smoothly and maintain stream consistency.

What’s Next?

Putting It All Together

We’re in the final stretch of integrating all these components into a seamless stream processing system.

Rigorous Testing

We’ll push our stream processing to its limits, simulating diverse data flows to ensure rock-solid performance in any scientific setting.

HPC Environment Validation

The ultimate test will be running our new streaming capabilities in real HPC environments, checking how they perform with different I/O setups and computing paradigms.

Wrapping Up

This summer has been a whirlwind of coding, testing, and learning. We’re making significant strides in bringing real-time processing capabilities to FasTensor, which could open up exciting new possibilities in scientific computing and data analysis. Stay tuned for more updates as we finalize this feature. If you’re interested in the nitty-gritty technical details or want to check out the code, feel free to reach out or check our project repository. Happy coding, and may your computations be ever faster!

Enhancing h5bench with HDF5 Compression Capability

Sat, 27 Jul 2024 00:00:00 +0000

Introduction

As part of the h5bench project my Enhencing h5bench with HDF5 Compression Capability under the mentorship of Dr. Jean Luca Bez and Dr. Suren Byna aims to allow users of h5bench to incoporate compression features in their simulations by creating custom benchmarks with common scientific lossless & lossy compression algorithms such as SZ, SZ3, ZFP, and GZIP.

The problem I am trying to solve is to implement multiple data compression algorithms in h5bench core access patterns through HDF5 filters. This capability should grant users the flexibility to configure the parameters and methods of compression applied to their datasets according to their specific needs and preferences. My solution primarily involves using a user-defined HDF5 filter mechanism to implement lossless and lossy compression algorithms, such as ZFP, SZ, and cuSZ. Throughout the process, I will deliver one C source code implementing compression configuration settings, one C source code implementing lossless and lossy algorithms, a set of performance reports before and after data compression in CSV and standard output files, and a technical documentation on h5bench user manual website.

Midterm Blog

This summer, after completing my junior year, I was honored to have the opportunity working with Dr. Jean Luca Bez and Dr. Suren Byna on the h5bench, an open-source benchmarking project designed to simulate runnning sync/async HDF5 I/O on HPC machines. This post will cover mostly what I have learned, produced, planned, and thoughts over the first six weeks.

First of all, let’s define some of the terms here. HDF5 stands for Hierarchical Data Format 5. Unlike other data storage formats (JSON, CSV, XML…), HDF5 is not only a container that manages data similar to a file system, but also a powerful library that gives you the ability to perform I/O (Inputs/Outputs) operations between memory and file. One of the reasons this tool is commonly used by HPC applications is that it also supports MPI I/O, which is a protocol for parallel computing (you can think of it as the parallel version of POSIX). With exabytes of data and high frequencies of usage for analysis in scientific studies, HDF5 is perfect for the job. Essentially, h5bench is a software that tests the hardware’s performance through HDF5 (it also provides other benchmark kernels such as AMReX, E3SM-IO, MACSio, and openPMD-api, but my job focuses on using vanilla HDF5 I/O).

So, what I have done so far? Frist, my job is to allow users to tune input parameters regarding data compression, and make sure h5bench prints accurate benchmark results with the intended compression algorithm applied to their datasets. h5bench’s frondend is written in Python, which takes an input of a JSON file from user and parses it into a CFG configuration file that can be read by the backend later, which is written in C. I created a new enum struct and made user able to specify one from a range of compression algorithms (SZ3, ZFP, LZ4, GZIP, and other pre-defined algorithms). I also made it possible to apply these algorithms to the datasets, so the .h5 (an HDF5 file) would contain chunks of compressed data after multiple H5Dwrite calls.

Next, the challenges and gains. Throughout the first six weeks, 30% of the time was spent on understanding the newest version of h5bench and HDF5 by reading through C source codes and documentations, and asking many dumb questions to my mentors (thanks to their patience and great answers :D). Writing code is fairly easy after I really understood what the program is doing. By that I mean you have to understand every line in almost all functions and how each and every variables change. 40% of the time was used on debugging and testing the compression algorithm, mainly SZ3. To make code behaves correctly is another level of difficulty. Most of the issues resulted from failing to configure the application and dependent libraries correctly. Without necessary macros enabled during the build process, features like compression filter plugin will not run. As I was also new to CMake and HPC environment, I learned that new envrionment variables will be reset for every new session, even if you requested a compute node resource. Besides getting used to the standard build sequence: “cmake ..”, “make”, “make install”, I also learned to use “ccmake ..” to examine the flags of the compiled program. The rest of time I learned more about parallel computing, HDF5, compression algorithms, by reading some papers and documentations. A lot of notes were taken (I must say a good note taking system is the game changer). Last but not the least, I also spent times synchronizing online and offline with my mentors to discuess problems. Without their help, I can never make this far.

My next phase will tackle these problems, here I am just offering a list:

Test applying filter with other compression algorithms, and with different dimension layout of the dataset
Add decompression capability
Allow users to tune the auxiliary parameters for controlling the behavior of a certain compression filter H5Pset_filter(COMPRESS_INFO.dcpl_id, H5Z_FILTER_SZ3, H5Z_FLAG_MANDATORY, 0, NULL); cd_nelmts cd_values[]
Print additional benchmark results to indicate what and how the compression filter is applied, and the compression ratio

Data Engineering and Automated Evaluation for OpenROAD's Chat Assistant: Midterm Update

Sun, 21 Jul 2024 00:00:00 +0000

Hello everyone! We’ve reached the halfway point of our Google Summer of Code 2024 journey, and it’s time for an update on our project to build a conversational chat assistant for OpenROAD. Under the guidance of our mentors, Indira Iyer and Jack Luar, we’re making significant strides in enhancing OpenROAD’s user support capabilities.

Project Focus

My project focuses on two crucial aspects of our chat assistant:

Data Engineering: Ensuring our assistant has access to comprehensive and relevant information.
Evaluation: Developing robust methods to assess and improve the assistant’s performance.

The ultimate goal is to create a more responsive and accurate chat assistant capable of aiding users with troubleshooting, installation, and general queries about OpenROAD. I’m working in tandem with Palaniappan R, who is developing the RAG architecture for our assistant.

Progress

Since our initial deployment, I’ve been concentrating on implementing automated evaluation systems for our RAG architecture. We’ve developed two primary evaluation methods:

Basic Abbreviation Evaluation

This method assesses the model’s ability to accurately identify and explain common abbreviations used within the OpenROAD community. It ensures that our assistant can effectively communicate using domain-specific terminology.

LLM Judge-Based Evaluation

For this more comprehensive evaluation, we:

Prepared a dataset of question-answer pairs relevant to OpenROAD.
Queried our model with these questions to generate answers.
Employed LLMs (including GPT-4o and Gemini 1.5 Flash) to act as judges.
Evaluated our model’s responses against ground truth answers.

Here’s a glimpse of our early benchmark results:

Exploratory Data Analysis (EDA) on GitHub OpenROAD issues

To gather more data, I performed Exploratory Data Analysis (EDA) on GitHub OpenROAD issues using GitHub’s GraphQL API. This allowed us to:

Filter data based on parameters such as:
- Minimum number of comments
- Date range
- Mentioned PRs
- Open or closed status
Structure the data, focusing on issues tagged with Build, Query, Installation, and Runtime.
Process the data into JSONL format with key fields including:
- url: URL of the GitHub issue
- id: Unique issue number
- title: Issue title
- author: Username of the issue creator
- description: Initial issue description
- content: Array of messages related to the issue
- category: General category of the issue
- subcategory: More specific category of the issue
- tool: Relevant tools or components
- date: Issue creation timestamp

After curating this dataset, I was able to run an Analysis on OpenROAD Github Issues, identifying multiple categories of issues in the form of a pie chart.

Looking Ahead

As we move into the second half of the GSOC period, our plans include:

Incorporating GitHub Discussions data into our knowledge base.
Utilizing this expanded dataset to enhance our RAG architecture.
Continually refining and improving our model’s performance based on evaluation results.

We’re excited about the progress we’ve made and look forward to delivering an even more capable and helpful chat assistant for the OpenROAD community. Stay tuned for more updates as we continue this exciting journey!

Hardware Hierarchical Dynamical Systems

Sat, 20 Jul 2024 00:00:00 +0000

Hi everyone! I am Ujjwal Shekhar, a Computer Engineering student at the International Institute of Information Technology - Hyderabad. I am excited to share my current progress on the project titled “Hardware Hierarchical Dynamical Systems” as part of the Open Source Research Experience (OSRE) program and Google Summer of Code. I am working with my mentors, Jose Renau and Sakshi Garg, on this project.

Project Overview

With hardware compilers, it is not uncommon for the size of code that the hardware compilers need to handle to go into millions. We aim to improve the efficiency of the tree data structure to be used for representing the Abstract Syntax Tree (AST) of the input program. The tree data structure is optimized for typical AST traversal and queries. Some queries that are made to this tree are much more frequent than others.

Thus, the goal of this project is to be able to optimize the tree for frequent queries while still providing support for other infrequent queries. We use Google Bench to benchmark the tree for scalability and performance and expect it to outperform the current version of the tree. Finally, the new version of the tree will be integrated into the LiveHD core repository.

Progress and Challenges

Over the past month and a half, I have successfully finished working on the add/append methods of the tree. Moreover, I have finished writing the iterators on the tree too. There are preliminary tests already in place and the HHDS repository now has a working Bazel build system.

As shown in the figure, we can see that the tree went from storing pointers to everything that it could to only storing pointers to the nodes that are absolutely necessary. Moreover, by not maintaining multiple levels in the tree, we have been able to reduce the memory footprint of the tree. This is a significant improvement from the LHtree that was being used earlier.

Furthermore, we have also been able to improve the cache friendliness of each node of the tree. By realizing that most of the time, new children are added soon after the parent is added, we have been able to store the children in a contiguous memory location whenever possible, or access them using a shorter delta from the parent node. This has significantly improved the cache friendliness of the tree by allowing the packing of the book-keeping of up to 8 children in a single 512-bit word. This 512-bit chunk has amazing cache alignment properties.

Highlights

Finished working on the add/append methods of the tree.
Finished writing the iterators on the tree.
Preliminary tests are in place.
HHDS repository now has a working Bazel build system.

Challenges

Working out a new plan: The initial plan was to use a flattening policy to optimize the tree for frequent queries. However, this plan has been revised and we have flattened the tree not using a tour-based flattening policy, but by still storing pointers to various nodes in the tree. This has been done to ensure that the tree is still able to support infrequent queries.
Benchmarking: The benchmarking of the tree is still in progress. I am working on creating a benchmarking suite that will be able to test the tree for scalability and performance. This will allow future developers to test the tree for performance and scalability after they make changes.

Next Steps

From here, a lot of testing and benchmarking is still left to be done. Moreover, we need to add the delete methods and make sure that the integration with the LiveHD core repository is smooth. The next steps involve:

Adding the delete methods to the tree.
Benchmarking the tree for scalability and performance.
Ensuring that the syntax of the tree is in line with the LiveHD core repository.
Integrating the tree into the LiveHD core repository.
Adding documentation to the tree.
Integrating the testing of the tree into the LiveHD testing suite.

Conclusions

My experience so far has been amazing. I have been able to work on a project that is at the intersection of hardware and software. Moreover, I have been able to work with a team that is very supportive and has been able to guide me through the project. I am looking forward to the next steps and am excited to see the final version of the tree in the LiveHD core repository.

Acknowledgements

I would like to thank my mentors, Jose Renau and Sakshi Garg for their guidance and support throughout the project. It would not have been possible without their help.

Architecture Updates - LLM Assistant for OpenROAD

Fri, 19 Jul 2024 00:00:00 +0000

Hi again! I’m Palaniappan R, a GSoC contributor working on the OpenROAD chat assistant project under the mentorship of Indira Iyer and Jack Luar. My project aims to build an LLM-powered chat assistant designed to provide seamless access to existing online resources, thereby reducing support overhead. Over the past month, I’ve been collaborating with Aviral Kaintura, on data engineering to deliver on our common project goal of an OpenROAD assistant and an open-EDA dataset that promotes further research and collaboration.

Progress

The retrieval architecture is at the heart of any retrieval-augmented generation (RAG) setup. Our current setup employs a hybrid-search technique, combining a traditional keyword search method with more advanced vector search methods. As illustrated in the diagram, we combine a simple semantic search, a Maximal Marginal Relevance (MMR) search and a text-based BM25 ranking technique to build our hybrid retriever.

flowchart LR id0([Query]) --> id1 id1([Vectorstore]) --- id2([Semantic Retriever]) id1([Vectorstore]) --- id3([MMR Retriever]) id1([Vectorstore]) --- id4([BM25 Retriever]) id2([Semantic Retriever]) -- Retrieved Docs ---> id5([Reranking]) id3([MMR Retriever]) -- Retrieved Docs ---> id5([Reranking]) id4([BM25 Retriever]) -- Retrieved Docs ---> id5([Reranking]) id5([Reranking]) ---> id6(top-n docs)

Upon receiving a query, relevant documents are sourced from each retriever, resulting in a broad set of results. We feed these results into a cross-encoder re-ranker model to get the top-n documents with maximum relevance.

After building the retriever, we utilized the LangGraph framework to develop a stateful, multi-agent workflow tailored to our use case. This allows flexibility in servicing a diverse set of user questions in an efficient and accurate manner, given the sparse nature of our dataset.

Our current dataset can be broadly classified into the following categories:

OpenROAD Documentation
OpenROAD-flow-scripts Documentation
OpenSTA Documentation
OpenROAD Manpages

These data sources are embedded into separate FAISS vector databases using open-source embeddings models (we’ve been working on fine-tuning an embeddings model for better retrieval accuracy). The hybrid search retrievers are then applied to these vector databases, creating internal tools that can be queried by our LLM as needed. Each tool has access to different data sources in various domains. For instance, the retrieve_cmds tool selectively has access to information detailing the multiple commands in the OpenROAD framework, while the retrieve_install deals with installation-related documentation. As depicted in the flowchart, a routing LLM call classifies the input query and forwards it to the appropriate retriever tool. Relevant documents are then sent back to the LLM for response generation.

graph TD __start__ --> router_agent router_agent -.-> retrieve_cmds router_agent -.-> retrieve_general router_agent -.-> retrieve_install router_agent -.-> retrieve_opensta retrieve_cmds --> generate retrieve_general --> generate retrieve_install --> generate retrieve_opensta --> generate generate --> __end__

Feel free to try out our chat assistant here. Instructions to set up and run our chatbot can be found here.

Here’s an example of our chatbot in action.

Future Plans

In the upcoming weeks, we aim to enhance our dataset by incorporating actionable information filtered from GitHub issues and discussions. We’ll be adding support to keep track of the conversation history as well.

Stay tuned for more updates!

Midterm Blogpost: HDEval's LLM Benchmarking for HDL Design

Thu, 18 Jul 2024 00:00:00 +0000

Introduction

Hello! My name is Ashwin Bardhwaj, an electrical engineering and computer science student based in San Diego, CA. For the past 6 weeks, I have been working closely with Professor Jose Renau on the HDEval project. The aim of this project is to create multiple project sized HDL benchmarks to evaluate how well existing LLMs can generate Verilog/Chisel code. These benchmarks will include my own “golden” HDL implementation of the project as well as respective English prompts to guide the LLM. I am excited to be able to work with these tools that have the potential to become a valuable resource for HDL design. So far, I have been successful in creating the first benchmark, a pipelined 3 stage RISC-V core, as well as working through by second project, a Gameboy Emulator.

RISC-V Implementation

Over this past month and a half, I have successfully completed my first benchmark which focuses on creating, modeling, and testing a pipelined 3-stage RISC-V core. The core uses the fetch, decode, and execute structure and is functional for most RV32I instructions. I synthesized and simulated my Verilog using Icarus Verilog and displayed the waveforms on GTKWave. After development, a good section of time was spent creating and tuning the English explanation of each Verilog module. After running these benchmark files through several LLM APIs, we compared the existing “golden” modules with the generated ones and noticed that more recent versions of LLMs such as GPT 4o and Claude 3 preform much better at creating syntactically correct and efficient code.

In addition, I have also created a tool that will parse the Verilog and instruction files into the necessary json structure to then test on various models.

Gameboy Emulator

I am also in the process of developing the second benchmark, which targets a Gameboy emulator. This will challenge the LLMs much more than the RISC-V project because apart from the custom CISC CPU, the model should also understand how to handle various other blocks of the hardware system including memory, picture processing unit (PPU), sound processing unit (SPU), various input/output systems like the buttons and cartridge, and interrupt handlers. As a result, it will challenge the model to understand the system as a whole when creating each individual module.

Next Steps

As we continue on to the second half of the project, I will continue working on my gameboy emulator. I have already completely developed and tested the Z80-esque CPU, DMA, and interrupt handler but need to continue working on the display and sound interfaces. Also, I will also continue to evaluate and run these tests over a wider range of LLMs to get a better picture of what models and versions are best suited for HDL design as well as the direction these models are going in.

Unveiling Medicine Patterns: 3D Clustering with Polyphy/Polyglot

Wed, 19 Jun 2024 00:00:00 +0000

Hello! My name is Ayush and this summer I’ll be contributing to Polyphy and Polyglot, a GPU oriented agent-based system for reconstructing and visualizing optimal transport networks defined over sparse data. under the mentorship of Oskar Elek and Kiran Deol.

For the reference here’s my proposal for this project.

Polyglot offers an immersive 3D visualization experience, enabling users to zoom, rotate, and delve into complex datasets. My project aims to harness these capabilities to unlock hidden connections in the realm of medicine, specifically focusing on the relationships between drugs based on their shared salt compositions, rather than just their active ingredients. This approach promises to reveal intricate patterns and relationships that have the potential to revolutionize drug discovery, pharmacology, and personalized medicine.

In this project, I will create custom embeddings for a vast dataset of over 600,000 medicines, capturing the relationships between their salt compositions. By visualizing these embeddings in Polyglot’s 3D space, researchers can identify previously unknown connections between medicines, leading to new insights and breakthroughs. The dynamic and interactive nature of Polyglot will empower researchers to explore these complex relationships in a very efficient and cool way, potentially accelerating the discovery of new drug interactions and therapeutic applications.

I am really excited to work on this project. Keep following the blogs for further updates!.

Artificial Intelligence Explainability Accountability

Fri, 14 Jun 2024 00:00:00 +0000

Hey! I’m Sarthak Chowdhary(Shaburu), and I am thrilled to share my incredible journey with the Open Source Program Office of UC Santa Cruz! Association as part of Google Summer of Code (GSoC) 2024. This experience marks a pivotal milestone in my career, offering me the chance to delve into an intriguing project while learning from the brightest minds in the open-source community. Allow me to guide you through my adventure thus far, from the nerve-wracking wait for results to the exhilarating commencement of the coding period.

Before we start here’s my Proposal.

Pre-GSoC Application

I had shortlisted 3 Organizations that i was working on

OSPO UC Santa Cruz - Amplifying Research Impact Through Open Source
CVAT.AI - Computer Vision Data Annotation for AI
Emory University - Biomedical Research to Advance Medical Care

On the 1st of May, like many students eagerly anticipating the results of the Google Summer of Code (GSoC) 2024, I found myself glued to my screen, anxiously awaiting the clock to strike 11:30 PM IST. After what felt like an eternity of waiting, I finally received the email that changed everything: I had been selected for GSoC 2024 with the Open Source Program Office of UC Santa Cruz!

The first month of GSoC, known as the community bonding period, is for establishing rapport with the people working on the project. I researched about my mentor Dr. Leilani H. Gilpin and build a good rapport with her, who is an Assistant Professor in Computer Science and Engineering and an affiliate of the Science & Justice Research Center at UC Santa Cruz. She is also a part of the AI group @ UCSC and leads the AI Explainability and Accountability (AIEA) Lab. Her research focuses on the design and analysis of methods for autonomous systems to explain themselves. Her work has applications to robust decision-making, system debugging, and accountability. Her current work examines how generative models can be used in iterative XAIstress testing. She guided me through the necessary documentation and explained the Project demands and requirements in detail, which was invaluable for my project.

Project

The project aims to build a system that is capable of taking some input which will be the student’s code and explaining them their mistakes from low level syntax errors, compilation errors to high level issues such as overloaded variables.

My Proposal aims to create custom novel basic questions and take it up a notch by creating custom drivers for each problem, common drivers to detect low level errors and give baseline explanations for various error cases, combining these drivers to make a robust system and use third-party open source software (like monaco code editor - the editor of the web) where necessary. Write uniform and consistent feedback/explanations for Each coding problem while covering all the possible edge cases and a pipeline which will iterate the test cases and feedbacks. This benchmark suite will be used for testing the system.

Additionally I plan on building an interface that has a roadmap from basics such as arrays, hashmaps to advanced topics such as trees, heap, backtracking along with progress bars and throws confetti on successful unit tests (important). These will be using the same benchmark suite that will be built under the hood. I will be utilizing Judge0 (open-source online code execution system) for the code execution and Monaco(open-source The Editor of the Web) as the code editor for this.

Project goals:

Project Objective: By the end of summer the software should be a novel and robust tool for helping the community of beginner and advanced programmers alike in learning programming by hyper-focusing on the mistakes they make and using AI to explain to them the how, what and why of their code. Provide clear and concise explanations accompanied by actionable suggestions for debugging and improvement.
Expected deliverables: A Robust eXplainable AI benchmark suite which will be used extensively for the undergraduate AI courses and possibly the Graduate courses as well. Along with anyone interested in learning programming with the help of personalized AI.
Future work based on project: A beautiful Gamified interface that gets people excited to learn programming which utilizes the above benchmark suite would be awesome to build!

When I Started my programming journey (before ChatGPT😨) I personally encountered problems that were way above my skill set and I had no way of knowing so, which used to result in spending countless hours without proper feedback as to where I was going wrong. This project has a real impact on people in an innovative way which I wish I had access to at the start of my Programming journey, so working on it comes from a place of passion. Also this specific project will test my own understanding of programming and spending the summer solidifying it, that too under the guidance of Leilani H. Gilpin is a dream come true for me.

Developing Trustworthy Large Language Models

Fri, 14 Jun 2024 00:00:00 +0000

Hi! Thanks for stopping by.

In this first blog post of a series of three, I’d like to introduce myself, my mentor, and my project.

My name is Nikhil. I am an ML researcher who works at the intersection of NLP, ML, and HCI. I previously worked as a Machine Learning Engineer II at VMware and spent some wonderful summers interning with ML teams at NVIDIA and IIT Bombay. I also recently graduated from the University of Southern California (USC) with honors in Computer Science and a master’s thesis.

This year at Google Summer of Code (GSoC 24), I will be working on developing trustworthy large language models. I’m very grateful to be mentored by Leilani H. Gilpin at the AIEA lab, UC Santa Cruz. I truly admire the flexibility and ownership she allows me in pursuing my ideas independently within this project. Please feel free to peruse my accepted GSoC proposal here.

Project: My project has a tangible outcome: An open-source, end-to-end, full-stack web app with a hybrid trustworthy LLM in the backend.

This open-source web app will be a lightweight tool that not only has the ability to take diverse textual prompts and connect with several LLMs and a database but also the capability to gather qualitative and quantitative user feedback. Users will be able to see how this feedback affects the LLMs’ responses and impacts its reasoning and explanations (xAI). The tool will be thoroughly tested to ensure that the unit tests are passing and there is complete code coverage.

At the moment, we are investigating LLMs and making them more trustworthy in constraint satisfaction tasks like logical reasoning and misinformation detection tasks. However, our work has applicability in other areas of Responsible AI, such as Social Norms (toxicity detection and cultural insensitivity), Reliability (misinformation, hallucination, and inconsistency), Explainability & Reasoning (lack of interpretability, limited logical, and causal reasoning), Safety (privacy violation and violence), and Robustness (prompt attacks and distribution shifts).

Impact:

Responsible AI research teams across industry and academia can use this as a boilerplate for their user study projects.
Diverse PhD students and academic researchers looking to study LLM and user interaction research will find this useful.
LLM alignment researchers and practitioners can find this resourceful as user feedback affects the inherent rewards model of the internal LLMs.
Explainable AI (xAI) researchers can find value in the explanations that this tool generates, which reveal interpretable insights into how modern LLMs think and use their memory. These are just a few use cases; however, there are several others that we look forward to describing in the upcoming posts.

This was my first blog in the series of three for the UC OSPO. Stay tuned for the upcoming blogs, which will detail my progress at the halfway mark and the final one concluding my work.

If you find this work interesting and would love to share your thoughts, I am happy to chat! :) Feel free to connect on LinkedIn and mention that you are reaching out from this blog post.

It is great to meet the UC OSPO community, and thanks for reading. Bye for now.

Heterogeneous Graph Neural Networks for I/O Performance Bottleneck Diagnosis

Fri, 14 Jun 2024 00:00:00 +0000

Hello, I am Mahdi Banisharifdehkordi, a Ph.D. student in Computer Science at Iowa State University, specializing in Artificial Intelligence. This summer, I will be working on the project AIIO / Graph Neural Network under the mentorship of Bin Dong and Suren Byna.

High-Performance Computing (HPC) applications often face performance issues due to I/O bottlenecks. Manually identifying these bottlenecks is time-consuming and error-prone. My project aims to enhance the AIIO framework by integrating a Graph Neural Network (GNN) model to automatically diagnose I/O performance bottlenecks at the job level. This involves developing a comprehensive data pre-processing pipeline, constructing and validating a tailored GNN model, and rigorously testing the model’s accuracy using test cases from the AIIO dataset.

Through this project, I seek to provide a sophisticated, AI-driven approach to understanding and improving I/O performance in HPC systems, ultimately contributing to more efficient and reliable HPC applications.

LLM Assistant for OpenROAD - Data Engineering and Testing

Thu, 13 Jun 2024 00:00:00 +0000

Hello! My name is Aviral Kaintura, and I will be contributing to OpenROAD, a groundbreaking open-source toolchain for digital integrated circuit automation (RTL to GDSII) during GSoC 2024.

My project, LLM Assistant for OpenROAD - Data Engineering and Testing, is jointly mentored by Indira Iyer and Jack Luar.

The aim of this project is to develop a chat assistant to improve the user experience with OpenROAD. My focus will be on developing a well-curated dataset from OpenROAD’s knowledge base. This dataset will be fundamental for another project led by Palaniappan R, which involves building the chatbot’s architecture. It will be used for training and validating the model and ensuring efficient context retrieval to generate accurate user responses, aiding in troubleshooting, installation, and other common issues to reduce the maintainers’ workload.

In addition to dataset creation, I will be working on testing and evaluation. This includes developing metrics for model evaluation, incorporating both human and automated techniques.

Our human evaluation framework will utilize chatbot feedback for valuable insights, enhancing the model and dataset. An automated batch testing application is also used to further enhance the evaluation process.

Here is an early build of the evaluation framework.

By leveraging advanced data engineering and testing methodologies, we aim to build an assistant that combines high accuracy with optimal response times. Additionally, we will collaborate with research teams at NYU and ASU to contribute to the research on AI-based chat assistants for electronic design automation.

I am thrilled to be part of this journey and look forward to making a meaningful impact on the OpenROAD project.

Stay tuned for more updates on the project!

LLM Assistant for OpenROAD - Model Architecture and Prototype

Thu, 13 Jun 2024 00:00:00 +0000

Hi there!

I’m Palaniappan R, currently an undergraduate student at the Birla Institute of Technology & Science, Pilani, India.

I’ll be working on the LLM Assistant for OpenROAD - Model Architecture and Prototype project, under the mentorship of Indira Iyer and Jack Luar.

My project aims to develop the architecture for a chat assistant built for OpenROAD and its native flow, designed to assist beginners and experienced users by giving easy access to existing resources, offering troubleshooting assistance, and providing fast and accurate responses to common questions. I plan to do this by leveraging state-of-the-art retrieval and fine-tuning techniques.

As part of this project, I will be working alongside another project to build and test on a valid dataset for training and deployment. We will also be collaborating with other research teams at NYU and ASU, working on similar projects related to OpenROAD chat assistants and flow generation using Generative AI. Our primary objective is to minimize support overhead, improve user experience by reducing response times, and provide access to updated information about OpenROAD.

Upon completion, my project will offer a viable chat assistant architecture as part of OpenROAD that benefits both the users and tool developers of OpenROAD.

An early prototype developed along with a human evaluation framework shows promising results.

Here are some responses generated by the prototype,

I’m excited about the potential of ORAssistant as part of the OpenROAD tool suite to accelerate innovation in EDA and chip design by utilizing open-source tools along with Generative AI.

Stay tuned for more updates!

Stream Processing support for FasTensor

Thu, 13 Jun 2024 00:00:00 +0000

Hi, I’m Aditya Narayan,👋

I’m a frequent visitor to the town square of theoretical CS, operations (Ops), and robust high-performance systems. Sometimes I indulge myself with insights on Computing and Biology, and other times I enjoy the accounts of minefield experiences in the systems world. Luckily, this summer, OSRE offered an opportunity that happened to be at the perfect intersection of my interests.

This summer, I will be working on a scientific computing library called FasTensor that offers a parallel computing structure called Stencil, widely popular in the scientific computing world to solve PDEs for Physical Simulations and Convolutions on Signals, among its many uses. I am excited to introduce my mentors, Dr. Bin Dong and Dr. John Wu of the Scientific Data Management Group at Lawrence Berkeley National Laboratory (LBNL). They bring invaluable expertise to the project.

They recognized the need for a tensor processing library that provided dedicated support for big datasets with inherent structural locality, often found in the scientific computing world, which was lacking in popular open-source MapReduce or Key-Value based frameworks.

More often than not, the operations performed on these datasets are composed of computations involving neighboring elements. This motivated the development of the FasTensor library.

I will be working on providing a Stream Processing interface that enables online data processing of large-scale datasets as they arrive from Data Producers. The project focuses on offering rich interfaces for managing and composing streams, supporting common scientific data formats like HDF5, and integrating fault tolerance and reliability mechanisms.

I am thrilled to work on the FasTensor project because I believe it has the potential to make a significant impact by enabling researchers to implement a rich set of computations on their big datasets in an easy and intuitive manner.

After all, FasTensor has just one simple paradigm: A -> Transform(F(x), B),

and it handles all the behind-the-scenes grunt work of handling big datasets so you can focus on your research.

Stay tuned for updates and feel free to collaborate!

Drishti

Thu, 06 Jun 2024 00:00:00 +0000

Namaste everyone! 🙏🏻

I’m Joel Tony, a third-year Computer Science undergraduate at BITS Pilani, Goa, India. I’m truly honored to be part of this year’s Google Summer of Code program, working with the UC OSPO organization on a project that genuinely excites me. I’m particularly grateful to be working under the mentorship of Dr. Jean Luca Bez, a Research Scientist at Lawrence Berkeley National Laboratory, and Dr. Suren Byna, a Full Professor at the Ohio State University. Their expertise in high-performance computing and data systems is invaluable as I tackle this project.

My project, “Drishti: Visualization and Analysis of AI-based Applications”, aims to extend the Drishti framework to better support AI/ML workloads, focusing specifically on optimizing their Input/Output (I/O) performance. I/O refers to the data transfer between a computer’s memory and external storage devices like hard drives (HDDs) or solid-state drives (SSDs). As AI models and datasets continue to grow exponentially in size, efficient I/O management has become a critical bottleneck that can significantly impact the overall performance of these data-intensive workloads.

Drishti is an innovative, interactive web-based framework that helps users understand the I/O behavior of scientific applications by visualizing I/O traces and highlighting bottlenecks. It transforms raw I/O data into interpretable visualizations, making performance issues more apparent. Now, I’m working to adapt these capabilities for the unique I/O patterns of AI/ML workloads.

Through my studies in high-performance computing and working with tools like BeeGFS and Darshan, I’ve gained insights into the intricacies of I/O performance. However, adapting Drishti for AI/ML workloads presents new challenges. In traditional HPC, computing often dominates, but in the realm of AI, the tables have turned. As models grow by billions of parameters and datasets expand to petabytes, I/O has become the critical path. Training larger models or using richer datasets doesn’t just mean more computation; it means handling vastly more data. This shift makes I/O optimisation not just a performance tweak but a fundamental enabler of AI progress. By fine-tuning Drishti for AI/ML workloads, we aim to pinpoint I/O bottlenecks precisely, helping researchers streamline their data pipelines and unlock the full potential of their hardware.

As outlined in my proposal, my tasks are threefold:

Modularize Drishti’s codebase: Currently, it’s a single 1700-line file that handles multiple functionalities. I’ll be refactoring it into focused, maintainable modules, improving readability and facilitating future enhancements.
Enable multi-trace handling: Unlike traditional HPC apps that typically generate one trace file, most AI jobs produce multiple. I’ll build a layer to aggregate these, providing a comprehensive view of the application’s I/O behavior.
Craft AI/ML-specific recommendations: Current suggestions often involve MPI-IO or HDF5, which aren’t typical in ML frameworks like PyTorch or TensorFlow. I’ll create targeted recommendations that align with these frameworks’ data pipelines.

This summer, my mission is to make Drishti as fluent in AI/ML I/O patterns as it is in traditional HPC workloads. My goal is not just to adapt Drishti but to optimize it for the unique I/O challenges that AI/ML applications face. Whether it’s dealing with massive datasets, handling numerous small files, or navigating framework-specific data formats, we want Drishti to provide clear, actionable insights.

From classroom theories to hands-on projects, from understanding file systems to optimizing AI workflows, each step has deepened my appreciation for the complexities and potential of high-performance computing. This GSoC project is an opportunity to apply this knowledge in a meaningful way, contributing to a tool that can significantly impact the open-source community.

In today’s AI-driven world, the pace of innovation is often gated by I/O performance. A model that takes weeks to train due to I/O bottlenecks might, with optimized I/O, train in days—translating directly into faster iterations, more experiments, and ultimately, breakthroughs. By making I/O behavior in AI/ML applications more interpretable through Drishti, we’re not just tweaking code. We’re providing developers with the insights they need to optimize their data pipelines, turning I/O from a bottleneck into a catalyst for AI advancement.

I look forward to sharing updates as we adapt Drishti for the AI era, focusing squarely on optimizing I/O for AI/ML workloads. In doing so, we aim to accelerate not just data transfer but the very progress of AI itself. I’m deeply thankful to Dr. Jean Luca Bez and Prof. Suren Byna for their guidance in this endeavor and to the UC OSPO and GSoC communities for this incredible opportunity.

Enhancing h5bench with HDF5 Compression Capability

Mon, 27 May 2024 00:00:00 +0000

Hardware Hierarchical Dynamical Systems

Tue, 14 May 2024 00:00:00 +0000

As part of Micro Architecture Santa Cruz (MASC) my proposal under the mentorship of Jose Renau and Sakshi Garg aims to develop a tree data structure under HHDS to replace the current one offered by LHTree

The tree data structure is to be optimized for typical AST traversal and queries. Some queries that are made to this tree are much more frequent than others. Thus a flattening policy will be used to optimize the tree for these queries, at the potential cost of becoming slow for the infrequent queries. The tree will be benchmarked for scalability and performance and is expected to outperform the current version of the tree. Once the implementation is complete, the tree will be integrated into the LiveHD core repository.

HDEval: Benchmarking LLMs that Generate Verilog/Chisel Modules From Natural Language

Tue, 14 May 2024 00:00:00 +0000

Hi everyone!

I’m Ashwin Bardhwaj, currently pursuing a bachelors in Electrical Engineering and Computer Science at UC Berkeley. I was recently involved in a project to implement a secure hardware encryption enclave in Verilog. That’s why I was excited to work with the MASC group to evaluate how existing generalized LLMs (such as ChatGPT 4 or StarCoder) can generate accurate Verliog/Chisel code from English and assist in the hardware development process.

As part of Micro Architecture Santa Cruz (MASC) my proposal under the mentorship of Jose Renau and Sakshi Garg looks to create a suite of benchmark programs for HDEval.

The deliverable of this project is to create multiple large HDL benchmarks along with a respective set of prompts. Using yosys to implement Logic Equivalence Check, we are able to prove through formal verification that the generated code will exhibit the same behavior as the benchmark. In addition, we can also consider the performance and resource utilization of the generated code as a metric.

BenchmarkST: Cross-Platform, Multi-Species Spatial Transcriptomics Gene Imputation Benchmarking

Sat, 17 Feb 2024 00:00:00 +0000

Topics: bioinformatics, spatial transcriptomics, gene imputation, benchmarking, cross-platform/species analysis
Skills:
- Programming Languages:
  - Proficient in Python and/or R, commonly used in bioinformatics.
- Data Analysis:
  - Experience with statistical data analysis and machine learning models.
- Bioinformatics Knowledge (not required but preferred):
  - Proficiency in bioinformatics and computational biology.
  - Familiarity with spatial transcriptomics datasets and platforms.
Difficulty: Advanced
Size: Large (350 hours). Given the scope of integrating multi-platform, multi-species datasets and the complexity of benchmarking gene imputation methods, this project is substantial. It requires extensive data preparation, analysis, and validation phases, making it suitable for a larger time investment.
Mentors: Ziheng Duan (contact person)

Project Idea Description

The orchestration of cellular life is profoundly influenced by the precise control of gene activation and silencing across different spatial and temporal contexts. Understanding these complex spatiotemporal gene expression patterns is vital for advancing our knowledge of biological processes, from development and disease progression to adaptation. While single-cell RNA sequencing (scRNA-seq) has revolutionized our ability to profile gene expression across thousands of cells simultaneously, its requirement for cell dissociation strips away the critical spatial context, limiting our comprehension of cellular interactions within their native environments. Recent strides in spatial transcriptomics have started to bridge this gap by enabling spatially resolved gene expression measurements at single-cell or even sub-cellular resolutions. These advancements offer unparalleled opportunities to delineate the intricate tapestry of gene expression within tissues, shedding light on the dynamic interactions between cells and their surroundings.

Despite these technological advances, a significant challenge remains: the datasets generated by spatial transcriptomic technologies are often incomplete, marred by missing gene expression values due to various technical and biological constraints. This limitation severely impedes our ability to fully interpret these rich datasets and extract meaningful insights from them. Gene imputation emerges as a pivotal solution to this problem, aiming to fill in these missing data points, thereby enhancing the resolution, quality, and interpretability of spatial transcriptomic datasets.

Recognizing the critical importance of this task, there is a pressing need for a unified benchmarking platform that can facilitate the evaluation and comparison of gene imputation methods across a diverse array of samples, spanning multiple sampling platforms, species, and organs. Currently, the bioinformatics and spatial transcriptomics fields lack such a standardized framework, hindering progress and innovation. To address this gap, our project aims to establish a comprehensive gene imputation dataset that encompasses a wide range of conditions and parameters. We intend to reproduce known methods and assess their efficacy, providing a solid and reproducible foundation for future advancements in this domain.

Project Deliverable

A comprehensive, preprocessed benchmark dataset that spans multiple sampling platforms, species, and organs, aimed at standardizing gene imputation tasks in spatial transcriptomics.
An objective comparison of state-of-the-art gene imputation methodologies, enhancing the understanding of their performance and applicability across diverse biological contexts.
A user-friendly Python package offering a suite of gene imputation tools, designed to fulfill the research needs of the spatial transcriptomics community by improving data completeness and reproducibility.

GPEC: An Open Emulation Platform to Evaluate GPU/ML Workloads on Erasure Coding Storage

Thu, 08 Feb 2024 00:00:00 +0000

Project Idea Description

Topics: Storage Systems, Machine Learning, Erasure Coding
Skills: C/C++, Python, PyTorch, Bash scripting, Linux, Erasure Coding, Machine Learning
Difficulty: Hard
Size: Large (350 hours)
Mentors: Meng Wang (primary contact), John Bent

Large-scale data centers store immense amounts of user data across a multitude of disks, necessitating redundancy strategies like erasure coding (EC) to safeguard against disk failures. Numerous research efforts have sought to assess the performance and durability of various erasure coding approaches, including single-level erasure coding, locally recoverable coding, and multi-level erasure coding.

Despite its widespread adoption, a significant research gap exists regarding the performance of large-scale erasure-coded storage systems when exposed to machine learning (ML) workloads. While conventional practice often leans towards replication for enhanced performance, this project seeks to explore whether cost-effective erasure encoding can deliver comparable performance. In this context, several fundamental questions remain unanswered, including: Can a typical erasure-coded storage system deliver sufficient throughput for ML training tasks? Can an erasure-coded storage system maintain low-latency performance for ML training and inference workloads? How does disk failure and subsequent repair impact the throughput and latency of ML workloads? What influence do various erasure coding design choices, such as chunk placement strategies and repair methods, have on the aforementioned performance metrics?

To address these questions, the most straightforward approach would involve running ML workloads on large-scale erasure coded storage systems within HPC data centers. However, this presents challenges for researchers and students due to limited access to expensive GPUs and distributed storage systems, especially when dealing with large-scale evaluations. Consequently, there is a need for a cost-effective evaluation platform.

The objective of this project is to develop an open-source platform that facilitates cheap and reproducible evaluations of erasure-coded storage systems concerning ML workloads. This platform consists of two key components: GPU Emulator: This emulator is designed to simulate GPU performance for ML workloads. Development of the GPU emulator is near completion. EC Emulator: This emulator is designed to simulate the performance characteristics of erasure-coded storage systems. It is still in the exploratory phase and requires further development.

The student’s responsibilities will include documenting the GPU emulator, progressing the development of the EC emulator, and packaging the experiments to ensure easy reproducibility. It is anticipated that this platform will empower researchers and students to conduct cost-effective and reproducible evaluations of large-scale erasure-coded storage systems in the context of ML workloads.

Project Deliverable

Build an EC emulator to emulate the performance characteristics of large-scale erasure-coded storage systems
Incorporate the EC emulator into ML workloads and GPU emulator
Conduct reproducible experiments to evaluate the performance of erasure-coded storage systems in the context of ML workloads
Publish a Trovi artifact shared on Chameleon Cloud and a GitHub repository with open-source code

Turn on, Tune in, Listen up: Maximizing Side-Channel Recovery in Cross-Platform Time-to-Digital Converters

Thu, 08 Feb 2024 00:00:00 +0000

Turn on, Tune in, Listen Up Is an open-source framework for implementing voltage flucturation sensors in FPGA devices for use in side-channel security research. Side-channels are an ever present hardware security threat. The reconfigurability of FPGAs significantly broadens the side-channel attack surface in many cloud heterogeneous systems. We have developed a highly tunable side-channel sensor, which significantly improves side-channel attack time and resolution in multiple contexts. Concurrent users sharing the same device may attack one another through the power side-channel (check out our paper), while consecutive users may attack one another through measurement of the physical wear-out state of the FPGA device (check out our paper). We have demonstrated these attack surfaces on both Intel (Altera) and AMD (Xilinx) platforms. Currently, our open-sourced sensor design and side-channel analysis flow is limited to AMD devices. We are seeking CSE/CS/CE/ECE researchers interested in FPGA design, heterogeneous computing and/or hardware security to combine our Intel and AMD side-channel sensors into a unified attack framework and comparing capabilities between vendors.

Open-source sensor repository updates

Topics: Hardware security, cloud security, heterogeneous computing, temporal and spatial side-channels
Skills: Experience with GitHub, FPGA development (AMD or Intel), and Python
Difficulty: Moderate
Size: Large (350 hours)
Mentors: Dustin Richmond, Tyler Sheaves

Update existing open-source voltage fluctuation sensor to support both AMD and Intel devices. Currently our repository exclusively supports AMD FPGAs. We have added new features to our sensor and have demonstrated an implementation on Intel. We would like to consolidate this work into a unified repository containing side-channel analysis demonstrations using open-source target benchmark designs.

Specific tasks:

Adapt existing tooling scripts to support multiple vendor tool flows.
Adapt existing test infrastructure to target multiple SoC-type FPGA platforms (i.e. DE10-Nano, Pynq Z2, etc.).
Evaluate cross-platform sensor architecture on a collection of benchmark designs. Demonstrate each benchmark using a cross-platform unified side-channel analysis framework.
Draw a comparison between sensor implementations on different architectures.

Artificial Intelligence Explainability Accountability

Wed, 07 Feb 2024 00:00:00 +0000

Trustworthy Logical Reasoning Large Language Models (LLMs)

Logical LLMs is a project to translate the output from large language models (LLM) into a logic-based programming language (prolog) to detect inconsistencies and hallucinations automatically . The goals of this project would be to build a user interface for users to be able to give feedback which can be incorporated into the system. The project goal is to create a trustworthy hybrid open-source LLM tool that can learn from user feedback and explain its mistakes.

Collect Hallucinations and Facts

Topics: AI/ML, data collection, logic, user interfaces
Skills: javascript, html, python, bash, git
Difficulty: Easy/Moderate
Size: Large
Mentors: Leilani H. Gilpin (and a PhD student TBD).

Specific Tasks

Run queries in an LLM API with various prompts.
Create a user interface system that collects user feedback in a web browser.
Create a pipeline for storing the user data in a common format that can be shared in our database.
Document the tool for future maintenance.

Explaining failures in autograding

The eXplainable autograder (XAutograder) is a tool for autograding student coding assignments, while providing personalized explanations or feedback. The goal of this project is to create an introductory set of coding assignment with explanations of wrong answers. This benchmark suite will be used for testing our system. The project goal is to create a dynamic autograding system that can learn from student’s code and explain their mistakes

Design introductory questions and explanations

Topics: AI/ML, AI for education, XAI (Explainable AI_
Skills: python, git
Difficulty: Moderate
Size: Large
Mentors: Leilani H. Gilpin (and a PhD student TBD).

Specific Tasks

Design 5-10 basic programming questions (aggregated from online, other courses, etc).
Create tests of correctness (unit tests), and a testing framework which can input a set of answers, and provide a final assessment
Create a set of baseline explanations for various error cases, e.g., out of bounds error, syntax error, etc.
Create a pipeline for iterating on the test cases and/or explanation feedback.
Document the tool for future maintenance.

Causeway: Learning Web Development Through Micro-Roles

Wed, 07 Feb 2024 00:00:00 +0000

Thus far, we have developed a version of the platform that walks learners through the process of developing presentational components of a web application as well as smart components / containers that contain multiple presentational components and are responsible for fetching data from the backend and handling events and updates to the database. This content is still using Angular 13 and needs to be updated to Angular 17, as well as to make some improvements in our use of RxJS, NgRx, and Firebase. We’d also like to extend the content in multiple ways including: 1) extending the walkthrough to more components and containers besides the single example we have, ideally in a way that covers a complete application, and 2) extending beyond components and containers to cover defining database entities and relationships. We’d also like to develop a learning dashboard where users can see the different micro-roles and lessons that they’ve completed or that are upcoming for the project they are working on.

Causeway / Improving the Core Infrastructure and Experience

The proposed work includes updating the platform and the example infrastructure within the platform to the latest version of Angular and other associated libraries, implementing and testing logging and analytics, implementing a learning dashboard for users, and time permitting, creating new modules to cover defining database entities and relationships. Both roles will also contribute to running usability studies and documenting the platform so that it can be open-sourced.

Topics: Web Development, Educational Technologies, Angular
Skills: Web development experience, HTML, CSS, Javascript, Angular, RxJS, NgRx, Firebase
Difficulty: Medium to Hard
Size: Large (350 hours)
Mentors: David Lee

Causeway / Extend the Learning Scope and Experience

The proposed work includes extending the component and container walkthroughs to cover a complete interactive application. This means writing a separate simple application, and organizing the code required to do so into units of work organized by our micro-role structure. Both roles will also contribute to running usability studies and documenting the platform so that it can be open-sourced.

Topics: Web Development, Educational Technologies, Angular
Skills: Web development experience, HTML, CSS, Javascript, Angular, RxJS, NgRx, Firebase
Difficulty: Medium
Size: Large (350 hours)
Mentors: David Lee

Open Sensing Platform (OSP)

Mon, 05 Feb 2024 00:00:00 +0000

Open Sensing Platform I: Software to enable large scale outdoor sensor networks

Topics: Data Visualization, Backend, Web Development, UI/UX, Analytics
Skills:
- Required: React, Javascript, Python, SQL, Git
- Nice to have: Flask, Docker, CI/CD, AWS, Authentication
Difficulty: Medium
Size: Large (350 hours)
Mentors: Colleen Josephson, John Madden, Aaron Wu

Open Sensing Platform (OSP) is a new initiative expanding from our prior project DirtViz, a data visualization web platform for monitoring microbial fuel cell sensors (see GitHub). The mission is to scale up the current platform to support other researchers or citizen scientists in integrating their novel sensing hardware or microbial fuel cell sensors for monitoring and data analysis. Examples of the types of sensors currently deployed are sensors measuring soil moisture, temperature, current, and voltage in outdoor settings. The focus of the software half of the project involves building upon our existing visualization web platform, and adding additional features to support the mission. A live version of the website is available here.

Deliverables:
- Create a system for remote collaborators/citizen scientists to set up their sensors and upload securely, eg. designing user flow to create sensors
- Craft an intuitive navigation system so that data from deployment sites around the world can be easily viewed, eg. designing experience/system to locate deployment sites.
- Refine our web-based visualization tools to add additional features for users to analyze collected data, eg. lazy loading out-of-range data or caching queried data.
- Document the tool thoroughly for future maintenance

Open Sensing Platform II: Hardware to enable large scale outdoor sensor networks

Topics: Embedded system, wireless communication, low-power remote sensing
Skills:
- Required: C/C++, Git, Github, Platformio
- Nice to have: PCB design and debugging experience, STM32 HAL, ESP32 Arduino, protobuf, python, knowledge of standard communication protocols (I2C, SPI, and UART)
Difficulty: Hard
Size: Large (350 hours)
Mentors: Colleen Josephson, John Madden, Stephen Taylor

The Open Sensing Platform hardware aims to be a general purpose hardware platform for outdoor sensing (e.g. agriculture, ecological monitoring, etc.). The typical use case involves a sensor deployment in an agricultural field, remotely uploading measurements without interfering with farming operations. The current hardware revision (Soil Power Sensor) was originally designed for monitoring power output of microbial fuel cells using high fidelity voltage and current measurement channels, as well as auxiliary sensors such as the SDI-12 TEROS-12 soil moisture sensor. The primary activities of this project will involve low-level firmware design and implementation, but may also incorporate hardware design revisions if necessary. We are looking to expand functionality to other external sensors, as well as optimize for power consumption, via significant firmware design activities.

Long-range, low-power wireless communication is achieved through a LoRa capable STM32 microcontroller with in-lab experiments using an ESP32 microcontroller to enable the simpler WiFi interface. Both wireless interfaces communicate upload measurements to our data visualization dashboard, Open Sensing Platform I. The combined goal across both of these projects is to create a system that enables researchers to test and evaluate novel sensing solutions. We are looking to make the device usable to a wide range of researchers which may not have a background in electronics, so are interested in design activities that enhance user friendliness.

Deliverables: Contribution via commits to the GitHub repository with documentation on completed work. A changelog of contributions to the firmware.

LiveHD

Thu, 01 Feb 2024 00:00:00 +0000

The goals is to enable a more productive flow where the ASIC/FPGA designer can work with multiple hardware description languages like CHISEL, Pyrope, or Verilog.

There are several projects, some compiler infrastructure around LiveHD. Others around how to interface LLMs to improve chip design productivity.

There are the following projects available:

Slang with LiveHD
Hardware Hierarchical Dynamic Structures (hdds)
HDLEval for LLMs
C++ Profiler Optimizer with LLMs
Decompiler from Assembly to C++ with LLMs

Slang with LiveHD

Project Idea

slang is one of the best open source Verilog front-ends available. LiveHD uses slang, but only a subset of Verilog is supported. The goal is to add more slang features.

Project Deliverable

The slang/LiveHD interface creates LiveHD IR (LNAST IR). The plan is to keep extending the translation to support more features. This is a project that allows small steps. The goal is to support all Verilog 2001, and potentially some System Verilog features.

Topics: SysteVerilog, Compilers
Skills Needed: Knowledge of Verilog, C++17, some compiler background.
Difficulty: Medium
Size: Large
Mentor: Jose Renau, Sakshi Garg

Hardware Hierarchical Dynamic Structures (hdds)

Project Idea

hdds aims to build efficient tree and graph data structures commonly used by hardware compilers. A key difference is the hierarchical nature, and patterns.

Project Deliverable

There are 2 main components: Graph and Tree.

For each, there is a hierarchical implementation that allows to connect tree/graphs in a hieararchy. For example, a graph can call another graph with input and outputs like a Verilog module calls other Verilog modules.

Both classes should have iterators for traversing in topological sort.

Topics: Data structures for compilers
Skills Needed: Data structures, C++17
Difficulty: Medium
Size: Large
Mentor: Jose Renau, Sakshi Garg

HDLEval for LLMs

Project Idea

LLMs can be used to create new hardware. The goal of this project is to create multiple prompts so that LLM/compiler designers can have examples to improve their flows.

Project Deliverable

The idea is to create many sample projects where a “input” creates a Verilog artifact. The specification should not assume Verilog as output because other HDLs like Chisel could be used.

The goal is to create many sample circuits that are realistic and practical. The description can have

Topics: Verilog, LLMs
Skills Needed: Verilog or Chisel
Difficulty: Low
Size: Small or medium
Mentor: Jose Renau

C++ Profiler Optimizer with LLMs

Project Idea

Fine-tune, and/or RAG, a LLM to leverage profiling tools so that it can provide code optimization recommendations for C++ and possibly Rust code.

Project Deliverable

Create a Python package (poetry?) called aiprof that analyzes the execution of a C++ or Rust program and provide code change recommendations to improve performance.

aiprof ./binary

aiprof uses perf tools but also other tools like redspy, zerospy, and loadspy to find problematic code areas and drive the GPT optimizer.

The plan is to find several examples of transformations to have a database so that a model like CodeLlama or mixtral can be fine-tuned with code optimization recomendations.

Topics: C++, perf tools
Skills Needed: C++17, Linux performance counters
Difficulty: Medium
Size: Large
Mentor: Jose Renau

Decompiler from Assembly to C++ with LLMs

Project Idea

There are several decompilers from assembly to C like ghidra and retdec. The idea is to enhance both outputs to feed an LLM to generate nicer C++ code.

Project Deliverable

ghidra and retdec generate C code out of assembly. The idea is to start with these tools as baseline, but feed it to a LLM to generate C++ code instead of plain C.

Create a Python package (poetry?) called aidecomp that integrates both decompilers. It allows to target C or C++17.

To check that the generated code is compatible with the function translated, a fuzzer could be used. This allows aidecomp to iterate the generation if the generated code is not equivalent.

Topics: C++, decompilers
Skills Needed: C++17
Difficulty: Medium
Size: Large
Mentor: Jose Renau

Drishti

Tue, 30 Jan 2024 10:15:00 -0700

Drishti / Server-side Visualization Service

The proposed work will include investigating and building server-side solutions to support the visualization of larger I/O traces and logs, while integrating with the existing analysis, reports, and recommendations.

Topics: I/O HPC visualization, performance analysis
Skills: Python, HTML/CSS, JavaScript
Difficulty: Moderate
Size: Large (350 hours)
Mentors: Jean Luca Bez and Suren Byna

Drishti / Visualization and Analysis of AI-based Applications

Drishti to handle metrics from non-MPI applications, specifically, AI/ML codes and applications. This work entails adapting the existing framework, heuristics, and recommendations to support metrics collected from AI/ML workloads.

Topics: I/O HPC AI visualization, performance analysis
Skills: Python, AI, performance profiling
Difficulty: Moderate
Size: Large (350 hours)
Mentors: Jean Luca Bez and Suren Byna

h5bench

Tue, 30 Jan 2024 10:15:00 -0700

h5bench / Reporting and Enhancing

The proposed work will include standardizing and enhancing the reports generated by the suite, and integrate additional I/O kernels (e.g., HACC-IO).

Topics: I/O HPC benchmarking
Skills: Python, C/C++, good communicator
Difficulty: Moderate
Size: Large (350 hours)
Mentors: Jean Luca Bez and Suren Byna

h5bench / Compression

The proposed work will focus on including compression capabilities into the h5bench core access patterns through HDF5 filters.

Topics: I/O HPC benchmarking, compression
Skills: C/C++, Python, HDF5
Difficulty: Moderate
Size: Large (350 hours)
Mentors: Jean Luca Bez and Suren Byna

OpenROAD - An Open-Source, Autonomous RTL-GDSII Flow for Chip Design

Mon, 22 Jan 2024 00:00:00 +0000

OpenROAD massively scales and supports EWD (Education and Workforce Development) and supports a broad ecosystem making it a vital tool that supports a rapidly growing Semiconductor Industry.

Create OpenROAD Tutorials and Videos

Topics: Documentation, Tutorials, Videos, VLSI design basics
Skills: Video/audio recording and editing, training and education
Difficulty: Medium
Size: Large (350 hours)
Mentor: Indira Iyer, Vitor Bandeira

Create short videos for training and course curriculum highlighting key features and flows in OpenROAD-flow-scripts.

Improve the OpenROAD AutoTuner Flow and documentation

Topics: OpenROAD-flow-scripts, AutoTuner, Design Exploration
Skills: Knowledge of ML for hyperparameter tuning, Cloud-based computation, Basic VLSI design and tools knowledge, python, C/C++
Difficulty: Medium
Size: Large (350 hours)
Mentor: Vitor Bandeira, Indira Iyer

Test, analyze and enhance the AutoTuner to improve usability, documentation and QoR. The Autotuner is an important tool in the OpenROAD flow - OpenROAD-flow-scripts for Chip design exploration that significantly reduces design time. You will use state-of-the-art ML tools to test the current tool exhaustively for good PPA (performance, power, area) results. You will also update existing documentation to reflect any changes to the tool and flow.

Implement a memory compiler in the OpenROAD Flow

Topics: OpenROAD-flow-scripts, Memory Compiler,
Skills: Basic VLSI design and tools knowledge, python, tcl, C/C++, memory design a plus
Difficulty: Medium
Size: Medium (175 hours)
Mentor: Matt Liberty, Austin Rovinski

Implement a memory compiler as part of the OpenROAD flow to improve the placement and layout efficiency of large, memory-intensive designs. You will start with an existing code base to develop this feature: https://github.com/The-OpenROAD-Project-staging/OpenROAD/tree/dffram This is another option: https://github.com/AUCOHL/DFFRAM Enhance code to support DFFRAM support for the OpenROAD native flow, OpenROAD-flow-scripts.

Integrate a tcl and python linter

Topics: Linting, Workflow
Skills: tcl, python, linting
Difficulty: Easy
Size: Small (90 hours)
Mentor: Vitor Bandeira, Austin Rovinski

Integrate a tcl and python linter for tools in OpenROAD and OpenROAD-flow-scripts to enforce error checking, style and best practices.

LLM assistant for OpenROAD - Create Model Architecture and Prototype

Topics: Large Language Model, Machine Learning, Model Architecture, Model Deployment
Skills: large language model engineering, prompt engineering, fine-tuning
Difficulty: Medium
Size: Medium (175 hours)
Mentor: Indira Iyer, Jack Luar

This project involves the creation of a conversational assistant designed around OpenROAD to answer user queries. You will be working in tandem with members of the OpenROAD team and other researchers to deliver a final deployable prototype. You will focus on the design and implementation of modular LLM architectures. You will be experimenting through different architectures and justifying which approach works the best on our domain-specific data. Open to proposals from all levels of ML practitioners.

LLM assistant for OpenROAD - Data Engineering and testing

Topics: Large Language Model, Machine Learning, Data Engineering, Model Deployment, Testing
Skills: large language model engineering, prompt engineering, fine-tuning
Difficulty: Medium
Size: Medium (175 hours)
Mentor: Indira Iyer, Jack Luar

This project involves the creation of a conversational assistant designed around OpenROAD to answer user queries. You will be working in tandem with members of the OpenROAD team and other researchers to deliver a final deployable prototype. This project will focus on the data engineering portion of the project. This may include: training pipelines specifically tailored for fine-tuning LLM models, data annotation, preprocessing and augmentation. Open to proposals from all levels of ML practitioners.

Create Unit tests for OpenROAD tools

Topics: OpenROAD-flow-scripts, unit testing
Skills: Basic VLSI design and tools knowledge, python, tcl, C/C++, Github
Difficulty: Medium
Size: Medium ( 175 hours)
Mentor: Vitor Bandeira, Indira Iyer

You will build unit tests to test specific features of the OpenROAD tool which will become part of the regression test. Here is an example of a test for UPF support: https://github.com/The-OpenROAD-Project/OpenROAD/blob/master/test/upf/mpd_aes.upf. This is a great way to learn VLSI flow basics and the art of testing them for practical applications.

AIIO / Graph Neural Network

Wed, 17 Jan 2024 10:15:56 -0700

[AIIO] (https://github.com/hpc-io/aiio) revolutionizes the way for users to automatically tune the I/O performance of applications on HPC systems. It currently works on linear regression models but has more opportunities to work on heterogeneous data, such as programming info. This requires extending the linear regression model to more complex models, such as heterogeneous graph neural networks. The proposed work will include developing the graph neural work-based model to predict the I/O performance and interpretation.

AIIO / Graph Neural Network

Topics: AIIO/Graph Neural Network`
Skills: Python, Github, Machine Learning
Difficulty: Difficult
Size: Large (350 hours)
Mentor: Bin Dong, Suren Byna

The Specific tasks of the project include:

Develop the data pre-processing pipeline to convert I/O logs into formats which are required by the Graph Neural Network
Build and test the Graph Neural Network to model the I/O performance for HPC applications.
Test and evaluate the accuracy of the Graph Neural Network with test cases from AIIO

FasTensor / Stream Processing

Wed, 17 Jan 2024 10:15:56 -0700

[FasTensor] (https://github.com/BinDong314/FasTensor) is a generic tensor processing engine with scalability from single nodes to thousands of nodes on HPC. FasTensor supports applications from traditional SQL query to complex DFT solver in scientific applications. It has a 1000X performance advantage over MapReduce and Spark in supporting generic data processing functions on tensor structure. In this project, we propose to expand FasTensor with streaming functionality to support online data processing. Specifically, participants of this project will develop a stream endpoint for retrieving live data output from applications, such as DAS. The stream endpoint performs the function to maintain the pointer of data, which could be a n-dimensional subset of a tensor.

FasTensor / Stream Processing

Topics: FasTensor/Streaming Processing
Skills: C++, github
Difficulty: Difficult
Size: Large (350 hours)
Mentor: Bin Dong, John Wu

The Specific tasks of the project include:

Building a mock workflow based on our DAS application (https://github.com/BinDong314/DASSA) to test stream processing. The mock workflow comprises a data producer, which generates DAS data, and a data consumer, which processes the data.
Developing a Stream Endpoint (e.g., I/O driver) to iteratively read dynamically increasing data from a directory. The stream endpoint essentially includes open, read, and write functions, and a pointer to remember current file pointer.
Integrating the Stream Endpoint into the FasTensor library.
Evaluating the performance of the mock workflow with the new Stream Endpoint.
Documenting the execution mechanism.

PolyPhy

Mon, 01 Jan 2024 00:00:00 +0000

PolyPhy is a GPU oriented agent-based system for reconstructing and visualizing optimal transport networks defined over sparse data. Rooted in astronomy and inspired by nature, we have used an early prototype called Polyphorm to reconstruct the Cosmic web structure, but also to discover network-like patterns in natural language data. You can see an instructive overview of PolyPhy in our workshop and more details about our research here.

Under the hood, PolyPhy uses a richer 3D scalar field representation of the reconstructed network, instead of a typical discrete representation like a graph or a mesh. The ultimate purpose of PolyPhy is to become a toolkit for a range of specialists across different disciplines: astronomers, neuroscientists, data scientists and even artists and designers. PolyPhy aspires to be a tool for discovering connections between different disciplines by creating quantitatively comparable structural analytics.

PolyPhy Web Presence

Topics: Web Development UX Social Media
Skills: full stack web development, Javascript, good communicator
Difficulty: Challenging
Size: Large (350 hours)
Mentors: Oskar Elek, Ezra Huscher

The online presentation of a software project is without a doubt one of the core ingredients of its success. This project aims to develop a sustainable web presentce for PolyPhy, catering to interested contributors, active collaborators, and users alike.

Specific tasks:

Closely work with the mentors on understanding the context of the project and its detailed requirements in preparation of the proposal.
Port the existing website into a more modern Javascript framework (such as Next.js) that provides a user-friendly CMS and admin interface.
Update the contents of the website with new information from the repository repository page as well as other sources as directed by the mentors.
Develop a simple functional system for posting updates about the project to selected social media and other communication platforms (LinkedIn, Twitter/X or Mastodon, mailing list) which will also be reflected on the website.
Optional: improve the UX of the website where needed.
Optional: implement website analytics (visitor stats etc).

Data Visualization and Analysis with PolyPhy/Polyglot

Topics: Data Science Data Visualization Point Clustering 3D Neural Embeddings
Skills: data science, Python, Javascript, statistics, familiarity with AI and latent embedding spaces a big plus
Difficulty: Challenging
Size: Large (350+ hours)
Mentors: Oskar Elek, Kiran Deol

The aim of this project is to explore a novel data-scientific usecase using PolyPhy and its associated web visualization interface PolyGlot. The contributor is expected to identify a dataset they are already well familiar with, and that fits the application scope of the PolyPhy/PolyGlot tooling: a complex point cloud arising from a 3D or a higher dimensional process which will benefit from latent pattern identification and a subsequent visual as well as quantitative analysis. The contributor needs to have the rights for using the dataset - either by owning the copyright or via the open-source nature of the data.

Specific tasks:

Closely work with the mentors on understanding the context of the project and its detailed requirements in preparation of the proposal.
Become acquainted with the tooling (PolyPhy, PolyGlot) prior to the start of the project period.
Document the nature of the target dataset and define the complete data pipeline with assistance of the mentors, including the specific analytic tasks and objectives.
Implement the data pipeline in PolyPhy and PolyGlot.
Document the process and resulting findings in a publicly available report.

These 4 new features will change the way you use OpenROAD

Sun, 29 Oct 2023 00:00:00 +0000

Introduction

Welcome to the final blog post for my GSoC’23! Once again, my name is Jack and I am working under the open-source electronic design automation project - OpenROAD. We are a fast growing leading open-source foundational application for semiconductor digital design, as evidenced from our consistent star growth since inception. You may check us out at this link. Allow me to share the four significant contributions I made in this GSoC project.

1) Improving Ease of Installation

Firstly, OpenROAD is now able to support multiple operating systems. This is essential as one of our primary goals is to democratise chip implementation. And installation is often one of the hardest steps to get right, so that was one of our priorities. Today, we have provided options for different types of installation:

Prebuilt binaries: Local installations can often be riddled with incompatibilities or unexpected bugs, as well as taking a long compilation time. We sidestepped this by providing semi-regular updates to OpenROAD binary, reducing the time to installation.
Docker: Echoing previous concerns, we also enabled Docker installation for 9 major operating systems. Docker is extremely flexible and runs on many operating systems (as long as it is supported by Docker).

With these changes, we have observed 10% reduction of installation related Github issues posted on a weekly basis.

Figure 1: Supported OS matrix

2) Filling Missing Documentation

Next, we have made considerable improvements to over 20 tool-specific documentations, introducing consistent formatting styles for each page. We introduce default values and datatypes to allow users to use the tools with greater ease.

Figure 2: Helpful documentation defaults and datatype

Rather than having all arguments for a function under a common table, we separated out into developer arguments and developer commands. This is to further make our documentation more beginner-friendly to read, while not alienating our technical userbase. We have also added sections for example scripts and regression test, so as to help onboard newcomers to each tool of the flow.

Figure 3: Useful developer commands, example scripts, and regression test instructions

3) Extensible Documentation Framework

Thirdly, we have introduced extensible documentation frameworks. Now, what do we mean by extensible? It means we have created an infrastructure which is easy to use for developers, and allows for greater maintanability. Our goal is to create something that requires minimal changes to add content for documentation.

So, how did we do this?

We introduced 4 initiatives, namely: the warning/error messages glossary. We noticed that people were searching for error and warning messages, but our documentation did not have them. So we added a page where all the error/warning messages along with relevant code line number can be generated automatically. On top of that, developers can add useful debug information to help the end user.

Figure 4: Warning/Error messages glossary.

Next, we also introduced automatically generated Doxygen pages, which integrates nicely into our C++/Tcl source code framework. This automatic generation will make it much more convenient for developers to just insert comments into their source code, and allow Doxygen to generate documentation automatically.

Figure 5: Doxygen pages.

Next, we introduced cloud-based packaging. It is important that our framework is able to runnable on cloud, and the ever-popular notebook format. Our Colab based notebook was created with this in mind, and allows for easy transfer to other notebook providers with some modifications. Check out the notebooks here!

Figure 6: Google Colab can now run OpenROAD scripts.

Lastly, we have the changelog workflow which can be triggered manually. For our open-source project, we have chosen not to do software releases. This means it can be difficult to track the changes between commit numbers. Adding this workflow can help newcomers track the changes easier, by month.

Figure 7: Sample output of github changelog

4) OpenROAD Chatbot

Finally, we are also discussing the potential of creating a chatbot whose purpose is to answer user queries. We were thinking, there are lots of domain knowledge in Slack Channels, Github repos, and so on, so why not create a LLM-based chatbot. Stay tuned for updates!

Personal Reflections

To me, my most valuable takeaway is with regards to code quality. Often times, we as coders tend to opt for the best solution and “hack” something out quickly. Hacking is fine, as a proof of concept - but not for long term code development. Working in open-source projects like this, I have learnt to avoid creating unnecessary files, shortening the code and optimising runtime. In doing our job, we also wish to make life easier, not harder for future developers

Final Words

I would like to express my gratitude to my mentors Indira and Vitor for their guidance and insight throughout the project, as well as the OpenROAD dev team for their assistance. Would also like to thank the Google Summer of Code organising committee, and UCSC for creating such a wonderful program. Being able to contribute to actual real open-source projects with real needs, is truly the best of both worlds for aspiring programmers.

Final GSoC Blog - Polyglot

Mon, 25 Sep 2023 00:00:00 +0000

As I send in my final work submission for the final GSoC evaluation, I’m excited to share with you the progress we’ve made this summer (and future plans for Polyglot!). You can view the repository and web app here: https://polyphyhub.github.io/PolyGlot/. As a quick reminder of the project, we sought to extend the Polyglot web app, as developed by Hongwei (Henry) Zhou. For context, the web app follows this methodology:

Given a set of words, use an embedding model (such as Word2Vec, BERT, etc.) to generate a set of high dimensional points associated with each word.
Use a dimensionality reduction method (such as UMAP) to reduce the dimensionality of each word-vector point to 3 dimensions
Use the novel MCPM (Monte Carlo Physarum Machine) to compute the similarities between a set of anchor points and the rest of the point cloud. You could use any similarity metric here, too, such as the Euclidean distance.
The web app then displays the point cloud of 3-dimensional embeddings, but uses coloring to indicate the level of MCPM similarity each word has with the anchor point (e.g, if the anchor point is the word “dog”, the rest of the point cloud is colored such that words identified as similar to “dog” by the MCPM metric are brighter, whereas dissimilar words are darker.

The main results since the last blog are summarized as follows:

Novel timeline feature in which users can track the importance of certain words over time by watching the change in size of points (computes the IF-IDF metric for a word across all documents in a given year). Uses linear interpolation for years which do not have an explicit importance score.
An industrial collaboration with UK startup Lautonomy, where we have pre-processed and entered their data into Polyglot. Pre-processing consisted of first computing a high dimensional embedding of their set of words using OpenAI’s CLIP model https://openai.com/research/clip and the CLIP-as-service Python package https://clip-as-service.jina.ai. Next, we used UMAP to reduce the dimensionality of these embeddings to 3D. We computed the Euclidean distance on this data (in place of MCPM metric). Finally, we formatted the data to enter into Polyglot.

Although the app has developed a lot over the summer, we are planning to continue working on Polyglot, particularly with respect to one of our original goals: to set up a pipeline from PolyPhy to Polyglot. Unfortunately, with PolyPhy undergoing refactoring this summer, we weren’t able to set this pipeline up. However, that is one of our goals for the next few months. We are also moving forward with the industrial collaboration with legal analytics startup Lautonomy. We hope to release an output together soon!

If you’re curious about Polyglot or are interesting in getting involved, please feel free to reach out to myself, Oskar Elek, and Jasmine Otto!

KV store final Blog

Fri, 25 Aug 2023 00:00:00 +0000

Hello again! Before we get started, take a look at my previous blogs, Introduction and Mid Term. The goal of the project was to implement io_uring based backend driver for client side, which was at that time using traditional sockets. The objective was improving performance from the zero copy capabilities of io uring. In the process, I learnt about many things, about libkinetic and KV stores in general.

I started by writing a separate driver using io_uring in libkinetic/src in ktli_uring.c, most of which is similar to the sockets backend in ktli_sockets.c. The only difference was in the send and receive functions. For more detailed description about the implementation, refer to the mid term blog.

After the implementation, it was time to put it to test. We ran extensive benchmarks with a tool called fio, which is generally used to run tests on filesystems and other IO related things. Thanks to Philip, who had already written an IO engine for testing kinetic KV store (link), I didn’t have much problem in setting up the testbench. Again thanks to Philip, He set up a ubuntu server with the kinetic server and gave me access through ssh. We ran extensive tests on that server, with both socket and uring backends, with several different block sizes. The link to the benchmarks sheet can be found here.

We spent a lot of time in reading and discussing the numbers, probably the most time consuming part of the project, we had several long discussions analyzing numbers and their implications, for example in the initial tests, we were getting very high std dev in mean send times, then we figured it was because of the network bottleneck, as we were using large block sizes and filling up the 2.5G network bandwidth quickly.

In conclusion, we found out that there are many other major factors affecting the performance of the KV store, for example the network, and the server side of the KV store. Thus, though io_uring offers performance benefit at the userspace-kernel level, in this case, there were other factors that had more significant effect than the kernal IO stack on the client side. Thus, for increasing the performance, we need to look at the server side

I would like to thank Philip and Aldrin for their unwavering support and in depth discussions on the topic in our weekly meetings, I learned a lot from them throughout the entire duration of the project.

Grammar, Parsers, and Queries

Sat, 12 Aug 2023 00:00:00 +0000

Update on tree-sitter-pyrope

The pyrope hardware description language now has syntax highlighting available for neovim users. The repository includes a guide to installing the parser, and activating highlights. After we have tested the syntax highlighting, a pull request will be made to the nvim-treesitter repository. In this post, I will outline the highlighting process and reflect on a useful feature of neovim.

Syntax Trees

The pyrope language is described by a grammar. A grammar is a set of rules that describes the allowed structure of a language. A parser uses the grammar to generate a syntax tree. For example, consider this line of pyrope code.

var a:u32 = 0

Using the pyrope parser, we can get a syntax tree for this statement. The command tree-sitter parse file.prp gives us the following output.

(statement [1, 0] - [1, 13]
 (assignment_or_declaration_statement [1, 0] - [1, 13]
 decl: (var_or_let_or_reg [1, 0] - [1, 3])
 lvalue: (complex_identifier [1, 4] - [1, 5]
 (identifier [1, 4] - [1, 5]))
 type: (type_cast [1, 5] - [1, 9]
 type: (primitive_type [1, 6] - [1, 9]
 (sized_integer_type [1, 6] - [1, 9])))
 operator: (assignment_operator [1, 10] - [1, 11])
 rvalue: (constant [1, 12] - [1, 13])))

The nvim-treesitter syntax highlighting is based on this tree structure.

Queries

A query is an expression that selects nodes from the tree. For example,

(complex_identifier (identifier))

matches any identifier that is the child of a complex_identifier. Color schemes in neovim assign colors to different highlight groups. So, we can assign highlight groups to tree queries.

(constant) @number

Now, when a constant shows up in the syntax tree, it will highlight according to the @number group. Most of the work I did on this project involved studying the pyrope grammar, and writing queries based on it.

neovim

The text editor neovim is a popular choice among programmers. It allows advanced user control with configuration files. It also has an active community working on plugins to extend its functionality. Tools such as lazyvim allow for features like code completion and file management that give neovim the same functionality as IDEs. However, because neovim configuration is unique to each user, this may make it difficult to reproduce neovim instructions. For example, Professor Renau was going to test pyrope syntax highlighting in neovim. However, I did not know what configuration was necessary for him to see highlights in neovim. While I knew that syntax highlighting worked on my setup, I have lots of configuration files that may have contributed to that success. There is no guarantee that Professor Renau, or other potential users, have the same neovim configuration that I do.

NVIM_APPNAME

So, Professor Renau suggested I use the $NVIM_APPNAME variable to test the process on a fresh configuration. This feature allows the user to specify the configuration files used to launch neovim. For example, I installed lazyvim to the folder ~/.config/lazy. Then, I launched neovim with NVIM_APPNAME=lazy nvim. So instead of using my default configuration from ~/.config/nvim, the lazyvim configuration was used. This allowed me to use a neovim instance that was unaffected by my configuration files. I was able to preview the process of setting up syntax highlighting from the perspective of a lazyvim user. Similarly, the process can be done with an empty folder to mimic a brand new neovim installation The point is, configuration files can impact reproducibility in neovim. However, this feature allows us to bypass our individual configurations, and create reproducible guidelines.

Conclusion

In conclusion, most of my work involved writing queries for the pyrope tree-sitter grammar. This was for the purpose of syntax highlighting in neovim. However, an important part of any open source project is communicating the results and providing documentation. The NVIM_APPNAME feature helps view neovim from the perspective of different users, which helps for writing useful documentation.

Midpoint Blog Interactive Exploration of High-dimensional Datasets with PolyPhy and Polyglot

Thu, 03 Aug 2023 00:00:00 +0000

The last few months of my GSoC project have been very exciting and I hope to share why with you here in this blog post! To briefly summarize, my project has been focused on further developing the Polyglot app, a tool for visualizing 3D language embeddings. One important part of Polyglot is its utilization of the novel MCPM metric, where points are colored according to their MCPM similarity to a user-chosen “anchor point” (e.g., if “hat” is our anchor point, then similar words like “cap” or “fedora” will be colored more prominently).

The first issue we wanted to tackle was actually navigating the point cloud. With hundreds of thousands of points, it can be difficult to find what you’re looking for! Thus, the first few features added were a search bar for points and anchor points and a “jump to point” feature which changes a user’s center of rotation and “jumps” to a chosen point. There were a few hiccups with implementing these features, mainly due to the large number of points and the particular quirks of the graphics library Polyglot uses. In the end though, these simple features made it feel a lot easier to use Polyglot.

The next set of features related to our desire to actually annotate the point cloud. Similar to how one might annotate a Google doc (ie., highlight a chunk of text and leave a comment), we wanted to set up something similar, but with points! Indeed, this led to the development of a cool brush tool for coloring points, named and commented annotations (up to 5), a search bar within annotations, and finally a button to export annotations and comments to a CSV.

The next few weeks are looking bright as we strive to finish the PolyPhy-Polyglot pipeline (a notebook for quickly formatting MCPM data from PolyPhy and getting it into Polyglot). We also hope to add a unique “timeline” feature in which users can analyze sections of the point cloud based on the associated time of each point. Overall, it’s been a very stimulating summer and I’m excited to push this project even further!

Midterm: High Fidelity UAV Simulation Using Unreal Engine with specular reflections

Wed, 02 Aug 2023 00:00:00 +0000

As part of the Open Source Autonomous Vehicle Controller my proposal under the mentorship of Aaron Hunter and Carlos Espinosa aims to Develop a Unreal Engine based simulator for testing. The simulator will be using Unreal Engine for the physics and visualization.

What we have done so far

We found that we can use Unreal Engine as a physics simulator and co-simulate with Simulink using the tools provided by MathWorks.
Simulated a example provided by MathWorks but i wasn’t getting the expected behaviour and there were very few resource available.
So we decided with using Gazebo and ROS for simulation instead of Unreal Engine and Simulink for the example of a balancing bot which had been designed in Solidworks.
For using Gazebo, i had converted the Solidworks model into an URDF and imported it into Gazebo.

Future Work

Currently, i am working on using Gazebo and ROS for controling a balancing bot using a PID control algorithm. Afterwards document the process of import a model into Gazebo for testing a control algorithm.

ScaleBugs: Reproducible Scalability Bugs

Wed, 02 Aug 2023 00:00:00 +0000

Introduction

As part of the Scalebugs Project, we have worked on building a dataset of reproducible scalability bugs. To achieve this, we go through existing bug reports for popular distributed systems, which include Cassandra, HDFS, Ignite, and Kafka. Workloads are designed to reproduce these scalability bugs by triggering some functionalities of the system under different configurations (e.g., different numbers of nodes), for which we will observe the impact on performance.

So far we have worked on packaging the buggy and fixed versions of scalability systems, a runtime environment that ensures reproducibility, and the workloads used to trigger the symptoms of the bug inside docker containers. By packaging these versions together, we are simplifying the process of deployment and testing. This enables us to switch between different versions efficiently, aiding in the identification and comparison of the bug’s behavior. For each scalability system, we have carefully built a runtime environment that is consistent and reproducible. This approach ensures that each time we run tests or investigations, the conditions remain identical.

New Terms

In order to make sense of the various bug reports, we had to learn some terminologies associated with scalability systems:

Clusters: Clusters are groups of related or connected items, often found in various fields such as computer science, data analysis, or even social sciences. For example, in data analysis, clusters might represent groups of data points with similar characteristics, making it easier to understand patterns or trends in the data.

Cluster Membership: Cluster membership refers to the process of determining which items or entities belong to a particular cluster. This task can be done based on various criteria, such as similarity in attributes, spatial proximity, or shared characteristics.

Locks: In computer programming, locks are mechanisms used to manage access to shared resources, such as files, data structures, or hardware devices. When multiple processes or threads need to access a shared resource simultaneously, locks ensure that only one process or thread can access it at a time, preventing data corruption or conflicts.

Lock Contentions: Lock contention occurs when multiple processes or threads attempt to acquire the same lock simultaneously. When this happens, one process or thread must wait until the lock becomes available, leading to potential delays and reduced performance.

Critical Paths: In project management or process analysis, a critical path is the longest chain of dependent tasks that determines the overall duration of the project or process. Any delay in tasks along the critical path will directly impact the project’s completion time.

Tokens: Tokens can have various meanings depending on the context. In computer programming, tokens are the smallest units of source code recognized by a compiler or interpreter. In cryptography, tokens can represent digital certificates or authentication data used for secure communication.

Nodes: In the context of network theory or graph theory, nodes are individual points or entities that form a network or graph. In a computer network, nodes can be devices like computers or routers, and in a social network, nodes can represent individuals or entities.

Peers: Peers are entities within a network that have the same status or capabilities. In peer-to-peer networks, each node can act as both a client and a server, enabling direct communication between nodes without relying on a central server.

Gossipers, Gossip Protocol: In distributed systems, gossipers are nodes that share information with each other using the gossip protocol. The gossip protocol involves randomly selecting peers and exchanging information in a decentralized manner, allowing information to spread quickly across the network.

Threads: Threads are the smallest units of execution within a process in computer programming. Multiple threads can run concurrently within a single process, enabling multitasking and parallel processing. Threads can share the same resources within the process, making them more lightweight than separate processes. However, proper synchronization is essential to prevent data corruption or conflicts when multiple threads access shared resources.

Flush and Writes Contention: This refers to a situation where simultaneous operations involving data flushing (saving data to a storage medium) and data writing (updating or adding data) are causing conflicts or delays. This contention can arise when multiple processes or threads attempt to perform these operations concurrently, leading to performance bottlenecks or potential data integrity issues.

Accomplishments

We have been able to build docker containers for the following scalability bugs:

IGNITE 12087

This bug stems from the resolution of the IGNITE-5227 issue (another bug), which has led to a significant decline in the performance of a particular operation. Prior to addressing IGNITE-5227, the insertion of 30,000 entries displayed remarkable efficiency, completing in roughly 1 second. However, post the resolution, executing the same insertion process for 30,000 entries witnessed a considerable slowdown, taking approximately 130 seconds – a performance degradation of nearly 100 times.

CASSANDRA 14660

This bug is related to how clusters work together and how a lock is causing conflicts with the critical path. The issue arises from a method call that uses O(Peers * Tokens) resources while contending for a lock, which is causing problems in the write path. The lock is used to protect cached tokens that are essential for determining the correct replicas. The lock is implemented as a synchronized block in the TokenMetadata class.

How was this fixed?

It was fixed by reducing the complexity of the operation to O(Peers) taking advantage of some properties of the token list and the data structure.

CASSANDRA 12281

This bug is also related to how clusters work together and a lock conflict. The issue arises when a specific method is trying to access a lot of resources (O(Tokens^2)) while contending for a read lock. As reported, a cluster with around 300 nodes has around 300 * 256 (assuming the default number of tokens) tokens, thus joining a new member reportedly is taking more than 30 mins. This happens because due to the long execution time here, this lock makes every gossip message delayed, so the node never becomes active.

How was this fixed?

The granularity of the lock is decreased, meaning that the expensive function calls now do not take the problematic read lock and simply use a synchronized block, synchronizing on a specific field, that does the job much better.

HA16850

This is a bug related to obtaining thread information in the JvmMetrics package. When obtaining thread information, the original buggy version used MXBeans to obtain thread information. The call uses an underlying native implementation that holds a lock on threads, preventing thread termination or creation. This means that the more threads that we have to obtain information for, the longer the function call will hold a lock. The result is that the execution time scales on the number of active threads O(threads).

How was this fixed?

Developers utilized a ThreadGroup to keep track of obtaining metrics for threads. The result is that there is no lock held for every thread.

CA13923

This issue revolves around conflicts between the “flush” and “writes” processes. The main problem is that during the “flush” process, a resource-intensive function called “getAddressRanges” is invoked. This function has a high computational cost and its complexity is O(Tokens^2). In other words, the time it takes to complete this function grows quickly as the number of “tokens” increases. This situation is causing challenges and delays in the overall process.

How was this fixed?

This function call affected many paths and they made sure no one calls getAddressRanges in critical paths.

Challenges

Demanding Memory Requirements: Running certain builds consumes a significant amount of memory. This places a strain on system resources and can impact the overall performance and stability of the process.

Little Issues Impacting Execution: Often, seemingly minor details can obstruct the successful execution of a build. Resolving such issues requires thorough investigation and extensive research into similar problems faced by others in the past.

Complexities of Scalability Bugs: Identifying the underlying causes of scalability-related bugs is intricate. These bugs exhibit unique characteristics that can complicate the process of pinpointing and comprehending their root origins.

What is Docker? ( For those who don’t know about it )

Docker is a platform that facilitates the containerization of applications, leading to consistent and efficient deployment across diverse environments. Its benefits include portability, resource efficiency, isolation, and rapid development cycles. DockerHub complements Docker by providing a centralized hub for sharing and accessing container images, fostering collaboration and ease of use within the Docker ecosystem.

More about docker https://docs.docker.com/get-started/overview/

Midterm: Open Source Autonomous Vehicle Controller

Tue, 01 Aug 2023 00:00:00 +0000

As part of the Open Source Autonomous Vehicle Controller Project my proposal under the mentorship of Aaron Hunter and Carlos Espinosa aimed to create comprehensive technical documentation to help onboard new users of the OSAVC controller.

I have accomplished the following:

From the KiCad Schematic Editor, created pinouts of the I/O connectors on the OSAVC.
Detailed a hardware overview of the OSAVC by labeling and describing each electrical component.
Documented the setup for loading code on the OSAVC, including software such as Git, MPLAB X, XC32 Compiler, and serial terminal and hardware by showing how to connect the PICKit3 and OSAVC to a PC.
Tested the OSAVC by receiving and transmitting characters in the serial port into a buffer.
Fixed bugs/errors in the NEO_M8N GPS module library and PWM motors library.
Created a new library for the uni and bidirectional ESC brushless motors.
Created a user-interfaced test harness for all peripherals: serial, IMU, GPS, encoder, PWM actuators, radio telemetry, Mavlink heartbeat, radio controller, and LIDAR.
Incorporated new user interface element and fixed video streaming errors in the Flask app running on the Raspberry Pi 4 communicating with the OSAVC.
Documented both software and hardware steps to run the OSAVC with a companion computer such as a Raspberry Pi 4.
Highlighted common problems encountered with the OSAVC.
Created a contributor’s guide for others to create new libraries or contribute to the OSAVC project.
Designed a switching voltage regulator in SOLIDWORKS
Designed a self balancing bot that employs the OSAVC in SOLIDWORKS

Future Work

Currently, the laser cutter at UCSC is in maintenance, so we couldn’t assemble the self balancing bot yet. Once we assemble it, I will finish and document the control algorithms. We can also try incorporating ML models on the Raspberry Pi with the Coral USB accelerator on the self balancing bot.

Implemented IO uring for Key-Value Drives

Mon, 31 Jul 2023 00:00:00 +0000

Hi everyone!

I’m Manank Patel, (link to my Introduction post) and am currently working on Efficient Communication with Key/Value Storage Devices. The goal of the project was to leverage the capabilities of io_uring and implement a new backend driver.

In the existing sockets backend, we use non-blocking sockets with looping to ensure all the data is written. Here is a simplified flow diagram for the same. The reasoning behind using non blocking sockets and TCP_NODELAY is to get proper network utilization. This snippet from the code explains it further.


NODELAY means that segments are always sent as soon as possible,
even if there is only a small amount of data. When not set,
data is buffered until there is a sufficient amount to send out,
thereby avoiding the frequent sending of small packets, which
results in poor utilization of the network. This option is
overridden by TCP_CORK; however, setting this option forces
an explicit flush of pending output, even if TCP_CORK is
currently set.

In the above figure, we have a loop with a writev call, and we check the return value and if all the data has not been written, then we modify the offsets and then loop again, otherwise, if all the data has been written, we exit the loop and return from the function. Now this works well with traditional sockets, as we get the return value from the writev call as soon as it returns. In case of io_uring, if we try to follow the same design, we get the following flow diagram.

Here, as you can see, there are many additional steps/overhead if we want to check the return value before sending the next writev, as we need to know how many bytes has been written till now to change the offsets and issue the next request accordingly. Thus, in every iteration of the loop we need to to get an sqe, prep it for writev, then submit it, and then get a CQE, and then wait for the CQE to get the return value of writev call.

The alternate approach would be to write the full message/iovec atomically in one call, as shown in following diagram.

However, on trying this method, and running fio tests, we noticed that it worked well with smaller block sizes, like 16k, 32k and 64k, but was failing constantly with larger block sizes like 512k or 1m. This was because it was not able to write all the data to the socket in one go. This method showed good results as compared to sockets backend (for small BS i.e). We tried to increase the send/recv buffers to 1MiB-10MiB but it still struggled with larger blocksizes.

Going forward, we discussed a few ideas to understand the performance trade-offs. One is to use a static variable and increment it on every loop iteration, in this way we can find out if that is really the contirbuting factor to our problem. Another idea is to break down the message in small chunks, say 256k and and set up io uring with sqe polling and then link and submit those requests in loop, without calling io_uring_submit and waiting for CQE. The plan is to try these ideas, discuss and come up with new ideas on how we can leverage io_uring for ktli backend.

PDC Midterm Evaluation

Sun, 30 Jul 2023 00:00:00 +0000

Mid-Term Evaluation Update

Hello! I’m Nick, a GSoC contributor for the Proactive Data Containers (PDC) Project. Over the past few weeks I’ve worked on verifying the functionality of the Python API for the PDC project and ensuring the smooth onboarding for new users of the data containers.

I began by documenting the installation of the Ubuntu virtual machine in order to run the PDC repository, since the project wasn’t initially supported on Apple silicon hardware. The installation notes that I recorded for PDC would help contribute towards a more refined and precise process that can be seen updated on the github webpage.

After installing the dependencies of the project onto the VM, I would begin maintaining the existing Python API and making changes that would allow the tests to compile and run successfully. The manual setup had a few problems with file directories paths that prevented the installation of a few files on new devices, which I fixed by manually by linking the path and removing a few header files. However, this proved to only be a temporary fix as the prior issues was evidence of a hardcoded path, which was resolved by some alteration and fishing in the source code.

Now the PDC and PDCpy installations should go smoothly regardless of what OS is being used, and the instruction documentation can be found from the github page which should allow any user to access the data containers.

Building extensions between Python libraries for Biotechnology laboratories

Fri, 28 Jul 2023 00:00:00 +0000

Hello again! This is Luiza, a GSoC contributor for the LabOp Project. My task is to build bridges between programming languages for Biotechnology Laboratory automation.

When talking about life sciences, reproducibility is a issue amongst most research centers. Biotechnology focused laboratories usually have their own protocols developed in house for their own applications. Researchers rely on such protocols to perform their experiments and collect data but when it comes to sharing those protocols and performing them in different laboratories many difficulties arise. Whether it is by lack of equipment, reagents or even by having different orders of execution, replicating a protocol in another laboratory is a challenge. To address this issue LabOp was developed to represent a protocol and convert it in many ways possible, so it can be executed by humans and by machines.

PylabRobot and PyHamilton also come to the picture as such libraries exist to make it possible to write protocols for Hamilton robots(and Tecan machines as well for PylabRobot) but those libraries share the limitation of being able to only represent laboratory protocols at their lower levels, with the user having to write every single command in Python for the protocol to be executed. Thus I’m currently developing an extension for LabOp protocols to be converted into PylabRobot/PyHamilton scripts. This way the researcher writing the protocol can do it in a friendlier fashion, using human-friendly terms to write protocols for robot execution.

BehaviourSpecialization for Liquid Handling class

The first step is building a correspondence spreadsheet with a hello world protocol written in both languages (LabOp | PylabRobot ). This way we can make an equivalence between the functions, parameters and default commands of both Libraries, as well as their structure. This spreadsheet will serve as guidance for the conversion of the Liquid handling steps from their representation in LabOp to their representation in Pylabrobot.

The second step is to create a file that’ll do execute the conversion. In this file I will define a Labware map that’s basically a dictionary translating the resources LabOp names into Labware IDs recognizable by PylabRobots “resource” classes and a Behaviourspecialization class that should convert LabOp actions into PylabRobots Liquid Handler class operations as they’ll coordinate the commands sent from the script to the machines.(see featured images)

Dictionary for LabOp to Pylabrobot container correspondence

Then we move to the protocol that will be tested on the Hamilton Machines, this is a Plasmid purification protocol that is usually performed by a human at a very lower level, one sample at a time. This limitation is not present on Hamilton robots as they can handle many samples at the same time with only one protocol execution. The robot that will be running this protocol has two modules that are not yet present in PylabRobot’s extensions, a pressure pump module and a on deck heatershaker. I’ll be implemmenting this modules in PylabRobot based on their default commands present in PyHamilton and run the protocol on a Hamilton Starlet unit.

The steps of the protocol have been decoupled to facilitate the pilot testing, they are as follows:

Liquid handling - GOOD TO GO
Pressure pump module- requires adjustments
plate grippers(necessary to move the plasmid plate from one module to another) - requires adjustment
On deck heaterShaker- GOOD TO GO

The first pilot tests of the protocol will be run with water instead of plasmid to verify that all the steps are going smoothly, when that’s out of the way we will perform the protocol with dirty plasmids that require purification (which is what the protocol is for). The measurements for success will be sequencing the plasmid (if possible), performing a gel eletrophoresis and measuring absorbance of the DNA.

The goal of this tests is to gather data from the efectiveness of the protocol and its execution on the machine, thus confirming that it is in fact a useful mechanism for DNA purification.

PolyPhy Infrastructure Enhancement

Thu, 27 Jul 2023 00:00:00 +0000

As part of the Polyphy Project, my proposal was aimed at improving various aspects of the project, including CI/CD workflows, encapsulation, and security. Under the mentorship of Oskar Elek, I have made significant progress in the following areas:

Fixed GitHub CI Workflows and Release to PyPI: During the first phase, I focused on refining the GitHub CI workflows by implementing new flows that facilitate seamless releases to PyPI. This ensures that the project can be easily distributed and installed by users, making it more accessible and user-friendly.
Encapsulation from Jupyter into Module: I successfully encapsulated the code from Jupyter notebooks into a module. This step is crucial as it prepares the codebase to be released as a standalone module, making it easier for developers to use and integrate into their own projects.
SonarCloud Integration for Better Code Analysis: To ensure the codebase’s quality, I set up SonarCloud to perform comprehensive code analysis. This helps in identifying potential issues, bugs, and areas of improvement, leading to a more robust and reliable project.
Migration to Docker from Tox: In order to improve the containerization process, I replaced the existing solution, Tox, with Docker. Docker provides better container management and ensures a consistent development and deployment environment across different platforms.
Research on Community Platforms for Self-Hosting: I conducted extensive research on various community platforms suitable for self-hosting. This will enable the project to establish a thriving community and foster active collaboration among users and contributors.
Enhanced Security Measures: I implemented several security improvements to safeguard the project and its users. These include setting up a comprehensive security policy, implementing secret scanning to prevent unintentional exposure of sensitive information, code scanning to identify potential vulnerabilities, private vulnerability reporting to handle security issues responsibly, and Dependabot integration for monitoring and managing dependencies.
Upgraded Taichi to Utilize Class-Based Features: As part of the project’s development, I successfully upgraded Taichi to utilize class-based features available, thereby enhancing the codebase’s organization and maintainability.

Moving forward, I plan to continue working diligently to achieve the goals outlined in my proposal. The improvements made during the first half of the GSoC program have laid a strong foundation for the project’s growth and success.

Stay tuned for further updates and exciting developments as the project progresses!

Uncovering Actionable Insights using ReadTheDocs Analytics

Thu, 27 Jul 2023 00:00:00 +0000

Introduction

Hello again! This is Jack, a GSoC contributor for the OpenROAD Project. My task is to update and optimise the documentation to encourage user adoption and engagement.

For open-source repo maintainers, readthedocs is a godsend. One of its more underrated features are in providing search and traffic analytics of up to 90 days for the Community tier users. This is awesome, because ReadTheDocs is “always free for open source and community projects”.

Motivation

Why are analytics important?

Analytics are great as a proxy indicator for documentation engagement. For instance, traffic to a page, could highlight how popular the tool is, or it could also mean the tool is unclear and therefore people might need more visits to the page to further understand usage. But overall, it still indicates that the page needs to be taken care of due to the increased visits.

In what follows we aim to provide a quick tutorial as well as list out some of the actionable insights we uncovered in the OpenROAD/OpenROAD-flow-scripts documentation project.

Preamble

To download the analytics raw csv files, refer to this website.

You should also have the following packages installed: pandas, numpy, matplotlib, scipy.

Traffic Analytics

Traffic analytics are easy to understand. It comes in the format Date, Version, Path, DailyViews as follows:

df = pd.read_csv('ta_or.csv')[::-1].reset_index(drop=True)
df.Date = df.Date.apply(lambda x: x.split()[0])
df.head()

Figure 1: Loading traffic analytics dataframe

The raw data is not all that informative. Let us aggregate the data to obtain the weekly views.

weeklydf = df.copy()
weeklydf.Date = pd.to_datetime(weeklydf.Date) - pd.to_timedelta(7, unit='d')
weeklydf = weeklydf.groupby(['Path', pd.Grouper(key='Date', freq='W')])['Views']\
 .sum()\
 .reset_index()\
 .sort_values('Date')
weeklydf[weeklydf.Path == '/index.html']

Figure 2: Aggregated weekly traffic

Note that we can replace the page path with any interesting page path we desire. A useful command to obtain all possible page paths in this dataset is to use:

weeklydf.Path.unique()

Figure 3: Unique paths in dataset

With these neat data in our arsenal, let us do some plotting! For the visualisation, we have chosen to use the traffic aggregated on a daily scale. On top of this, we also plot a linear best-fit line of all the points to track the trendline over time.

The code below shows how to plot the top 20 pages.

def plot_views(df, numPages = 20):
 # Groupby Path, sum views
 pathResults = df.groupby('Path').Views.sum().sort_values(ascending=False)
 fig, ax = plt.subplots(numPages, figsize = (15,30))
 fig.tight_layout()

 for i in range(numPages):
 key = pathResults.index[i]
 temp = df[df.Path == key]
 ax[i].scatter(temp.Date, temp.Views)
 ax[i].set_xticks(np.arange(0,90, 7)) # this line is to not clutter the x-axis too much.
 ax[i].set_ylabel('Views')
 ax[i].set_title(key)

 # linear regression
 x, y = temp.Date, temp.Views
 bestfit = stats.linregress(range(len(y)),y)
 print(bestfit)
 equation = str(round(bestfit[0],2)) + "x + " + str(round(bestfit[1],2))
 ax[i].plot(range(len(y)), np.poly1d(np.polyfit(range(len(y)), y, 1))(range(len(y))), '--',label=equation)
 ax[i].legend(loc='upper right')

Figure 4: Top 20 pages by daily view counts (in descending order)

Also, we can aggregate the total views by day to plot daily traffic:

def plot_daily_traffic(df):
 # Groupby Date, sum views
 fig = plt.figure(figsize = (15,10))
 dateResults = df.groupby('Date').Views.sum()
 x, y = dateResults.index, dateResults.values
 plt.scatter(x, y)
 plt.xticks(np.arange(0,90, 7))
 plt.ylabel('Views')
 plt.title('Traffic by Day')

 # linear regression
 bestfit = stats.linregress(range(len(y)),y)
 print(bestfit)
 equation = str(round(bestfit[0],2)) + "x + " + str(round(bestfit[1],2))
 plt.plot(range(len(y)), np.poly1d(np.polyfit(range(len(y)), y, 1))(range(len(y))), '--',label=equation)
 plt.legend(loc='upper right')

Figure 5: Daily aggregated traffic

Key Trends:

Notice how there seems to be a cyclical pattern every week - rise in average view counts during Mon-Fri, then a falloff on weekends. This is most evident in the pages /index.html, /main/README.html. This could be attributed to the standard work or study week of Mon-Fri.
According to the gradient of the best-fit line for Figure 2, there seems to be a slow decline of traffic for the OpenROAD docs. For a gradient of -0.77, it translates roughly to decline of 22 views per month. The small decline could be attributed to the higher traffic from 19-29 March 2023, the dates for the OpenROAD 7nm design contest. Contest are always good for driving traffic.

Actionable insights:

Top pages are usually landing pages: index.html, main/README.html, main/src/README.html. We thus prioritised making these pages more readable and concise.
This is followed by tutorial /tutorials/index.html and /search.html. The prominence of the tutorials page made us shift the tutorials link to a higher position on the left navigation sidebar. Search tips were also included to obtain better search results. More about search in the next section.
Next, as OpenROAD consists of 20 tools: traffic analytics helps us come up with an order to update: ifp, gui, odb, ppl, sta, grt, mpl, gpl, rsz, rcx. pdn, cts, psm

Search Analytics

Search analytics come in the form of: Date, Query, TotalResults. Contrary to traffic analytics, TotalResults do not refer to search count for the query that day, but rather it corresponds to the total results returned by that query on that day. Separate aggregation still needs to be done to obtain the final count.

Firstly, let us load the dataset and perform a groupby on the column Date to obtain the daily count aggregates.

df = pd.read_csv('sa_or.csv')[::-1].reset_index(drop=True)
df = df.rename(columns ={'Created Date': 'Date', 'Total Results': 'TotalResults'})
df.Date = df.Date.apply(lambda x: x.split()[0])

dateResults = df.groupby('Date').TotalResults.count()
dateResults

Figure 6: Code output for daily aggregated search counts.

Now we are ready to plot the daily aggregated searches. This represents the number of times a search was performed on the documentation website.

def plot_daily_searches(df):
 dateResults = df.groupby('Date').TotalResults.count()
 x, y = dateResults.index, dateResults.values
 plt.scatter(x, y)
 plt.xticks(np.arange(0,90, 7))
 plt.ylabel('# Times Searched')
 plt.title('Search count by day')

 # linear regression
 bestfit = stats.linregress(range(len(y)),y)
 print(bestfit)
 equation = str(round(bestfit[0],2)) + "x + " + str(round(bestfit[1],2))
 plt.plot(range(len(y)), np.poly1d(np.polyfit(range(len(y)), y, 1))(range(len(y))), '--',label=equation)
 plt.legend(loc='upper right')

Figure 7: Daily aggregated search counts

We can also do an additional plot for queries that return zero results. In other words, we are interested in the terms people are curious about; but is not covered by our documentation currently. Think of it as an on-site search engine optimisation.

zeroResults = df[df.TotalResults == 0]
zeroResults = zeroResults.groupby('Query').Date.count().sort_values(ascending=False)
print('\nAll 0 results queries (desc)\n')
print(zeroResults.index.tolist())

Example output as follows:

['autotuner', 'tdms', '*macro*', 'rtlmp_max_inst', 'get_property',
'check_setup', 'centos', 'initialize_padring', 'core_utilization',
'pin_access', 'read_libraries', 'config', 'eco', 'rpt',
'improve_placement', 'define_process_corner', 'global_place',
'report_worst_slack', 'max_phi_cof', 'report_power', 'get_pins',
'registerfile', 'set_global_routing', 'prebuilt', 'env',
'repair_clock_inverters', 'set_thread_count', 'report_',
'partition_design', 'place_cell', 'blockage', 'partitionmgr',
'nmos', 'tuner', 'write_sdf', 'place_density', 'place_pins_args',
'size_cell', '*macor*', 'repair_clock_inverter', 'misk',
'readhaty', 'readhat', 'obstruct', 'odbpy', 'openpdn', 'openram',
'placement_cfg', 'read_macro_placement', 'output_drc', 'positon',
'pct', 'qrctechtable', 'qrctechfile', 'qrctech', 'qrc',
'properly covered', 'precision innovations', 'repeater', '"rcx-0487"',
'report_worst', 'report_area', 'report_clock_properties', 'skywater',
'study', 'sv', 'synth', 'synth_hierarchical', 'systemverilog',
'tdm', 'tdms_place', 'triton', 'ungroup', 'verilog_files',
'wrc', 'write_lef', 'write_partition_verilog', 'שואם',
'si2', 'sever', 'setrc', 'rtl_macro', 'report_dcalc', 'report_design',
'report_design_info', 'report_instance', 'report_slews', 'resize',
'rtlmp', 'set_power_activity', 'rtree', 'run_all', 'run_all.tcl',
'sc', 'set_all_input_output_delays', 'set_io_pin_constraints', 'metis',
'lefdef', 'make_result_file', 'macro_placement_cfg', 'clock__details',
'clocks__details', 'combinational', 'config.mk', 'coord',
'core_margin', 'db_process_node', 'dbblocjs', 'dbdatabase',
'dbr', 'dbrt', 'dbrttree', 'debian', 'define_pin_shape',
'densiy', 'desgin', 'diff_file', 'clk_period', 'clk_io_ptc',
'cdl', 'analog', './env.sh', '178', '6_final',
'6_final.odb', '_placement', 'abat', 'add_stripe', 'arch',
'ccs', 'binaries', 'bookshelf', 'buff_cell', 'buildwithdocker',
'busbitchars', 'buschar', 'captable', 'directoryobject',
'disallow_one_site_gaps', 'distribute', 'is_port', 'hierarch',
'hop', 'hyper', 'initialie_flooorplan', 'initialize_flooorplan',
'instance_count', 'is_chip', 'lean', 'gui_final', 'lec',
'*def*', 'limitation', 'lyp', 'maco', 'macro_pin',
'macro_place', 'harness', 'gui.py', 'dont', 'fill_cell',
'dreamplace', 'em', 'enable_dpo', 'energy', 'env.sh', 'erc',
'export', 'findmaste', 'grt_layer_adjustments', 'findmaster',
'freepdk45', 'gdt', 'global_', 'global_place_db',
'global_placementy', 'graph', '갲']

For our case we can roughly the problem with these zero-result queries fall under one of these categories:

Missing documentation: Either the parameter of functionality
Typo: User has the right keyword, but did not type it correctly. We will therefore provide them with search tips such as using fuzziness ~N operator for better matches.

Future Work

ReadTheDocs could also be linked with Google Analytics, but this remains for more advanced users.

Another rich source of information helpful to open-source maintainers are GitHub issues. These are the direct platform where users discuss their problems. Another great way to track documentation engagement is to use metrics such as: installation issues per unit week, or user-issue retention rate, which tracks the number of users that continue to file issues after their first.

Conclusion

This post showcases the amount of insight one can gather from parsing traffic and search analytics. It also provides useful Python functions that can be applied to the analytics dataset for fast prototyping and experimentation. If you are a contributor to open-source projects, try uncovering some insights for your doc pages today!

Highlighting and Formatting Pyrope HDL

Thu, 22 Jun 2023 00:00:00 +0000

As part of Micro Architecture Santa Cruz (MASC) my proposal under the mentorship of Jose Renau aims to develop syntax highlighting and a vertical alignment tool for Pyrope. Pyrope is a modern hardware description language under development by MASC. Code is parsed with the tree-sitter grammar for Pyrope. I am working on developing a query file for the nvim-treesitter plugin. This gives neovim users Pyrope syntax highlighting based on the parse tree. In addition to syntax highlighting, I am working on a vertical alignment tool to improve code readability. These features will improve the usability and convenience of Pyrope.

Proactive Data Containers

Tue, 20 Jun 2023 00:00:00 +0000

As part of the Proactive Data Containers (PDC) my proposal under the mentorship of Houjun Tang aims to novel data abstraction for managing science data in an object-oriented manner. PDC’s will provide efficient strategies for moving data in deep storage hierarchies and techniques for transforming and reorganizing data based on application requirements. The functionality of the container object themselves are already well developed, so my goal will be to verify the functionality tests regarding the Python API to ensure that it can be used with ease, as well as create command line tools so that it is a complete data object that can be used across platforms and is simple and helpful for the users.

Interactive Exploration of High-dimensional Datasets with PolyPhy and Polyglot

Fri, 16 Jun 2023 00:00:00 +0000

Hello! My name is Kiran and this summer I’ll be working with Polyphy and Polyglot under the mentorship of Oskar Elek. The full proposal is available online.

For a brief overview, the Polyglot app allows users to interact with a 3D network of high-dimensional language embeddings, specfically the Gensim Continuous Skipgram result of Wikipedia Dump of February 2017 (296630 words) dataset. The high-dimensional embeddings are reduced to 3 dimensions using UMAP. The novel MCPM slime mode metric is then used to compute the similarty levels between points (much like how you might compute the Euclidean distance between two points). These similarity levels are used to filter the network and enable users to find interesting patterns in their data they might not find using quantitative methods alone. For example, the network has a distinct branch in which only years are nearby! Users might find other clusters, such as ones with sports words or even software engineering words. Although such exploration may not lead to quantitatively significant conclusions, the ability to explore and test mini hypotheses about the data can lead to important insights that go on to incite quantitatively significant conclusions.

In our project, we aim to expand Polyglot such that any user can upload their own data, once they have computed the MCPM metric using PolyPhy. This will have important applications in building trust in our data and embeddings. This could also help with research on the MCPM metric, which presents a new, more naturalistic way of computing similarity by relying on the principle of least effort. Overall, there is an exciting summer ahead and if you’re interested in keeping up please feel free to check out the Polyglot app on Github!

Optimizing FasTensor: Enabling Efficient Tensor Execution on GPUs

Mon, 05 Jun 2023 00:00:00 +0000

Greetings,

I am Rishabh Singh, and I am excited to be part of the 2023 Google Summer of code program. My proposal under the mentorship of John Wu and Bin Dong focuses on optimizing the FasTensor tensor computing library for efficient usage on GPUs, specifically targeting tensor contraction while preserving structure-locality. This optimization is crucial for scientific applications and advanced AI model training. Throughout the project, I will develop custom computational operations for GPUs, implement FasTensor on GPUs, assess its performance, and provide comprehensive documentation. By the end, I aim to deliver a working implementation, a performance report, and a detailed execution mechanism guide. Leveraging my background in software engineering and machine learning, I will utilize languages like C++ and OpenMP to ensure efficient memory management and data movement. Stay tuned for regular updates and informative blogs as I progress through the summer.

ScaleBugs: Reproducible Scalability Bugs

Thu, 01 Jun 2023 00:00:00 +0000

Hello! As part of the ScaleBugs project our proposals (proposal from Goodness Ayinmode and proposal from Zahra Nabila Maharani) under the mentorship under the mentorship of Cindy Rubio González,Haryadi S. Gunawi and Hao-Nan Zhu aims to build a dataset of reproducible scalability bugs by analyzing bug reports from popular distributed systems like Cassandra, HDFS, Ignite, and Kafka. For each bug report, we will analyze whether the reported bug is influenced by the scale of the operation, such as the number of nodes being used or a number of requests. The resulting dataset will consist of bug artifacts containing the buggy and fixed versions of the scalability system, a reproducible runtime environment, and workload shell scripts designed to demonstrate bug symptoms under different scales. These resources will help support research and development efforts in addressing scalability issues and optimizing system performance.

Intro: Open Source Autonomous Vehicle Controller

Tue, 30 May 2023 00:00:00 +0000

As part of the Open Source Autonomous Vehicle Controller Project my proposal under the mentorship of Aaron Hunter and Carlos Espinosa aims to create comprehensive technical documentation to help onboard new users of the OSAVC controller. I will be writing tutorials and examples to demonstrate how to start with an OSAVC, programming it with the robotic equivalent of HelloWorld and later moving onto more sophisticated explanations. Hence, this will encourage more applications and wider adoption in the field of autonomous vehicles and expand the community of OSAVC users.

Enhancing and Validating LiveHD's Power Modeling Flow

Mon, 29 May 2023 00:00:00 +0000

As part of the Enhancing and Validating LiveHD’s Power Modeling Flow my proposal under the mentorship of Jose Renau and Sakshi Garg aims to enhance and validate LiveHD’s power modeling flow, a critical feature for estimating power consumption in modern hardware designs. The existing flow requires further refinement to ensure its stability, accuracy, compatibility with a wider range of netlists and VCD files, and overall performance. To address these challenges, the project will focus on methodically debugging the current implementation, establishing a comprehensive validation methodology for verifying the accuracy of power estimates, and optimizing the flow to handle larger netlists and VCD files efficiently. Additionally, the project aims to improve existing documentation by providing detailed explanations, examples, and tutorials to facilitate user adoption and understanding. Upon successful completion, the project will deliver a more reliable, accurate, and efficient power modeling flow within LiveHD, contributing to the development of energy-efficient hardware designs. This refined flow will not only enhance the capabilities of LiveHD but also encourage wider adoption and utilization by the hardware design community, fostering innovation in the field of energy-efficient devices and systems.

High Fidelity UAV Simulation Using Unreal Engine with specular reflections

Mon, 29 May 2023 00:00:00 +0000

As part of the Open Source Autonomous Vehicle Controller my proposal under the mentorship of Aaron Hunter and Carlos Espinosa aims to Develop a unreal engine based simulator for testing. The simulator will be using unreal engine for the physics and visualization.

The existing framework uses gazebo simulator with ROS which limit the developement to only Python and C++ programing languages. I intend to develope this simulator with intention connecting it with Python and C++, additionaly expanding support to Matlab so that in future the control algorithm design and validation process becomes easier. To smoothen future developement, i intent to add detailed documentation consisting of the developement period weekly report, examples and tutorial. Upon succesful completion, the project will deliver a powerful simulator with realistic simulation using unreal engine and additional support other programming languages like matlab.

For more information about the Open Source Autonomous Vehicle Controller and the UC OSPO organization, you can visit the OSAVC project repository and the UC OSPO website.

OpenRAM Layout verses Schematic (LVS) visualization

Mon, 29 May 2023 00:00:00 +0000

As part of the OpenRAM Layout verses Schematic (LVS) visualization my proposal under the mentorship of Jesse Cirimelli-Low and Matthew Guthaus aims to develop a comprehensive Python-based graphical user interface (GUI) with a robust backend system to effectively analyze, visualize, and debug layout versus schematic (LVS) mismatches in the OpenRAM framework. The proposed solution focuses on efficiently processing LVS report files in JSON format, identifying mismatched nets in the layout, and visually representing extra nets in the schematic graph using advanced backend algorithms. By implementing a powerful backend system, the GUI will streamline the debugging process and improve overall productivity, while maintaining high performance and reliability. The deliverables for this project include a fully-functional GUI with a performant backend, features for visualizing and navigating through LVS mismatches, comprehensive documentation, and user guides.

Efficient Communication with Key/Value Storage Devices

Fri, 26 May 2023 00:00:00 +0000

Hi everyone!

I’m Manank Patel, and am currently an undergraduate student at Birla Institute of Technology and Sciences - Pilani, KK Birla Goa Campus. As part of the Efficient Communication with Key/Value Storage Devices my proposal under the mentorship of Aldrin Montana and Philip Kufeldt aims to implement io_uring based communication backend for network based key-value store.

io_uring offers a new kernel interface that can improve performance and avoid the overhead of system calls and zero copy network transmission capabilities. The KV store clients utilize traditional network sockets and POSIX APIs for their communication with the KV store. A notable advancement that has emerged in the past two years is the introduction of a new kernel interface known as io_uring, which can be utilized instead of the POSIX API. This fresh interface employs shared memory queues to facilitate communication between the kernel and user, enabling data transfer without the need for system calls and promoting zero copy transfer of data. By circumventing the overhead associated with system calls, this approach has the potential to enhance performance significantly.

Update OpenROAD Documentation and Tutorials

Fri, 26 May 2023 00:00:00 +0000

Hi! I am Jack, a Masters student at the National University of Singapore. In GSoC 2023, I will be undertaking the project entitled Update OpenROAD Documentation and Tutorials to improve the user experience and documentation of this exciting open-source RTL-to-GDSII framework, jointly mentored by Indira Iyer Almeida and Vitor Bandeira. Check out my proposal here!

This project aims to review and update missing documentation and tutorials in OpenROAD-flow-scripts. A key focus will be on increasing ease-of-setup by updating documentation, setup scripts and docker-based commands. Next, we will also update documentation for the following OpenROAD components: Makefile flow variable, distributed detailed routing, Hier-RTLMP, Autotuner. If time permits, cloud enablement will be implemented, alongside notebook-based packaging to further increase ease of adoption.

Advancing Reproducible Science through Open Source Laboratory Protocols as Software

Thu, 25 May 2023 00:00:00 +0000

Hello everyone!

My name is Luiza, I am an eighth-semester Bsc Biological Sciences student from São Paulo, Brazil. As part of the LabOp working group, my proposal under the mentorship of Dan Bryce and Tim Fallon aims to build a conversor that takes normal laboratory protocols and translates them into machine executable protocols. This is possible thanks to LabOP’s versatility to represent what a Laboratory protocol should look like. I´ll be testing this specialization in Hamilton machines that are great for experimenting scalling up.

Nowadays we face a very common issue between Biotechnology laboratories, that is that protocols are difficult to share and to adapt for machine execution. Laboratory protocols are critical to biological research and development, yet complicated to communicate and reproduce across projects, investigators, and organizations. While many attempts have been made to address this challenge, there is currently no available protocol representation that is unambiguous enough for precise interpretation and automation, yet simultaneously abstract enough to enable reuse and adaptation.

With LabOP we can take a protocol and convert it in multiple ways depending on the needs of the researcher for automation or human experimentation and allowing flexibility for execution and experimentation so I`ll be building a specialization that translates protocols in a way that they can be executed by Hamilton machines.

PolyPhy Infrastructure Enhancement

Thu, 25 May 2023 00:00:00 +0000

Hey!

I’m Prashant Jha, from Pune, a recent undergraduate student from BITS Pilani. As part of the Polyphy my proposal under the mentorship of Oskar Elek aims to develop and improve the current infrastructure.

Polyphorm / PolyPhy - which is led by Oskar Elek. PolyPhy is an organization that focuses on developing a GPU oriented agent-based system for reconstructing and visualizing optimal transport networks defined over sparse data. With its roots in astronomy and inspiration drawn from nature, PolyPhy has been instrumental in discovering network-like patterns in natural language data and reconstructing the Cosmic web structure using its early prototype called Polyphorm. The organization aims to provide a richer 2D / 3D scalar field representation of the reconstructed network, making it a toolkit for a range of specialists across different disciplines, including astronomers, neuroscientists, data scientists, and artists. PolyPhy’s ultimate purpose is to create quantitatively comparable structural analytics and discover connections between different disciplines. To achieve its goals, PolyPhy requires a robust infrastructure that is engineered using DevOps, Code Refactoring, and Continuous Integration/Continuous Deployment (CI/CD) practices. You can see an instructive overview of PolyPhy in our workshop and more details about our research here.

Strengthening Underserved Segments of the Open Source Pipeline

Thu, 25 May 2023 00:00:00 +0000

Namaste everyone🙏🏻!

I’m Nandini Saagar, from Mumbai. An undergraduate student at the Indian Institute of Technology, Banaras Hindu University, IIT (BHU), Varanasi. As part of the Strengthening Underserved Segments of the Open Source Pipeline my proposal under the mentorship of Emily Lovell aims to strengthen the underserved segment of the open source pipeline.

My interest in Open Source was first piqued as a freshman when I was introduced to Open Source as a place where people from all communities and backgrounds come together to create software that can have real-world impact, that too in a completely autonomous and self-governed manner! I am so glad that I could transition from just a person who imagined Open Source to be a fair-eyed dream to being a part of multiple such communities. This journey has been life-defining for me, and that’s why I want to help deliver the message of Open Source to all teenagers!

This project seeks to invite and support broader, more diverse participation in open source by supporting early contributors, especially those who have been historically minoritized within tech. It will aim to create content that anyone with some Open Source experience can use to help and guide new students to the journey of OpenSource, GitHub, and all the relevant technologies, provide a medium and platform for all contributors to share their various OpenSource experiences and testimonials, conduct an Open Source Themed Hackathon/Scavenger Hunt, and leverage the power of social media engagement to get young and brilliant minds acquainted with the technical and open-source world at an early age.

Stay tuned to explore the enormous world of Open Source with me!

Open Source Autonomous Vehicle Controller

Wed, 24 May 2023 00:00:00 +0000

As part of the Open Source Autonomous Vehicle Controller Project my proposal under the mentorship of Aaron Hunter and Carlos Espinosa aims to Develop a tutorial that serves as a comprehensive guide for new users of the OSAVC controller. The tutorial will start from scratch, demonstrating how to initialize and program the controller using the equivalent of a “Hello, World!” program. Subsequently, it will progress to more advanced applications.

Throughout the project, I will work closely with my mentors to ensure the accuracy, clarity, and usability of the documentation. Their guidance and expertise will be instrumental in achieving the project’s objectives effectively.

By creating comprehensive technical documentation, this project aims to empower new users to harness the capabilities of the OSAVC controller. It will facilitate their understanding of the controller’s functionalities and enable them to leverage its potential in the field of autonomous vehicle applications.

I am excited to embark on this journey, contribute to the open-source community, and make a valuable impact in the field of autonomous vehicles. Stay tuned for regular updates and progress reports as I work towards achieving the goals set forth in this project.

For more information about the Open Source Autonomous Vehicle Controller and the UC OSPO organization, you can visit the OSAVC project repository and the UC OSPO website.

Stay connected and join me in this exciting endeavor!

OSRE Catalyst

Thu, 23 Mar 2023 00:00:00 +0000

Contributing to an open source project is a great way to build a technical portfolio, learn industry tools/practices, and have real-world impact – all while embedded in a collaborative community. The UC Santa Cruz Open Source Program Office (OSPO) wants to support more students on this path, especially those who have been minoritized in tech. We are partnering with an HBCU for a pilot summer program offering, with hopes to expand our reach in 2024.

Through a hybrid (in-person/remote) model, participating students will spend four weeks on the UCSC campus learning about open source, followed by four weeks remotely contributing to an open source project. Participants will be well-supported by our instructional team, as well as their small peer cohort, through community-building and mentorship spanning the full eight weeks.

Pilot Program Mentor & Developer

Topics: Education, Broadening Participation, Mentorship and Support, Community
Skills: communication, organization, GitHub/Markdown, basic web programming (HTML, CSS, JavaScript), open source contribution, version control/git workflow, mentorship, teaching
Difficulty: Novice to Intermediate
Size: Medium or Large (175 or 350 hours)
Mentors: Emily Lovell, James Davis

Given that this is a program pilot, your involvement and feedback will directly help shape its future!

Possible tasks:

Help cultivate a welcoming and supportive learning community
Support students in completing hands-on activities related to open source contribution (e.g. evaluating potential projects/communities, using git, setting up a development environment)
Develop technology-specific tutorials to introduce students to languages/libraries/etc. employed by their project
Offer mentorship around how to navigate documentation, large codebases, and contributor communities
Share your own input and perspective on what it’s like to be a newcomer to open source!

eBPF Monitoring Tools

Tue, 21 Feb 2023 00:00:00 +0000

eBPF is a technology that allows sandboxed programs to run in a priviledged context such as a Linux kernel. eBPF is for operating systems what Javascript is for web browsers: new functionality can be safely loaded without restarting or continually upgrading the operating system or browser and executed efficiently. eBPF is used to introduce new functionality into a running Linux kernel, including next-generation networking, observability, and security functionality. The following is just one idea of many possible.

Implement Darshan functionality as eBPF tool

Topics: performance, I/O, workload characterization
Difficulty: Medium
Size: Medium or large (175 or 350 hours)
Mentors: Tyler Reddy

Darshan is an HPC I/O characterization tool that collect statistics using a lightweight design that makes it suitable for full time deployment. Darshan is an interposer library that catches and counts IO requests (open, write, read, etc.) to a file/file system and it keeps the counters in buckets in data structure that can be queried. How many reads of small size, medium size, large size) for example are the types of things that are counted.

Having this be an interposer library requires users to link their application with this library. Having this function in epbf would make this same function transparent to users. Darshan has all the functions and could provide the list of functions to implement and the programmer could build and test these functions in ebpf on a linux machine. This could be a broadly available open tool that would be generally useful and but one of perhaps hundreds of examples of where ebpf based tools that could be in the open community for all to leverage.

Proactive Data Containers (PDC)

Sun, 12 Feb 2023 00:00:00 +0000

Proactive Data Containers (PDC) are containers within a locus of storage (memory, NVRAM, disk, etc.) that store science data in an object-oriented manner. Managing data as objects enables powerful optimization opportunities for data movement and transformations, and storage mechanisms that take advantage of the deep storage hierarchy and enable automated performance tuning.

Command line and python interface to an object-centric data management system

Topics: Python, object-centric data management, PDC
Skills: Linux, C, Python
Difficulty: Medium
Size: Large (350 hours)
Mentors: Houjun Tang, Suren Byna

Proactive Data Containers (PDC) is an object-centric data management system for scientific data on high performance computing systems. It manages objects and their associated metadata within a locus of storage (memory, NVRAM, disk, etc.). Managing data as objects enables powerful optimization opportunities for data movement and transformations, and storage mechanisms that take advantage of the deep storage hierarchy and enable automated performance tuning. This project includes developing and updating efficient and user friendly command line and Python interfaces for PDC.

OpenRAM

Wed, 08 Feb 2023 00:00:00 +0000

OpenRAM is an award winning open-source Python framework to create the layout, netlists, timing and power models, placement and routing models, and other views necessary to use SRAMs in ASIC design. OpenRAM supports integration in both commercial and open-source flows with both predictive and fabricable technologies. Most recently, it has created memories that are included on all of the eFabless/Google/Skywater MPW tape-outs.

Layout verses Schematic (LVS) visualization

Topics: VLSI Design Basics, Python
Skills: Python, VLSI, JSON
Difficulty: Easy/Medium
Size: Medium or Large (175 or 350 hours)
Mentors: Jesse Cirimelli-Low, Matthew Guthaus
Contributor(s): Mahnoor Ismail

Create a visualization interface to debug layout verses schematic mismatches in Magic layout editor. Results will be parsed from a JSON output of Netgen.

ScaleBugs: Reproducible Scalability Bugs

Tue, 07 Feb 2023 00:00:00 +0000

Scalable systems lay essential foundations of the modern information industry. HPC data centers tend to have hundreds to thousands of nodes in their clusters. The use of “extreme-scale” distributed systems has given birth to a new type of bug: scalability bugs. As its name suggests, scalability bugs may be presented depending on the scale of a run, and thus, symptoms may only be observable in large-scale deployments, but not in small or median deployments. For example, Cassandra-6127 is a scalability bug detected in the popular distributed database Cassandra. The scalability bug causes unnecessary CPU usage, however, the symptom is not observed unless ~1000 nodes are deployed. This demonstrates the main challenge of studying scalability bugs: it is extremely challenging to reproduce without deploying the system at a large scale.

In this project, our goal is to build a dataset of reproducible scalability bugs. To achieve this, we will go through the existing bug reports for popular distributed systems, which include Cassandra, HDFS, Ignite, and Kafka. For each bug report, we determine if the reported bug depends on the scale of the run, such as the number of nodes utilized. With the collected scale-dependent bugs, we then will craft the workload to reproduce those scalability bugs. Our workloads will be designed to trigger some functionalities of the system under different configurations (e.g., different numbers of nodes), for which we will observe the impact on performance. For example, a successful reproduction should be able to show the performance drop along with an increasing number of nodes.

Building a Dataset of Reproducible Scalability Bugs

Topics: Scalability systems, bug patterns, reproducibility, bug dataset
Skills: Linux Shell, Docker, Java, Python
Difficulty: Medium
Size: Large (350 hours)
Mentors: Cindy Rubio González, Haryadi S. Gunawi, Hao-Nan Zhu
Contributor(s): Goodness Ayinmode, Zahra Nabila Maharani

The student will build a dataset of reproducible scalability bugs. Each bug artifact in the dataset will contain (1) the buggy and fixed versions of the scalability system, (2) a runtime environment that ensures reproducibility, and (3) a workload shell script that could demonstrate the symptoms of the bug under different scales.

Specific Tasks

Work with the mentors to understand the context of the project.
Learn the background of scalability systems.
Inspect the bug reports from Apache JIRA and identify scale-dependent bugs.
Craft shell scripts to trigger the exact scalability bug described by the bug report.
Organize the reproducible scalability bugs and write documentation to build the code and trigger the bug.

Strengthening Underserved Segments of the Open Source Pipeline

Tue, 07 Feb 2023 00:00:00 +0000

Contributing to an open source project offers novices the opportunity to join a community of practitioners, build a technical portfolio, gain experience with industry tools and technologies, and have real-world impact. This project seeks to invite and support broader, more diverse participation in open source by supporting early contributors – especially those who have been historically minoritized within tech.

This work builds upon a number of existing projects with similar or overlapping goals. Some examples:

The Teaching Open Source (TOS) community, which brings together instructors teaching open source
The Professors’ Open Source Software Experience (POSSE) workshops and wiki, for faculty teaching - or wanting to teach - open source
Internships such as Google Summer of Code (GSoC), Outreachy, and the MLH Fellowship
Open Source Comes to Campus, offering student workshops on tools and culture [no longer active]
Google Code-in, inviting pre-university students to make open source contributions [no longer active]

This project will investigate gaps in currently available resources/programs and seek to address them, beginning with the exploration of engaging high school students with open source. Depending on early findings, this project could also entail the development of resources for independent learners and/or mentors.

Learning Resource Development + Repository-Building

Topics: Education, Broadening Participation, Mentorship and Support, Community Development
Skills: independent research, communication, organization, GitHub/Markdown, basic web programming (HTML, CSS, JavaScript)
Difficulty: Novice to Intermediate
Size: Medium or Large (175 or 350 hours)
Mentors: Emily Lovell, James Davis
Contributor(s): Nandini Saagar

As an early contributor to this project, you will help gather information to inform the project direction – and then help bring it to life!

Possible tasks:

Meet with teachers and/or community members to identify new opportunities to engage with students (e.g. outside-of-school workshops, classroom visits, materials for teachers to use independently)
Evaluate and test existing learning activities with a high school audience in mind (e.g. consider necessary pre-requisites, time required, ideal activity format)
Evaluate and organize existing resources for newcomers (e.g. Up For Grabs, Hacktoberfest, internship/fellowship opportunites)
Help design and pilot new learning activities and/or workshops
Assist in curating an open source repository of the aforementioned resources
Conduct outreach to our target communities (e.g. brainstorm a catchy repository name, compose inviting and inclusive emails, design visual project elements)
Share your own input and perspective on what it’s like to be a newcomer to open source!

LabOP - an open specification for laboratory protocols, that solves common interchange problems stemming from variations in scale, labware, instruments, and automation.

Mon, 06 Feb 2023 00:00:00 +0000

Project idea 1: Software, hardware, and wetware building LabOP with simultaneous language & protocol development & test executions

Topics: Software standard development, Laboratory automation, Biology
Skills: Python, Semantic Web Technologies (RDF, OWL), interest to think about describing biological & chemical laboratory processes
Difficulty: Moderate
Size: Large (350 hours)
Mentors:
1. Tim Fallon
2. Dan Bryce

About: The Laboratory Open Protocol Language (LabOP)

See link: https://bioprotocols.github.io/labop/

LabOP is an open specification for laboratory protocols, that solves common interchange problems stemming from variations in scale, labware, instruments, and automation. LabOP was built from the ground-up to support protocol interchange. It provides an extensible library of protocol primitives that capture the control and data flow needed for simple calibration and culturing protocols to industrial control.

Software Ecosystem

LabOP’s rich representation underpins an ecosystem of several powerful software tools, including:

labop: the Python LabOP library, which supports:
- Programming LabOP protocols in Python,
- Serialization of LabOP protocols conforming to the LabOP RDF specification,
- Execution in the native LabOP semantics (rooted in the UML activity model),
- Specialization of protocols to 3rd-party protocol formats (including Autoprotocol, OpenTrons, and human readible formats), and
- Integration with instruments (including OpenTrons OT2, Echo, and SiLA-based automation).
laboped: the web-based LabOP Editor, which supports:
- Programming LabOP protocols quickly with low-code visual scripts,
- Storing protocols on the cloud,
- Exporting protocol specializations for use in other execution frameworks,

About the Bioprotocols Working Group

The Bioprotocols Working Group is an open community organization developing a free and open standard for representation of biological protocols.

To join the Bioprotocols Working Group:

Join the community mailing list at: https://groups.google.com/g/bioprotocols
Join the #collab-bioprotocols channel on the Bits in Bio Slack.

Leadership

Elected Term: August 24th, 2022 - August 23rd, 2023

Chair: Dan Bryce (SIFT)

Finance Committee:

Governance

Approved by community vote on August 16th, 2022

https://bioprotocols.github.io/labop/about#Governance

Mission:

The Bioprotocols Working Group is an open community organization developing free and open standards for representation of biological protocols. In support of that goal, the organization also develops tools and practices and works with other organizations to facilitate dissemination and adoption of these standards.

As an organization, the Bioprotocols Working Group holds the following values:

The standards developed by the community should be available under permissive free and open licenses.
Technical decisions of the community should be made following open and inclusive processes.
The community is strengthened by fostering a culture of diversity and inclusion, in which all constructive participants feel comfortable making their voices heard.

OpenROAD - An Open-Source, Autonomous RTL-GDSII Flow for VLSI Designs (2023)

Wed, 01 Feb 2023 00:00:00 +0000

The OpenROAD project is a non-profit, DARPA-funded and Google sponsored project committed to creating low-cost and innovative Electronic Design Automation (EDA) tools and flows for IC design. Our mission is to democratize IC design, break down barriers of cost and access and mitigate schedule risk through native and open source innovation and collaboration with ecosystem partners. OpenROAD provides an autonomous, no-human-in-the-loop, 24-hour, RTL-GDSII flow for fast ASIC design exploration, QoR estimation and physical implementation for a range of technologies above 12 nm. We welcome a diverse community of designers, researchers, enthusiasts, software engineers and entrepreneurs to use and contribute to OpenROAD and make a far-reaching impact. OpenROAD has been used in > 600 tapeouts across a range of ASIC applications with a rapidly growing and diverse user community.

Enhance OpenROAD GUI Flow Manager

Topics: GUI, Visualization, User Interfaces
Skills: C++, Qt
Difficulty: Medium
Size: Medium or Large (175 or 350 hours)
Mentor: Matt Liberty, Ethan Mahintorabi

Develop custom features for analysis and visualizations in the [OpenROAD GUI] (https://openroad.readthedocs.io/en/latest/main/src/gui/README.html) to support native and third party flows. These include OpenROAD-flow-scripts, OpenLane and other third-party flows . Create documentation: commands, developer guide notes, tutorials to show GUI usage for supported flows.

Profile and tune OpenROAD flow for Runtime improvements

Topics: OpenROAD-flow-scripts, Flow Manager, Runtime Optimization
Skills: Knowledge about Computational resource optimization, Cloud-based computation, Basic VLSI design and tools knowledge
Difficulty: Medium
Size: Medium or Large (175 or 350 hours)
Mentor: Matt Liberty, Ethan Mahintorabi

Test, analyze and develop verifiable and re-producible strategies to improve run times in OpenROAD-flow-scripts. These include optimizations of computational resources over the cloud, tuning of algorithmic and design flow parameters. Create test plans using existing or new designs to show runtime improvements.

Update OpenROAD Documentation and Tutorials

Topics: Documentation, Tutorials, VLSI design basics
Skills: Knowledge of EDA tools, basics of VLSI design flow, tcl, shell scripts, Documentation, Markdown
Difficulty: Medium
Size: Medium or Large (175 or 350 hours)
Mentor: Indira Iyer, Vitor Bandeira
Contributor(s): Jack Luar

Review and update missing documentation and tutorials in OpenROAD-flow-scripts for existing and new features. Here is an example Tutorial link: https://openroad-flow-scripts.readthedocs.io/en/latest/tutorials/FlowTutorial.html for reference.

LEF and Liberty Model Testing

Topics: Testing, LEF, ‘LIB’, VLSI design basics
Skills: Knowledge of EDA tools, basics of VLSI design, lef and lib model abstracts, tcl, shell scripts, Verilog, Layout
Difficulty: Medium
Size: Medium or Large (175 or 350 hours)
Mentor: Matt Liberty

Test the accuracy of generated LIB and LEF models for signoff in OpenROAD-flow-scripts for flat and hierarchical design flows. Build test cases to validate and add to the regression suite.

Polyphorm / PolyPhy

Thu, 15 Dec 2022 00:00:00 +0000

PolyPhy infrastructure engineering and practices

Topics: DevOps Code Refactoring CI/CD
Skills: fluidity in Python, experience with OOP, experience with building and packaging libraries, understanding GitHub and its tools ecosystem
Difficulty: Challenging
Size: 350+ hours
Mentors: Oskar Elek, Anisha Goel
Contributor(s): Prashant Jha

Your responsibility in this project will be developing new infrastructure of the PolyPhy project as well as maintaining the existing codebases. This is a multifaceted role that will require coordination with the team and active approach to understanding the technical needs of the community.

Specific tasks:

Work with the technical lead to develop effective interfaces for PolyPhy, providing access to its functionality on the level of both Python/Jupyter code and the command line.
Maintain the existing codebase and configure it according to the team’s needs.
Develop and extend the current CI/CD functionality and related code metrics.
Document the best practices related to the above.

Write PolyPhy’s technical story and content

Topics: Writing Documentation Storytelling
Skills: experienced writing structured text, well read, technical or scientific education, webdev basics (preferably NodeJS)
Difficulty: Moderate
Size: 350 hours
Mentors: Oskar Elek, Ezra Huscher

Integral to PolyPhy’s presentation is a “story” - a narrative understanding - that the users and the project contributors can relate to. Your responsibility will be to develop the written part of that understanding, as well as major portions of technical documentation that match it.

Specific tasks:

Work with mentors on understanding the context of the project.
Write and edit diverse pages of the project website.
Work with mentors to improve project’s written community practices (diversity, communication).
Write and edit narrative and explanatory parts of PolyPhy’s documentation.
Create tutorials that present core functionality of the toolkit.

Community engagement and management

Topics: Community Management Social Media Networking
Skills: documented experience with current social media landscape, social and well spoken, ability to communicate technical concepts
Difficulty: Moderate
Size: 175 or 350 hours
Mentors: Oskar Elek, Ezra Huscher

Your responsibility will be to build and engage the community around PolyPhy. This includes its standing team and stakeholders, current expert users, potential adopters as well as the general public. The scope (size) of the project depends on the level of commitment during and beyond the Summer and is negotiable upfront.

Specific tasks:

Manage the team’s communication channels (Slack, Zoom, email) and maintain active presence therein.
Develop social media presence for PolyPhy on Twitter, LinkedIn and other selected social media platforms.
Manage and extend the online presence for the project, including its website, mailing list, and other applicable outreach activities.
Research and engage with new communities that would benefit from PolyPhy, both as its expert users and contributors.

Adaptive Load Balancers for Low-latency Multi-hop Networks

Mon, 07 Nov 2022 10:15:56 -0700

This project aims at designing efficient, adaptive link level load balancers for networks that handle different kinds of traffic, in particular networks where flows are heterogeneous in terms of their round trip times. Geo distributed data centers are one such example. With the large-scale deployments of 5G in the near future, there will be even more applications, including more bulk transfers of videos and photos, augmented reality applications and virtual reality applications which take advantage of 5G’s low latency service. With the development and discussion about Web3.0 and Metaverse, the network workloads across data centers are only going to get more varied and challenging. All these add to heavy, bulk of data being sent to the data centers and over the backbone network. These traffic have varying quality of service requirements, like low latency, high throughput and high definition video streaming. Wide area network (WAN) flows are typically data heavy tasks that consist of backup data taken for a particular data center. The interaction of the data center and WAN traffic creates a very interesting scenario with its own challenges to be addressed. WAN and data center traffic are characterized by differences in the link utilizations and round trip times. Based on readings and literature review, there seems to be very little work on load balancers that address the interaction of data center and WAN traffic. This in turn motivates the need for designing load balancers that take into account both WAN and data center traffic in order to create high performance for more realistic scenarios. This work proposes a load balancer that is adaptive to the kind of traffic it encounters by learning from the network conditions and then predicting the optimal route for a given flow.

Through this research we seek to contribute the following :

Designing a load balancer, that is adaptive to datacenter and WAN traffic, and in general can be adapted to varied traffic conditions
Real time learning of the network setup and predicting optimal paths
Low latency, high throughput and increased network utilization deliverables

Adaptive, Dynamic Load Balancing for data center and WAN traffic

Topics: ‘data center networking’, TCP/IP stack’, ‘congestion control’, ’load balancing’
Skills: C++, python, linux ; experience with network simulators would be helpful
Difficulty: moderate/ challenging
Size: Medium or Large (175 or 350 hours)
Mentors: Katia Obraczka,Abdul Kabbani, Lakshmi Krishnaswamy

Specific tasks:

Understanding the OMNeT++ network simulator and creating simple networks and data center topologies to understand the simulation environment.
Implementing existing load balancers on OMNeT++ and exploring the effect of different features of the load balancers with data center traffic and WAN traffic.
Finding and testing out WAN specific traffic that may exist, like video streaming traffic, large database queries etc.
Working with the mentors on developing a learning-based load balancer framework that learns from past sample traffic, network conditions, to adapt dynamically to current network conditions.

Apache AsterixDB

Mon, 07 Nov 2022 10:15:56 -0700

AsterixDB is an open source parallel big-data management system. AsterixDB is a well-established Apache project that has beedddn active in research for more than 10 years. It provides a flexible data model that supports modern NoSQL applications with a powerful query processor that can scale to billions of records and terabytes of data. Users can interact with AsterixDB through a power and easy to use declarative query language, SQL++, which provides a rich set of data types including timestamps, time intervals, text, and geospatial, in addition to traditional numerical and Boolean data types.

Geospatial Data Science on AsterixDB

Topics: Data science, SQL++, documentation
Skills: SQL, Writing, Spreadsheets
Difficulty: Medium
Size: Medium or Large (175 or 350 hours)
Mentors: Ahmed Eldawy, Akil Sevim

Build a data science project using AsterixDB that analyzes geospatial data among other dimensions. Use Chicago Crimes as the main dataset and combine with other datasets including points of interests ZIP Code boundaries. During this project, we will answer interesting questions about the data and visualize the results such as:

What is the most common crime type in a specific date or over the weekends?
Where do most of the arrests happen?
How are the crime rates change over time for different regions?

The goals of this project are:

Understand how to build a scalable data science project using AsterixDB.
Translate common questions to SQL queries and run them on large data.
Learn how to visualize the results of queries and present them.
Write detailed documentation about the process of building a data science application in AsterixDB.
Improve the documentation of AsterixDB while working in the project to improve the experience for future users.

Machine Learning Integration

As a bonus task, and depending on the progress of the project, we can explore the integration of machine learning with AsterixDB through Python UDFs. We will utilize the AsterixDB Python integration through user-defined functions to connect AsterixDB backend with scikit-learn to build some unsupervised and supervised models for the data. For example, we can cluster the crimes based on their location and other attributes to find interesting patterns or hotspots.

CephFS

Mon, 07 Nov 2022 10:15:56 -0700

CephFS is a distributed file system on top of Ceph. It is implemented as a distributed metadata service (MDS) that uses dynamic subtree balancing to trade parallelism for locality during a continually changing workloads. Clients that mount a CephFS file system connect to the MDS and acquire capabilities as they traverse the file namespace. Capabilities not only convey metadata but can also implement strong consistency semantics by granting and revoking the ability of clients to cache data locally.

CephFS namespace traversal offloading

Topics: Ceph, filesystems, metadata, programmable storage
Skills: C++, Ceph / MDS
Difficulty: Medium
Size: Large (350 hours)
Mentor: Carlos Maltzahn

The frequency of metadata service (MDS) requests relative to the amount of data accessed can severely affect the performance of distributed file systems like CephFS, especially for workloads that randomly access a large number of small files as is commonly the case for machine learning workloads: they purposefully randomize access for training and evaluation to prevent overfitting. The datasets of these workloads are read-only and therefore do not require strong coherence mechanisms that metadata services provide by default.

The key idea of this project is to reduce the frequency of MDS requests by offloading namespace traversal, i.e. the need to open a directory, list its entries, open each subdirectory, etc. Each of these operations usually require a separate MDS request. Offloading namespace traversal refers to a client’s ability to request the metadata (and associated read-only capabilities) of an entire subtree with one request, thereby offloading the traversal work for tree discovery to the MDS.

Once the basic functionality is implemented, this project can be expanded to address optimization opportunities, e.g. describing regular tree structures as a closed form expression in the tree’s root, shortcutting tree discovery.

DirtViz (2022)

Mon, 07 Nov 2022 10:15:56 -0700

DirtViz is a project to visualize data collected from sensors deployed in sensor networks. We have deployed a number of sensors measuring qualities like soil moisture, temperature, current and voltage in outdoor settings. This project involves extending (or replacing) our existing plotting scripts to create a fully-feledged dataviz tool tailored to the types of data collected from embedded systems sensor networks.

Visualize Sensor Data

Topics: Data Visualization, Analytics
Skills: javascript, python, bash, webservers, git, embedded systems
Difficulty: Easy/Moderate
Size 175 hours
Mentor: Colleen Josephson

Develop set of visualization tools (ideally web based) that easily allows users to zoom in on date ranges, change axes, etc.
Document the tool thoroughly for future maintenance
If interested, we are also interested in investigating correlations between different data streams

Eusocial Storage Devices

Mon, 07 Nov 2022 10:15:56 -0700

As storage devices get faster, data management tasks rob the host of CPU cycles and main memory bandwidth. The Eusocial project aims to create a new interface to storage devices that can leverage existing and new CPU and main memory resources to take over data management tasks like availability, recovery, and migrations. The project refers to these storage devices as “eusocial” because we are inspired by eusocial insects like ants, termites, and bees, which as individuals are primitive but collectively accomplish amazing things.

Dynamic function injection for RocksDB

Skills: C/C++, Java
Difficulty: Challenging
Size 175 or 350 hours
Mentor: Jianshen Liu

Recent research reveals that the compaction process in RocksDB can be altered to optimize future data access by changing the data layout in compaction levels. The benefit of this approach can be extended to different data layout optimization based on application access patterns and requirements. In this project, we want to create an interface that would allow users to dynamically inject layout optimization functions to RockDB, using containerization technologies such as Webassembly.

Reference: Saxena, Hemant, et al. “Real-Time LSM-Trees for HTAP Workloads.” arXiv preprint arXiv:2101.06801 (2021).

Demonstrating a composable storage system accelerated by memory semantic technologies

Skills: C/C++, Bash, Python, System architecture, Network fabrics
Difficulty: Challenging
Size 350 hours
Mentor: Jianshen Liu

Since the last decade, the slowing down in the performance improvement of general-purpose processors is driving the system architecture to be increasingly heterogeneous. We have seen the kinds of domain-specific accelerator hardware (e.g., FPAG, SmartNIC, TPU, GPU) are growing to take over many different jobs from the general-purpose processors. On the other hand, the network and storage device performance have been tremendously improved with a trajectory much outweighed than that of processors. With this trend, a natural thought to continuously scale the storage system performance economically is to efficiently utilize and share different sources from different nodes over the network. There already exist different resource sharing protocols like CCIX, CXL, and GEN-Z. Among these GEN-Z is the most interesting because, unlike RDMA, it enables remote memory accessing without exposing details to applications (i.e., not application changes). Therefore, it would be interesting to see how/whether these technologies can help improve the performance of storage systems, and to what extent. This project would require building a demo system that uses some of these technologies (especially GEN-Z) and run selected applications/workloads to better understand the benefits.

References: Gen-Z: An Open Memory Fabric for Future Data Processing Needs: https://www.youtube.com/watch?v=JLb9nojNS8E, Pekon Gupta, SMART Modular; Gen-Z subsystem for Linux, https://github.com/linux-genz

When will Rotational Media Users abandon SATA and converge to NVMe?

Skills: Entrepreneurial mind, interest in researching high technology markets
Difficulty: Medium
Size: 350 hours
Mentor: Carlos Maltzahn

Goal: Determine the benefits in particular market verticals such as genomics and health care to converge the storage stack in data center computer systems to the NVMe device interface, even when devices include rotational media (aka disk drives). The key question: “When do people abandon SATA and SAS and converge to NVMe?”

Background: NVMe is a widely used device interface for fast storage devices such as flash that behave much more like random access memory than the traditional rotational media. Rotational media is accessed mostly via SATA and SAS which has served the industry well for close to two decades. SATA in particular is much cheaper than NVMe. Now that NVMe is widely available and quickly advancing in functionality, an interesting question is whether there is a market for rotational media devices with NVMe interfaces, converging the storage stack to only one logical device interface, thereby enabling a common ecosystem and more efficient connectivity from multiple processes to storage devices.

The NVMe 2.0 specification, which came out last year, has been restructured to support the increasingly diverse NVMe device environment (including rotational media). The extensibility of 2.0 encourages enhancements of independent command sets such as Zoned Namespaces (ZNS) and Key Value (NVMe-KV) while supporting transport protocols for NVMe over Fabrics (NVMe-oF). A lot of creative energy is now focused on advancing NVMe while SATA has not changed in 16 years. Having all storage devices connect the same way not only frees up space on motherboards but also enables new ways to manage drives, for example via NVMe-oF that allows drives to be networked without additional abstraction layers.

Suggested Project Structure: This is really just a suggestion for a starting point. As research progresses, a better structure might emerge.

Convergence of software stack: seamless integration between rotational media and hot storage

Direct tiering: one unified interface to place data among fast and slow devices on the same NVMe fabric depending on whether the data is hot or cold.

Computational storage:

What are the architectures of computational NVMe devices? For example, offloading compute to an FPGA vs an onboard processor in a disk drive?
Do market verticals such as genomics and health care for one over the other? When do people abandon SATA and converge to NVMe?

Project tasks:

Review current literature
Survey what the industry is doing
Join weekly meetings to discuss findings with Ph.D. students, experienced industry veterans, and faculty (Thursday’s 2-3pm, can be adjusted if necessary)
Product is a slide deck with lots of pictures

Interesting links:
https://www.opencompute.org/wiki/Storage/NVMeHDD
https://2021ocpglobal.fnvirtual.app/a/event/1714 (video and slides, requires $0 registration)
https://www.storagereview.com/news/nvme-hdd-edges-closer-to-reality
https://www.tomshardware.com/news/seagate-demonstrates-hdd-with-pcie-nvme-interface
https://nvmexpress.org/everything-you-need-to-know-about-the-nvme-2-0-specifications-and-new-technical-proposals/
https://www.tomshardware.com/news/nvme-2-0-supports-hard-disk-drives

FasTensor

Mon, 07 Nov 2022 10:15:56 -0700

FasTensor is a parallel execution engine for user-defined functions on multidimensional arrays. The user-defined functions follow the stencil metaphor used for scientific computing and is effective for expressing a wide range of computations for data analyses, including common aggregation operations from database management systems and advanced machine learning pipelines. FasTensor execution engine exploits the structural-locality in the multidimensional arrays to automate data management operations such as file I/O, data partitioning, communication, parallel execution, and so on.

Continuous Integration

Topics: Data Management, Analytics
Skills: C++, github
Difficulty: Medium
Size: Large (350 hours)
Mentor: John Wu, Bin Dong, Suren Byna

Develop a test suite for the public API of FasTensor
Automate execution of the test suite
Document the continuous integration process
Develop performance testing suite

FasTensor

Mon, 07 Nov 2022 10:15:56 -0700

Tensor execution engine on GPU

Topics: Data Management, Analytics
Skills: C++, github
Difficulty: Difficult
Size: Large (350 hours)
Mentor: John Wu, Bin Dong, Suren Byna

Tensor based computing is needed by scientific applications and now advanced AI model training. Most tensor libraries are hand customized and optimized on GPU, and most of they only serve one kind of application. For example, TensorFlow is only optimized for AI model training. Optimizing generic tensor computing libraries on GPU can benefit wide applications. Our FasTensor, as a generic tensor computing library, can only work efficiently on CPU now. How to run the FasTensor on GPU is still none-explored work. Research and development challenges will include but not limited to: 1) how to maintain structure-locality of tensor data on GPU; 2) how to reduce the performance loss when the structure-locality of tensor is broken on GPU.

Develop a mechanism to move user-define computing kernels onto GPU
Evaluate the performance of the execution engine
Document the execution mechanism
Develop performance testing suite

Continuous Integration

Topics: Data Management, Analytics
Skills: C++, github
Difficulty: Medium
Size: Large (300 hours)
Mentor: John Wu, Bin Dong, Suren Byna

Develop a test suite for the public API of FasTensor
Automate execution of the test suite
Document the continuous integration process

HDF5

Mon, 07 Nov 2022 10:15:56 -0700

HDF5 is a unique technology suite that makes possible the management of extremely large and complex data collections.

The HDF5 technology suite includes:

A versatile data model that can represent very complex data objects and a wide variety of metadata.
A completely portable file format with no limit on the number or size of data objects in the collection.
A software library that runs on a range of computational platforms, from laptops to massively parallel systems, and implements a high-level API with C, C++, Fortran 90, and Java interfaces.
A rich set of integrated performance features that allow for access time and storage space optimizations.
Tools and applications for managing, manipulating, viewing, and analyzing the data in the collection.

Python Interface to HDF5 Asynchronous I/O

Topics: Python, Async I/O, HDF5
Skills: Python, C, HDF5
Difficulty: Medium
Size: Large (350 hours)
Mentor: Suren Byna, Houjun Tang

HDF5 is a well-known library for storing and accessing (known as “Input and Output” or I/O) data on high-performance computing systems. Recently, new technologies, such as asynchronous I/O and caching, have been developed to utilize fast memory and storage devices and to hide the I/O latency. Applications can take advantage of an asynchronous interface by scheduling I/O as early as possible and overlapping computation with I/O operations to improve overall performance. The existing HDF5 asynchronous I/O feature supports the C/C++ interface. This project involves the development and performance evaluation of a Python interface that would allow more Python-based scientific codes to use and benefit from the asynchronous I/O.

LiveHD (2022)

Mon, 07 Nov 2022 10:15:56 -0700

Projects for LiveHD. Lead Mentors: Jose Renau and Sheng-Hong Wang.

HIF Tooling


Title	HIF tooling
Description	Tools around Hardware Interchange Format (HIF) files
Mentor(s)	Jose Renau
Skills	C++17
Difficulty	Medium
Size	Medium 175 hours
Link

HIF (https://github.com/masc-ucsc/hif) stands for Hardware Interchange Format. It is designed to be a efficient binary representation with simple API that allows to have generic graph and tree representations commonly used by hardware tools. It is not designer to be a universal format, but rather a storate and traversal format for hardware tools.

LiveHD has 2 HIF interfaces, the tree (LNAST) and the graph (Lgraph). Both can read/write HIF format. The idea of this project is to expand the hif repository to create some small but useful tools around hif. Some projects:

hif_diff + hif_patch: Create the equivalent of the diff/patch commands that exist for text but for HIF files. Since the HIF files have a more clear structure, some patches changes are more constrained or better understood (IOs and dependences are explicit).
hif_tree: Print the HIF hierarchy, somewhat similar to GNU tree but showing the HIF hieararchy.
hif_grep: capacity to grep for some tokens and outout a hif file only with those. Thena hif_tree/hif_cat can show the contents.

Mockturtle


Title	Mockturtle
Description	Perform synthesis for graph in LiveHD using Mockturtle
Mentor(s)	Jose Renau
Skills	C++17, synthesis
Difficulty	Medium
Size	Medium 175 hours
Link

There are some issues with Mockturtle integration (new cells) and it is not using the latest Mockturtle library versions. The goal is to use Mockturtle (https://github.com/lsils/mockturtle) with LiveHD. The main characteristics:

Use mockturtle to tmap to LUTs
Use mockturtle to synthesize (optimize) logic
Enable cut-rewrite as an option
Enable hierarchy cross optimization (hier:true option)
Use the graph labeling to find cluster to optimize
Re-timing
Map to LUTs only gates and non-wide arithmetic. E.g: 32bit add is not mapped to LUTS, but a 2-bit add is mapped.
List of resources to not map:
- Large ALUs. Large ALUs should have an OpenWare block (hardcoded in FPGAs and advanced adder options in ASIC)
- Multipliers and dividers
- Barrell shifters with not trivial shifts (1-2 bits) selectable at run-time
- memories, luts

Query Shell


Title	Query Shell
Description	Create a console app that interacts with LiveHD to query parameters about designs
Mentor(s)	Jose Renau
Skills	C++17
Difficulty	Medium
Size	Medium 175 hours
Link

Based on replxx (like lgshell)
Query bits, ports… like
- https://github.com/rubund/netlist-analyzer
- https://www.jameswhanlon.com/querying-logical-paths-in-a-verilog-design.html
It would be cool if subsections (selected) parts can be visualized with something like https://github.com/nturley/netlistsvg
The shell may be expanded to support simulation in the future
Wavedrom/Duh dumps

Wavedrom and duh allows to dump bitfield information for structures. It would be interesting to explore to dump tables and bit fields for Lgraph IOs, and structs/fields inside the module. It may be a way to integrate with the documentation generation.

Example of queries: show path, show driver/sink of, do topo traversal,….

As an interesting extension would be to have some simple embedded language (TCL or ChaiScript or ???) to control queries more easily and allow to build functions/libraries.

Lgraph and LNAST check pass


Title	Lgraph and LNAST check pass
Description	Create a pass that check the integrity/correctness of Lgraph and LNAST
Mentor(s)	Jose Renau
Skills	C++17
Difficulty	Medium
Size	Large 350 hours
Link

Create a pass that checks that the Lgraph (and/or LNAST) is semantically correct. The LNAST already has quite a few tests (pass.semantic), but it can be further expanded. Some checks:

No combinational loops
No mismatch in bit widths
No disconnected nodes
Check for inefficient splits (do not split buses that can be combined)
Transformations stages should not drop names if same net is preserved
No writes in LNAST that are never read
All the edges are possible. E.g: no pin ‘C’ in Sum_op

unbitwidth


Title	unbitwidth
Description	Not all the variables need bitwidth information. Find the small subset
Mentor(s)	Jose Renau
Skills	C++17
Difficulty	Medium
Size	Medium 175 hours
Link

This pass is needed to create less verbose CHISEL and Pyrope code generation.

The LGraph can have bitwidth information for each dpin. This is needed for Verilog code generation, but not needed for Pyrope or CHISEL. CHISEL can perform local bitwidth inference and Pyrope can perform global bitwidth inference.

A new pass should remove redundant bitwidth information. The information is redundant because the pass/bitwidth can regenerate it if there is enough details. The goal is to create a pass/unbitwidth that removes either local or global bitwidth. The information left should be enough for the bitwidth pass to regenerate it.

Local bitwidth: It is possible to leave the bitwidth information in many places and it will have the same results, but for CHISEL the inputs should be sized. The storage (memories/flops) should have bitwidth when can not be inferred from the inputs.
Global bitwidth: Pyrope bitwidth inference goes across the call hierarchy. This means that a module could have no bitwidth information at all. We start from the leave nodes. If all the bits can be inferred given the inputs, the module should have no bitwidth. In that case the bitwidth can be inferred from outside.

LiveHD (2023)

Mon, 07 Nov 2022 10:15:56 -0700

Projects for LiveHD.
Lead Mentors: Jose Renau and Sakshi Garg.
Contributor(s): Shahzaib Kashif

LiveHD is a “compiler” infrastructure for hardware design optimized for synthesis and simulation. The goals is to enable a more productive flow where the ASIC/FPGA designer can work with multiple hardware description languages like CHISEL, Pyrope, or Verilog.

There are several projects available around LiveHD. A longer explanation and more project options are available at projects. Contact the mentors to find a project that fits your interests.

A sample of helpful projects:

Mockturtle


Title	Mockturtle
Description	Perform synthesis for graph in LiveHD using Mockturtle
Mentor(s)	Jose Renau and Sakshi Garg
Skills	C++17, synthesis
Difficulty	Medium
Size	Medium 175 hours
Link

Mockturtle (https://github.com/lsils/mockturtle) is a synthesis tool partially integrated with LiveHD. The goal of this task is to iron out bugs and issues and to use the LiveHD Tasks API to parallelize the synthesis.

Main features:

The current synthesis divides the circuit in partitions. Each partition can be synthesized in parallel.
Support hierarchical synthesis to optimize cross Lgraphs (cross verilog module optimization)

The goal is to use Mockturtle (https://github.com/lsils/mockturtle) with LiveHD. The main characteristics:

Use mockturtle to tmap to LUTs
Use mockturtle to synthesize (optimize) logic
Enable cut-rewrite as an option
Enable hierarchy cross optimization (hier:true option)
Use the graph labeling to find cluster to optimize
Re-timing
Map to LUTs only gates and non-wide arithmetic. E.g: 32bit add is not mapped to LUTS, but a 2-bit add is mapped.
List of resources to not map:
- Large ALUs. Large ALUs should have an OpenWare block (hardcoded in FPGAs and advanced adder options in ASIC)
- Multipliers and dividers
- Barrell shifters with not trivial shifts (1-2 bits) selectable at run-time
- memories, luts

LiveHD Console


Title	LiveHD Console
Description	Create a console app that interacts with LiveHD to query parameters about designs
Mentor(s)	Jose Renau and Sakshi Garg
Skills	C++17
Difficulty	Medium
Size	Medium 175 hours
Link

Current LiveHD uses replxx but it a no longer maintained shell/console. The result is that it fails in newer versions of OSX.

There is an alternative Crossline (https://github.com/jcwangxp/Crossline). This affects main/main.cpp and nothing else.

In addition to replace the current console with auto-completion, the plan is to add “query” capacity to visualize some of the LiveHD internals.

Query bits, ports… like
- https://github.com/rubund/netlist-analyzer
- https://www.jameswhanlon.com/querying-logical-paths-in-a-verilog-design.html
It would be cool if subsections (selected) parts can be visualized with something like https://github.com/nturley/netlistsvg
The shell may be expanded to support simulation in the future
Wavedrom/Duh dumps

Example of queries: show path, show driver/sink of, do topo traversal,….

Compiler error generation pass


Title	Lgraph and LNAST check pass
Description	Create a pass that check the integrity/correctness of Lgraph and LNAST
Mentor(s)	Jose Renau and Sakshi Garg
Skills	C++17
Difficulty	Medium
Size	Large 350 hours
Link

Create a pass that checks that the Lgraph (and/or LNAST) is semantically correct. The LNAST already has quite a few tests (pass.semantic), but it can be further expanded. Some checks:

No combinational loops
No mismatch in bit widths
No disconnected nodes
Check for inefficient splits (do not split buses that can be combined)
Transformations stages should not drop names if same net is preserved
No writes in LNAST that are never read
All the edges are possible. E.g: no pin ‘C’ in Sum_op

Open Source Autonomous Vehicle Controller

Mon, 07 Nov 2022 10:15:56 -0700

The OSAVC is a vehicle-agnostic open source hardware and software project. This project is designed to provide a real-time hardware controller adaptable to any vehicle type, suitable for aerial, terrestrial, marine, or extraterrestrial vehicles. It allows control researchers to develop state estimation algorithms, sensor calibration algorithms, and vehicle control models in a modular fashion such that once the hardware set has been developed switching algorithms requires only modifying one C function and recompiling.

Lead mentor: Aaron Hunter

Projects for the OSAVC:

Vehicle/Craft sensor driver development

Topics: Driver code to integrate sensor to a microcontroller
Skills: C, I2C, SPI, UART interfaces
Size 175 hours
Difficulty Medium
Mentor Aaron Hunter

Help develop a sensor library for use in autonomnous vehicles. Possible sensors include range finders, ping sensors, IMUs, GPS receivers, RC receivers, barometers, air speed sensors, etc. Code will be written in C using state machine methodology and non-blocking algorithms. Test the drivers on a Microchip microncontroller.

Path finding algorithm using OpenCV and machine learning

Topics: Computer vision, blob detection
Skills: C/Python, OpenCV
Size 175 or 350 hours
Difficulty Medium
Mentor Aaron Hunter

Use OpenCV to identify a track for an autonomous vehicle to follow. Build on previous work by developing a new model using EfficientDet and an existing training set of images. Port the model to TFlite and implement on the Coral USB Accelerator. Evaluate its performance against our previous efforts.

State estimation/sensor fusion algorithm development

Topics: Kalman filtering, Mahoney
Skills: C/Python, Matlab/Simulink, numerical optimization algorithms
Size 350 hours
Difficulty Challenging
Mentor Aaron Hunter

Implement an optimal state estimation algorithm from a model. This model can be derived from a Kalman filter or some other state estimation filter (e.g., Mahoney filter). THe model takes sensor readings as input and provides an estimate of the state of a vehicle. Finally, convert the model to standard C using the Simulink code generation or implement in Python (for use on a single board computer, e.g., Raspberry Pi)

Open Source Autonomous Vehicle Controller

Mon, 07 Nov 2022 10:15:56 -0700

Lead mentor: Aaron Hunter

Projects for the OSAVC:

Vehicle/Craft sensor driver development

Topics: Driver code to integrate sensor to a microcontroller
Skills: C, I2C, SPI, UART interfaces
Size 175 hours
Difficulty Medium
Mentor Aaron Hunter, Carlos Espinosa, Pavlo Vlastos

Help develop sensor libraries for use in autonomous vehicles. We are in particular interested in sensors for UAVs: airspeed sensors (pitot tube) or barometers, but also proximity detectors (ultrasonic), and range sensors. Code will be written in C using state machine methodology and non-blocking algorithms. Test the drivers on a Microchip microncontroller.

Technical Documentation

Topics: Documentation
Skills: Technical writing, markdown language, website
Size 175 hours
Difficulty Medium
Mentor Aaron Hunter/Carlos Espinosa/Pavlo Vlastos
Contributor(s) Aniruddha Thakre

Technical Documentation: Write a tutorial to demonstrate how to start with an OSAVC and program it with the robotic equivalent of HelloWorld, moving onto more sophisticated applications. Create a web page interface to the OSAVC repo highlighting this tutorial. In this project you will start from scratch with an OSAVC PCB and bring it to life, while documenting it in a way to help new users.

ROS/Gazebo Robot Simulation

Topics: Robot simulation with ROS/Gazebo
Skills ROS/Gazebo, Python
Size 175 or 350 hours
Difficulty Medium to Hard
Mentor Aaron Hunter, Carlos Espinosa, Pavlo Vlastos
Contributor(s) Damodar Datta Kancharla

Generate a simulated world and a quadcopter model in ROS/Gazebo. Provide a link from Mavlink to ROS using the mavros package and simulate a real vehicle data stream to command the simulated quadcopter in Gazebo. At the same time return the image stream from Gazebo to allow for offline processing of ML models on the images.

OpenRAM

Mon, 07 Nov 2022 10:15:56 -0700

Replace logging framework with library

Topics: User Interfaces, Python APIs
Skills: Python
Difficulty: Easy
Size: Medium (175 hours)
Mentors: Matthew Guthaus,Jesse Cirimelli-Low

Replace the custom logging framework in OpenRAM with Python logging module. New logging should allow levels of detail as well as tags to enable/disable logging of particular features to aid debugging.

ROM generator

Topics: VLSI Design Basics, Memories, Python
Skills: Python, VLSI
Difficulty: Medium/Challenging
Size: Large (350 hours)
Mentors: Matthew Guthaus

Use the OpenRAM API to generate a Read-Only Memory (ROM) file from an input hex file. Project will automatically generate a Spice netlist, layout, Verilog model and timing characterization.

Register File generator

Topics: VLSI Design Basics, Memories, Python
Skills: Python, VLSI
Difficulty: Medium/Challenging
Size: Large (350 hours)
Mentors: Matthew Guthaus

Use the OpenRAM API to generate a Register File from standard library cells. Project will automatically generate a Spice netlist, layout, Verilog model and timing characterization.

Built-In Self Test and Repair

Topics: VLSI Design Basics, Python, Verilog, Testing
Skills: Python, Verilog
Difficulty: Medium/Challenging
Size: Medium (175 hours)
Mentors: Matthew Guthaus, Bugra Onal

Finish integration of parameterized Verilog modeule to support Built-In-Self-Test and Repair of OpenRAM memories using spare rows and columns in OpenRAM memories.

Layout verses Schematic (LVS) visualization

Topics: VLSI Design Basics, Python
Skills: Python, VLSI, JSON
Difficulty: Easy/Medium
Size: Medium or Large (175 or 350 hours)
Mentors: Matthew Guthaus,Jesse Cirimelli-Low

Create a visualization interface to debug layout verses schematic mismatches in Magic layout editor. Results will be parsed from a JSON output of Netgen.

OpenROAD - A Complete, Autonomous RTL-GDSII Flow for VLSI Designs

Mon, 07 Nov 2022 10:15:56 -0700

OpenROAD is a front-runner in open-source semiconductor design automation tools and know-how. OpenROAD reduces barriers of access and tool costs to democratize system and product innovation in silicon. The OpenROAD tool and flow provide an autonomous, no-human-in-the-loop, 24-hour RTL-GDSII capability to support low-overhead design exploration and implementation through tapeout. We welcome a diverse community of designers, researchers, enthusiasts and entrepreneurs who use and contribute to OpenROAD to make a far-reaching impact. Our mission is to democratize and advance design automation of semiconductor devices through leadership, innovation, and collaboration.

OpenROAD is the key enabler of successful Chip initiatives like the Google-sponsored Efabless that has made possible more than 150 successful tapeouts by a diverse and global user community. The OpenROAD project repository is https://github.com/The-OpenROAD-Project/OpenROAD.

Design of static RAMs in VLSI designs for good performance and area is generally time-consuming. Memory compilers significantly reduce design time for complex analog and mixed-signal designs by allowing designers to explore, verify and configure multiple variants and hence select a design that is optimal for area and performance. This project requires the support of memory compilers to OpenROAD-flow-scripts based on popular PDKS such as those provided by OpenRAM.

OpenLane Memory Design Macro Floorplanning

Topics: Memory Compilers, OpenRAM, Programmable RAM
Skills: python, basic knowledge of memory design, VLSI technology, PDK, Verilog
Difficulty: Medium
Size: Medium or Large (175 or 350 hours)
Mentor: Matthew Guthaus, Mehdi Saligane

Improve and verify OpenLane design planning with OpenRAM memories. Specifically, this project will utilize the macro placer/floorplanner and resolve any issues for memory placement. Issues that will need to be addressed may include power supply connectivity, ability to rotate memory macros, and solving pin-access issues.

OpenLane Memory Design Timing Analysis

Topics: Memory Compilers, OpenRAM, Programmable RAM
Skills: python, basic knowledge of memory design, VLSI technology, PDK, Verilog
Difficulty: Medium
Size: Medium or Large (175 or 350 hours)
Mentor: Matthew Guthaus, Mehdi Saligane

Improve and verify OpenLane Static Timing Analysis using OpenRAM generated library files. Specifically, this will include verifying setup/hold conditions as well as creating additional checks such as minimum period, minimum pulse width, etc. Also, the project will add timing information to Verilog behavioral model.

OpenLane Memory Macro PDK Support

Topics: Memory Compilers, OpenRAM, Programmable RAM
Skills: python, basic knowledge of memory design, VLSI technology, PDK, Verilog
Difficulty: Medium
Size: Medium or Large (175 or 350 hours)
Mentor: Matthew Guthaus, Mehdi Saligane

Integrate and verify FreePDK45 OpenRAM memories with an OpenLane FreePDK45 design flow. OpenLane currently supports only Skywater 130nm PDK, but OpenROAD supports FreePDK45 (which is the same as Nangate45). This project will create a design using OpenRAM memories with the OpenLane flow using FreePDK45.

VLSI Power Planning and Analysis

Topics: Power Planning for VLSI, IR Drop Analysis, Power grid Creation and Analysis
Skills: C++, tcl, VLSI Layout
Difficulty: Medium
Size: Medium or Large (175 or 350 hours)
Mentor: Mehdi Saligane mailto:mehdi@umich.edu, Ming-Hung mailto:minghung@umich.edu

Take the existing power planning (pdngen.tcl) module of openroad and recode the functionality in C++ ensuring that all of the unit tests on the existing code pass correctly. Work with a senior member of the team at ARM. Ensure that designs created are of good quality for power routing and overall power consumption.

Demos and Tutorials

Topics: Demo Development, Documentation, VLSI design basics
Skills: Knowledge of EDA tools, basics of VLSI design flow, tcl, shell scripts, Documentation, Markdown
Difficulty: Medium
Size: Medium (175 hours)
Mentor: Indira Iyer, Vitor Bandeira

For OpenLane, develop demos showing: The OpenLane flow and highight key features GUI visualizations Design Explorations and Experiments Different design styles and particular challenges

Comprehensive Flow Testing

Topics: Testing, Documentation, VLSI design basics
Skills: Knowledge of EDA tools, basics of VLSI design, tcl, shell scripts, Verilog, Layout
Difficulty: Medium
Size: Medium (175 hours)
Mentor: Indira Iyer

Develop detailed test plans to test the OpenLane flow to expand coverage and advanced features. Add open source designs to the regression test suite to improve tool quality and robustness. This includes design specification, configuration and creation of all necessary files for regression testing. Suggested sources : ICCAS benchmarks, opencores, LSOracle for synthesis flow option.

Enhance GUI features

Topics: GUI, Visualization, User Interfaces
Skills: C++, Qt
Difficulty: Medium
Size: Medium or Large (175 or 350 hours)
Mentor: Matt Liberty, Vitor Bandeira

For OpenROAD, develop and enhance visualizations for EDA data and algorithms in the OpenROAD GUI. Allow deeper understanding of the tool results for users and tool internals for developers.

Automate OpenDB code Generation

Topics: Database, EDA
Skills: C++, Python, JSON, Jinja templating
Difficulty: Medium
Size: Medium or Large (175 or 350 hours)
Mentor: Matt Liberty, Tom Spyrou

For OpenROAD- Automatic code generation for the OpenDB database which allows improvements to the data model with much less hand coding. Allow the generation of storage, serialization, and callback code from a custom schema description format. r

Implement an NLP based AI bot aimed at increasing users, enhancing usability and building a knowledge base

Topics: AI, ML, Analytics
Skills: Python. ML libraries (e.g., Tensorflow, PyTorch)
Difficulty: Medium
Size: Medium or Large (175 or 350 hours)
Mentor: Vitor Bandeira, Indira Iyer

The OpenROAD project contains a storehouse of knowledge in it’s Github repositories within Issues and Pull requests. Additionally, project related slack channels also hold useful information in the form of questions and answers, problems and solutions in conversation threads. Implement an AI analytics bot that filters, selects relevant discussions and classifies/records them into useful documentation and actionable issues. This should also directly track, increase project usage and report outcome metrics.

Package Management & Reproducibility

Mon, 07 Nov 2022 10:15:56 -0700

Project ideas related to reproducibility and package management, especially as it relates to store type package managers (NixOS, Guix or Spack).

Lead Mentor: Farid Zakaria mailto:fmzakari@ucsc.edu

Investigate the dynamic linking landscape

Topics: Operating Systems Compilers Linux Package Management NixOS
Skills: Experience with systems programming and Linux familiarity
Difficulty: Moderate to Challenging
Size: Large (350 hours)
Mentors: Farid Zakaria & Tom Scogland mailto:scogland1@llnl.gov

Dynamic linking as specified in the ELF file format has gone unchallenged since it’s invention. With many new package management models that eschew the filesystem hierarchy standard (i.e. Nix, Guix and Spack), many of the idiosyncrasies that define the way in which libraries are discovered are no longer useful and potentially harmful.

Specific tasks:

Continue development on Shrinkwrap a tool to make dynamic library loading simpler and more robust.
Evaluate it’s effectiveness across a wide range of binaries.
Upstream contributions to NixOS or Guix to leverage the improvement when suitable.
Investigate alternative improvements to dynamic linking by writing a dynamic linker “loadder wrapper” to explore new ideas.

Polyphorm / PolyPhy

Mon, 07 Nov 2022 10:15:56 -0700

Polyphorm is an agent-based system for reconstructing and visualizing optimal transport networks defined over sparse data. Rooted in astronomy and inspired by nature, we have used Polyphorm to reconstruct the Cosmic web structure, but also to discover network-like patterns in natural language data. You can find more details about our research here. Under the hood, Polyphorm uses a richer 3D scalar field representation of the reconstructed network, instead of a discrete representation like a graph or a mesh.

PolyPhy will be a Python-based redesigned version of Polyphorm, currently in the beginning of its development cycle. PolyPhy will be a multi-platform toolkit meant for a wide audience across different disciplines: astronomers, neuroscientists, data scientists and even artists and designers. All of the offered projects focus on PolyPhy, with a variety of topics including design, coding, and even research. Ultimately, PolyPhy will become a tool for discovering connections between different disciplines by creating quantitatively comparable structural analytics.

Develop website for PolyPhy

Topics: Web Development Dynamic Updates UX
Skills: web development experience, good communicator, (HTML/CSS), (Javascript)
Difficulty: Moderate
Size: Medium or large (175 or 350 hours)
Mentors: Oskar Elek

Develop a clean and welcoming website for the project. The organization needs to reflect the needs of PolyPhy users, but also provide a convenient entry point for interested project contributors. No excessive pop-ups or webjunk.

Specific tasks:

Work with mentors on understanding the context of the project.
Port the contents of the repository page to a dedicated website.
Design the structure of the website according to best OS practices.
Work with the visual designer (see below) in creating a coherent and organic presentation.
Interactively link important metrics from the project dev environment as well as documentation.

Design visual experience for PolyPhy’s website and presentations

Topics: Design Art UX
Skills: vector and bitmap drawing, sense for spatial symmetry and framing, (interactive content creation), (animation)
Difficulty: Moderate
Size: Medium (175 hours)
Mentors: Oskar Elek

Develop visual content for the project using its main themes: nature-inspired computation, biomimetics, interconnected structures. Aid in designing visual structure of the website as well as other public-facing artifacts.

Specific tasks:

Work with mentors on understanding the context of the project.
Design imagery and other graphical elements to visually (re-)present PolyPhy.
Work with the technical writer (see below) in designing a coherent story.
Work with the web developer (see above) in creating a coherent and organic presentation.

Write PolyPhy’s technical story and content

Topics: Writing Documentation Storytelling
Skills: experienced writing structured text over 10 pages, well read, (technical or scientific education)
Difficulty: Moderate
Size: Medium or Large (175 or 350 hours)
Mentors: Oskar Elek

Integral to PolyPhy’s presentation is a story that the users and the project contributors can relate to. The objective is to develop the verbal part of that story, as well as major portions of technical documentation that matches it. The difficulty of the project is scalable.

Specific tasks:

Work with mentors on understanding the context of the project.
Write different pages of the project website.
Work with mentors to improve project’s written community practices (diversity, communication).
Write and edit narrative and explanatory parts of PolyPhy’s documentation.
Work with the visual designer (see above) in designing a coherent story.

Video tutorials and presentation for PolyPhy

Topics: Video Presentation Tutorials Didactics
Skills: video editing, creating educational content, communication, (native or fluent in another language)
Difficulty: Easy-Moderate
Size: Medium or Large (175 or 350 hours)
Mentors: Oskar Elek, Drew Ehrlich

Create a public face for PolyPhy that reflects its history, context, and teaches its functionality to users in different degrees of familiarity.

Specific tasks:

Work with mentors on understanding the context and history of the project.
Interview diverse project contributors.
Create a video documenting PolyPhy’s history, with roots in astronomy, complex systems, fractals.
Create a set of tutorial videos for starting and intermediate PolyPhy users.
Create an accessible template for future tutorials.

Implement heterogeneous data I/O ops

Topics: I/O Operations File Conversion Numerics Testing
Skills: Python, experience working with scientific or statistical data, good debugging skills
Difficulty: Moderate-Challenging
Size: Medium or Large (175 or 350 hours)
Mentors: Oskar Elek, Anisha Goel

By default, PolyPhy operates with an unordered set of points as an input and scalar fields (float ndarrays) as an output, but others are applicable as well. Design and implement interfaces to load and export different data formats (CSV, OBJ, HDF5, FITS…) and modalities (points, meshes, density fields). The difficulty of the project can be scaled based on contributor’s interest.

Specific tasks:

Research which modalities are used by members of the target communities.
Implement modular loaders for the inputs and an interface to PolyPhy core.
Implement exporters for simulation datasets and visualization captures.
Write testing code for the above.
Integrate external packages as necessary.

Setup CI/CD for PolyPhy

Topics: Continuous Integration Continuous Deployment DevOps
Skills: experience with CI/CD, GitHub, Python package deployment
Difficulty: Moderate
Size: Large (350 hours)
Mentors: Oskar Elek, Anisha Goel

The objective is to setup a CI/CD pipeline that automates the build testing and deployment of the software. The resulting process needs to be robust to contributor errors and work in the distributed conditions of a diverse contributor base.

Specific tasks:

Automate continuous building, testing, merging and deployment for PolyPhy in GitHub.
Publish the CI/CD metrics and build assets to the project webpage.
Work with other contributors in educating them about the best practices of using the developed CI/CD pipeline.
Add support for automated packaging using common management systems (pip, Anaconda).

Refine PolyPhy’s UI and develop new functional elements

Topics: UI/UX Visual Experience
Skills: Python programming, UI/UX development experience, (knowledge of graphics)
Difficulty: Moderate
Size: Large (350 hours)
Mentors: Oskar Elek, David Abramov

The key feature of PolyPhy is its interactivity. By interacting with the underlying simulation model, the user can adjust its parameters in real time and respond to its behavior. For instance, an astrophysics expert can load a dataset of 100k galaxies and reconstruct the large-scale structure of the intergalactic medium. A responsive UI combined with real-time visualization allows them to judge the fidelity of the reconstruction and make necessary changes.

Specific tasks:

Implement a platform-agnostic UI to house PolyPhy’s main rendering context as well as secondary analytics.
Work with the visualization developer (see below) to integrate the rendering functionality.
Optimize to UI’s performance.
Test the implementation on different OS platforms.

Create new data visualization regimes

Topics: Interactive Visualization Data Analytics 3D Rendering
Skills: basic graphics theory and math, Python, GPU programming, (previous experience visualizing novel datasets)
Difficulty: Challenging
Size: Large (350 hours)
Mentors: Oskar Elek, David Abramov

Data visualization is one of the core components of PolyPhy, as it provides a real-time overview of the underlying MCPM simulation. Through the feedback provided by the visualization, PolyPhy users can adjust the simulation model and make new findings about the dataset. Various operations over the reconstructed data (e.g. spatial searching) as well as important statistical summaries also benefit from clear visual presentation.

Specific tasks:

Develop novel ways of visualizing scientific data in PolyPhy.
Work with diverse data modalities - point clouds, graphs, scalar and vector fields.
Add support for visualizing metadata, such as annotations and labels.
Create UI elements for plotting statistical summaries computed in real-time.

Discrete graph extraction from simulated scalar fields

Topics: Graph Theory Data Science
Skills: good understanding of discrete math and graph theory, Python, (GPU programming)
Difficulty: Challenging
Size: Large (350 hours)
Mentors: Oskar Elek, Farhanul Hasan

Develop a custom method for graph extraction from scalar field data produced by PolyPhy. Because PolyPhy typically produces network-like structures, representing these structures as weighted discrete graphs is very useful for efficiently navigating the data. The most important property of this abstracted representation is that it preserves the topology of the base scalar field by navigating the 1D ridges of the scalar field.

Specific tasks:

Become familiar with different algorithms for graph growing and skeleton extraction.
Implement the most suitable method in PolyPhy, interpreting the source scalar field as a throughput (transport) network. The weights of the resulting graph need to reflect the source throughputs between the respective node locations.
Implement common graph operations, e.g. hierarchical clustering and reduction, shortest path between two nodes, range queries.
Optimize the runtime of the implemented methods.
Work with the visualization developer (see above) to visualize the resulting graphs.

Proactive Data Containers (PDC)

Mon, 07 Nov 2022 10:15:56 -0700

Python interface to an object-centric data management system

Topics: Python, object-centric data management, PDC
Skills: Python, C, PDC
Difficulty: Medium
Size: Large (350 hours)
Mentor: Suren Byna, Houjun Tang

Proactive Data Containers (PDC) is an object-centric data management system for scientific data on high performance computing systems. It manages objects and their associated metadata within a locus of storage (memory, NVRAM, disk, etc.). Managing data as objects enables powerful optimization opportunities for data movement and transformations, and storage mechanisms that take advantage of the deep storage hierarchy and enable automated performance tuning. Currently PDC has a C interface. Providing a python interface would make it easier for more Python applications to utilize it.

Skyhook Data Management

Mon, 07 Nov 2022 10:15:56 -0700

SkyhookDM

The Skyhook Data Management project extends object storage with data management functionality for tabular data. SkyhookDM enables storing and query tabular data in the Ceph distributed object storage system. It thereby turns Ceph into an Apache Arrow-native storage system, utilizing the Arrow Dataset API to store and query data with server-side data processing, including selection and projection that can significantly reduce the data returned to the client.

SkyhookDM is now part of Apache Arrow (see blog post).

Support reading from Skyhook in Dask/Ray using the Arrow Dataset API

Topics: Arrow, Dask/Ray
Skills: C++
Size: 175 hours
Difficulty: Medium

Mentor: Jayjeet Chakraboorty

Problem: Dask and Ray are parallel-computing frameworks similar to Apache Spark but in a Python ecosystem. Each of these frameworks support reading tabular data from different data sources such as a local filesystem, cloud object stores, etc. These systems have recently added support for the Arrow Dataset API to read data from different sources. Since, the Arrow dataset API supports Skyhook, we can leverage this capability to offload compute-heavy Parquet file decoding and decompression into the Ceph storage layer. This can help us speed up the queries significantly as CPU will get freed up in the Dask/Ray workers for other processing tasks.

Implement Gandiva based query executor in SkyhookDM

Topics: Arrow, Gandiva, SIMD
Skills: C++
Size: 350 hours
Difficulty: Hard

Mentor: Jayjeet Chakraboorty

Problem: Gandiva allows efficient evaluation of query expressions using runtime code generation using LLVM. The generated code leverages SIMD instructions and is highly optimized for parallel processing in modern CPUs. It is natively supported by Arrow for compiling and executing expressions. SkyhookDM currently uses the Arrow Dataset API (which internally uses Arrow Compute APIs) to execute query expressions inside the Ceph OSDs. Since, the Arrow Dataset API particularly does not support Gandiva currently, the goal of this project is to add support for Gandiva in the Arrow Dataset API in order to accelerate query processing when offloaded to the storage layer. This will help Skyhook combat some of the peformance issues due to the inefficient serialization interface of Arrow.

References:

Add Ability to create and save views from Datasets

Topics: Arrow, Database views, virtual datasets
Skills: C++
Size: 175 hours
Difficulty: Medium

Mentor: Jayjeet Chakraboorty

Problem - Workloads may repeat the same or similar queries over time. This causes repetition of IO and compute operations, wasting resources. Saving previous computation in the form of materialized views can provide benefit for future workload processing. Solution - Add a method to the Dataset API to create views from queries and save the view as an object in a separate pool with some object key that can be generated from the query that created it.

Reference: https://docs.dremio.com/working-with-datasets/virtual-datasets.html

Integrating Delta Lake on top of SkyhookDM

Topics: data lakes, lake house, distributed query processing
Skills: C++
NSize: 175 or 350 hours
Difficulty: Medium

Mentor: Jayjeet Chakraboorty

Delta Lake is a new architecture for querying big data lakes through Spark, providing transactions. An important benefit of this integration will be to provide an SQL interface for SkyhookDM functionality, through Spark SQL. This project will further build upon our current work connecting Spark to SkyhookDM through the Arrow Dataset API. This would allow us to run some of the TPC-DS queries (popular set of SQL queries for benchmarking databases) on SkyhookDM easily.

Reference: [Delta Lake paper] (https://databricks.com/jp/wp-content/uploads/2020/08/p975-armbrust.pdf)

Efficient Communication with Key/Value Storage Devices

Sun, 27 Feb 2022 00:00:00 +0000

Network key value stores are used throughout the cloud as a storage backends (eg AWS ShardStore) and are showing up in devices (eg NVMe KV SSD). The KV clients use traditional network sockets and POSIX APIs to communicate with the KV store. An advancement that has occurred in the last 2 years is a new kernel interface that can be used in lieu of the POSIX API, namely io_uring. This new interface uses a set of shared memory queues to provide for kernel-to-user communication and permits zero copy transfer of data. This scheme avoids the overhead of system calls and can improve performance.

Implement `io_uring` communication backend

Topics: performance, I/O, network, key-value, storage
Difficulty: Medium
Size: Medium or large (120 or 150 hours)
Mentors: Philip Kufeldt (Seagate), Aldrin Montana (UC Santa Cruz) Contributor(s): Manank Patel

Seagate has been using a network-based KV HDD as a research vehicle for computational storage. This research vehicle uses open-source user library that implements a KV API by sending network protobuf-based RPCs to a network KV store. Currently it is implemented with the standard socket and POSIX APIs to communicate with the KV backend. This project would implement an io_uring communication backend and compare the results of both implementations.

DirtViz 2.0 (2023)

Mon, 07 Feb 2022 00:00:00 +0000

DirtViz is a project to visualize data collected from sensors deployed in sensor networks. We have deployed a number of sensors measuring qualities like soil moisture, temperature, current and voltage in outdoor settings. This project involves extending our existing visualization stack, DirtViz 1.0 (see github), and expanding it to version 2.0. The project goal is to create a fully-fledged dataviz tool tailored to the types of data collected from embedded systems sensor networks.

Visualize Sensor Data

Topics: Data Visualization, Analytics
Skills: javascript, python, bash, webservers, git, embedded systems
Difficulty: Easy/Moderate
Size: Large, 350 hours
Mentors: Colleen Josephson, Sonia Naderi, Stephen Taylor, John Madden

Specific tasks:

Refine our web-based visualization tools to easily allow users to zoom in on date ranges, change axes, etc.
Create a system for remote collaborators/citizen scientists to upload their own data in a secure manner
Craft an intuitive navigation system so that data from deployment sites around the world can be easily viewed
Document the tool thoroughly for future maintenance
If interested, we are also open to you investigating correlations between different data streams and doing self-directed data analysis