osre26 | UCSC OSPO

NETAI: AI-Powered Network Anomaly Detection and Diagnostics Platform

Thu, 05 Feb 2026 00:00:00 +0000

NETAI (Network AI) is an AI-powered network anomaly detection and diagnostics platform for the National Research Platform (NRP). This project combines Kubernetes-native LLM integration, network performance monitoring, and predictive analytics to create an intelligent assistant for network operators. Students will work with cutting-edge technologies including Large Language Models (LLMs), Kubernetes, perfSONAR network measurements, time-series analysis, and containerized AI/ML workloads, while contributing to real-world applications in network operations and diagnostics.

The project involves developing a Kubernetes chatbot that leverages NRP’s managed LLM service (providing access to models like Qwen3-VL, GLM-4.7, and GPT-OSS) to help network operators understand complex network behaviors, diagnose anomalies, and receive natural language explanations of network issues. Students will integrate perfSONAR measurement data with traceroute path analysis to create an interactive network topology visualization, and develop AI/ML models for predictive network performance analysis using NRP’s GPU resources.

In addition, students will gain hands-on experience with fine-tuning LLMs on historical network diagnostics data, developing time-series forecasting models for network metrics, and implementing anomaly detection using deep learning techniques. The entire AI/ML pipeline will be containerized and deployed as Kubernetes workloads, utilizing GPU-enabled pods for model training and inference, ensuring scalability and seamless integration with existing NRP infrastructure.

The platform builds upon existing network diagnostics capabilities, combining end-to-end throughput measurements with detailed traceroute data to enable operators to visualize network paths, identify performance bottlenecks, and understand relationships between metrics and underlying infrastructure. The AI enhancement will provide predictive capabilities, automated incident reporting, and intelligent recommendations for network remediation strategies.

NETAI / LLM Integration & Kubernetes Chatbot

The proposed work includes developing a Kubernetes-native chatbot that integrates with NRP’s managed LLM service to provide intelligent network diagnostics assistance. Students will create a conversational interface that can answer questions about network performance, explain anomalies in natural language, and suggest remediation strategies. They will fine-tune LLMs on historical network diagnostics data, test results, and traceroute information to create domain-specific assistants. Students will implement RESTful APIs for chatbot interactions, develop prompt engineering strategies for network diagnostics, and create context-aware responses that incorporate real-time network telemetry. The chatbot will be deployed as Kubernetes services, utilizing GPU pods for inference and integrating with the existing diagnostics platform.

Topics: Large Language Models, Kubernetes, Chatbots, Natural Language Processing, Network Diagnostics, API Development
Skills: Python, Kubernetes, LLM APIs (Qwen3-VL, GLM-4.7, GPT-OSS), Prompt Engineering, REST APIs, Docker, GPU Computing
Difficulty: Hard
Size: Large (350 hours)
Mentors: Dmitry Mishin, Derek Weitzel

NETAI / Network Anomaly Detection Models

The proposed work includes developing deep learning models for network anomaly detection using historical perfSONAR and traceroute data. Students will create models that can identify slow links, high packet loss, excessive retransmits, and failed network tests automatically. They will implement anomaly detection algorithms using techniques such as autoencoders, LSTM networks, and transformer architectures. Students will train models on NRP’s GPU clusters using historical network telemetry stored in SQLite databases, develop feature engineering pipelines for network metrics, and create real-time inference services deployed as Kubernetes workloads. The models will be integrated into the diagnostics platform to provide automated anomaly detection alongside the interactive visualization.

Topics: Deep Learning, Anomaly Detection, Time-Series Analysis, Network Monitoring, Model Training, GPU Computing
Skills: Python, PyTorch/TensorFlow, scikit-learn, Pandas, NumPy, SQLite, Kubernetes, GPU Pods, MLOps
Difficulty: Hard
Size: Large (350 hours)
Mentors: Dmitry Mishin, Derek Weitzel

NETAI / Predictive Analytics & Forecasting

The proposed work includes developing predictive models that can forecast network performance degradation and identify patterns in network anomalies before they impact users. Students will create time-series forecasting models for network metrics such as throughput, latency, and packet loss, using techniques like ARIMA, Prophet, and deep learning-based forecasting. They will implement few-shot learning approaches to adapt models to new network topologies and measurement patterns, develop early warning systems for potential network issues, and create automated incident report generation using LLMs. Students will leverage NRP’s GPU resources for training forecasting models and deploy them as Kubernetes services for real-time predictions integrated with the diagnostics dashboard.

Topics: Time-Series Forecasting, Predictive Analytics, Machine Learning, Network Performance, Early Warning Systems, LLM Integration
Skills: Python, PyTorch/TensorFlow, Prophet, ARIMA, Pandas, NumPy, Time-Series Analysis, Kubernetes, GPU Computing
Difficulty: Hard
Size: Large (350 hours)
Mentors: Dmitry Mishin, Derek Weitzel

NETAI / Kubernetes Deployment & Infrastructure

The proposed work includes setting up Kubernetes-based infrastructure for deploying the entire NETAI platform, including LLM services, ML models, and the diagnostics dashboard. Students will create Helm charts for deploying containerized AI/ML workloads, configure GPU-enabled pods for model training and inference, and implement persistent storage solutions for maintaining historical network telemetry. They will develop GitLab CI/CD pipelines for automated testing and deployment, set up monitoring and observability using Prometheus and Grafana for tracking model performance and resource usage, and create scalable deployment strategies that leverage NRP’s distributed computing resources. Students will also integrate the platform with existing perfSONAR infrastructure and ensure seamless operation within the NRP cluster.

Topics: Kubernetes, DevOps, CI/CD, GPU Computing, Container Orchestration, Infrastructure as Code, Monitoring
Skills: Kubernetes, Helm, GitLab CI/CD, Prometheus, Grafana, Docker, GPU Pods, Persistent Storage, Infrastructure Automation
Difficulty: Medium to Hard
Size: Large (350 hours)
Mentors: Dmitry Mishin, Derek Weitzel

Project Resources

National Research Platform: https://nrp.ai/
NRP LLM Service: https://nrp.ai/documentation/userdocs/ai/llm-managed/
perfSONAR: https://www.perfsonar.net/
MaDDash: https://github.com/esnet/maddash
Network Monitoring Documentation: https://nrp.ai/documentation/

Background

This project addresses critical gaps in network performance monitoring for the National Research Platform by integrating AI/ML capabilities with existing perfSONAR-based diagnostics. The platform combines end-to-end network measurements with detailed path-level analysis, enhanced by intelligent AI assistants that can help operators understand complex network behaviors and predict potential issues. By leveraging NRP’s managed LLM service and GPU resources, students will create a Kubernetes-native system that scales across the distributed research network infrastructure, providing both real-time diagnostics and predictive analytics to improve network reliability and performance for researchers nationwide.

VINE: Precision Agriculture Data Platform & Digital Twin

Thu, 05 Feb 2026 00:00:00 +0000

VINE (Vineyard Intelligence Network & Environment) is an AI/ML research project focused on precision agriculture using the National Research Platform (NRP). This project leverages the innovative demonstration at Iron Horse Vineyards to study how AI and machine learning can optimize agricultural practices through data-driven insights. Students will work with cutting-edge AI/ML technologies, distributed computing on NRP, and large-scale data analysis, while contributing to real-world applications in sustainable agriculture and climate adaptation.

The project involves AI/ML research using agricultural data from Iron Horse Vineyards, leveraging the computational resources of the National Research Platform for training and deploying machine learning models. Students will work with agricultural datasets including sensor data, multi-spectral drone imagery, and historical records, developing models for predictive analytics, computer vision, and time-series forecasting. The integration of NRP’s distributed infrastructure enables scalable AI research that can process large volumes of sensor data, multi-spectral imagery, and historical agricultural records.

Students will gain hands-on experience with AI/ML model development for agricultural applications, learning how to analyze multi-spectral drone imagery, process time-series sensor data, and build predictive models for irrigation scheduling, pest detection, and harvest timing. They will deploy and train models on NRP’s Kubernetes clusters, utilize GPU resources for deep learning workloads, and work with agricultural datasets for comprehensive research. The project emphasizes using distributed computing on NRP to scale AI/ML experiments and create open, shareable datasets for collaborative research.

The platform builds upon the success demonstrated at Iron Horse Vineyards, where AI-driven analytics have shown potential for 10% water use reduction and improved yield optimization. This project aims to advance AI/ML research in precision agriculture by utilizing NRP’s computational capabilities, creating reproducible research that can benefit the broader agricultural and research communities.

VINE / Data Pipeline & Integration

The proposed work includes building data pipelines to ingest, process, and prepare agricultural data from Iron Horse Vineyards and other sources for AI/ML research. Students will develop pipelines to collect sensor data (soil moisture, temperature, CO2, weather), multi-spectral drone imagery, and historical agricultural records. They will create data validation and quality assurance processes, implement data preprocessing for ML model training, and develop data integration workflows that connect agricultural datasets with NRP computational resources. Students will also work on data sharing mechanisms to make processed datasets available for the research community.

Topics: Data Engineering, Time-Series Data, Data Preprocessing, Data Sharing, ML Data Pipelines
Skills: Python, Pandas, NumPy, Data Validation, REST APIs, Docker, Kubernetes, Data Processing
Difficulty: Medium to Hard
Size: Large (350 hours)
Mentors: Mohammad Firas Sada

VINE / AI/ML Models for Agricultural Analytics on NRP

The proposed work includes developing and training machine learning models for agricultural applications using the National Research Platform (NRP). Students will create models for predictive irrigation scheduling based on soil moisture, weather forecasts, and historical data. They will develop computer vision models for analyzing multi-spectral drone imagery to detect plant health, identify pests, and estimate yield. Students will also work on time-series forecasting models for predicting harvest timing and optimizing resource allocation. The project will involve training models on NRP’s GPU clusters, utilizing distributed training capabilities, and deploying models for real-time inference. Students will leverage agricultural datasets for training and validation, and contribute model outputs and insights for the research community.

Topics: Machine Learning, Computer Vision, Time-Series Analysis, Predictive Analytics, Agricultural AI, Distributed Training
Skills: Python, PyTorch/TensorFlow, scikit-learn, OpenCV, Pandas, NumPy, MLOps, NRP Kubernetes, GPU Computing
Difficulty: Hard
Size: Large (350 hours)
Mentors: Mohammad Firas Sada

VINE / Digital Twin & AI-Driven Visualization

The proposed work includes creating AI-enhanced digital twin systems for agricultural sites using computational resources on NRP. Students will develop 3D visualization systems (potentially using Omniverse or similar platforms) to represent vineyards and farms, integrate AI model predictions into the digital twin for real-time insights, and create interactive dashboards for monitoring and analysis. They will implement spatial data processing using ML models to map sensor locations and readings to geographic coordinates, and develop AI-driven simulation capabilities for testing different agricultural strategies (irrigation patterns, planting layouts, etc.) before implementation. Students will deploy visualization services on NRP infrastructure and integrate with agricultural data sources for real-time updates.

Topics: Digital Twin, AI-Enhanced Visualization, GIS, Spatial Data, ML-Driven Simulation, Real-Time Systems
Skills: Python, 3D Graphics (Omniverse/Unity/Blender), GIS tools, WebGL, React/Three.js, ML Integration, NRP Deployment
Difficulty: Hard
Size: Large (350 hours)
Mentors: Mohammad Firas Sada

VINE / Web Dashboard & NRP Integration Platform

The proposed work includes building a comprehensive web dashboard for visualizing agricultural data, AI model predictions, and research insights. Students will develop a full-stack web application using modern frameworks (React, Flask/FastAPI) deployed on the National Research Platform (NRP). The dashboard will display real-time sensor readings, historical trends from agricultural datasets, AI model predictions, and digital twin visualizations. Students will create API endpoints that integrate with NRP computational resources and agricultural data sources, implement role-based access control for researchers, and enable data export/sharing with the broader research community. The platform will support interactive data exploration tools and provide programmatic access to AI/ML models running on NRP.

Topics: Full-Stack Web Development, Data Visualization, API Development, NRP Deployment, ML Model Serving
Skills: React, Flask/FastAPI, PostgreSQL, D3.js/Plotly, Bootstrap/Tailwind CSS, REST APIs, Kubernetes, NRP APIs
Difficulty: Medium to Hard
Size: Large (350 hours)
Mentors: Mohammad Firas Sada

Project Resources

National Research Platform: https://nrp.ai/
Iron Horse Vineyards Project: https://gitlab.nrp-nautilus.io/ihv
Omniverse Integration: https://gitlab.nrp-nautilus.io/omniverse
CENIC Network: https://cenic.org/
CENIC Precision Agriculture Blog: https://nrp.ai/cenic-precision-agriculture-2025

Background

This project builds upon the successful demonstration at Iron Horse Vineyards, where CENIC, UC San Diego, and partners have created a living laboratory for precision agriculture. The VINE project focuses on AI/ML research using the National Research Platform (NRP) for computational resources. By leveraging NRP’s distributed infrastructure and GPU clusters, students can train and deploy sophisticated ML models for agricultural applications. The project works with agricultural datasets from Iron Horse Vineyards and aims to create open, shareable datasets for the research community. This approach creates a scalable, reproducible framework for AI/ML research in precision agriculture that can benefit researchers, educators, and practitioners nationwide.

Reconfigurable and Placement-Aware Replication for Edge Systems

Sat, 31 Jan 2026 00:00:00 +0000

Project Description

Topics: Distributed systems
Skills: Rust, Java, Go, Python, Bash scripting, Linux, Docker.
Difficulty: Hard
Size: Large (350 hours)
Mentors: Fadhil I. Kurnia

Modern replicated systems are typically evaluated under static configurations with fixed replica placement. However, real-world edge deployments are highly dynamic: workloads shift geographically, edge nodes join or fail, and latency conditions change over time. Our existing testbed provides reproducible evaluation for replicated systems but lacks support for dynamic reconfiguration and adaptive edge placement policies.

This project extends the existing open testbed to support:

Dynamic Replica Reconfiguration
- Membership changes (add/remove replicas)
- Leader migration and shard movement
- Online reconfiguration cost measurement (latency spikes, recovery overhead, state transfer cost)
Edge-Aware Placement Policies
- Demand-aware placement based on geographic workload skew
- Latency-aware and bandwidth-aware replica selection
- Comparison of static vs. adaptive placement strategies
- Evaluation under real-world latency matrices (e.g., US metro-level or cloud region traces)
What-if Simulation Framework
- Replay workload traces with time-varying demand
- Simulate hundreds of edge sites with realistic network conditions
- Quantify trade-offs between consistency, availability, reconfiguration overhead, and cost

The outcome will be an open-source framework that enables researchers to evaluate not only steady-state replication performance, but also how systems behave under churn, scaling events, and demand shifts. They are central challenges in real edge environments.

Expected Deliverables

Reconfiguration abstraction layer (API for membership & placement changes)
Placement policy plugin framework (k-means, facility-location heuristics, latency-minimizing, cost-aware)
Trace-driven dynamic workload engine
Public benchmark scenarios and reproducible experiment scripts
Artifact-ready documentation and evaluation report

AI Data Readiness Inspector (AIDRIN)

Fri, 30 Jan 2026 10:15:00 -0700

Garbage In, Garbage Out (GIGO) is a widely accepted quote in computer science across various domains, including Artificial Intelligence (AI). As data is the fuel for AI, models trained on low-quality, biased data are often ineffective. Computer scientists who use AI invest considerable time and effort in preparing the data for AI.

AIDRIN (AI Data Readiness INspector) is a framework that provides a quantifiable assessment of data readiness for AI processes, covering a broad range of dimensions from the literature. AIDRIN uses metrics from traditional data quality assessment, such as completeness, outliers, and duplicates, to evaluate data. Furthermore, AIDRIN uses metrics specific to assessing AI data, such as feature importance, feature correlations, class imbalance, fairness, privacy, and compliance with the FAIR (Findability, Accessibility, Interoperability, and Reusability) principles. AIDRIN provides visualizations and reports to assist data scientists in further investigating data readiness.

AIDRIN Multiple File Formats

The proposed work will include improvements in the AIDRIN framework to (1) add support for new file formats such as Zarr, ROOT, and HDF5; and (2) to allow providing custom data ingestion mechanisms.

Topics: data readiness, AI, data analysis
Skills: Python, C/C++, data analysis, good communicator
Difficulty: Moderate
Size: Large (350 hours)
Mentors: Jean Luca Bez and Suren Byna

Drishti

Fri, 30 Jan 2026 10:15:00 -0700

Drishti is a novel interactive web-based analysis framework to visualize I/O traces, highlight bottlenecks, and help understand the I/O behavior of scientific applications. Drishti aims to fill the gap between the trace collection, analysis, and tuning phases. The framework contains an interactive I/O trace analysis component for end-users to visually inspect their applications’ I/O behavior, focusing on areas of interest and getting a clear picture of common root causes of I/O performance bottlenecks. Based on the automatic detection of I/O performance bottlenecks, our framework maps numerous common and well-known bottlenecks and their solution recommendations that can be implemented by users.

Drishti Comparisons and Heatmaps

The proposed work will include investigating and building a solution to allow comparing and finding differences between two I/O trace files (similar to a diff), covering the analysis and visualization components. It will also explore additional metrics and counters such as Darshan heatmaps in the analysis and visualization components of the framework.

Topics: I/O, HPC, data analysis, visualization, profiling, tracing
Skills: Python, data analysis, performance profiling
Difficulty: Moderate
Size: Large (350 hours)
Mentors: Jean Luca Bez and Suren Byna

EnergyAPI: An End-to-End API for Energy-Aware Forecasting and Scheduling

Fri, 30 Jan 2026 00:00:00 +0000

Over the past decades, electricity demand has increased steadily, driven by structural shifts such as the electrification of transportation and, more recently, the rapid expansion of artificial intelligence (AI). Power grids have responded by expanding generation capacity, integrating renewable energy sources such as solar and wind, and deploying demand-response mechanisms. However, the current pace of demand growth is increasingly outstripping grid expansion, leading to integration delays, greater reliance on behind-the-meter consumption, and rising operational complexity.

To mitigate the environmental and socioeconomic impacts of electricity consumption, large consumers such as cloud data centers and electric vehicle (EV) charging infrastructures are increasingly participating in demand-response programs. These programs provide consumers with grid signals indicating favorable periods for electricity usage, such as when energy is cheapest or has the lowest carbon intensity. Consumers can then shift workloads across time and location to better align with grid conditions and their own operational constraints. A key challenge, however, is the online nature of this problem: operators must make real-time decisions without full knowledge of future grid conditions. While forecasting and optimization techniques exist, their effectiveness depends heavily on workload characteristics, such as whether tasks are delay-tolerant cloud jobs or EV charging sessions with route and deadline constraints.

This project proposes the design and implementation of a modular, extensible API for energy-aware workload scheduling. The API will ingest grid signals alongside workload Service Level Objectives (SLOs) and operational requirements, and produce execution plans that adapt to changing grid conditions. It will support multiple pluggable scheduling strategies and heuristics, enabling developers to compare real-time and forecast-based approaches across different workload classes. By providing a reusable, open-source interface for demand-response-aware scheduling, this project aims to lower the barrier for developers to integrate energy-aware decision-making into distributed systems and applications.

Building an End-to-End Service for Energy Forecasting and Scheduling

Topics: Databases Machine Learning
Skills: Python, command line tools (bash), SQL (MySQL or SQLite), FastAPI, time-series analysis, basic machine learning
Difficulty: Moderate
Size: Large (350 hours)
Mentors: Abel Souza

Develop a containerized, end-to-end platform consisting of a backend, API, and web-based frontend for collecting, estimating, and visualizing real-time and forecasted electrical grid signals. These signals include electricity demand, prices, energy production, grid saturation, and carbon intensity. The system will support scalable data ingestion, region-specific forecasting models, and interactive visualizations to enable energy-aware application development and analysis.

Tasks:

Study electrical grid signals and demand-response data sources (e.g., demand, price, carbon intensity, grid saturation) and identify their requirements for real-time and forecast-based consumption planning.
Design and implement a relational data model for storing historical, real-time, and forecasted grid signals.
Ingest and validate grid signal data into a MySQL or SQLite database, ensuring data quality and time alignment across regions.
Implement baseline time-series forecasting models for grid signals (e.g., demand, price, or carbon intensity), with support for region-specific configurations.
Query European Network of Transmission System Operators for Electricity (ENTSO-E) and EIA (Energy Information Administration (EIA)) APIs to collect grid data.
Develop a RESTful API that exposes both raw and forecasted grid signals for use by energy-aware applications and schedulers.
Build a web-based user interface to visualize historical trends, forecasts, and regional differences in grid conditions.
Implement an interactive choropleth map to display spatial variations in grid signals such as carbon intensity and electricity prices.
Design an extensible architecture that allows different regions to plug in custom forecasting models or heuristics.
Containerize the backend, API, and frontend components using Docker to enable reproducible deployment and easy integration by external users.

Environmental NeTworked Sensor (ENTS)

Fri, 30 Jan 2026 00:00:00 +0000

ENTS I: Usability improvements for visualization dashboard

Topics: Data Visualization, Backend, Frontend, UI/UX, Analytics
Skills:
- Required: React, Javascript, Python, SQL, Git
- Nice to have: Flask, Docker, CI/CD, AWS, Authentication
Difficulty: Medium
Size: Large (350 hours)
Mentors: Colleen Josephson, Alec Levy, John Madden

The Environmental NeTworked Sensor (ENTS) platform, formally Open Sensing Platform (OSP), implements data visualization website for monitoring microbial fuel cell sensors (see GitHub). The mission is to scale up the current platform to support other researchers or citizen scientists in integrating their novel sensing hardware or microbial fuel cell sensors for monitoring and data analysis. Examples of the types of sensors currently deployed are sensors measuring soil moisture, temperature, current, and voltage in outdoor settings. The focus of the software half of the project involves building upon our existing visualization web platform, and adding additional features to support the mission. A live version of the website is available here.

Below is a list of project ideas that would be beneficial to the ENTS project. You are not limited to the following projects, and encourage new ideas that enhance the platform:

Drag and drop charts functionality
Creation of unique charts by users (with unique equations)
Customizable options of charts (color, line width, datapoint/line style, axis labels)
Exportable charts (with customizable options)
Saving layouts via url

ENTS II: Migration to TockOS

Topics: Embedded system, operating system
Skills:
- Required: Rust, C/C++, Git, Github
- Nice to have: STM32 HAL, python
Difficulty: Hard
Size: Large (350 hours)
Mentors: Colleen Josephson, John Madden

The current version of the hardware firmware is implemented in baremetal through the use of STM hardware abstraction layer (HAL) drivers. We are interested in porting the firmware implementation to an operating system (OS) to allow for additional functionality to support environmental data logging. TockOS is an embedded operating system designed for running multiple concurrent, mutually distrustful applications on low-memory and low-power microcontrollers that will be used. TockOS allows for OTA updates, dynamic app loading, hardware multiplexing, and more. We envision multiple users utilizing shared ENTS hardware that provides communication and measurement capabilities. Thus, the initial cost of deploying wireless sensor networks is reduced.

The TockOS kernel is written in Rust to enhance security. Userspace apps can be written in either C, C++, or Rust. Development will be done through a remote development server to access the hardware. See the following repos for the current status of the project:

Userspace library: libtock-c
Kernel: tock
Baremetal: ENTS-node-firmware

Scope of work:

Writing kernel peripheral drivers.
- Done entirely in Rust.
- Low-level understanding of microcontroller
- Basic kernel functionality knowledge.
Porting baremetal components to userland apps.
- Involves porting STM HAL calls to TockOS syscalls.
- Primarily done in C.
- Understanding of syscalls.

Reproducible CXL Emulation

Fri, 30 Jan 2026 00:00:00 +0000

Compute Express Link (CXL) is an emerging memory interconnect standard that enables shared, coherent memory across CPUs, accelerators, and multiple hosts, unlocking new possibilities in hyperscale, HPC, and disaggregated systems. However, because access to real multi-host CXL hardware is limited, it is difficult for researchers and students to experiment with, evaluate, and reproduce results on advanced CXL topologies. OCEAN (Open-source CXL Emulation At Hyperscale) [https://github.com/cxl-emu/OCEAN] is a full-stack CXL emulation platform built on QEMU that enables detailed emulation of CXL 3.0 memory systems, including multi-host shared memory pools, coherent fabric topologies, and latency modeling. This project will create reproducible experiment pipelines, automated deployment workflows, and user-friendly tutorials so that others can reliably run and extend CXL emulation experiments without requiring specialized hardware.

Reproducible CXL Emulation for Multi-Host Memory Systems

Streamline multi-host CXL emulation without specialized hardware.

Topics: CXL emulation Memory Systems Reproducibility
Skills: C/C++, Virtualization (QEMU), Scripting, Performance Modeling
Difficulty: Medium
Size: Large (350 hours)
Mentors: Mujahid Al Rafi, Luanzheng "Lenny" Guo.

Tasks:

Create automated deployment scripts and configuration templates for OCEAN-based CXL emulation topologies (single-host and multi-host).
Develop a standardized experiment harness for running memory performance benchmarks (e.g., OSU micro-benchmarks, STREAM-style tests) in emulated CXL environments.
Build reproducible experiment pipelines that others can run to evaluate latency, bandwidth, and scaling properties of CXL memory systems.
Produce tutorials, documentation, and reproducibility artifacts to guide new users through setup, execution, and analysis.
Package and contribute all scripts, configurations, and documentation back to the OCEAN open-source repository.

Exploring Security and Isolation in CXL-Based Memory Systems

Investigate security and isolation properties of CXL-based memory systems using software emulation.

Topics: CXL Systems Security Memory Isolation Side Channel Emulation
Skills: C/C++, Virtualization (QEMU), Scripting, Computer Architecture, Security
Difficulty: Medium
Size: Large (350 hours)
Mentors: Mujahid Al Rafi, Luanzheng "Lenny" Guo.

Tasks:

Study the CXL memory model and fabric architecture to identify potential security and isolation risks in multi-host shared memory environments (e.g., contention, timing variation, and resource interference).
Set up multi-host or multi-VM CXL emulation environments using OCEAN that mimic realistic multi-tenant deployments.
Design and implement reproducible micro-benchmarks to measure timing, bandwidth contention, or observable interference through shared CXL memory pools.
Analyze how fabric configuration choices (e.g., topology, latency injection, memory partitioning, or allocation policies) affect isolation and leakage behavior.
Explore and prototype mitigation strategies—such as memory partitioning, throttling, or policy-driven allocation—and evaluate their effectiveness using the emulation platform.

Omni-ST: Instruction-Driven Any-to-Any Multimodal Modeling for Spatial Transcriptomics

Thu, 29 Jan 2026 00:00:00 +0000

Project description

Spatial transcriptomics (ST) integrates spatially resolved gene expression with tissue morphology, enabling the study of cellular organization, tissue architecture, and disease microenvironments. Modern ST datasets are inherently multimodal, combining histology images (H&E / IF), gene expression vectors, spatial graphs, cell annotations, and free-text pathology descriptions.

However, most existing ST methods are task-specific and modality-siloed: separate models are trained for image-to-gene prediction, spatial domain identification, cell type classification, or text-based interpretation. This fragmentation limits cross-task generalization and scalability.

Omni-ST proposes a single instruction-driven any-to-any multimodal backbone that treats each spatial transcriptomics modality as a “language” and formulates all tasks as:

Instruction + Input Modality → Output Modality

Natural language is elevated from auxiliary metadata to a unifying interface that specifies task intent, target modality, and biological context. This paradigm enables flexible, interpretable, and extensible spatial reasoning within a single model.

Project Idea: Instruction-Driven Any-to-Any Modeling for Spatial Transcriptomics

Topics: spatial transcriptomics, multimodal learning, instruction tuning, computational pathology
Skills: PyTorch, deep learning, Transformers, multimodal representation learning
Difficulty: Hard
Size: 350 hours

Mentor:

Xi Li — mailto:xil43@uci.edu

Essential information:

Design a unified multimodal backbone with lightweight modality adapters for histology images, gene expression vectors, spatial graphs, and text.
Use natural language instructions to condition model behavior, enabling any-to-any translation without task-specific heads.
Support core tasks including image → gene expression prediction, gene expression → cell type / spatial domain identification, region → text-based biological explanation, and text-based spatial retrieval.
Evaluate the model across multiple spatial transcriptomics tasks within a single framework, emphasizing generalization and interpretability.
Develop visualization and interpretation tools such as spatial maps and language-grounded explanations.

Expected deliverables:

An open-source PyTorch implementation of the Omni-ST framework.
Unified multitask benchmarks for spatial transcriptomics.
Visualization and interpretation tools for spatial predictions.
Documentation and tutorials demonstrating how to add new tasks via instructions.

StatWrap

Thu, 29 Jan 2026 00:00:00 +0000

StatWrap is a free and open-source assistive, non-invasive discovery and inventory tool to document research projects. It inventories project assets (e.g., code files, data files, manuscripts, documentation) and organizes information without additional input from the user. It also provides structure for users to add searchable and filterable notes connected to files to help communicate metadata about intent and analysis steps.

At its core, StatWrap helps investigators identify and track changes in a research project as it evolves - which may affect reproducibility. For example: (1) people on the project can change over time, so processes may not be consistently executed due to transitions in employment; (2) data changes over time, due to accruing additional cases, adding new variables, or correcting mistakes in existing data; (3) software (e.g. used for data preparation and statistical analysis) evolves as it is edited, improved, and optimized; and (4) software can break or produce different results due to changes ‘under the hood’ such as updates to statistical packages, compilers, or interpreters. StatWrap passively and actively documents these changes to support reproducibility.

Additional information:

Group and Individual Customizations

Topics: configuration, user interface
Skills: JavaScript, React
Difficulty: Medium
Size: Large (350 hours)
Mentor: Luke Rasmussen, Eric Whitley

The goal of this project is to expand the existing capabilities of StatWrap to provide more flexibility to individual users and groups. Currently, features within StatWrap such as the directory template for creating new projects and the reproducibility checklist are static, meaning everyone who downloads StatWrap has the same configuration. However, each user and team work differently and should be able to configure StatWrap to support their needs.

When a user creates a new project, StatWrap provides a collection of project templates. These create a directory hierarchy, along with some seed files (e.g., a README.md file in the project root). Different groups have their own conventions for creating project directories. While StatWrap can be released with additional project templates defined, there are many situations in which users would want to keep their project template local. StatWrap should allow a user to create a project template configuration, from scratch or being seeded by the contents of an existing project. A user should then be able to export this configuration, share it with others, and other user should have the ability to import the configuration into their instance of StatWrap.

Similarly, StatWrap provides a reproducibility checklist that includes six existing checklist items. However, individual users and groups may have their own checklists, including institution-specific steps. Similar to the project template, a user should be able to configure additional items for the checklist. A user should be able to create a “checklist template” that can be used and applied in multiple projects. A specific project’s template should also be modifiable once the checklist has been created.

The specific tasks of the project include:

Developing a configuration scheme for New Project templates
Provide a way for a user to import/export a template for New Projects
Develop a configuration scheme for Reproducibility Checklist questions
Provide a way for a user to import/export a template for the Reproducibility Checklist
Develop a configuration scheme for asset (file) attributes
Develop unit tests and conduct system testing

Network Simulation Bridge • Enabling Interactive Network Models

Wed, 28 Jan 2026 00:00:00 +0000

The Network Simulation Bridge – NSB – is a network co-simulation framework that bridges together applications and network simulators. It enables students, researchers, and developers to prototype their applications and systems on simulated networks. It consists of a message server and client endpoint interfaces which together form a bridge, routing application message payloads through the network simulator. NSB is designed to be extensible through modular interfaces that serve to allow users to contribute new features and modules that suit evolving and emerging use cases. NSB is developed to be application-, network simulator-, and platform-agnostic so that users and developers are empowered to integrate any application front-end with any network simulator back-end, providing versatility and flexibility when used alongside other tools in larger systems and applications.

NSB was created in-house by the Inter-Networking Research Group and is now being developed into a more full-featured open-source tool and ecosystem in partnership with the UCSC OSPO and as part of the NSF Pathways to Enable Open-Source Ecosystems program. In this transition to a more polished and feature-rich product, the next phase of NSB development will involve the engineering of new quality-of-life features, testing and iteration of the core tool itself, and user-centric refinement via implementation in interdisciplinary system models.

Develop a User-Centric Website for NSB

Topics: Web Development Dynamic Updates UX
Skills: web development experience, good communicator, (HTML/CSS), (Javascript)
Difficulty: Moderate
Size: Large
Mentors: Harikrishna Kuttivelil

Develop a clean and welcoming landing page and website for the project. The organization needs to reflect the needs of both users and potential project contributors. This website will be the first impression for people new to the project and should

Specific tasks:

Work with mentors on understanding the context of the project and the expected needs of the users.
Port relevant documentation and tutorials from the repository page, ensuring updates in the repository are reflected in the website.
Study existing open source product websites and draw insights to include in our own design.
Design the structure of the website according to best OS, visual design, and accessibility design practices.
Include visual content that showcases NSB integration and testimonials (if applicable).

Improve the User Experience of NSB

Topics: Software Engineering User-Centric Development Visualization UI/UX Documentation
Skills: package management, toolchain implementation, process automation, technical writing, (visualization), (bash), (Python), (C++)
Difficulty: Moderate
Size: Medium
Mentors: Harikrishna Kuttivelil

Our goal has always been to keep NSB streamlined and out of the way of the users and developers. In line with that, we want our tool to be easily available and installable, and we want the experience of using it to feel minimal and non-intrusive while providing sufficient observability of NSB’s internals for those who want it.

Specific tasks:

Work with mentors and potential users on identifying aspects of the user experience that can refined for better quality-of-life experiences.
Verify and iterate on existing software packaging methods for NSB to ensure that tool setup is stress-free.
Refine and update existing documentation and tutorials to reflect improvements in the setup, installation, and usage processes.
Work with mentors and other contributors to work backwards from what the user wants to see to design the user interface.
Work with other contributors (see below) to develop a Network-in-a-Box experience with NSB.

Create a Network-in-a-Box Experience with NSB

Topics: Software Engineering, Simulation, System Modeling, System Design, Visualization, UI/UX
Skills: software integration and interfacing, toolchain implementation, process automation, C++, (visualization), (LLM-enabled code generation), (technical writing)
Difficulty: Challenging
Size: Large
Mentors: Harikrishna Kuttivelil

NSB was originally designed for networking graduate students to interface with application-layer programs. But since then, there’s been more of an appetite for a simpler network-in-a-box approach that would allow users to quickly deploy baseline or generated network simulations that are ready for use with NSB.

Specific tasks:

Learn how to use one of the major open-source network simulators (ns3 or OMNeT++).
Work with mentors in designing a simpler, minimal user experience of operating NSB.
Develop tools to automatically create network simulations given input parameters (type of network, number of nodes, description of infrastructure).
Create documentation aimed at new users.
Implement or embed network visualizations to enrich the user experience.

Implement Networked System Models to Evaluate Quality of NSB

Topics: System Modeling Simulation System Design Software Development Product Testing
Skills: software integration, good communication, qualitative research, (proficiency in Python and/or C++), (processing scientific and technical literature)
Difficulty: Challenging
Size: Large
Mentors: Harikrishna Kuttivelil

NSB is a relatively new tool and has not been extensively tested outside of the core contributors, who know a bit too much about the tool. We need to better understand what external user and contributor experience will be like, and the best way to do that is to start developing with NSB to build models of connected systems, i.e., sensor networks, smart homes, smart farms, etc.

Specific tasks:

Research academic literature and relevant works to identify relevant distributed applications to model.
Work with mentors and collaborators to plan implementation of selected system models.
Track and report issues and concerns in quality-of-life experiences, critical errors, or difficulties.
Work with mentors and contributors to address issues and concerns.
Refine and update existing documentation and tutorials to reflect improvements in the setup, installation, and usage processes.
Work with other contributors (see below) in reviewing and cross-referencing model implementations.

Model Autonomous Vehicle Networks to Drive New Feature Development in NSB

Topics: System Modeling Simulation System Design Software Development
Skills: requirement-based software design, message parsing interfaces, server-client communication, (proficiency in Python and/or C++), (processing scientific and technical literature)
Difficulty: Challenging
Size: Large
Mentors: Harikrishna Kuttivelil

NSB today serves its named purpose – message relaying. However, modeling complex systems can sometimes involving synchronizing other simulation features, like mobility when dealing with vehivle networks. Implementing a generic layer of being able to synchronize user-defined features across endpoints would be a powerful, enabling feature in NSB. In the process, we may also uncover opportunities for improving the NSB developer experience.

Specific tasks:

Research academic literature and relevant works to identify and design potential autonomous vehicle network models.
Work with mentors and collaborators to iterate on system designs to ensure it serves the purpose of furthering NSB development.
Help mentors design and develop the new feature synchronization feature in NSB, driven by the autonomous vehicle system model.
Develop and iterate feature synchronization, using mobility as the synchronized feature.
Create documentation and tutorials to serve as resources for future users, contributors, and developers.
Work with other contributors (see above) in reviewing and cross-referencing model implementations.

Peersky Browser

Mon, 26 Jan 2026 12:00:00 -0800

Peersky Browser is an experimental personal gatekeeper to a new way of accessing web content. In a world where a handful of big companies control most of the internet, Peersky leverages distributed web technologies—IPFS, Hypercore, and BitTorrent return control to the users. With integrated local P2P applications, Peersky offers a fresh, community-driven approach to browsing.

Implement P2P Extension Store

Topics: Browser Extensions, P2P, Electron, IPFS, Hypercore
Skills: JavaScript, Electron.js, HTML/CSS, P2P
Difficulty: Moderate
Size: Medium (175 hours)
Mentors: Akhilesh Thite

Build a decentralized extension distribution flow that archives WebExtensions into a predictable P2P-friendly layout and installs directly from P2P URLs.

Tasks:

Define the P2P extension layout:
- Standardize /extensions/{name}/{version}/extension.zip and /extensions/{name}/index.json.
Design install compatibility for P2P URLs:
- Support peersky://extensions/... and P2P links from IPFS or Hypercore.
Archive Chrome Web Store extensions to P2P:
- Use chrome-extension-fetch to fetch CRX, convert to ZIP, and store it in the layout.
- Update index.json with metadata like version, P2P_URL, and fetchedAt.
- Publish the folder to IPFS or Hypercore and feed the link into the install flow.
Add settings and trust model:
- Add a “Load from P2P” settings toggle.
- Support curated extension hoards (index.json) and automated updates.
- Clarify integrity assumptions and sandboxing expectations.

More details in the issue: https://github.com/p2plabsxyz/peersky-browser/issues/42

Backup & Restore System (P2P JSON + Tabs Restore)

Topics: P2P, Backup, Session Restore, Electron, Onboarding
Skills: JavaScript, Electron.js, HTML/CSS, P2P
Difficulty: Moderate
Size: Medium (175 hours)
Mentors: Akhilesh Thite

Implement a backup and restore pipeline for Peersky’s P2P app data and session state, including an onboarding import flow for tabs from other browsers.

Tasks:

Generate a P2P backup bundle:
- Create a single .zip that contains lastOpened.json, tabs.json, ensCache.json, and the ipfs/ and hyper/ directories.
- Add an option to generate a CID for the backup zip for instant sharing.
Restore from settings:
- Upload a P2P backup zip file.
- Load a backup from an IPFS or Hyper CID.
- Import Chrome/Firefox tab exports produced by a helper extension.
Define the helper extension export format:
- Create a small extension under p2plabsxyz to export windows and tabs (URLs, titles, window grouping, active tab indexes).
- Ensure the export format is compatible with Peersky’s import pipeline.
Add onboarding import flow:
- Show onboarding.html on first launch and prompt “Import tabs from another browser?”.
- Guide users to install the helper extension and import the generated file.
Align with existing persistence:
- Reuse lastOpened.json, tabs.json, and peersky-browser-tabs localStorage for restores.

More details in the issue: https://github.com/p2plabsxyz/peersky-browser/issues/60

Scenic: A Language for Design and Verification of Autonomous Cyber-Physical Systems

Sat, 24 Jan 2026 00:00:00 +0000

Scenic is a probabilistic programming language for the design and verification of autonomous cyber-physical systems like self-driving cars. Scenic allows users to define scenarios for testing or training their system by putting a probability distribution on the system’s environment: the positions, orientations, and other properties of objects and agents, as well as their behaviors over time. Sampling these scenarios and running them in a simulator yields synthetic data which can be used to train or test a system. Since Scenic was released open-source in 2019, our group and many others in academia have used Scenic to find, diagnose, and fix bugs in autonomous cars, aircraft, robots, and other kinds of systems. In industry, it is being used by companies including Boeing, Meta, Deutsche Bahn, and Toyota in domains spanning autonomous driving, aviation, household robotics, railways, maritime, and virtual reality.

Our long-term goal is for Scenic to become a widely-used common representation and toolkit supporting the entire design lifecycle of AI-based cyber-physical systems. Towards this end, we have many summer projects available, ranging from adding new application domains to working on the Scenic compiler and sampler:

Extensions to the Scenic driving domain
Interfacing Scenic to new simulators
Scenic distribution visualizer

See the sections below for details.

Extensions to the Scenic Driving Domain

Topics: Autonomous Driving 3D modeling
Skills: Python; basic vector geometry
Difficulty: Moderate
Size: Medium or Large (175 or 350 hours)
Mentors: Daniel Fremont, Eric Vin

Scenic scenarios written to test autonomous vehicles use the driving domain, a Scenic library defining driving-specific concepts including cars, pedestrians, roads, lanes, and intersections. The library extracts information about road networks, such as the shapes of lanes, from files in the standard OpenDRIVE format.

There are several potential goals of this project, including:

Supporting importing complex object information from simulators like CARLA.
Extending the domain to incorporate additional metadata, such as highway entrances and exits.
Fixing various bugs and limitations that exist in the driving domain (e.g. Issue #274 and Issue #295).

Interfacing Scenic to New Simulators

Topics: Simulation Autonomous Driving
Skills: Python
Difficulty: Moderate
Size: Medium or Large (175 or 350 hours)
Mentors: Daniel Fremont, Eric Vin

Scenic is designed to be easily-interfaced to new simulators. Depending on student interest, we could pick a simulator which would open up new kinds of applications for Scenic and write an interface for it. Some possibilities include:

The AWSIM driving simulator (to allow testing the Autoware open-source autonomous driving software stack)
The CarMaker driving simulator

The goal of the project would be to create an interface between Scenic and the new simulator and write scenarios demonstrating it. If time allows, we could do a case study on a realistic system for publication at an academic conference.

Tool to Visualize Scenario Distributions

Topics: Visualization
Skills: Python; basic visualization and graphics
Difficulty: Moderate
Size: Medium or Large (175 or 350 hours)
Mentors: Daniel Fremont, Eric Vin

A Scenic scenario represents a distribution over scenes, but it can be difficult to interpret what exactly this distribution represents. Being able to visualize this distribution would be helpful for understanding and reasoning about Scenarios.

The goal of this project would be to build on an existing prototype for visualizing these distributions, and to create a tool that can be used by the wider Scenic community.

CauST: Causal Gene Intervention for Robust Spatial Domain Identification

Wed, 21 Jan 2026 00:00:00 +0000

Topics: spatial transcriptomics, spatial domain identification, causal inference, gene intervention
Skills:
- Programming Languages: Python (PyTorch preferred)
- Machine Learning: causal inference, representation learning, clustering
- Data Analysis: spatial transcriptomics preprocessing and evaluation (ARI, cross-slice generalization)
- Bioinformatics Knowledge (preferred): spatial transcriptomics, scRNA-seq, gene perturbation analysis
Difficulty: Advanced
Size: Large (350 hours)
Mentors: Lijinghua Zhang (contact person)

Project Idea Description

Spatial domain identification is a core task in spatial transcriptomics (ST), aiming to segment tissue sections into biologically meaningful regions based on spatially resolved gene expression profiles. These spatial domains often correspond to anatomical layers, functional niches, or microenvironmental states, and are widely used as the basis for downstream biological interpretation.

Despite strong empirical performance, most existing spatial domain identification methods rely on purely correlational gene signals. Genes are selected or weighted based on association with spatial patterns, without distinguishing whether they causally drive domain formation or merely reflect downstream or confounded effects. As a result, current models often suffer from limited robustness and poor generalization across tissue sections or donors.

Problem: Correlation-Driven Gene Usage and Limited Generalization

In standard pipelines, gene expression features are typically used wholesale or filtered using heuristic criteria (e.g., highly variable genes). However, many genes that are strongly correlated with spatial domains are not causally responsible for domain structure. Including such non-causal or confounded genes can:

Reduce robustness across slices and donors
Obscure true domain-driving biological signals
Limit interpretability of spatial domain assignments

Empirically, domain identification performance often degrades substantially in cross-slice or cross-donor evaluation settings, underscoring the need for causally informed feature selection.

Proposed Solution: CauST

This project proposes CauST, a Causal Gene Intervention framework for robust spatial domain identification.

CauST aims to identify domain-driving genes by estimating their causal influence on spatial domain assignments via in-silico gene interventions. Instead of relying on observational correlations, CauST approximates counterfactual gene knockouts by perturbing individual gene expressions while controlling for confounding factors.

In addition, CauST leverages cross-slice invariance as a practical criterion for causal gene discovery, prioritizing genes whose effects on spatial domain identification remain stable across tissue sections and donors.

By filtering or reweighting genes based on estimated causal influence, CauST improves the robustness, generalizability, and interpretability of spatial domain identification models.

Project Objectives

Causal Gene Effect Estimation
- Design in-silico intervention strategies to estimate gene-level causal effects on spatial domain assignments.
Invariant Effect Analysis
- Identify genes with stable effects across tissue sections or donors.
Causal Gene Filtering
- Develop filtering or reweighting schemes based on estimated causal influence.
Integration with Existing Methods
- Integrate CauST into state-of-the-art spatial domain identification pipelines.
Evaluation and Validation
- Benchmark robustness, cross-slice generalization, and interpretability on public spatial transcriptomics datasets.

Project Deliverables

CauST Framework Implementation
- Open-source Python implementation compatible with common spatial transcriptomics toolchains.
Causal Gene Benchmarks
- Quantitative evaluation of causal gene filtering and its impact on domain identification.
Visualization Tools
- Tools for visualizing gene interventions, causal scores, and spatial effects.
Documentation and Tutorials
- Clear examples enabling adoption of CauST by the broader community.

Impact

CauST introduces a causally grounded perspective to spatial domain identification by explicitly modeling gene-level interventions. By shifting from correlation-driven gene usage to causal gene selection, this project improves robustness, generalizability, and biological interpretability in spatial transcriptomics analysis. CauST has the potential to serve as a foundational framework for integrating causal reasoning into spatial omics representation learning.

Agent4Target: An Agent-based Evidence Aggregation Toolkit for Therapeutic Target Identification

Tue, 20 Jan 2026 00:00:00 +0000

Topics: therapeutic target identification, drug discovery, evidence aggregation, AI agents, biomedical knowledge integration
Skills:
- Programming Languages: Python; experience with modern ML tooling preferred
- Machine Learning / AI: agent-based systems, workflow orchestration, weak supervision (basic), representation learning
- Software Engineering: modular system design, APIs, CLI tools, documentation
- Biomedical Knowledge (preferred): familiarity with drug–target databases (e.g., PHAROS, DepMap, Open Targets)
Difficulty: Advanced
Size: Large (350 hours)
Mentors: Ziheng Duan (contact person)

Project Idea Description

Identifying and prioritizing high-quality therapeutic targets is a foundational yet challenging task in drug discovery. Modern target identification relies on aggregating heterogeneous evidence from multiple sources, including genetic perturbation screens, disease associations, chemical biology, and biomedical literature. These evidence sources are highly fragmented, noisy, and heterogeneous in both format and reliability.

While large language models and AI agents have recently shown promise in automating scientific workflows, many existing approaches focus on end-to-end prediction or conversational interfaces. Such systems are often difficult to reproduce, extend, or integrate into existing research pipelines, limiting their practical adoption by the biomedical community.

This project proposes Agent4Target, an agent-based evidence aggregation toolkit that reframes therapeutic target identification as a structured, modular workflow. Instead of using agents for free-form reasoning, Agent4Target employs agents as orchestrated components that systematically collect, normalize, score, and explain evidence supporting candidate therapeutic targets.

The goal is to deliver a reusable, open-source toolchain that can be integrated into diverse drug discovery workflows, independent of any single downstream prediction model or publication.

Key Idea and Technical Approach

Agent4Target models target identification as a multi-stage, agent-driven pipeline, coordinated by a central orchestrator:

Evidence Collector Agents
Specialized agents retrieve target-level evidence from heterogeneous sources, such as:
- Genetic perturbation and dependency data (e.g., DepMap)
- Target annotation and development status (e.g., PHAROS)
- Disease association scores (e.g., Open Targets)
- Automatically summarized literature evidence
Normalization & Scoring Agent
Collected evidence is converted into a unified, structured schema using typed data models (e.g., JSON / Pydantic).
This agent performs:
- Evidence normalization across sources
- Confidence-aware scoring and aggregation
- Optional weighting or calibration strategies
Explanation Agent
Rather than free-text generation, this agent produces structured explanations that explicitly link scores to supporting evidence, enabling transparency and interpretability for downstream users.
Workflow Orchestrator
A lightweight orchestration layer (e.g., LangGraph or a state-machine-based controller) manages agent execution, dependencies, and failure handling, ensuring reproducibility and extensibility.

This modular design allows individual agents to be replaced, extended, or reused without altering the overall system.

Project Objectives

Design a Modular Agent-based Architecture
- Define clear interfaces for evidence collection, normalization, scoring, and explanation agents.
Implement a Standardized Evidence Schema
- Develop a unified data model for heterogeneous target-level evidence.
Build a Reproducible Orchestration Framework
- Implement a deterministic, inspectable workflow for agent coordination.
Deliver a Community-Ready Toolkit
- Provide CLI tools, example notebooks, and clear documentation to support adoption.
Benchmark and Case Studies
- Demonstrate the toolkit on representative target identification scenarios using public datasets.

Project Deliverables

Open-Source Agent4Target Codebase
- A well-documented Python package with modular agent components.
Command-Line Interface (CLI)
- Tools for running end-to-end evidence aggregation pipelines.
Standardized Output Schema
- Machine-readable evidence summaries suitable for downstream modeling.
Example Notebooks and Benchmarks
- Demonstrations of usage and performance on real-world target identification tasks.
Documentation
- Installation guides, extension tutorials, and developer documentation.

Impact

Agent4Target provides a practical bridge between AI agents and real-world drug discovery workflows. By emphasizing structured evidence aggregation, reproducibility, and interpretability, this project enables researchers to systematically reason about therapeutic targets rather than relying on opaque, end-to-end models. The resulting toolkit can serve as a foundation for future work in AI-assisted drug discovery, weak supervision, and biomedical knowledge integration.

HistoMoE: A Histology-Guided Mixture-of-Experts Framework for Gene Expression Prediction

Tue, 20 Jan 2026 00:00:00 +0000

Topics: computational pathology, spatial transcriptomics, gene expression prediction, mixture-of-experts, multimodal learning
Skills:
- Programming Languages: Python; experience with PyTorch preferred
- Machine Learning: CNNs / vision encoders, mixture-of-experts, multimodal representation learning
- Data Analysis: handling large-scale histology image patches and gene expression matrices
- Bioinformatics Knowledge (preferred): familiarity with spatial transcriptomics or scRNA-seq data
Difficulty: Advanced
Size: Large (350 hours)
Mentors: Ziheng Duan (contact person)

Project Idea Description

Histology imaging is one of the most widely available data modalities in biomedical research and clinical practice, capturing rich morphological information about tissues and disease states. In parallel, spatial transcriptomics (ST) technologies provide spatially resolved gene expression measurements, enabling unprecedented insights into tissue organization and cellular heterogeneity. However, the high cost and limited accessibility of ST experiments remain a major barrier to their widespread adoption.

Predicting gene expression directly from histology images offers a promising alternative, enabling molecular-level inference from routinely collected pathology data. Existing approaches typically rely on a single global model that maps image embeddings to gene expression profiles. While effective to some extent, these models struggle to capture the strong organ-, tissue-, and cancer-specific heterogeneity that underlies gene expression patterns.

This project proposes HistoMoE, a histology-guided mixture-of-experts (MoE) framework that explicitly models biological heterogeneity by learning specialized expert models for different cancer types or organs, and dynamically routing histology image patches to the most relevant experts.

Key Idea and Technical Approach

As illustrated in the figure above, HistoMoE integrates multiple data modalities and learning components:

Vision Encoder
Histology image patches are encoded into high-dimensional visual representations using a convolutional or transformer-based vision backbone.
Text / Metadata Encoder
Sample-level metadata (e.g., tissue type, organ, disease context) is encoded using a lightweight text or embedding model.
Gating Network
A gating network jointly considers image and metadata embeddings to infer routing weights over multiple cancer- or organ-specific expert models.
Expert Models
Each expert specializes in modeling gene expression patterns for a specific biological context (e.g., CCRCC, COAD, LUAD), producing patch-level gene expression predictions.

By explicitly modeling biological structure through expert specialization, HistoMoE aims to improve both prediction accuracy and interpretability, allowing researchers to understand which biological experts drive each prediction.

Project Objectives

Design and Implement the HistoMoE Framework
- Build a modular MoE architecture with pluggable vision encoders, gating networks, and expert models.
Multimodal Routing and Expert Specialization
- Explore how image features and metadata jointly inform expert selection.
Benchmarking and Evaluation
- Compare HistoMoE against single-model baselines on multiple cancer and organ-specific spatial transcriptomics datasets.
Interpretability Analysis
- Analyze expert routing behavior to reveal biologically meaningful patterns.

Project Deliverables

Open-Source HistoMoE Codebase
- Well-documented Python implementation with training, evaluation, and visualization tools.
Benchmark Results
- Quantitative comparisons demonstrating improvements over non-expert baselines.
Visualization and Analysis Tools
- Tools for inspecting expert usage, routing weights, and gene-level predictions.
Documentation and Tutorials
- Clear instructions and examples to enable adoption by the research community.

Impact

HistoMoE introduces an expert-system perspective to histology-based gene expression prediction, bridging morphological and molecular representations through biologically informed specialization. By combining multimodal learning with mixture-of-experts modeling, this project advances the interpretability and accuracy of computational pathology methods and contributes toward scalable, cost-effective alternatives to spatial transcriptomics experiments.

StaR: A Stability-Aware Representation Learning Framework for Spatial Domain Identification

Tue, 20 Jan 2026 00:00:00 +0000

Topics: spatial transcriptomics, spatial domain identification, representation learning, model robustness
Skills:
- Programming Languages: Python; PyTorch experience preferred
- Machine Learning: representation learning, clustering, robustness and stability analysis
- Data Analysis: spatial transcriptomics preprocessing and evaluation (ARI, clustering metrics)
- Bioinformatics Knowledge (preferred): familiarity with spatial transcriptomics or scRNA-seq data
Difficulty: Advanced
Size: Large (350 hours)
Mentors: Ziheng Duan (contact person)

Project Idea Description

Spatial domain identification is a fundamental task in spatial transcriptomics (ST), aiming to partition tissue sections into biologically meaningful regions based on spatially resolved gene expression profiles. These spatial domains often correspond to distinct anatomical structures, cellular compositions, or functional microenvironments, and serve as a critical foundation for downstream biological analysis.

Despite rapid methodological progress, most existing spatial domain identification methods are highly sensitive to random initialization. In practice, simply changing the random seed can lead to substantially different clustering results and large performance fluctuations, even when using identical hyperparameters and datasets. This instability severely undermines the reliability, reproducibility, and interpretability of spatial transcriptomics analyses.

Problem: Seed Sensitivity and Unstable Representations

Empirical evidence shows that state-of-the-art spatial domain identification models can exhibit substantial performance variance across random seeds. For example, the Adjusted Rand Index (ARI) may vary from relatively strong performance (e.g., ARI ≈ 0.65) to noticeably degraded yet still reasonable outcomes (e.g., ARI ≈ 0.50) solely due to different random initializations.

By systematically evaluating models across hundreds to thousands of random seeds, we observe that:

Model performance landscapes are highly rugged, with sharp cliffs and isolated high-performing regions.
Standard training objectives implicitly favor brittle representations that are not robust to small perturbations in initialization or optimization trajectories.

These observations suggest that instability is not a peripheral issue, but rather a structural limitation of current representation learning approaches for spatial transcriptomics.

Proposed Solution: StaR

This project proposes StaR, a Stability-Aware Representation Learning framework designed to explicitly address seed sensitivity in spatial domain identification.

The core idea of StaR is to learn representations that are robust to perturbations in model parameters and training dynamics, rather than optimizing solely for peak performance under a single random seed. Concretely, StaR introduces controlled noise or perturbations into the training process and encourages consistency across multiple perturbed model instances, guiding the model toward flatter and more stable regions of the parameter space.

By prioritizing stability during representation learning, StaR aims to produce embeddings that:

Yield consistent spatial domain assignments across random seeds
Maintain competitive or improved clustering accuracy
Better reflect underlying biological structure

Project Objectives

Characterize Instability in Existing Methods
- Systematically quantify seed sensitivity across popular spatial domain identification models.
Develop Stability-Aware Training Objectives
- Design perturbation-based or consistency-driven losses that encourage robust representations.
Integrate StaR into Existing Pipelines
- Apply StaR to widely used spatial transcriptomics workflows with minimal architectural changes.
Evaluation and Benchmarking
- Evaluate StaR using clustering metrics (e.g., ARI) and stability metrics across multiple datasets and random seeds.
Biological Validation
- Assess whether stability-aware representations preserve biologically meaningful spatial patterns.

Project Deliverables

StaR Framework Implementation
- An open-source Python implementation compatible with common spatial transcriptomics toolchains.
Stability Benchmarks
- Comprehensive evaluations demonstrating reduced performance variance across seeds.
Visualization Tools
- Tools for visualizing performance landscapes, stability surfaces, and spatial domain consistency.
Documentation and Tutorials
- Clear examples enabling researchers to adopt StaR in their own analyses.

Impact

StaR addresses a critical yet underexplored challenge in spatial transcriptomics: model instability and poor reproducibility. By shifting the focus from single-run performance to stability-aware representation learning, this project improves the reliability and trustworthiness of spatial domain identification methods. StaR has the potential to become a foundational component in robust spatial transcriptomics pipelines and to inspire broader adoption of stability-aware principles in biological representation learning.

MedJEPA: Self-Supervised Medical Image Representation Learning with JEPA

Mon, 19 Jan 2026 10:15:56 -0700

Project Description

[MedJEPA] Medical image analysis is fundamental to modern healthcare, enabling disease diagnosis, treatment planning, and patient monitoring across diverse clinical applications. In radiology and pathology, deep learning models support automated detection of abnormalities, tumor segmentation, and diagnostic assistance. Medical imaging modalities including X-rays, CT scans, MRI, ultrasound, and histopathology slides generate vast amounts of unlabeled data that could benefit from self-supervised representation learning. Clinical applications include cancer detection and staging, cardiovascular disease assessment, neurological disorder diagnosis, and infectious disease screening. In drug discovery and clinical research, analyzing medical images helps evaluate treatment efficacy, predict patient outcomes, and identify biomarkers for disease progression. Telemedicine and point-of-care diagnostics benefit from AI-powered image analysis that extends expert-level interpretation to underserved regions. However, medical imaging faces unique challenges: limited labeled datasets due to expensive expert annotation, patient privacy concerns restricting data sharing, domain shift across different imaging equipment and protocols, and the need for models that generalize across hospitals and populations. Traditional medical image analysis relies heavily on supervised learning with manually annotated labels, creating bottlenecks due to the scarcity and cost of expert annotations. Existing self-supervised methods applied to medical imaging often employ complex training procedures with numerous heuristics—momentum encoders, stop-gradients, teacher-student architectures, and carefully tuned augmentation strategies—that may not translate well across different medical imaging modalities and clinical contexts. These approaches struggle with domain-specific challenges such as subtle pathological features, high-resolution images, 3D volumetric data, and the need for interpretable representations that clinicians can trust. To address these challenges, we propose MedicalJEPA: Self-Supervised Medical Image Representation Learning with Joint-Embedding Predictive Architecture, which leverages the theoretically grounded LeJEPA framework for 2D medical images and V-JEPA principles for medical video and volumetric data, creating a unified, scalable, and heuristics-free approach specifically tailored for medical imaging applications. By utilizing the principled JEPA frameworks with objectives like Sketched Isotropic Gaussian Regularization (SIGReg), MedJEPA eliminates complex training heuristics while learning clinically meaningful representations from unlabeled medical images. Unlike conventional self-supervised methods that require extensive hyperparameter tuning and may not generalize across medical imaging modalities, MedicalJEPA provides a clean, theoretically motivated framework with minimal hyperparameters that adapts to diverse medical imaging contexts—from chest X-rays to histopathology slides to cardiac MRI sequences. The learned representations can support downstream tasks including disease classification, lesion detection, organ segmentation, and survival prediction, while requiring significantly fewer labeled examples for fine-tuning. This approach democratizes access to state-of-the-art medical AI by enabling effective learning from the vast amounts of unlabeled medical imaging data available in hospital archives, addressing the annotation bottleneck that has limited progress in medical AI.

Project Objectives

Aligned with the vision of the 2026 Open Source Research Experience (OSRE), this project aims to apply Joint-Embedding Predictive Architecture (JEPA) frameworks to medical image representation learning, addressing the critical challenge of learning from limited labeled medical data. Medical imaging generates enormous amounts of unlabeled data, but supervised learning approaches are bottlenecked by the scarcity and cost of expert annotations. Existing self-supervised methods often rely on complex heuristics that don’t generalize well across diverse medical imaging modalities, equipment vendors, and clinical protocols. This project will leverage the theoretically grounded LeJEPA framework for 2D medical images (X-rays, histopathology slides, fundus images) and V-JEPA principles for temporal and volumetric medical data (cardiac MRI sequences, CT scans, surgical videos). The core challenge lies in adapting these heuristics-free, stable frameworks to medical imaging’s unique characteristics: subtle pathological features requiring fine-grained representations, high-resolution images demanding efficient processing, domain shift across hospitals and equipment, and the need for interpretable features that support clinical decision-making. The learned representations will be evaluated on diverse downstream clinical tasks including disease classification, lesion detection, organ segmentation, and prognosis prediction, with emphasis on few-shot learning scenarios that reflect real-world annotation constraints. Below is an outline of the methodologies and models that will be developed in this project.

Step 1: Medical Data Preparation: Develop data processing pipelines for diverse medical imaging modalities, implementing DICOM/NIfTI parsing, standardized preprocessing, and efficient data loading for self-supervised pre-training. Prepare 2D medical image datasets: Chest X-rays: ChestX-ray14, MIMIC-CXR, CheXpert for lung disease detection Histopathology: Camelyon16/17 (breast cancer), PCam (patch-level classification) Retinal imaging: EyePACS, APTOS (diabetic retinopathy), Messidor Dermatology: HAM10000, ISIC (skin lesion classification) Prepare 3D volumetric and temporal medical data: CT scans: LIDC-IDRI (lung nodules), Medical Segmentation Decathlon datasets MRI sequences: BraTS (brain tumors), ACDC (cardiac MRI), UK Biobank cardiac videos Medical video: Surgical procedure videos, endoscopy recordings, ultrasound sequences Implement medical imaging-specific preprocessing: intensity normalization, resolution standardization, handling of multi-channel medical images (different MRI sequences, RGB histopathology), and privacy-preserving anonymization. Design masking strategies appropriate for medical imaging: spatial masking for 2D images, volumetric masking for 3D scans, temporal masking for sequences, and anatomy-aware masking that respects organ boundaries. Create data loaders supporting high-resolution medical images, 3D volumes, and multi-modal inputs (e.g., multiple MRI sequences).
Step 2: JEPA Model Implementation for Medical Imaging: Implement LeJEPA for 2D medical images: Adapt joint-embedding predictive architecture for medical image characteristics (high resolution, subtle features, domain-specific patterns) Apply Sketched Isotropic Gaussian Regularization (SIGReg) to learn clinically meaningful embedding distributions Maintain single trade-off hyperparameter and heuristics-free training for reproducibility across medical imaging centers Support various encoder architectures: Vision Transformers for global context, ConvNets for local features, hybrid approaches Extend to V-JEPA for medical video and volumetric data: Spatiotemporal encoding for cardiac MRI sequences, surgical videos, and time-series medical imaging Temporal prediction objectives for understanding disease progression and treatment response 3D volume processing for CT and MRI scans with efficient memory management Multi-slice and multi-sequence learning for comprehensive medical imaging contexts Develop medical domain-specific enhancements: Multi-scale representation learning to capture both fine-grained pathological details and global anatomical context Interpretability mechanisms: attention visualization, feature attribution, and embedding space analysis for clinical validation Robustness to domain shift: training strategies that generalize across different scanners, protocols, and institutions Privacy-preserving training considerations compatible with medical data regulations (HIPAA, GDPR) Implement efficient training infrastructure: Support for distributed training across multiple GPUs for large medical imaging datasets Memory-efficient processing of high-resolution images and 3D volumes Checkpoint management and model versioning for clinical deployment pipelines Minimal-code implementation (≈50-100 lines) demonstrating framework simplicity
Step 3: Evaluation & Safety Validation: : Disease Classification Tasks: Multi-label chest X-ray classification: 14 pathology classes on ChestX-ray14, MIMIC-CXR Diabetic retinopathy grading: 5-class classification on EyePACS, APTOS Skin lesion classification: 7-class classification on HAM10000 Brain tumor classification: glioma grading on BraTS dataset Evaluate with linear probing, few-shot learning (5-shot, 10-shot), and full fine-tuning Lesion Detection and Segmentation: Lung nodule detection on LIDC-IDRI dataset Tumor segmentation on Medical Segmentation Decathlon tasks Polyp detection in colonoscopy videos Cardiac structure segmentation in MRI sequences Clinical Prediction Tasks: Survival prediction from histopathology slides Disease progression prediction from longitudinal imaging Treatment response assessment from pre/post imaging pairs Few-Shot and Low-Data Regime Evaluation: Systematic evaluation with 1%, 5%, 10%, 25%, 50% of labeled training data Comparison against supervised baselines and ImageNet pre-training Analysis of annotation efficiency: performance vs. number of labeled examples required

Project Deliverables

This project will deliver three components: software implementation, clinical evaluation, and practical deployment resources. The software implementing MedicalJEPA will be hosted on GitHub as an open-access repository with modular code supporting multiple medical imaging modalities (2D images, 3D volumes, videos), pre-trained model checkpoints on major medical imaging datasets (chest X-rays, histopathology, MRI), training and evaluation scripts with medical imaging-specific preprocessing pipelines, privacy-preserving training implementations compatible with clinical data regulations, and comprehensive documentation including tutorials for medical AI researchers and clinicians. The evaluation results will include benchmarks on 10+ medical imaging datasets across diverse modalities and clinical tasks, few-shot learning analysis demonstrating annotation efficiency gains, cross-institutional validation studies showing robustness to domain shift, interpretability visualizations enabling clinical validation of learned representations, and detailed comparisons against supervised baselines and existing medical self-supervised methods. .

NeuroHealth

Topics: Self-Supervised Medical Image Representation Learning with JEPA
Skills: Proficiency in Python, Pytorch, Github, JEPA
Difficulty: Difficult
Size: Large (350 hours)
Mentor: Bin Dong, Linsey Pang

References:

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics - Randall Balestriero and Yann LeCun, arXiv 2024
Revisiting Feature Prediction for Learning Visual Representations from Video (V-JEPA) - Adrien Bardes et al., arXiv 2024
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture - Mahmoud Assran et al., CVPR 2023 (I-JEPA)
ChestX-ray14: Hospital-Scale Chest X-Ray Database - https://nihcc.app.box.com/v/ChestXray-NIHCC
Medical Segmentation Decathlon - http://medicaldecathlon.com/
MIMIC-CXR Database - https://physionet.org/content/mimic-cxr/
The Cancer Imaging Archive (TCIA) - https://www.cancerimagingarchive.net/
UK Biobank Imaging Study - https://www.ukbiobank.ac.uk/enable-your-research/about-our-data/imaging-data

NeuroHealth: AI-Powered Health Assistant

Mon, 19 Jan 2026 10:15:56 -0700

Project Description

[NeuroHealth] Intelligent health assistance systems are increasingly essential for improving healthcare accessibility, patient engagement, and clinical decision support. In primary care and preventive medicine, AI assistants help users understand symptoms, schedule appropriate appointments, and receive preliminary health guidance. Telemedicine applications include triage support, appointment scheduling optimization, and patient education based on health inquiries. In chronic disease management, these systems provide medication reminders, lifestyle recommendations, and timely alerts for medical follow-ups. Healthcare navigation applications include finding appropriate specialists, understanding treatment options, and coordinating care across multiple providers. In wellness and preventive care, intelligent assistants enhance health literacy by delivering personalized health information, screening recommendations, and proactive health management strategies. By leveraging natural language understanding and medical knowledge integration, these systems enhance healthcare access, reduce unnecessary emergency visits, and empower users to make informed health decisions across diverse populations. Traditional health information systems often provide generic responses that fail to account for individual health contexts, medical history, and personal circumstances. Existing symptom checkers and health chatbots primarily rely on rule-based logic or simple decision trees, limiting their ability to understand nuanced health inquiries, reason about complex symptom patterns, or provide contextually appropriate guidance. These systems struggle with interpreting ambiguous descriptions, adapting to users’ health literacy levels, and generating personalized recommendations that account for individual medical constraints and preferences. To address these challenges, we propose NeuroHealth: AI-Powered Health Assistant, which leverages Large Language Models (LLMs) to create an intelligent conversational agent that synthesizes user health inquiries, symptom descriptions, and contextual information into actionable, personalized health guidance and appointment recommendations. By integrating LLM-based medical reasoning with structured clinical knowledge bases, NeuroHealth enhances symptom interpretation, appointment routing, and health education delivery. Unlike conventional systems that provide static responses from predetermined templates, NeuroHealth dynamically understands user intent, asks clarifying questions, assesses urgency levels, and generates appropriate recommendations—whether scheduling a doctor appointment, suggesting self-care measures, or directing users to emergency services. This fusion of LLM intelligence with validated medical knowledge enables a more accessible, adaptive, and helpful health assistance platform, bridging the gap between users seeking health information and appropriate medical care.

Project Objectives

Aligned with the vision of the 2026 Open Source Research Experience (OSRE), this project aims to develop an AI-Powered Health Assistant (NeuroHealth) to improve healthcare accessibility and patient engagement through intelligent conversational guidance. Healthcare systems face significant challenges in providing timely, personalized health information and connecting patients with appropriate care resources. Traditional symptom checkers and health information systems often deliver generic, rule-based responses that fail to account for individual contexts and struggle with natural language understanding. To address these limitations, this project will leverage Large Language Models (LLMs) to create an intelligent health assistant that understands user health inquiries, interprets symptom descriptions, assesses urgency, and provides personalized recommendations including doctor appointment suggestions, self-care guidance, and healthcare navigation support. The core challenge lies in designing NeuroHealth as a safe, accurate, and user-friendly system capable of natural conversation, medical knowledge retrieval, and appropriate response generation while maintaining clinical safety guardrails. Unlike conventional health chatbots that follow rigid conversation flows, NeuroHealth will reason over user inputs, ask clarifying questions, and dynamically adapt responses based on context, resulting in more helpful, accurate, and appropriate health assistance. Below is an outline of the methodologies and models that will be developed in this project.

Step 1: Data Collection & Knowledge Base Construction: Develop a comprehensive medical knowledge base integrating validated health information sources, symptom databases, condition descriptions, and appointment routing guidelines. Collect and curate conversational health inquiry datasets from public medical Q&A forums, symptom checker logs, and healthcare chatbot interactions to create training and evaluation data. Design structured representations for symptoms, conditions, urgency levels, and appointment recommendations to enable effective retrieval and reasoning. Extract common health inquiry patterns, symptom descriptions, and user intent categories to inform conversation flow design. Data sources can include public medical knowledge bases such as MedlinePlus, Mayo Clinic health information, clinical practice guidelines, and synthetic patient inquiry scenarios based on common healthcare use cases. Implement data validation mechanisms to ensure medical accuracy and clinical safety compliance.
Step 2: Model Development: Design and implement an LLM-based conversational health assistant that integrates medical knowledge retrieval with natural language understanding and generation. Develop a Retrieval-Augmented Generation (RAG) architecture that grounds LLM responses in validated medical information sources, reducing hallucination risks and ensuring factual accuracy. Create prompt engineering strategies and reasoning frameworks that enable the system to: interpret symptom descriptions, assess urgency levels, ask appropriate clarifying questions, and generate personalized health guidance. Implement a multi-component architecture including: intent recognition, symptom extraction, urgency assessment, appointment recommendation generation, and response formatting modules. Develop clinical safety guardrails that detect high-risk scenarios requiring immediate medical attention and provide appropriate emergency guidance. Design conversation management strategies that maintain context across multi-turn dialogues and adapt to users’ health literacy levels. The baseline architecture can leverage state-of-the-art models such as GPT-4, Claude, or open-source alternatives like Llama, Qwen, combined with medical knowledge retrieval systems.
Step 3: Evaluation & Safety Validation: : Benchmark NeuroHealth against existing symptom checkers and health chatbots, evaluating on metrics including response accuracy, appropriateness of appointment recommendations, urgency assessment precision, and user satisfaction. Conduct human evaluation studies with healthcare professionals to assess clinical safety, response quality, and appropriateness of medical guidance. Perform adversarial testing to identify potential failure modes, unsafe responses, or inappropriate recommendations under edge cases. Conduct ablation studies to analyze the impact of retrieval-augmented generation, safety guardrails, and conversation management strategies on system performance. Evaluate system performance across diverse health inquiry types including acute symptoms, chronic condition management, preventive care questions, and healthcare navigation requests. Assess response quality across different user demographics and health literacy levels to ensure equitable access. Optimize inference efficiency and response latency for real-time conversational interaction across web and mobile platforms.

Project Deliverables

This project will deliver three components: model development, evaluation and validation, and interactive demonstration. The software implementing the NeuroHealth system will be hosted on GitHub as an open-access repository with comprehensive documentation, deployment guides, and API specifications. The evaluation results, including benchmark comparisons against existing systems, clinical safety assessments, and user study findings, will be published alongside the GitHub repository. An interactive demo showcasing the conversational interface, symptom interpretation capabilities, and appointment recommendation generation will be provided to illustrate real-world application scenarios.

NeuroHealth

Topics: AI-Powered Health Assistant
Skills: Proficiency in Python, Github, LLM
Difficulty: Difficult
Size: Large (350 hours)
Mentor: Linsey Pang, Bin Dong

References:

Large Language Models in Healthcare - Singhal et al., Nature 2023
Med-PaLM: Large Language Models for Medical Question Answering - Singhal et al., arXiv 2022
Capabilities of GPT-4 on Medical Challenge Problems - Nori et al., arXiv 2023
MedlinePlus Medical Encyclopedia - https://medlineplus.gov/
Clinical Practice Guidelines Database - https://www.guidelines.gov/

LMS Toolkit

Tue, 13 Jan 2026 13:00:00 -0800

The EduLinq LMS Toolkit is a suite of tools used by several courses at UCSC to interact with LMS’s (e.g. Canvas) from the command line or Python. A Learning Management System (LMS) is a system that institutions use to manage courses, assignments, students, and grades. The most popular LMSs are Canvas, Blackboard, Moodle, and Brightspace. These tools can be very helpful, especially from an administrative standpoint, but can be hard to interact with. They can be especially difficult when instructors and TAs want to do something that is not explicitly supported by their built-in GUIs (e.g., when an instructor wants to use a special grading policy). The LMS Toolkit project is an effort to create a single suite of command-line tools (along with a Python interface) to connect to all the above mentioned LMSs in a simple and uniform way. So, not only can instructors and TAs easily access the modify the data held in an LMS (like a student’s grades), but they can also do it the same way on any LMS. The LINQS Lab has made many contributions to the maintain and improve the Quiz Composer.

Currently, the LMS Toolkit supports Canvas, Moodle, and Blackboard. But, the degree of support for each LMS varies.

All students interested in LINQS projects for OSRE/GSoC 2026 should fill out this form. Towards the end of the application window, we will contact those who we believe to be a good fit for a LINQS project. The form will stop accepting responses once the application window closes. Do not post on any of the project repositories about OSRE/GSoC (e.g., comment on an issue that you want to tackle it as a part of OSRE/GSoC 2026). Remember, these are active repositories that were not created for OSRE/GSoC.

Advanced LMS Support

Topics: Backend Teaching Tools API
Skills: software development, backend, rest api, data munging, http request inspection, python
Difficulty: Moderate
Size: Medium or Large (175 or 350 hours)
Mentors: Eriq Augustine, Batuhan Salih, Lise Getoor

The LMS Toolkit already has basic read-write support for many core pieces of LMS functionality (e.g., working with grades and assignments). However, there are still many more features that can be supported such as group management, quiz management, quiz statistics, and assignment statuses.

The task for this project is to choose a set of advanced features (not limited to those features mentioned above), design an LMS-agnostic way to support those features, and implement those features. The flexibility in the features chosen to implement account for the variable size of this project.

See Also:

Repository for LMS Toolkit
GitHub Issues

New LMS Support: Brightspace

Topics: Backend Teaching Tools API
Skills: software development, backend, rest api, data munging, http request inspection, python
Difficulty: Challenging
Size: Large (350 hours)
Mentors: Eriq Augustine, Batuhan Salih, Lise Getoor

The goal of the LMS toolkit is to provide a single interface for all LMSs. D2L Brightspace is one of the more popular LMSs. Naturally, the LMS Toolkit wants to support Brightspace as well. However, a challenge in supporting Brightspace is that it is not open source (unlike Canvas and Moodle). Therefore, support and testing on Brightspace may be very challenging.

The task for this project is to add basic support for the Brightspace LMS. It is not necessary to support all the same features that are supported for other LMSs, but at least the core features of score and assignment management should be implemented. The closed-source nature of Brightspace makes this a challenging and uncertain project.

See Also:

Lynx Grader

Tue, 13 Jan 2026 13:00:00 -0800

The EduLinq Lynx Grader (also referred to as “autograder”) is an open source tool used by several courses at UCSC to safely and quickly grade programming assignments. Grading student code is something that may seem simple at first (you just need to run their code!), but quickly becomes exceeding complex as you get more into the details. Specifically, grading a student’s code securely while providing the “last mile” service of getting code from students and sending results to instructors/TAs and the course’s LMS (e.g., Canvas) can be very difficult. The Lynx Grader provides all of this in a free and open source project. The LINQS Lab has made many contributions to the maintain and improve the Lynx Grader.

As an open source project, there are endless opportunities for development, improvements, and collaboration. Here, we highlight some specific projects that will work well in the summer mentorship setting.

LLM Detection

Topics: AI/ML LLM Research Backend
Skills: software development, backend, systems, data munging, go, docker
Difficulty: Challenging
Size: Large (350 hours)
Mentors: Eriq Augustine, Fabrice Kurmann, Lise Getoor

As Large Language Model (LLM) tools like ChatGPT become more common and powerful, instructors need tools to help determine if students are the actual authors of the code they submit. More classical instances of plagiarism are often discovered by code similarity tools like MOSS. However these tools are not sufficient for detecting code written not by a student, but by an AI model like ChatGPT or GitHub Copilot.

The task for this project is to create a system that provides a score indicating the system’s confidence that a given piece of code was written by an AI tool and not a student. This will supplement the existing code analysis tools in the Lynx Grader. There are many approaches to completing this task that will be considered. A more software development approach can consist of levering exiting systems to create a production-ready system, whereas a more research approach can consist of creating a novel approach complete with a paper and experiments.

There has been previous work on this issue, where a student did a survey of existing solutions, collection of initial datasets, and exploratory experiments on possible directions. This project would build off of this previous work.

See Also:

Code Analysis GUI

Topics: Frontend
Skills: software development, frontend, data munging, js, css, go
Difficulty: Easy
Size: Medium or Large (175 or 350 hours)
Mentors: Eriq Augustine, Fabrice Kurmann, Lise Getoor

The Lynx Grader has existing functionality to analyze the code in a student’s submission for malicious content. Relevant to this project is that the Lynx Grader can run a pairwise similarity analysis against all submitted code. This is how most existing software plagiarism systems detect offending code. The existing infrastructure provides detailed statistics on code similarity, but does not currently have a visual way to display this data.

The task for this project is to create a web GUI using the Lynx Grader REST API to display the results of a code analysis. The size of this project depends on how many of the existing features are going to be supported by the web GUI.

See Also:

Web GUI

Topics: Frontend
Skills: software development, frontend, js, css
Difficulty: Easy
Size: Medium or Large (175 or 350 hours)
Mentors: Eriq Augustine, Fabrice Kurmann, Lise Getoor

The Lynx Grader contains dozens of API endpoints, most directly representing a piece of functionality exposed to the user. All of these features are exposed in the Lynx Grader’s Python Interface. However, the Python interface is a purely command-line interface. And although command-line interface are objectively (read: subjectively) the best, a web GUI would be more accessible to a wider audience. The autograder already has a web GUI, but it does not cover all the features available in the Lynx Grader.

The task for this project is to augment the Lynx Grader’s web GUI with more features. Specifically, add support for more tools used to create and administer courses.

See Also:

Quiz Composer

Tue, 13 Jan 2026 13:00:00 -0800

The EduLinq Quiz Composer (also called the “Quiz Generator”) is a tool used by several courses at UCSC to create and maintain platform-agnostic quizzes (including exams and worksheets). Knowledge assessments like quizzes, exams, and tests are a core part of the learning process for many courses. However maintaining banks of questions, collaborating on new questions, and converting quizzes to new formats can use up a lot of time, taking time away from actually working on improving course materials. The Quiz Composer helps by providing a single text-based format that can be stored in a repository and “compiled” into many different formats including: HTML, LaTeX, PDF, Canvas, GradeScope, and QTI. The LINQS Lab has made many contributions to the maintain and improve the Quiz Composer.

Canvas Import

Topics: Backend Teaching Tools API
Skills: software development, backend, rest api, data munging, http request inspection, python
Difficulty: Moderate
Size: Medium (175 hours)
Mentors: Eriq Augustine, Lucas Ellenberger, Lise Getoor

The Quiz Composer houses quizzes and quiz questions in a simple and unambiguous format based on JSON and Markdown (specifically, the CommonMark specification). This allows the Quiz Composer to unambiguously create versions of the same quiz in many different formats. However, creating a quiz in the Quiz Composer format can be a daunting task for those not familiar with JSON or Markdown. Instead, it would be easier for people to import quizzes from another format into the Quiz Composer format, and then edit it as they see fit. Unfortunately not all other quiz formats, namely Canvas in this case, are unambiguous.

The task for this project is to implement the functionality of importing quizzes from Canvas to the standard Quiz Composer format. The unambiguous nature of Canvas quizzes makes this task non-trivial, and adds an additional element of design decisions to this task. It will be impossible to import quizzes 100% correctly, but we want to be able to get close enough that most people can import their quizzes without issue.

See Also:

Google Forms Export

Topics: Backend Teaching Tools API
Skills: software development, backend, rest api, data munging, python
Difficulty: Moderate
Size: Medium (175 hours)
Mentors: Eriq Augustine, Lucas Ellenberger, Lise Getoor

The Quiz Composer can export quizzes to many different formats, each with a varying level of interactivity and feature support. For example, quizzes can be exported to PDFs which will be printed and the students will just write down their answers to be checked in the future. Quizzes can also be exported to interactive platforms like Canvas where students can enter answers that may be automatically checked with feedback immediately provided to the student. On potential platform with functionality somewhere between the above two examples is Google Forms. “Forms” (an entity on Google Forms) can be something like a survey or (as of more recently) a quiz.

The task for this project is to add support for exporting quizzes from the Quiz Composer to Google Forms. There is a large overlap in the quiz features supported in Canvas (which the Quiz Composer already supports) and Google Forms, so most settings should be fairly straightforward. There may be some design work around deciding what features are specific to one quiz platform and what features can be abstracted to work across several platforms.

See Also:

Template Questions

Topics: Backend Teaching Tools API
Skills: software development, backend, data munging, python
Difficulty: Moderate-Challenging
Size: Large (350 hours)
Mentors: Eriq Augustine, Lucas Ellenberger, Lise Getoor

Questions in the Quiz Composer are described using JSON and Markdown files which contain the question prompt, possible answers, and the correct answer. (Of course there are many differ question types, each with different semantics and requirements.) However, a limitation of this is that each question is always the same. You can have multiple copies of a question with slightly different prompts, numbers, and answers; but you are still limited to each question being static and unchanging. It would be useful to have “template questions” that can dynamically create static questions from a template and collection of replacement data.

The task for this project is to add support for the “template questions” discussed above. Much of the high-level design work for this issue has already been completed. But there is still the implementation and low-level design decision left to do.

See Also:

Writing a blog about your OSRE 2026 project

Mon, 21 Oct 2024 00:00:00 +0000

OSRE participants are required to blog three times during their summer program. The first blog is a chance to introduce yourself and your project. The second blog occurs around the mid-point of the project and a final blog post is expected as part of you final project delverable. The organization administrator will send emails with specific dates. Instructions for the blog are indicated below. All blogs should include links to proposals, presentations, links to any deliverables/products as well as an overview of the student’s experience. Check out the student pages from previous years to get an idea of content / size.

We will also ask students and contributors to provide regular status updates which will help track your activities. The organization administrator will provide more details once the program work begins.

Making a pull request for your blog

Fork the git repository
If you haven’t already done so, add your profile using these instructions
- IMPORTANT: Under user_groups: add - 2026 Contributors (as opposed to any of the two mentor groups)
- The short bio and any other information goes below the frontmatter
Post your blog
- Add /content/report/osre26/ORGANIZATION/PROJECTNAME/DATE-USERNAME/index.md
- Add a frontmatter to index.md, using the labels below
- Blog text goes below the frontmatter
- In that same directory include a picture and call it featured.png (also supports .jpg, .jpeg)
Commit to your fork and make a pull request. Email OSRE Admins with questions.

Example frontmatter and text body

---
title: "YOUR TITLE"
subtitle: "YOUR SUBTITLE (OPTIONAL)"
summary:
authors:
 - USERNAME1
 - USERNAME2
tags: ["osre26"]
categories: []
date: YYYY-MM-DD
lastmod: YYYY-MM-DD
featured: false
draft: false

# Featured image
# To use, add an image named `featured.jpg/png` to your page's folder.
# Focal points: Smart, Center, TopLeft, Top, TopRight, Left, Right, BottomLeft, Bottom, BottomRight.
image:
 caption: ""
 focal_point: ""
 preview_only: false
---

As part of the [PROJECTNAME](/project/osre26/ORGANIZATION/PROJECTNAME) my [proposal](https://...) under the mentorship of MENTOR aims to ...