gsoc | UCSC OSPO

Scenic-RoboSuite Integration: Building the First Working Prototype

Mon, 29 Sep 2025 00:00:00 +0000

I’m Sahil, presenting the first working prototype of the Scenic-RoboSuite integration. This project is being mentored by Daniel Fremont and Eric Vin.

After months of development, we have achieved a functional prototype of the Scenic-RoboSuite interface. Researchers can now write basic declarative robotic manipulation scenarios in Scenic that execute with physics simulation in RoboSuite. While still in development, the prototype demonstrates the feasibility and potential of bridging probabilistic scenario generation with detailed robot control.

Major Achievements

MJCF XML Injection

The interface introduces direct MJCF XML support, allowing Scenic to build RoboSuite-native manipulable objects from raw XML definitions. Users can define custom objects with complex mesh geometries, textures, and physics properties directly in their Scenic scenarios:

dragon_xml = '''
<mujoco>
 <asset>
 <mesh file="dragon.stl" scale="0.01 0.01 0.01"/>
 <texture file="dragon_texture.png"/>
 </asset>
 <worldbody>
 <body name="object">
 <geom mesh="dragon_mesh" type="mesh"/>
 </body>
 </worldbody>
</mujoco>
'''

dragon = new CustomObject with mjcfXml dragon_xml

The system automatically handles collision geometry generation, joint creation for physics, and asset file resolution.

Complex Mesh Object Support

Import and manipulate arbitrary 3D models (STL, OBJ) with automatic mesh repair and texture mapping. The interface resolves file paths relative to Scenic files, copies assets to temporary directories for MuJoCo, and converts textures (JPG to PNG) when needed. This enables using custom robotic tools, industrial parts, or any 3D model in manipulation scenarios.

Custom Arena Definition

Define complete custom environments using MJCF XML, extending beyond RoboSuite’s built-in arenas:

custom_arena = new CustomArena with arenaXml localPath("warehouse.xml")

This allows creating specialized workspaces, factory floors, or research-specific environments while maintaining full physics simulation.

Multi-Robot Support

The interface handles multiple robots operating in the same workspace:

robot1 = new Panda at (-0.5, 0, 0)
robot2 = new UR5e at (0.5, 0, 0)
table = new Table at (0, 0, 0.425)

Each robot maintains independent control and can execute coordinated or individual behaviors.

Built-in Manipulation Behaviors

Ready-to-use behaviors for immediate testing and development:

MoveToPosition - Precise end-effector positioning
PickObject - Automated grasping with approach and closure
LiftToHeight - Controlled lifting to target heights
PickAndLift - Complete pick-and-place sequence

These behaviors use Operational Space Control (OSC) for intuitive 3D movement commands.

Extended Environment Configuration

The interface extends RoboSuite’s configurability through Scenic’s parameter system:

param controller_config = {'type': 'OSC_POSITION', 'impedance': 'low'}
param camera_view = 'robot0_eye_in_hand'
param lite_physics = True # Faster simulation for testing

Example: Probabilistic Pick-and-Place

model scenic.simulators.robosuite.model

# Randomly position cube on table
table = new Table at (0.6, 0, 0.425)
cube = new Box on table,
 with color (1, 0, 0, 1),
 with position (Uniform(-0.2, 0.2), Uniform(-0.2, 0.2), _)

# Robot adapts to random cube position
behavior AdaptivePickup():
 do PickAndLift(cube, height=1.1)

ego = new Panda at (0, 0, 0),
 with behavior AdaptivePickup()

Each scenario run generates a different cube position, testing the robot’s adaptive capabilities.

Challenges Overcome

Understanding Dual Architecture Paradigms

RoboSuite and Scenic operate on fundamentally different principles. RoboSuite builds environments imperatively through MuJoCo XML composition, expecting complete scene specification upfront. Scenic generates scenes probabilistically through constraint solving, requiring geometric knowledge before simulation. Bridging these required developing a two-pass system where we first extract geometry from a temporary RoboSuite environment, update Scenic’s understanding, then create the final simulation. This architectural mismatch touched every aspect of the integration, from object creation to property updates.

Discovering and Extending ManipulationEnv

RoboSuite’s documentation focuses on using pre-built tasks, not creating custom environments. Through extensive source code analysis, we discovered that ManipulationEnv was the key - it accepts robots as configuration while allowing customizable arenas and objects as components. This class became our foundation, but required significant extension. We implemented ScenicManipulationEnv to intercept Scenic’s object configurations, handle dynamic arena selection (EmptyArena vs MultiTableArena based on scene content), and manage the complex initialization sequence where robots, arenas, and objects must be assembled in specific order for MuJoCo compilation.

XML to 3D Mesh Pipeline

Converting MJCF XML to usable 3D meshes proved complex. MuJoCo uses XML to describe geometry, but Scenic needs actual mesh data for collision checking. We built a multi-stage pipeline: First, ElementTree parses the XML to extract mesh references and primitive definitions. Then, we handle two paths - for mesh files, we load STL/OBJ files with trimesh and apply XML-specified transformations; for primitives (boxes, cylinders), we generate meshes programmatically. The challenge intensified with composite objects - a table might have a box tabletop and four cylinder legs. We developed ComponentExtractor to analyze the MuJoCo scene graph, identify related geometries through naming patterns and hierarchy, and export each component as a separate GLB file with proper world transforms preserved.

File Path Resolution Discrepancies

Scenic and RoboSuite handle file paths completely differently. Scenic uses localPath() for paths relative to the scenario file, while RoboSuite expects paths relative to its package structure or absolute paths. MJCF XML compounds this - mesh references can be relative to the XML file location, not the calling code. We implemented a sophisticated path resolution system: detect whether paths come from embedded XML (relative to Scenic file) or external XML files (relative to XML location), copy all referenced assets (meshes, textures) to temporary directories accessible to MuJoCo, and handle texture format conversion (JPG to PNG) when needed. This system transparently manages assets whether they’re in the Scenic project, RoboSuite package, or absolute paths, making the interface truly portable.

Impact and Applications

This bridge enables:

Research: Generate diverse manipulation scenarios for robot learning algorithms
Testing: Validate robotic systems against probabilistic task variations
Development: Rapid prototyping of manipulation tasks without manual scene setup
Education: Teach robotics concepts through declarative scenario specification

The integration makes complex robotic simulations accessible through Scenic’s intuitive language while preserving RoboSuite’s detailed physics and control capabilities.

Documentation and Resources

The project includes:

example scenarios demonstrating all features
Comprehensive STATUS.md tracking working features and known issues
Technical documentation in docs/ covering architecture and troubleshooting
Mesh extraction utilities for pre-processing and caching

Current Status and Future Work

This prototype demonstrates that the Scenic-RoboSuite bridge is viable and functional. Basic features are working reliably:

Single-robot manipulation scenarios execute successfully
MJCF XML injection creates custom objects
Pick-and-place behaviors operate consistently
Multi-robot support functions in controlled scenarios

However, significant work remains:

Stability improvements: Some features work intermittently and need refinement
Velocity tracking: Full implementation awaits framework updates
Multi-robot coordination: Advanced synchronization primitives needed
Performance optimization: Mesh extraction and caching can be streamlined
Extended testing: More diverse scenarios and edge cases need validation

The prototype serves as a proof of concept, showing that probabilistic scenario specification can successfully drive physics-based robot simulation. The architecture is sound, the core features function, and the path forward is clear.

Conclusion

This working prototype of the Scenic-RoboSuite integration represents significant progress toward bridging probabilistic programming with robotic simulation. We’ve successfully demonstrated that declarative scenario specification can control detailed physics simulation, opening new possibilities for robotic system development and testing.

While not yet production-ready, the prototype provides a solid foundation for future development. Researchers can begin experimenting with basic manipulation scenarios, developers can test the interface with their use cases, and the community can contribute to making this bridge more robust and feature-complete.

The challenges overcome - from understanding dual architectures to implementing XML-to-mesh pipelines - have resulted in a functional system that validates our approach. This prototype proves that Scenic’s elegant scenario language and RoboSuite’s detailed physics can work together, setting the stage for a powerful new tool in robotics research and development.

Robot Manipulation with Scenic-RoboSuite

Wed, 30 Jul 2025 00:00:00 +0000

We’re Sahil, continuing work on the Scenic-RoboSuite integration for GSoC 2025. This project is mentored by Daniel Fremont and Eric Vin.

Since the last update, the Scenic-RoboSuite interface has made significant progress. The bidirectional bridge is now functional - robots can read sensor data and execute behaviors based on observations. However, these features are still in early stages and we’re working on making them more stable and consistent.

We’ve integrated RoboSuite’s Operational Space Control into Scenic. This control method lets you command the robot’s hand directly in 3D space (like “move 10cm left”) instead of calculating complex joint rotations. While the integration works, it’s rough around the edges and we’re currently focused on stabilizing it across different scenarios.

The main challenge was architectural - RoboSuite expects all robot commands bundled together each timestep, while Scenic processes them one by one. We solved this with a pending actions system that collects everything first, then executes in one go. Time synchronization was another challenge, matching Scenic’s steps with MuJoCo’s physics.

We’ve implemented a basic pick-and-place behavior for basic testing. The robot reads sensor data, calculates where to move, and adjusts continuously. It can successfully grasp and lift objects, though consistency varies between runs. The system supports three robot models and works with RoboSuite’s pre-built environments.

Custom world building is currently on hold. We’ve decided to focus on integrating existing RoboSuite features into Scenic first, then build Scenic’s capabilities like dynamic scenario randomization on top. For our first prototype, we’re aiming to extend the pick-and-place behavior into a full randomization demo - Scenic will randomly position the cube each run, and the robot will adapt to find and grasp it regardless of location.

The next two weeks focus on stabilizing current features and preparing this randomized scenario prototype. Expanding the behavior library and supporting additional environments will come in future phases after we have a solid foundation.

The core bridge between Scenic and RoboSuite is operational, but there’s significant work ahead to make it reliable and user-friendly.

Midway Through GSoC

Mon, 14 Jul 2025 00:00:00 +0000

Midway Through GSoC

Hello everyone! I’m Pratham Devadiga, and I’m thrilled to share a midterm progress update on my GSoC 2025 project with the Open Source Research Experience (OSRE). My project is focused on building the first open-source billion-scale vector embeddings dataset from real-world open source code to support benchmarking of Approximate Nearest Neighbor (ANN) algorithms and facilitate research in Retrieval-Augmented Generation (RAG).

Project Overview

The goal of this project is to address a critical gap in the ecosystem: existing ANN benchmarks are either synthetic or limited in scale. With the explosion of code-focused LLMs and embedding models, there’s a pressing need for:

High-volume, high-dimensional vector datasets built from real-world data (open-source codebases).
Open, reproducible benchmarks that reflect realistic RAG workloads.
A dataset that can be used to evaluate ANN libraries like FAISS, HNSW, and Annoy on massive and practical retrieval tasks.

Our approach is to use high-quality open-source code repositories to extract meaningful code chunks, encode them into vector embeddings using open models, and make these datasets publicly available with metadata for downstream benchmarking and analysis.

Progress So Far

We’ve made substantial foundational progress in the first half of the coding period. Key highlights:

Tested multiple embedding models such as codeBERT, MiniLM-L6-v2, and all-mpnet-base-v2, evaluating trade-offs in speed, dimensionality, and GPU memory.
Selected codebert-base (768d) as the current model for phase one due to its stable performance and manageable resource footprint.
Implemented and validated a complete script pipeline to:
- Traverse large open-source repositories.
- Extract and chunk code intelligently (functions, classes, modules).
- Encode code into embeddings and attach metadata (repo, file path, license).
- Store results efficiently in parquet and NumPy formats.
Tested all components of the pipeline on sample datasets using multi-GPU setups, ensuring compatibility and robustness.

Challenges and Learnings

Building a billion-scale dataset from real-world codebases is no small task. Here’s what we’ve encountered and learned along the way:

1. Multi-GPU Pipeline Design

Naively parallelizing the embedding process caused memory overflow and deadlocks due to model reloading across processes. We refactored the code using torch.multiprocessing and pinned GPU contexts to avoid such issues, improving throughput on multi-GPU machines.

2. Embedding Trade-offs

We experimented with larger models but found that their generation time and memory use were too high to be practical in early phases. This helped us narrow down to scalable configurations for initial dataset generation.

3. Preparing for Scale

Although the embeddings are not generated yet, all scripts are now modular, parallelized, and reproducible, ensuring a smooth transition to billion-scale data generation in the second half.

What’s Next

The second half of the project will focus on:

Scaling up embedding generation to >1B code chunks across hundreds of open-source repositories.
Running benchmarks using FAISS, HNSW, and Annoy on these embeddings.
Releasing the dataset on Hugging Face and AWS S3 with sharded access and metadata.
Writing a detailed benchmarking report comparing speed, accuracy, and memory trade-offs across ANN algorithms.

Final Thoughts

This journey so far has taught me a lot about building large-scale ML pipelines, managing real-world compute constraints, and ensuring reproducibility for research-grade datasets. I’m grateful to my mentor Jayjeet Chakraborty and the OSRE team for their continuous support and guidance.

Excited for the next half, where the real scale begins!

Stay tuned for updates. You can find more about the project on my OSRE project page.

Building a Billion-Scale Vector Embeddings Dataset

Sun, 15 Jun 2025 00:00:00 +0000

Billion Vector Embeddings Dataset

As part of the Billion-Scale Embeddings Dataset project, my proposal under the mentorship of Jayjeet Chakraborty aims to create the first large-scale, real-world vector embeddings dataset—bridging the critical gap in Approximate Nearest Neighbor (ANN) benchmarks and Retrieval-Augmented Generation (RAG) systems.

Motivation

Existing ANN benchmarks often fall short—they’re either synthetic (like SIFT) or too small-scale (≤1M vectors). With the rapid evolution of LLM-based vector search systems (e.g., OpenAI’s 3072d text-embedding-3-large), there’s a growing need for:

High-dimensional (>1000d), large-scale (>100M) embeddings
Real-world distributions (Wikipedia-scale text)
Open, reproducible benchmarks for the community

Project Goals

Generate 1 billion embeddings from English Wikipedia using open-source models.
Create multiple dimensional variants: 1024d, 4096d, and 8192d.
Deduplicate, compress, and store embeddings with rich metadata (URL, timestamps, models).
Benchmark ANN performance on FAISS, HNSW, and Annoy.
Distribute the dataset via HuggingFace & AWS S3 with shard-level access.

Open Source Impact

ANN Libraries: Enable reproducible benchmarking for real-world workloads.
RAG Systems: Evaluate and optimize retrieval at scale using real Wikipedia text.
Researchers: Conduct large-scale studies on dimensionality, ANN accuracy, and compression trade-offs.

Introducing Scenic-RoboSuite Interface

Sun, 15 Jun 2025 00:00:00 +0000

Hey! I’m Sahil, working on integrating Scenic with RoboSuite for GSoC 2025. My project is mentored by Daniel Fremont and Eric Vin .

I’m connecting Scenic (a probabilistic programming language for scenarios) with RoboSuite (a robotics simulation framework). Basically, you write simple scenario descriptions and get complex 3D robot simulations automatically.

Currently, as I’m building things and learning how Scenic works, I have been able to get the basic skeleton for the simulator interface working. I’ve implemented the simulator class and built a world model that can translate Scenic objects into RoboSuite’s simulator (which is MuJoCo-based). The interface now handles precise object placement in the world pretty well.

One of the trickier parts was figuring out the translation logic between Scenic and RoboSuite. I managed to overcome this by building a system that automatically detects the shape of objects when moving between the two frameworks, which lays a foundation for more complex object mapping later on.

I’ve also built some basic example scenarios to run and test with. Currently working on more complex examples and testing Scenic’s features like probabilistic object placement, constraint satisfaction, and spatial relationships between objects.

In summary, the “Scenic to RoboSuite” part of the interface is pretty much done. For next week, I need to work on the “RoboSuite to Scenic” part - basically getting feedback and state information flowing back from the simulation. Achieving this will make a complete bridge and give us a working simulator interface, which is the first major milestone for the project.