Data Engineering and Automated Evaluation for OpenROAD's Chat Assistant: Midterm Update

Sun, 21 Jul 2024 00:00:00 +0000

Hello everyone! We’ve reached the halfway point of our Google Summer of Code 2024 journey, and it’s time for an update on our project to build a conversational chat assistant for OpenROAD. Under the guidance of our mentors, Indira Iyer and Jack Luar, we’re making significant strides in enhancing OpenROAD’s user support capabilities.

Project Focus

My project focuses on two crucial aspects of our chat assistant:

Data Engineering: Ensuring our assistant has access to comprehensive and relevant information.
Evaluation: Developing robust methods to assess and improve the assistant’s performance.

The ultimate goal is to create a more responsive and accurate chat assistant capable of aiding users with troubleshooting, installation, and general queries about OpenROAD. I’m working in tandem with Palaniappan R, who is developing the RAG architecture for our assistant.

Progress

Since our initial deployment, I’ve been concentrating on implementing automated evaluation systems for our RAG architecture. We’ve developed two primary evaluation methods:

Basic Abbreviation Evaluation

This method assesses the model’s ability to accurately identify and explain common abbreviations used within the OpenROAD community. It ensures that our assistant can effectively communicate using domain-specific terminology.

LLM Judge-Based Evaluation

For this more comprehensive evaluation, we:

Prepared a dataset of question-answer pairs relevant to OpenROAD.
Queried our model with these questions to generate answers.
Employed LLMs (including GPT-4o and Gemini 1.5 Flash) to act as judges.
Evaluated our model’s responses against ground truth answers.

Here’s a glimpse of our early benchmark results:

Exploratory Data Analysis (EDA) on GitHub OpenROAD issues

To gather more data, I performed Exploratory Data Analysis (EDA) on GitHub OpenROAD issues using GitHub’s GraphQL API. This allowed us to:

Filter data based on parameters such as:
- Minimum number of comments
- Date range
- Mentioned PRs
- Open or closed status
Structure the data, focusing on issues tagged with Build, Query, Installation, and Runtime.
Process the data into JSONL format with key fields including:
- url: URL of the GitHub issue
- id: Unique issue number
- title: Issue title
- author: Username of the issue creator
- description: Initial issue description
- content: Array of messages related to the issue
- category: General category of the issue
- subcategory: More specific category of the issue
- tool: Relevant tools or components
- date: Issue creation timestamp

After curating this dataset, I was able to run an Analysis on OpenROAD Github Issues, identifying multiple categories of issues in the form of a pie chart.

Looking Ahead

As we move into the second half of the GSOC period, our plans include:

Incorporating GitHub Discussions data into our knowledge base.
Utilizing this expanded dataset to enhance our RAG architecture.
Continually refining and improving our model’s performance based on evaluation results.

We’re excited about the progress we’ve made and look forward to delivering an even more capable and helpful chat assistant for the OpenROAD community. Stay tuned for more updates as we continue this exciting journey!

LLM Assistant for OpenROAD - Data Engineering and Testing

Thu, 13 Jun 2024 00:00:00 +0000

Hello! My name is Aviral Kaintura, and I will be contributing to OpenROAD, a groundbreaking open-source toolchain for digital integrated circuit automation (RTL to GDSII) during GSoC 2024.

My project, LLM Assistant for OpenROAD - Data Engineering and Testing, is jointly mentored by Indira Iyer and Jack Luar.

The aim of this project is to develop a chat assistant to improve the user experience with OpenROAD. My focus will be on developing a well-curated dataset from OpenROAD’s knowledge base. This dataset will be fundamental for another project led by Palaniappan R, which involves building the chatbot’s architecture. It will be used for training and validating the model and ensuring efficient context retrieval to generate accurate user responses, aiding in troubleshooting, installation, and other common issues to reduce the maintainers’ workload.

In addition to dataset creation, I will be working on testing and evaluation. This includes developing metrics for model evaluation, incorporating both human and automated techniques.

Our human evaluation framework will utilize chatbot feedback for valuable insights, enhancing the model and dataset. An automated batch testing application is also used to further enhance the evaluation process.

Here is an early build of the evaluation framework.

By leveraging advanced data engineering and testing methodologies, we aim to build an assistant that combines high accuracy with optimal response times. Additionally, we will collaborate with research teams at NYU and ASU to contribute to the research on AI-based chat assistants for electronic design automation.

I am thrilled to be part of this journey and look forward to making a meaningful impact on the OpenROAD project.

Stay tuned for more updates on the project!

Aviral Kaintura | UCSC OSPO