<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Aviral Kaintura | UCSC OSPO</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/aviral-kaintura/</link><atom:link href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/aviral-kaintura/index.xml" rel="self" type="application/rss+xml"/><description>Aviral Kaintura</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><image><url>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/aviral-kaintura/avatar_hu4363ede9fb8598bf1f3325844d9b3840_198955_270x270_fill_q75_lanczos_center.jpg</url><title>Aviral Kaintura</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/aviral-kaintura/</link></image><item><title>Data Engineering and Automated Evaluation for OpenROAD's Chat Assistant: Midterm Update</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/</link><pubDate>Sun, 21 Jul 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/</guid><description>&lt;p>Hello everyone! We&amp;rsquo;ve reached the halfway point of our Google Summer of Code 2024 journey, and it&amp;rsquo;s time for an update on our project to build a conversational chat assistant for OpenROAD. Under the guidance of our mentors, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/indira-iyer/">Indira Iyer&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jack-luar/">Jack Luar&lt;/a>, we&amp;rsquo;re making significant strides in enhancing OpenROAD&amp;rsquo;s user support capabilities.&lt;/p>
&lt;h2 id="project-focus">Project Focus&lt;/h2>
&lt;p>My project focuses on two crucial aspects of our chat assistant:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Data Engineering&lt;/strong>: Ensuring our assistant has access to comprehensive and relevant information.&lt;/li>
&lt;li>&lt;strong>Evaluation&lt;/strong>: Developing robust methods to assess and improve the assistant&amp;rsquo;s performance.&lt;/li>
&lt;/ol>
&lt;p>The ultimate goal is to create a more responsive and accurate chat assistant capable of aiding users with troubleshooting, installation, and general queries about OpenROAD. I&amp;rsquo;m working in tandem with &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/palaniappan-r/">Palaniappan R&lt;/a>, who is developing the RAG architecture for our assistant.&lt;/p>
&lt;h2 id="progress">Progress&lt;/h2>
&lt;p>Since our initial deployment, I&amp;rsquo;ve been concentrating on implementing automated evaluation systems for our RAG architecture. We&amp;rsquo;ve developed two primary evaluation methods:&lt;/p>
&lt;h3 id="basic-abbreviation-evaluation">Basic Abbreviation Evaluation&lt;/h3>
&lt;p>This method assesses the model&amp;rsquo;s ability to accurately identify and explain common abbreviations used within the OpenROAD community. It ensures that our assistant can effectively communicate using domain-specific terminology.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Figure 1: Flow Chart of Basic Abbreviation Evaluation" srcset="
/report/osre24/ucsd/openroad/20240621-aviral/figure1_basic_abbreviation_evaluation_hud808c3411b9bf24258c9c6d4950618ae_122195_7793f2944668d59749f48f3848acfba7.webp 400w,
/report/osre24/ucsd/openroad/20240621-aviral/figure1_basic_abbreviation_evaluation_hud808c3411b9bf24258c9c6d4950618ae_122195_c0340ef0448a8f440bce5566986a10ef.webp 760w,
/report/osre24/ucsd/openroad/20240621-aviral/figure1_basic_abbreviation_evaluation_hud808c3411b9bf24258c9c6d4950618ae_122195_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/figure1_basic_abbreviation_evaluation_hud808c3411b9bf24258c9c6d4950618ae_122195_7793f2944668d59749f48f3848acfba7.webp"
width="469"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Examples" srcset="
/report/osre24/ucsd/openroad/20240621-aviral/figure2_sample_examples_hu78acdf5642b62d8d730a2574d861f211_90900_f04196ec40b94ffced2a574cbd37ad44.webp 400w,
/report/osre24/ucsd/openroad/20240621-aviral/figure2_sample_examples_hu78acdf5642b62d8d730a2574d861f211_90900_1a776103bd42be9525343172ad16d2a2.webp 760w,
/report/osre24/ucsd/openroad/20240621-aviral/figure2_sample_examples_hu78acdf5642b62d8d730a2574d861f211_90900_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/figure2_sample_examples_hu78acdf5642b62d8d730a2574d861f211_90900_f04196ec40b94ffced2a574cbd37ad44.webp"
width="760"
height="431"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h3 id="llm-judge-based-evaluation">LLM Judge-Based Evaluation&lt;/h3>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Figure 2: Flow Chart of LLM Judge-Based Evaluation" srcset="
/report/osre24/ucsd/openroad/20240621-aviral/figure3_llm_judge_evaluation_hu385b71952e0ff054a9e7c96b25e3d452_269894_8dfc4bba33d8ad8d797f27f1c7a1eaaf.webp 400w,
/report/osre24/ucsd/openroad/20240621-aviral/figure3_llm_judge_evaluation_hu385b71952e0ff054a9e7c96b25e3d452_269894_6ef7c0153c7e61298bbf98aa15f5d69d.webp 760w,
/report/osre24/ucsd/openroad/20240621-aviral/figure3_llm_judge_evaluation_hu385b71952e0ff054a9e7c96b25e3d452_269894_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/figure3_llm_judge_evaluation_hu385b71952e0ff054a9e7c96b25e3d452_269894_8dfc4bba33d8ad8d797f27f1c7a1eaaf.webp"
width="689"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>For this more comprehensive evaluation, we:&lt;/p>
&lt;ol>
&lt;li>Prepared a dataset of question-answer pairs relevant to OpenROAD.&lt;/li>
&lt;li>Queried our model with these questions to generate answers.&lt;/li>
&lt;li>Employed LLMs (including GPT-4o and Gemini 1.5 Flash) to act as judges.&lt;/li>
&lt;li>Evaluated our model&amp;rsquo;s responses against ground truth answers.&lt;/li>
&lt;/ol>
&lt;p>Here&amp;rsquo;s a glimpse of our early benchmark results:&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Benchmark" srcset="
/report/osre24/ucsd/openroad/20240621-aviral/figure4_model_performance_comparison_hu7cf4636aada9c277e08b0256b02e5dd8_206498_06ea37525851a60dad5bd072a03cd329.webp 400w,
/report/osre24/ucsd/openroad/20240621-aviral/figure4_model_performance_comparison_hu7cf4636aada9c277e08b0256b02e5dd8_206498_d9a11b8b08e2634c01f9063cc78ab134.webp 760w,
/report/osre24/ucsd/openroad/20240621-aviral/figure4_model_performance_comparison_hu7cf4636aada9c277e08b0256b02e5dd8_206498_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/figure4_model_performance_comparison_hu7cf4636aada9c277e08b0256b02e5dd8_206498_06ea37525851a60dad5bd072a03cd329.webp"
width="760"
height="701"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Example" srcset="
/report/osre24/ucsd/openroad/20240621-aviral/figure5_sample_examples_hu7826377a5560a22c15aefc27866894bb_575795_f63055fd0281e09d0ef800e1e444c7f9.webp 400w,
/report/osre24/ucsd/openroad/20240621-aviral/figure5_sample_examples_hu7826377a5560a22c15aefc27866894bb_575795_91c683a3ebadbf3ce5a21099a81b1836.webp 760w,
/report/osre24/ucsd/openroad/20240621-aviral/figure5_sample_examples_hu7826377a5560a22c15aefc27866894bb_575795_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/figure5_sample_examples_hu7826377a5560a22c15aefc27866894bb_575795_f63055fd0281e09d0ef800e1e444c7f9.webp"
width="577"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="exploratory-data-analysis-eda-on-github-openroad-issues">Exploratory Data Analysis (EDA) on GitHub OpenROAD issues&lt;/h2>
&lt;p>To gather more data, I performed Exploratory Data Analysis (EDA) on GitHub OpenROAD issues using GitHub&amp;rsquo;s GraphQL API. This allowed us to:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Filter data based on parameters such as:&lt;/p>
&lt;ul>
&lt;li>Minimum number of comments&lt;/li>
&lt;li>Date range&lt;/li>
&lt;li>Mentioned PRs&lt;/li>
&lt;li>Open or closed status&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Structure the data, focusing on issues tagged with Build, Query, Installation, and Runtime.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Process the data into JSONL format with key fields including:&lt;/p>
&lt;ul>
&lt;li>&lt;code>url&lt;/code>: URL of the GitHub issue&lt;/li>
&lt;li>&lt;code>id&lt;/code>: Unique issue number&lt;/li>
&lt;li>&lt;code>title&lt;/code>: Issue title&lt;/li>
&lt;li>&lt;code>author&lt;/code>: Username of the issue creator&lt;/li>
&lt;li>&lt;code>description&lt;/code>: Initial issue description&lt;/li>
&lt;li>&lt;code>content&lt;/code>: Array of messages related to the issue&lt;/li>
&lt;li>&lt;code>category&lt;/code>: General category of the issue&lt;/li>
&lt;li>&lt;code>subcategory&lt;/code>: More specific category of the issue&lt;/li>
&lt;li>&lt;code>tool&lt;/code>: Relevant tools or components&lt;/li>
&lt;li>&lt;code>date&lt;/code>: Issue creation timestamp&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Figure 5: Sample structure of our processed JSONL data" srcset="
/report/osre24/ucsd/openroad/20240621-aviral/figure6_jsonl_data_structure_hua8b5c15add3c3268be11381a14b4e3cd_555437_fd103ea5ef1fa131b8bc806db99a24d1.webp 400w,
/report/osre24/ucsd/openroad/20240621-aviral/figure6_jsonl_data_structure_hua8b5c15add3c3268be11381a14b4e3cd_555437_c30d5d185fec144cfca686499f464f19.webp 760w,
/report/osre24/ucsd/openroad/20240621-aviral/figure6_jsonl_data_structure_hua8b5c15add3c3268be11381a14b4e3cd_555437_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/figure6_jsonl_data_structure_hua8b5c15add3c3268be11381a14b4e3cd_555437_fd103ea5ef1fa131b8bc806db99a24d1.webp"
width="692"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>After curating this dataset, I was able to run an Analysis on OpenROAD Github Issues, identifying multiple categories of issues in the form of a pie chart.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Figure 6: Distribution of OpenROAD issue types" srcset="
/report/osre24/ucsd/openroad/20240621-aviral/figure7_issue_type_distribution_hu54d4d8b580cc8ae9261f464a2e9181da_130314_d788906f5395b26ab2030fb056e45941.webp 400w,
/report/osre24/ucsd/openroad/20240621-aviral/figure7_issue_type_distribution_hu54d4d8b580cc8ae9261f464a2e9181da_130314_ebae2b4145d035c9521679314911236b.webp 760w,
/report/osre24/ucsd/openroad/20240621-aviral/figure7_issue_type_distribution_hu54d4d8b580cc8ae9261f464a2e9181da_130314_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/figure7_issue_type_distribution_hu54d4d8b580cc8ae9261f464a2e9181da_130314_d788906f5395b26ab2030fb056e45941.webp"
width="760"
height="504"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Figure 7: Breakdown of issues by specific OpenROAD tools" srcset="
/report/osre24/ucsd/openroad/20240621-aviral/figure8_issues_by_openroad_tools_hu9e03c2c37c64c392ac5e783cdb492b5c_174856_3af195a89fadc1379452709cdea50d22.webp 400w,
/report/osre24/ucsd/openroad/20240621-aviral/figure8_issues_by_openroad_tools_hu9e03c2c37c64c392ac5e783cdb492b5c_174856_e171fcc132e7c13ef62f2a192ed18b62.webp 760w,
/report/osre24/ucsd/openroad/20240621-aviral/figure8_issues_by_openroad_tools_hu9e03c2c37c64c392ac5e783cdb492b5c_174856_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240621-aviral/figure8_issues_by_openroad_tools_hu9e03c2c37c64c392ac5e783cdb492b5c_174856_3af195a89fadc1379452709cdea50d22.webp"
width="760"
height="511"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="looking-ahead">Looking Ahead&lt;/h2>
&lt;p>As we move into the second half of the GSOC period, our plans include:&lt;/p>
&lt;ul>
&lt;li>Incorporating GitHub Discussions data into our knowledge base.&lt;/li>
&lt;li>Utilizing this expanded dataset to enhance our RAG architecture.&lt;/li>
&lt;li>Continually refining and improving our model&amp;rsquo;s performance based on evaluation results.&lt;/li>
&lt;/ul>
&lt;p>We&amp;rsquo;re excited about the progress we&amp;rsquo;ve made and look forward to delivering an even more capable and helpful chat assistant for the OpenROAD community. Stay tuned for more updates as we continue this exciting journey!&lt;/p></description></item><item><title>LLM Assistant for OpenROAD - Data Engineering and Testing</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240613-aviral/</link><pubDate>Thu, 13 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240613-aviral/</guid><description>&lt;p>Hello! My name is Aviral Kaintura, and I will be contributing to &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD" target="_blank" rel="noopener">OpenROAD&lt;/a>, a groundbreaking open-source toolchain for digital integrated circuit automation (RTL to GDSII) during &lt;a href="https://summerofcode.withgoogle.com/" target="_blank" rel="noopener">GSoC 2024&lt;/a>.&lt;/p>
&lt;p>My project, &lt;a href="https://summerofcode.withgoogle.com/programs/2024/projects/J8uAFNCu" target="_blank" rel="noopener">LLM Assistant for OpenROAD - Data Engineering and Testing&lt;/a>, is jointly mentored by &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/indira-iyer/">Indira Iyer&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jack-luar/">Jack Luar&lt;/a>.&lt;/p>
&lt;p>The aim of this project is to develop a chat assistant to improve the user experience with OpenROAD. My focus will be on developing a well-curated dataset from OpenROAD&amp;rsquo;s knowledge base. This dataset will be fundamental for another project led by &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/palaniappan-r/">Palaniappan R&lt;/a>, which involves building the chatbot&amp;rsquo;s architecture. It will be used for training and validating the model and ensuring efficient context retrieval to generate accurate user responses, aiding in troubleshooting, installation, and other common issues to reduce the maintainers&amp;rsquo; workload.&lt;/p>
&lt;p>In addition to dataset creation, I will be working on testing and evaluation. This includes developing metrics for model evaluation, incorporating both human and automated techniques.&lt;/p>
&lt;p>Our human evaluation framework will utilize chatbot feedback for valuable insights, enhancing the model and dataset. An automated batch testing application is also used to further enhance the evaluation process.&lt;/p>
&lt;p>Here is an early build of the evaluation framework.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Screenshots" srcset="
/report/osre24/ucsd/openroad/20240613-aviral/image_hu3257e6557164f1033894cc91760eaec1_709148_ccb0a69833aa5c774f30b616a038edd6.webp 400w,
/report/osre24/ucsd/openroad/20240613-aviral/image_hu3257e6557164f1033894cc91760eaec1_709148_25ece2ab19d666f60342ed2d6dcb217f.webp 760w,
/report/osre24/ucsd/openroad/20240613-aviral/image_hu3257e6557164f1033894cc91760eaec1_709148_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsd/openroad/20240613-aviral/image_hu3257e6557164f1033894cc91760eaec1_709148_ccb0a69833aa5c774f30b616a038edd6.webp"
width="760"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
By leveraging advanced data engineering and testing methodologies, we aim to build an assistant that combines high accuracy with optimal response times. Additionally, we will collaborate with research teams at NYU and ASU to contribute to the research on AI-based chat assistants for electronic design automation.&lt;/p>
&lt;p>I am thrilled to be part of this journey and look forward to making a meaningful impact on the OpenROAD project.&lt;/p>
&lt;p>Stay tuned for more updates on the project!&lt;/p></description></item></channel></rss>