<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>final | UCSC OSPO</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/tag/final/</link><atom:link href="https://deploy-preview-1007--ucsc-ospo.netlify.app/tag/final/index.xml" rel="self" type="application/rss+xml"/><description>final</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><lastBuildDate>Thu, 25 Sep 2025 00:00:00 +0000</lastBuildDate><image><url>https://deploy-preview-1007--ucsc-ospo.netlify.app/media/logo_hub6795c39d7c5d58c9535d13299c9651f_74810_300x300_fit_lanczos_3.png</url><title>final</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/tag/final/</link></image><item><title>Final Update: Building Intelligent Observability for NRP</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/ucsd/seam/intelligent-observability/20250925-manish-reddy/</link><pubDate>Thu, 25 Sep 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/ucsd/seam/intelligent-observability/20250925-manish-reddy/</guid><description>&lt;p>I&amp;rsquo;m excited to share the completion of my OSRE 2025 project, &amp;ldquo;&lt;em>Intelligent Observability for NRP: A GenAI Approach&lt;/em>&amp;rdquo; and the significant learning journey it has been. We&amp;rsquo;ve successfully developed a novel InfoAgent architecture that delivers on our core goal: building an ML-powered service for NRP that analyzes monitoring data, detects anomalies, and provides trustworthy GenAI explanations.&lt;/p>
&lt;h2 id="how-our-novel-infoagent-architecture-advances-the-observability-mission">How Our Novel InfoAgent Architecture Advances the Observability Mission&lt;/h2>
&lt;p>Through extensive development and testing, I&amp;rsquo;ve learned tremendously about building production-ready AI systems and have implemented a novel InfoAgent architecture that orchestrates our specialized agents:&lt;/p>
&lt;h3 id="1-prometheus-metrics-analysis-agent">1. Prometheus Metrics Analysis Agent&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Function&lt;/strong>: Continuously ingests and processes NRP&amp;rsquo;s Prometheus metrics&lt;/li>
&lt;li>&lt;strong>Progress&lt;/strong>: Fully implemented data pipelines handling multiple metric types with optimized latency&lt;/li>
&lt;li>&lt;strong>Purpose&lt;/strong>: Provides the foundation for anomaly detection by establishing normal behavior baselines&lt;/li>
&lt;/ul>
&lt;h3 id="2-query-refinement-agent-croq">2. Query Refinement Agent (CROQ)&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Function&lt;/strong>: Clarifies ambiguous metrics or patterns before generating explanations&lt;/li>
&lt;li>&lt;strong>Progress&lt;/strong>: Completed implementation of Conformal Revision of Questions for disambiguation&lt;/li>
&lt;li>&lt;strong>Purpose&lt;/strong>: Ensures explanations address the right system behaviors (e.g., distinguishing CPU saturation from memory pressure)&lt;/li>
&lt;li>&lt;strong>Deliverable Impact&lt;/strong>: Successfully improved accuracy of GenAI explanations by eliminating misinterpretations&lt;/li>
&lt;/ul>
&lt;h3 id="3-explanation-generation-agent-ais">3. Explanation Generation Agent (AIS)&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Function&lt;/strong>: Creates human-readable explanations and root-cause analysis&lt;/li>
&lt;li>&lt;strong>Progress&lt;/strong>: Finalized the Automated Information Seeker with a complete Plan→Validate→Execute→Assess→Revise cycle&lt;/li>
&lt;li>&lt;strong>Purpose&lt;/strong>: Transforms technical anomalies into actionable insights for operators&lt;/li>
&lt;li>&lt;strong>Deliverable Impact&lt;/strong>: Delivers GenAI explanations with uncertainty quantification&lt;/li>
&lt;/ul>
&lt;h2 id="completed-integration-the-novel-infoagent-pipeline">Completed Integration: The Novel InfoAgent Pipeline&lt;/h2>
&lt;p>We&amp;rsquo;ve successfully integrated all agents into a unified observability pipeline that represents our novel contribution:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Data Collection&lt;/strong>: Prometheus metrics → Analysis Agent (comprehensive metrics support)&lt;/li>
&lt;li>&lt;strong>Anomaly Detection&lt;/strong>: With statistical confidence bounds using conformal prediction&lt;/li>
&lt;li>&lt;strong>Query Refinement&lt;/strong>: Resolving ambiguities before explanation&lt;/li>
&lt;li>&lt;strong>Explanation Generation&lt;/strong>: Human-readable analysis with uncertainty awareness&lt;/li>
&lt;li>&lt;strong>Feedback Loop&lt;/strong>: System learning from operator interactions (implemented and tested)&lt;/li>
&lt;/ol>
&lt;h2 id="hardware-testing-results">Hardware Testing Results&lt;/h2>
&lt;p>This project taught me valuable lessons about optimizing AI workloads on specialized hardware. We successfully tested our observability framework on Qualcomm Cloud AI 100 Ultra hardware:&lt;/p>
&lt;ul>
&lt;li>Achieved significant performance improvements over baseline CPU implementation&lt;/li>
&lt;li>Successfully ported and optimized GLM-4.5 for observability-specific tasks&lt;/li>
&lt;li>Validated that specialized AI hardware significantly enhances real-time anomaly detection&lt;/li>
&lt;/ul>
&lt;h2 id="learning-journey-and-novel-contributions">Learning Journey and Novel Contributions&lt;/h2>
&lt;p>Throughout OSRE 2025, I&amp;rsquo;ve learned extensively about:&lt;/p>
&lt;ol>
&lt;li>Building hierarchical agent coordination systems for complex reasoning&lt;/li>
&lt;li>Implementing conformal prediction for trustworthy AI outputs&lt;/li>
&lt;li>Creating self-correcting explanation pipelines&lt;/li>
&lt;li>Developing adaptive learning systems from operator feedback&lt;/li>
&lt;/ol>
&lt;p>The novel InfoAgent architecture demonstrates promising results in our testing environment, with evaluation metrics and benchmarks still being refined as work in progress.&lt;/p>
&lt;h2 id="ongoing-work-continuing-beyond-osre">Ongoing Work: Continuing Beyond OSRE&lt;/h2>
&lt;p>While OSRE 2025 is concluding, I&amp;rsquo;m actively continuing to contribute to this project:&lt;/p>
&lt;ol>
&lt;li>Preparing the InfoAgent framework for open-source release with comprehensive documentation&lt;/li>
&lt;li>Running extended evaluation tests on the Nautilus platform (work in progress)&lt;/li>
&lt;li>Writing a research paper detailing our novel architecture&lt;/li>
&lt;li>Creating tutorials to help others implement intelligent observability&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Project Updates and Code&lt;/strong>: You can follow my ongoing contributions and access the latest code at &lt;a href="https://mreddy10.pages.nrp-nautilus.io/gsocnrp/" target="_blank" rel="noopener">https://mreddy10.pages.nrp-nautilus.io/gsocnrp/&lt;/a>&lt;/p>
&lt;h2 id="acknowledgments">Acknowledgments&lt;/h2>
&lt;p>I&amp;rsquo;m deeply grateful to my lead mentor &lt;strong>Mohammad Firas Sada&lt;/strong> for his exceptional guidance throughout this transformative learning experience. His insights have been invaluable in helping me develop the novel InfoAgent architecture and navigate the complexities of building production-ready AI systems.&lt;/p>
&lt;p>The OSRE 2025 program has been an incredible journey of growth and discovery. I&amp;rsquo;ve learned not just how to build AI systems, but how to make them trustworthy, explainable, and genuinely useful for real-world operations. The novel InfoAgent architecture we&amp;rsquo;ve developed serves the original mission: creating an intelligent observability tool that helps NRP operators solve problems faster and keep complex research systems running smoothly.&lt;/p>
&lt;p>I&amp;rsquo;m excited to continue contributing to this project and look forward to seeing how the community adopts and extends these ideas. Check out my contributions and ongoing updates at &lt;a href="https://mreddy10.pages.nrp-nautilus.io/gsocnrp/" target="_blank" rel="noopener">https://mreddy10.pages.nrp-nautilus.io/gsocnrp/&lt;/a>!&lt;/p></description></item></channel></rss>