GSoC2025 | UCSC OSPO

AIDRIN Privacy-Centric Enhancements: Backend & UX Upgrades

Fri, 25 Jul 2025 00:00:00 +0000

⏱️ Reading time: 5–6 minutes

Hey everyone,

If you’ve ever wondered what it takes to make AI data pipelines not just smarter, but safer and more transparent, you’re in the right place. The last few weeks working on AIDRIN for GSoC have been a deep dive into the engine room of privacy and backend systems that power the AIDRIN project. My focus has been on building out the core privacy infrastructure and backend features that power AIDRIN’s ability to give users real, actionable insights about their data. It’s been challenging, sometimes messy, but incredibly rewarding to see these changes make a tangible difference.

Having Dr. Jean Luca Bez and Prof. Suren Byna as mentors, along with the support of the entire team, has truly made all the difference. Their guidance, encouragement, and collaborative spirit have been a huge part of this journey, whether I’m brainstorming new ideas or just trying to untangle a tricky bug.

Privacy Metrics: Making Data Safer

A major part of my work has been putting data privacy at the front and center in AIDRIN. I focused on integrating essential privacy metrics like k-anonymity, l-diversity, t-closeness, and more, making sure they’re not just theoretical checkboxes, but real tools that users can interact with and understand. Now, these metrics are fully wired up in the backend and visualized in AIDRIN, so privacy risks are no longer just a vague concern. They are something AI data preparers can actually see and act on. Getting these metrics to work seamlessly with different datasets and ensuring their accuracy took some serious backend engineering, but the payoff has been worth it.

Speeding Things Up (So You Don’t Have To Wait Around)

As AIDRIN started handling bigger datasets, some of the calculations can be time-consuming because data has to be accessed every time a metric is computed. To address this, I added caching for previously computed metrics, like class imbalance and privacy checks, and set up asynchronous execution with Celery and Redis. This should make the app super responsive. Rather than waiting for heavy computations to finish, one can start taking notes about other metrics or explore different parts of the app while their results are loading in the background. It’s a small change, but it helps keep the workflow moving smoothly.

Small Touch Ups That (Hopefully) Make a Big Difference

I also spent time on the details that make the app easier to use. Tooltips now explain what the privacy metrics actually mean, error messages are clearer, and there’s a new cache info page where you can see and clear your cached data. The sensitive attribute dropdown is less confusing now, especially if you’re working with quasi-identifiers. These tweaks might seem minor, but they add up and make the app friendlier for everyone.

Docs, Docs, Docs

I’m a big believer that good documentation is just as important as good code. I updated the docs to cover all the new features, added citations for the privacy metrics, and made the install process a bit more straightforward. Hopefully, this means new users and contributors can get up to speed without too much hassle.

Huge Thanks to My Mentors and the Team

I really want to shine a light on Dr. Bez, Prof. Byna, and the entire AIDRIN team here. Their encouragement, practical advice, and collaborative spirit have been a huge part of my progress. Whether I’m stuck on a bug, brainstorming a new feature, or just need a second opinion, there’s always someone ready to help me think things through. Their experience and support have shaped not just the technical side of my work, but also how I approach problem-solving and teamwork.

What’s Next?

Looking ahead, I’m planning to expand AIDRIN’s support for multimodal datasets and keep refining the privacy and fairness modules. There’s always something new to learn or improve, and I’m excited to keep building. If you’re interested in data quality, privacy, or open-source AI tools, I’d love to connect and swap ideas.

Thanks for reading and for following along with my GSoC journey. I’ll be back soon with more updates!

This is the second post in my 3-part GSoC series with AIDRIN. Stay tuned for the final update.

Midway Through GSoC: ENTS

Thu, 24 Jul 2025 00:00:00 +0000

Midway Through GSoC

Hi everyone! I’m Devansh Kukreja, and I’m excited to share a midterm update on my Google Summer of Code 2025 project with the University of California, Santa Cruz Open Source Program Office (UC OSPO) under the Open Source Research Experience (OSRE). I’m contributing to ENTS, a platform that supports real-time monitoring and visualization of environmental sensor networks.

Project Overview

The Environmental NeTworked Sensor (ENTS) platform is an open-source web portal designed to collect, visualize, and analyze data from distributed sensor networks. It’s used by researchers and citizen scientists to monitor field-deployed sensors measuring soil moisture, temperature, voltage, and current—supporting critical research on sustainability and environmental change.

My project focuses on improving the platform’s stability, usability, and extensibility through:

Fixing bugs in the data visualization components.
Enhancing real-time chart synchronization and data point selection.
Improving overall system error handling and reliability.
Building a Logger Registration System that enables users to register and configure their logging devices.
Exploring integration with The Things Network (TTN) to support LoRaWAN-based wireless sensor connectivity.

Progress So Far

During the first half of the GSoC period, I focused on laying the groundwork for a more robust and user-friendly system. Highlights include:

Enhanced date range logic: Improved the way the dashboard selects time periods by automatically choosing a recent two-week window with valid sensor data. This ensures charts always display meaningful insights and avoids showing blank states.
Improved chart rendering: Refined how charts behave when there’s no data or when unusual values (like negatives) are present. This includes smoother axis alignment and fallback messaging when data is unavailable.
Refactored cell management UI: Cleaned up and improved the modals used to manage cells and sensors, fixing several UI/UX issues and bugs to make interactions more intuitive and consistent.
Enabled smart URL syncing: The dashboard state now stays in sync with the URL, making it easier to share specific views or navigate back to previous states without losing context.

What’s Next

In the second half of the program, I’ll be focusing on:

Building out and polishing the Logger Registration UI based on the backend schema and wireframes.
Finalizing the onboarding flow for field loggers, linking registration data to ingestion and dashboard views.
Continuing work on LoRaWAN support with TTN, aiming to enable basic OTA provisioning for future deployments.
Exploring an admin dashboard that helps visualize device health, sync status, and alert on any anomalies.

Final Thoughts

Working on ENTS has been incredibly rewarding—it’s more than just code. It’s about making tools that help scientists and conservationists understand our changing environment, and I’m honored to be a part of that.

Big thanks to my mentors Colleen Josephson, John Madden, and Alec Levy for their support and thoughtful feedback throughout. I’ve learned a ton already, and I can’t wait to keep building.

Scaling Sensor Networks for Environmental Research

Sun, 15 Jun 2025 00:00:00 +0000

Hi! I’m Devansh Kukreja, a researcher, indie developer, and Computer Science undergrad. I’m interested in distributed systems, orchestration services, and real-time data platforms. I enjoy working on systems that help different components connect and run smoothly at scale.

This summer, I’m contributing to the ENTS (Environmental NeTworked Sensor) platform with the University of California, Santa Cruz Open Source Program Office as part of Google Summer of Code 2025.

ENTS is an open-source web portal designed to collect, visualize, and analyze data from large-scale environmental sensor networks. It helps researchers and citizen scientists monitor sensors like soil moisture, temperature, current, and voltage supporting real-time environmental research in outdoor settings.

My work this summer focuses on improving the platform’s reliability and usability. I’ll be fixing visualization bugs, enhancing chart synchronization, making data point selection more intuitive, and improving error handling. Alongside that, I’m building a Logger Registration System that lets users easily add and configure their data loggers, with potential support for over-the-air provisioning via The Things Network (TTN) for LoRaWAN-based devices.

You can check out my full proposal here. I’m grateful to be mentored by Colleen Josephson, John Madden, and Alec Levy, who are guiding the project with incredible insight and support.

By the end of the summer, ENTS will be a more stable, user-friendly, and extensible platform—better equipped to support environmental research at scale. I’m super excited to learn, build, and contribute to something meaningful!

Improving AI Data Pipelines in AIDRIN: A Privacy-Centric and Multimodal Expansion

Thu, 12 Jun 2025 00:00:00 +0000

⏱️ Reading time: 4–5 minutes

Hi 👋

I’m Harish Balaji, a Master’s student at NYU with a focus on Artificial Intelligence, Machine Learning, and Cybersecurity. I’m especially interested in building scalable systems that reflect responsible AI principles. For me, data quality isn’t just a technical detail. It’s a foundational aspect of building models that are reliable, fair, and reproducible in the real world.

This summer, I’m contributing to AIDRIN (AI Data Readiness Inspector) as part of Google Summer of Code 2025. I’m grateful to be working under the mentorship of Dr. Jean Luca Bez and Prof. Suren Byna from the Scientific Data Management Group at Lawrence Berkeley National Laboratory (LBNL).

AIDRIN is an open-source framework that helps researchers and practitioners evaluate whether a dataset is truly ready to be used in production-level AI workflows. From fairness to privacy, it provides a structured lens through which we can understand the strengths and gaps in our data.

Why this work matters

In machine learning, one principle always holds true:

“Garbage in, garbage out.”

Even the most advanced models can underperform or amplify harmful biases if trained on incomplete, imbalanced, or poorly understood data. This is where AIDRIN steps in. It provides practical tools to assess datasets across key dimensions like privacy, fairness, class balance, interpretability, and support for multiple modalities.

By making these characteristics measurable and transparent, AIDRIN empowers teams to make informed decisions early in the pipeline. It helps ensure that datasets are not only large or complex, but also trustworthy, representative, and purpose-fit.

My focus this summer

As part of my GSoC 2025 project, I’ll be focusing on extending AIDRIN’s evaluation capabilities. A big part of this involves strengthening its support for privacy metrics and designing tools that can handle non-tabular datasets, such as image-based data.

The goal is to expand AIDRIN’s reach without compromising on interpretability or ease of use. More technical insights and updates will follow in the next posts as the summer progresses.

What comes next

As the AI community continues to evolve, there’s a growing shift toward data-centric practices. I believe frameworks like AIDRIN are essential for helping us move beyond the question of “Does the model work?” toward a deeper and more meaningful one: “Was the data ready in the first place?”

Over the next few weeks, I’ll be working on development, testing, and integration. I’m excited to contribute to a tool that emphasizes transparency and reproducibility across the AI lifecycle, and to share lessons and ideas with others who care about responsible AI.

If you’re exploring similar challenges or working in the space of dataset evaluation and readiness, I’d love to connect and exchange thoughts. You can also read my full GSoC 2025 proposal below for more context around the project scope and vision:

👉 Read my GSoC 2025 proposal here

This is the first in a 3-part blog series documenting my GSoC journey with AIDRIN. Stay tuned for technical updates and behind-the-scenes insights as the summer unfolds!