<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Kiran Deol | UCSC OSPO</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/kiran-deol/</link><atom:link href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/kiran-deol/index.xml" rel="self" type="application/rss+xml"/><description>Kiran Deol</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><image><url>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/kiran-deol/avatar_hu096299a10fa1f493bdbf7f876ee18ac3_26178_270x270_fill_q75_lanczos_center.jpg</url><title>Kiran Deol</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/kiran-deol/</link></image><item><title>Mediglot</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/polyphy/</link><pubDate>Tue, 04 Feb 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre25/ucsc/polyphy/</guid><description>&lt;p>&lt;a href="https://github.com/PolyPhyHub/PolyPhy" target="_blank" rel="noopener">PolyPhy&lt;/a> is a GPU-oriented agent-based system for reconstructing and visualizing &lt;em>optimal transport networks&lt;/em> defined over sparse data. Rooted in astronomy and inspired by nature, we have used an early prototype called &lt;a href="https://github.com/CreativeCodingLab/Polyphorm" target="_blank" rel="noopener">Polyphorm&lt;/a> to reconstruct the &lt;a href="https://youtu.be/5ILwq5OFuwY" target="_blank" rel="noopener">Cosmic web&lt;/a> structure, but also to discover network-like patterns in natural language data. You can see an instructive overview of PolyPhy in our &lt;a href="https://elek.pub/workshop_cross2022.html" target="_blank" rel="noopener">workshop&lt;/a> and more details about our research &lt;a href="https://elek.pub/projects/Rhizome-Cosmology" target="_blank" rel="noopener">here&lt;/a>. Recent projects, such as &lt;a href="https://github.com/PolyPhyHub/PolyGlot" target="_blank" rel="noopener">Polyglot&lt;/a> and &lt;a href="https://github.com/Ayush-Sharma410/MediGlot" target="_blank" rel="noopener">Mediglot&lt;/a> have focused on using PolyPhy to better visualize language embeddings.&lt;/p>
&lt;h3 id="medicinal-language-embeddings">Medicinal Language Embeddings&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Large Language Models&lt;/code> &lt;code>NLP&lt;/code> &lt;code>Embeddings&lt;/code> &lt;code>Medicine&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, JavaScript, Data Science, Technical Communication&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/oskar-elek/">Oskar Elek&lt;/a>, &lt;a href="mailto:kdeol@ualberta.ca">Kiran Deol&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>This project aims to refine and enhance Mediglot, a web application for visualizing 3D medicinal embeddings, which extends the Polyglot app and leverages the PolyPhy toolkit for network-inspired data science. Mediglot currently enables users to explore high-dimensional vector representations of medicines (derived from their salt compositions) in a 3D space using UMAP, as well as analyze similarity through the innovative Monte-Carlo Physarum Machine (MCPM) metric. Unlike traditional language data, medicinal embeddings do not have an inherent sequential structure. Instead, we must work with the salt compositions of each medicine to create embeddings that are faithful to the intended purpose of each medicine.&lt;/p>
&lt;p>This year, we would like to focus on exploring and integrating state-of-the-art AI techniques and algorithms to improve Mediglot&amp;rsquo;s clustering capabilities and its representation of medicinal data in 3D. The contributor will experiment with advanced large language models (LLMs) and cutting-edge AI methods to develop innovative approaches for refining clustering and extracting deeper insights from medicinal embeddings. Beyond LLMs, we would like to experiment with more traditional language processing methods to design novel embedding procedures. Additionally, we would like to experiment with other similarity metrics. While the similarity of two medicines depends on the initial embedding, we would like to examine the effects of different metrics on the kinds of insights a user can extract. Finally, the contributor is expected to evaluate and compare different algorithms for dimensionality reduction to enhance the faithfulness of the visualization and its interpretability.&lt;/p>
&lt;p>The ideal contributor for this project has experience with Python (and common scientific toolkits such as NumPy, Pandas, SciPy). They will also need some experience with JavaScript and web development (MediGlot is distributed as a vanilla JS web app). Knowledge of embedding techniques for language processing is highly recommended.&lt;/p>
&lt;p>&lt;strong>Specific tasks:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Closely work with the mentors to understand the context of the project and its detailed requirements in preparation for the proposal.&lt;/li>
&lt;li>Become acquainted with the tooling (PolyPhy, PolyGlot, Mediglot) prior to the start of the project period.&lt;/li>
&lt;li>Explore different embedding techniques for medicinal data (including implementing novel embedding procedures).&lt;/li>
&lt;li>Explore different dimensionality reduction techniques, with a focus on faithful visualizations.&lt;/li>
&lt;li>Document the process and resulting findings in a publicly available report.&lt;/li>
&lt;/ul>
&lt;h3 id="enhancing-polyphy-web-application">Enhancing PolyPhy Web Application&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Web Development&lt;/code> &lt;code>UI/UX Design&lt;/code> &lt;code>Full Stack Development&lt;/code> &lt;code>JavaScript&lt;/code> &lt;code>Next.js&lt;/code> &lt;code>Node.js&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Full Stack Web Development, UI/UX Design, JavaScript, Next.js, Node.js, Technical Communication&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium (175 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/oskar-elek/">Oskar Elek&lt;/a>, &lt;a href="mailto:kdeol@ualberta.ca">Kiran Deol&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>This project aims to revamp and enhance the PolyPhy web platform to better support contributors, users, and researchers. The goal is to optimize the website’s UI/UX, improve its performance, and integrate Mediglot to provide users with a seamless experience in visualizing both general network structures and 3D medicinal embeddings.&lt;/p>
&lt;p>The contributor will be responsible for improving the website’s overall look, feel, and functionality, ensuring a smooth and engaging experience for both contributors and end-users. This includes addressing front-end and back-end challenges, optimizing the platform for better accessibility, and ensuring seamless integration with Mediglot.&lt;/p>
&lt;p>The ideal candidate should have experience in full-stack web development, particularly with &lt;strong>Next.js&lt;/strong>, &lt;strong>JavaScript&lt;/strong>, and &lt;strong>Node.js&lt;/strong>, and should be familiar with UI/UX design principles. A strong ability to communicate effectively, both in writing and through code, is essential for this role.&lt;/p>
&lt;p>&lt;strong>Specific tasks:&lt;/strong>&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Collaborate with mentors&lt;/strong> to understand the project&amp;rsquo;s goals and the specific requirements for the website improvements.&lt;/li>
&lt;li>&lt;strong>UI/UX Redesign&lt;/strong>:
&lt;ul>
&lt;li>Redesign and enhance the website’s navigation, layout, and visual elements to create an intuitive and visually engaging experience.&lt;/li>
&lt;li>Improve mobile responsiveness for broader accessibility across devices.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Website Performance &amp;amp; Stability&lt;/strong>:
&lt;ul>
&lt;li>Identify and resolve performance bottlenecks, bugs, or issues affecting speed, stability, and usability.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Mediglot Integration&lt;/strong>:
&lt;ul>
&lt;li>Integrate the Mediglot web application with PolyPhy, ensuring seamless functionality and a unified user experience for visualizing medicinal data alongside general network reconstructions.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Documentation&lt;/strong>:
&lt;ul>
&lt;li>Document the development process, challenges, and solutions in a clear and organized manner, ensuring transparent collaboration with mentors and the community.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol></description></item><item><title>PolyPhy</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/ucsc/polyphy/</link><pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/ucsc/polyphy/</guid><description>&lt;p>&lt;a href="https://github.com/PolyPhyHub/PolyPhy" target="_blank" rel="noopener">PolyPhy&lt;/a> is a GPU oriented agent-based system for reconstructing and visualizing &lt;em>optimal transport networks&lt;/em> defined over sparse data. Rooted in astronomy and inspired by nature, we have used an early prototype called &lt;a href="https://github.com/CreativeCodingLab/Polyphorm" target="_blank" rel="noopener">Polyphorm&lt;/a> to reconstruct the &lt;a href="https://youtu.be/5ILwq5OFuwY" target="_blank" rel="noopener">Cosmic web&lt;/a> structure, but also to discover network-like patterns in natural language data. You can see an instructive overview of PolyPhy in our &lt;a href="https://elek.pub/workshop_cross2022.html" target="_blank" rel="noopener">workshop&lt;/a> and more details about our research &lt;a href="https://elek.pub/projects/Rhizome-Cosmology" target="_blank" rel="noopener">here&lt;/a>.&lt;/p>
&lt;p>Under the hood, PolyPhy uses a richer 3D scalar field representation of the reconstructed network, instead of a typical discrete representation like a graph or a mesh. The ultimate purpose of PolyPhy is to become a toolkit for a range of specialists across different disciplines: astronomers, neuroscientists, data scientists and even artists and designers. PolyPhy aspires to be a tool for discovering connections between different disciplines by creating quantitatively comparable structural analytics.&lt;/p>
&lt;h3 id="polyphy-web-presence">PolyPhy Web Presence&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Web Development&lt;/code> &lt;code>UX&lt;/code> &lt;code>Social Media&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> full stack web development, Javascript, good communicator&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/oskar-elek/">Oskar Elek&lt;/a>, &lt;a href="mailto:ez@nmsu.edu">Ezra Huscher&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The online presentation of a software project is without a doubt one of the core ingredients of its success. This project aims to develop a sustainable web presentce for PolyPhy, catering to interested contributors, active collaborators, and users alike.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Closely work with the mentors on understanding the context of the project and its detailed requirements in preparation of the proposal.&lt;/li>
&lt;li>Port the existing &lt;a href="https://polyphy.io" target="_blank" rel="noopener">website&lt;/a> into a more modern Javascript framework (such as Next.js) that provides a user-friendly CMS and admin interface.&lt;/li>
&lt;li>Update the contents of the website with new information from the repository &lt;a href="https://github.com/CreativeCodingLab/Polyphorm" target="_blank" rel="noopener">repository page&lt;/a> as well as other sources as directed by the mentors.&lt;/li>
&lt;li>Develop a simple functional system for posting updates about the project to selected social media and other communication platforms (LinkedIn, Twitter/X or Mastodon, mailing list) which will also be reflected on the website.&lt;/li>
&lt;li>Optional: improve the UX of the website where needed.&lt;/li>
&lt;li>Optional: implement website analytics (visitor stats etc).&lt;/li>
&lt;/ul>
&lt;h3 id="data-visualization-and-analysis-with-polyphypolyglot">Data Visualization and Analysis with PolyPhy/Polyglot&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Data Science&lt;/code> &lt;code>Data Visualization&lt;/code> &lt;code>Point Clustering&lt;/code> &lt;code>3D&lt;/code> &lt;code>Neural Embeddings&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> data science, Python, Javascript, statistics, familiarity with AI and latent embedding spaces a big plus&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350+ hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/oskar-elek/">Oskar Elek&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/kiran-deol/">Kiran Deol&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The aim of this project is to explore a novel data-scientific usecase using PolyPhy and its associated web visualization interface &lt;a href="https://github.com/PolyPhyHub/PolyGlot" target="_blank" rel="noopener">PolyGlot&lt;/a>. The contributor is expected to identify a dataset they are already well familiar with, and that fits the application scope of the PolyPhy/PolyGlot tooling: a complex point cloud arising from a 3D or a higher dimensional process which will benefit from latent pattern identification and a subsequent visual as well as quantitative analysis. The contributor needs to have the rights for using the dataset - either by owning the copyright or via the open-source nature of the data.&lt;/p>
&lt;p>&lt;strong>Specific tasks:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Closely work with the mentors on understanding the context of the project and its detailed requirements in preparation of the proposal.&lt;/li>
&lt;li>Become acquainted with the tooling (PolyPhy, PolyGlot) prior to the start of the project period.&lt;/li>
&lt;li>Document the nature of the target dataset and define the complete data pipeline with assistance of the mentors, including the specific analytic tasks and objectives.&lt;/li>
&lt;li>Implement the data pipeline in PolyPhy and PolyGlot.&lt;/li>
&lt;li>Document the process and resulting findings in a publicly available report.&lt;/li>
&lt;/ul></description></item><item><title>Final GSoC Blog - Polyglot</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/polyphy/20230925-kirandeol/</link><pubDate>Mon, 25 Sep 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/polyphy/20230925-kirandeol/</guid><description>&lt;p>As I send in my final work submission for the final GSoC evaluation, I&amp;rsquo;m excited to share with you the progress we&amp;rsquo;ve made this summer (and future plans for Polyglot!). You can view the repository and web app here: &lt;a href="https://polyphyhub.github.io/PolyGlot/" target="_blank" rel="noopener">https://polyphyhub.github.io/PolyGlot/&lt;/a>. As a quick reminder of the project, we sought to extend the Polyglot web app, as developed by Hongwei (Henry) Zhou. For context, the web app follows this methodology:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Given a set of words, use an embedding model (such as Word2Vec, BERT, etc.) to generate a set of high dimensional points associated with each word.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Use a dimensionality reduction method (such as UMAP) to reduce the dimensionality of each word-vector point to 3 dimensions&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Use the novel MCPM (Monte Carlo Physarum Machine) to compute the similarities between a set of anchor points and the rest of the point cloud. You could use any similarity metric here, too, such as the Euclidean distance.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The web app then displays the point cloud of 3-dimensional embeddings, but uses coloring to indicate the level of MCPM similarity each word has with the anchor point (e.g, if the anchor point is the word “dog”, the rest of the point cloud is colored such that words identified as similar to “dog” by the MCPM metric are brighter, whereas dissimilar words are darker.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>The main results since the last blog are summarized as follows:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Novel timeline feature in which users can track the importance of certain words over time by watching the change in size of points (computes the IF-IDF metric for a word across all documents in a given year). Uses linear interpolation for years which do not have an explicit importance score.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>An industrial collaboration with UK startup Lautonomy, where we have pre-processed and entered their data into Polyglot. Pre-processing consisted of first computing a high dimensional embedding of their set of words using OpenAI&amp;rsquo;s CLIP model &lt;a href="https://openai.com/research/clip" target="_blank" rel="noopener">https://openai.com/research/clip&lt;/a> and the CLIP-as-service Python package &lt;a href="https://clip-as-service.jina.ai" target="_blank" rel="noopener">https://clip-as-service.jina.ai&lt;/a>. Next, we used UMAP to reduce the dimensionality of these embeddings to 3D. We computed the Euclidean distance on this data (in place of MCPM metric). Finally, we formatted the data to enter into Polyglot.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>Although the app has developed a lot over the summer, we are planning to continue working on Polyglot, particularly with respect to one of our original goals: to set up a pipeline from PolyPhy to Polyglot. Unfortunately, with PolyPhy undergoing refactoring this summer, we weren&amp;rsquo;t able to set this pipeline up. However, that is one of our goals for the next few months. We are also moving forward with the industrial collaboration with legal analytics startup Lautonomy. We hope to release an output together soon!&lt;/p>
&lt;p>If you&amp;rsquo;re curious about Polyglot or are interesting in getting involved, please feel free to reach out to myself, Oskar Elek, and Jasmine Otto!&lt;/p></description></item><item><title>Midpoint Blog Interactive Exploration of High-dimensional Datasets with PolyPhy and Polyglot</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/polyphy/20230803-kirandeol/</link><pubDate>Thu, 03 Aug 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/polyphy/20230803-kirandeol/</guid><description>&lt;p>The last few months of my GSoC project have been very exciting and I hope to share why with you here in this blog post! To briefly summarize, my project has been focused on further developing the Polyglot app, a tool for visualizing 3D language embeddings. One important part of Polyglot is its utilization of the novel MCPM metric, where points are colored according to their MCPM similarity to a user-chosen “anchor point” (e.g., if “hat” is our anchor point, then similar words like “cap” or “fedora” will be colored more prominently).&lt;/p>
&lt;p>The first issue we wanted to tackle was actually navigating the point cloud. With hundreds of thousands of points, it can be difficult to find what you’re looking for! Thus, the first few features added were a search bar for points and anchor points and a “jump to point” feature which changes a user’s center of rotation and “jumps” to a chosen point. There were a few hiccups with implementing these features, mainly due to the large number of points and the particular quirks of the graphics library Polyglot uses. In the end though, these simple features made it feel a lot easier to use Polyglot.&lt;/p>
&lt;p>The next set of features related to our desire to actually annotate the point cloud. Similar to how one might annotate a Google doc (ie., highlight a chunk of text and leave a comment), we wanted to set up something similar, but with points! Indeed, this led to the development of a cool brush tool for coloring points, named and commented annotations (up to 5), a search bar within annotations, and finally a button to export annotations and comments to a CSV.&lt;/p>
&lt;p>The next few weeks are looking bright as we strive to finish the PolyPhy-Polyglot pipeline (a notebook for quickly formatting MCPM data from PolyPhy and getting it into Polyglot). We also hope to add a unique “timeline” feature in which users can analyze sections of the point cloud based on the associated time of each point. Overall, it’s been a very stimulating summer and I’m excited to push this project even further!&lt;/p></description></item><item><title>Interactive Exploration of High-dimensional Datasets with PolyPhy and Polyglot</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/polyphy/20230616-kirandeol/</link><pubDate>Fri, 16 Jun 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/polyphy/20230616-kirandeol/</guid><description>&lt;p>Hello! My name is Kiran and this summer I&amp;rsquo;ll be working with &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/polyphy">Polyphy&lt;/a> and &lt;a href="https://normand-1024.github.io/Bio-inspired-Exploration-of-Language-Embedding/" target="_blank" rel="noopener">Polyglot&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/oskar-elek/">Oskar Elek&lt;/a>.
The full &lt;a href="https://drive.google.com/file/d/1iwKU938uzUHn0oY2tM0jPADOYoF0kqbh/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> is available online.&lt;/p>
&lt;p>For a brief overview, the Polyglot app allows users to interact with a 3D network of high-dimensional language embeddings, specfically the
&lt;a href="http://vectors.nlpl.eu/repository/" target="_blank" rel="noopener">Gensim Continuous Skipgram result of Wikipedia Dump of February 2017 (296630 words)&lt;/a> dataset. The high-dimensional
embeddings are reduced to 3 dimensions using UMAP. The novel &lt;a href="https://iopscience.iop.org/article/10.3847/2041-8213/ab700c/pdf" target="_blank" rel="noopener">MCPM slime mode metric&lt;/a> is then used
to compute the similarty levels between points (much like how you might compute the Euclidean distance between two points). These similarity levels are used
to filter the network and enable users to find interesting patterns in their data they might not find using quantitative methods alone. For example, the network has
a distinct branch in which only years are nearby! Users might find other clusters, such as ones with sports words or even software engineering words.
Although such exploration may not lead to quantitatively significant conclusions, the ability to explore and test mini hypotheses about the data can lead to
important insights that go on to incite quantitatively significant conclusions.&lt;/p>
&lt;p>In our project, we aim to expand Polyglot such that any user can upload their own data, once they have computed the MCPM metric using PolyPhy. This will have
important applications in building trust in our data and embeddings. This could also help with research on the MCPM metric, which presents a new, more naturalistic
way of computing similarity by relying on the principle of least effort. Overall, there is an exciting summer ahead and if you&amp;rsquo;re interested in keeping up please
feel free to check out the Polyglot app on Github!&lt;/p></description></item></channel></rss>