<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>gsco24 | UCSC OSPO</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/tag/gsco24/</link><atom:link href="https://deploy-preview-1007--ucsc-ospo.netlify.app/tag/gsco24/index.xml" rel="self" type="application/rss+xml"/><description>gsco24</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><lastBuildDate>Mon, 12 Aug 2024 00:00:00 +0000</lastBuildDate><image><url>https://deploy-preview-1007--ucsc-ospo.netlify.app/media/logo_hub6795c39d7c5d58c9535d13299c9651f_74810_300x300_fit_lanczos_3.png</url><title>gsco24</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/tag/gsco24/</link></image><item><title>Midterm Report : Halfway through medicinal data visulaization using PolyPhy/Polyglot</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/polyphy/20240719-ayushsharma/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/polyphy/20240719-ayushsharma/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Hello! My name is &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ayush-sharma/">Ayush Sharma&lt;/a>, a machine learning engineer and researcher based out of Chandigarh, a beautiful city in Northern India known for its modern architecture and green spaces.
For the last month and a half I have been working closely with my mentors &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/oskar-elek/">Oskar Elek&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/kiran-deol/">Kiran Deol&lt;/a> on the project titled &lt;a href="%5cproject%5cosre24%5cucsc%5cpolyphy">Unveiling Medicine Patterns: 3D Clustering with Polyphy/Polyglot&lt;/a>as part of GSoC 2024.&lt;/p>
&lt;h2 id="progress-and-challenges">Progress and Challenges&lt;/h2>
&lt;p>The project focuses on developing effective clustering algorithms to visualize medicine data in three dimensions using PolyPhy and Polyglot. My journey began with data preprocessing and cleaning, where unnecessary data points were removed, and missing values were addressed.&lt;/p>
&lt;p>One of the primary techniques we&amp;rsquo;ve employed is UMAP (Uniform Manifold Approximation and Projection). UMAP&amp;rsquo;s ability to preserve the global structure of the data while providing meaningful clusters proved advantageous. Initial experiments with UMAP on datasets of various sizes (ranging from 1,500 to 15,000 medicines) provided valuable insights into the clustering patterns. By iteratively halving the dimensions and refining the parameters, we achieved more accurate clustering results.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="UMAP on a dataset of 15000 medicines" srcset="
/report/osre24/ucsc/polyphy/20240719-ayushsharma/umap_hua68b3da7cb5e27475c0ecf687ad0d87a_123755_48eb545fa0673e23a0ff289b6fdac6cd.webp 400w,
/report/osre24/ucsc/polyphy/20240719-ayushsharma/umap_hua68b3da7cb5e27475c0ecf687ad0d87a_123755_12b5cf998e90e476fdd4e6c9800cc63e.webp 760w,
/report/osre24/ucsc/polyphy/20240719-ayushsharma/umap_hua68b3da7cb5e27475c0ecf687ad0d87a_123755_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/polyphy/20240719-ayushsharma/umap_hua68b3da7cb5e27475c0ecf687ad0d87a_123755_48eb545fa0673e23a0ff289b6fdac6cd.webp"
width="679"
height="603"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>To complement UMAP, we explored t-SNE (t-distributed Stochastic Neighbor Embedding). t-SNE&amp;rsquo;s focus on local relationships helped in understanding finer details within the clusters. By adjusting t-SNE parameters and conducting perturbations, we could better comprehend the data&amp;rsquo;s behavior. Combining UMAP with t-SNE in a loop, halving dimensions iteratively, showed promise, allowing us to leverage the strengths of both techniques to enhance clustering accuracy.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="t-SNE on a dataset of 15000 medicines" srcset="
/report/osre24/ucsc/polyphy/20240719-ayushsharma/t-SNE_hu27c25081a80397a68d5439e1a165b2a0_67619_505feb5f73fb8656ef98cfa71acfb53b.webp 400w,
/report/osre24/ucsc/polyphy/20240719-ayushsharma/t-SNE_hu27c25081a80397a68d5439e1a165b2a0_67619_fc473d7fb06ab1b2e2bafbb3b86db867.webp 760w,
/report/osre24/ucsc/polyphy/20240719-ayushsharma/t-SNE_hu27c25081a80397a68d5439e1a165b2a0_67619_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/polyphy/20240719-ayushsharma/t-SNE_hu27c25081a80397a68d5439e1a165b2a0_67619_505feb5f73fb8656ef98cfa71acfb53b.webp"
width="760"
height="527"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>We also experimented with pre-trained models like BERT and Glove to create embeddings for the medicines. BERT’s splitting of salts into subparts and Glove’s limitations in recognizing specific salts led us to inaccurate clustering and we&amp;rsquo;ve been working on improving it for the time being.&lt;/p>
&lt;h2 id="next-steps">Next Steps&lt;/h2>
&lt;p>Moving forward, I will focus on refining our clustering and embedding techniques to enhance overall accuracy. This involves integrating Jaccard distance alongside other distance measures to improve similarity assessments between medicines and clusters. Additionally, I&amp;rsquo;ll continue experimenting with advanced models like gpt,CLIP, gemini etc., for better embeddings while addressing the limitations of BERT and Glove by leveraging custom embeddings created with transformers and one-hot encoding. Optimization of UMAP and t-SNE algorithms will also be crucial, ensuring their effectiveness in clustering and visualization. These steps aim to overcome current challenges and further advance the project&amp;rsquo;s goals.&lt;/p></description></item><item><title>Unveiling Medicine Patterns: 3D Clustering with Polyphy/Polyglot</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/polyphy/20240619-ayushsharma/</link><pubDate>Wed, 19 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/polyphy/20240619-ayushsharma/</guid><description>&lt;p>Hello! My name is Ayush and this summer I&amp;rsquo;ll be contributing to &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre24/ucsc/polyphy/">Polyphy&lt;/a> and &lt;a href="https://normand-1024.github.io/Bio-inspired-Exploration-of-Language-Embedding/" target="_blank" rel="noopener">Polyglot&lt;/a>, a GPU oriented agent-based system for reconstructing and visualizing optimal transport networks defined over sparse data. under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/oskar-elek/">Oskar Elek&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/kiran-deol/">Kiran Deol&lt;/a>.&lt;/p>
&lt;p>For the reference here&amp;rsquo;s my &lt;a href="https://summerofcode.withgoogle.com/media/user/7a1cc1c971c5/proposal/gAAAAABmV3hljjurQ8HAS8PRRRZB2_c5vQ3clWisqad85y-gO7rNvpssnzqGlFeiYQkAb5qY5WDUoRKkxUoTHLLDXLwBvrAjSsRs1qNTYmMrFfsbs1aQrjo=.pdf" target="_blank" rel="noopener">proposal&lt;/a> for this project.&lt;/p>
&lt;p>Polyglot offers an immersive 3D visualization experience, enabling users to zoom, rotate, and delve into complex datasets.
My project aims to harness these capabilities to unlock hidden connections in the realm of medicine, specifically focusing on the relationships between drugs based on their shared salt compositions, rather than just their active ingredients. This approach promises to reveal intricate patterns and relationships that have the potential to revolutionize drug discovery, pharmacology, and personalized medicine.&lt;/p>
&lt;p>In this project, I will create custom embeddings for a vast dataset of over 600,000 medicines, capturing the relationships between their salt compositions. By visualizing these embeddings in Polyglot&amp;rsquo;s 3D space, researchers can identify previously unknown connections between medicines, leading to new insights and breakthroughs. The dynamic and interactive nature of Polyglot will empower researchers to explore these complex relationships in a very efficient and cool way, potentially accelerating the discovery of new drug interactions and therapeutic applications.&lt;/p>
&lt;p>I am really excited to work on this project. Keep following the blogs for further updates!.&lt;/p></description></item></channel></rss>