<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>zeyuzou | UCSC OSPO</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/zeyuzou/</link><atom:link href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/zeyuzou/index.xml" rel="self" type="application/rss+xml"/><description>zeyuzou</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><lastBuildDate>Thu, 17 Jul 2025 00:00:00 +0000</lastBuildDate><image><url>https://deploy-preview-1007--ucsc-ospo.netlify.app/media/logo_hub6795c39d7c5d58c9535d13299c9651f_74810_300x300_fit_lanczos_3.png</url><title>zeyuzou</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/zeyuzou/</link></image><item><title>Halfway Through GSoC: My Experience and Progress</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uci/rag-st/07172025-zeyu/</link><pubDate>Thu, 17 Jul 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre25/uci/rag-st/07172025-zeyu/</guid><description>&lt;p>As part of the &lt;a href="https://ucsc-ospo.github.io/project/osre25/uci/rag-st/" target="_blank" rel="noopener">RAG-ST&lt;/a> project, my &lt;a href="https://drive.google.com/file/d/1_yUf1NlVRpBXERCqnOby7pgP4WrWrZsr/view" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;strong>Ziheng Duan&lt;/strong> aims to build a &lt;strong>retrieval-augmented generation&lt;/strong> framework to predict spatial gene expression from histology images.&lt;/p>
&lt;hr>
&lt;h2 id="-achievements">🚀 Achievements&lt;/h2>
&lt;h3 id="-ran-the-hest-1k-pipeline">✅ Ran the HEST-1K Pipeline&lt;/h3>
&lt;p>I successfully ran gene expression prediction models on the &lt;strong>HEST-1K&lt;/strong> dataset, reproducing baseline image-to-expression workflows and setting up data loaders, evaluation metrics, and visual inspection of outputs.&lt;/p>
&lt;h3 id="-explored-tangrams-alignment-code">✅ Explored Tangram’s Alignment Code&lt;/h3>
&lt;p>I studied and ran &lt;strong>Tangram&lt;/strong>, a well-known scRNA-seq to ST alignment method, gaining key insights into cross-modality mapping. These ideas will inform our strategy to align histology images to scRNA-seq data.&lt;/p>
&lt;h3 id="-designed-the-rag-st-architecture">✅ Designed the RAG-ST Architecture&lt;/h3>
&lt;p>I drafted the architecture for the RAG-ST pipeline, including:&lt;/p>
&lt;ul>
&lt;li>Vision encoder to process image patches.&lt;/li>
&lt;li>Retrieval module to find relevant examples from a curated database.&lt;/li>
&lt;li>Generation head that conditions predictions on the retrieved examples — allowing transparency and context-aware outputs.&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="-challenges">🧠 Challenges&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Data Alignment&lt;/strong>: Spatial transcriptomics datasets often lack perfect alignment between histology, gene expression, and scRNA-seq, requiring custom preprocessing and normalization.&lt;/li>
&lt;li>&lt;strong>Trade-off Between Interpretability and Accuracy&lt;/strong>: Retrieval-augmented designs allow us to trace the origin of predictions but require care to avoid overfitting or performance drops.&lt;/li>
&lt;li>&lt;strong>Computation&lt;/strong>: High-resolution images and large-scale retrieval can be computationally expensive. I’ve begun exploring downsampling and vector database indexing strategies.&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="-whats-next">🔜 What&amp;rsquo;s Next&lt;/h2>
&lt;ul>
&lt;li>🔧 Build the &lt;strong>end-to-end retrieval-generation pipeline&lt;/strong>&lt;/li>
&lt;li>🧬 Prototype &lt;strong>histology-to-scRNA-seq&lt;/strong> alignment using adapted Tangram ideas&lt;/li>
&lt;li>📊 Benchmark &lt;strong>RAG-ST vs. MLP baselines&lt;/strong>&lt;/li>
&lt;li>👁️ Develop &lt;strong>interpretability visualizations&lt;/strong> to show which samples were retrieved for each prediction&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="-deliverables-progress">🧾 Deliverables Progress&lt;/h2>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Deliverable&lt;/th>
&lt;th>Status&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>HEST-1K Baseline Pipeline&lt;/td>
&lt;td>✅ Completed&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Tangram Exploration&lt;/td>
&lt;td>✅ Completed&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Data Curation&lt;/td>
&lt;td>🟡 In Progress&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>RAG-ST Architecture&lt;/td>
&lt;td>✅ Drafted&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Full Pipeline&lt;/td>
&lt;td>⏳ Planned&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Evaluation &amp;amp; Comparison&lt;/td>
&lt;td>⏳ Planned&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;hr>
&lt;h2 id="-closing-thoughts">🙌 Closing Thoughts&lt;/h2>
&lt;p>It&amp;rsquo;s been a rewarding first half of GSoC. I’ve gained hands-on experience with spatial transcriptomics datasets, explored state-of-the-art tools like Tangram, and laid the groundwork for a new interpretable gene prediction model.&lt;/p>
&lt;p>I’m excited to continue building RAG-ST and look forward to sharing more results soon. Huge thanks to my mentor &lt;strong>Ziheng Duan&lt;/strong> for the guidance and support throughout!&lt;/p>
&lt;p>If you have questions or want to discuss spatial modeling, feel free to reach out.&lt;/p></description></item></channel></rss>