<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>data management | UCSC OSPO</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/tag/data-management/</link><atom:link href="https://deploy-preview-1007--ucsc-ospo.netlify.app/tag/data-management/index.xml" rel="self" type="application/rss+xml"/><description>data management</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><lastBuildDate>Mon, 05 Jun 2023 00:00:00 +0000</lastBuildDate><image><url>https://deploy-preview-1007--ucsc-ospo.netlify.app/media/logo_hub6795c39d7c5d58c9535d13299c9651f_74810_300x300_fit_lanczos_3.png</url><title>data management</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/tag/data-management/</link></image><item><title>Optimizing FasTensor: Enabling Efficient Tensor Execution on GPUs</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/lbl/fastensor/20230605-ris0801/</link><pubDate>Mon, 05 Jun 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/lbl/fastensor/20230605-ris0801/</guid><description>&lt;p>Greetings,&lt;/p>
&lt;p>I am Rishabh Singh, and I am excited to be part of the 2023 Google Summer of code program. My &lt;a href="https://docs.google.com/document/d/14DRkbF1S0VnPcopd37Io0pgKVQ1bDSN3QMf3Os6JyBA/edit?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/john-wu/">John Wu&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/bin-dong/">Bin Dong&lt;/a> focuses on optimizing the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/lbl/fastensor">FasTensor&lt;/a> tensor computing library for efficient usage on GPUs, specifically targeting tensor contraction while preserving structure-locality. This optimization is crucial for scientific applications and advanced AI model training. Throughout the project, I will develop custom computational operations for GPUs, implement FasTensor on GPUs, assess its performance, and provide comprehensive documentation. By the end, I aim to deliver a working implementation, a performance report, and a detailed execution mechanism guide. Leveraging my background in software engineering and machine learning, I will utilize languages like C++ and OpenMP to ensure efficient memory management and data movement. Stay tuned for regular updates and informative blogs as I progress through the summer.&lt;/p></description></item><item><title>Proactive Data Containers (PDC)</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/lbl/pdc/</link><pubDate>Sun, 12 Feb 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/lbl/pdc/</guid><description>&lt;p>&lt;a href="https://sdm.lbl.gov/pdc/about.html" target="_blank" rel="noopener">Proactive Data Containers&lt;/a> (PDC) are containers within a locus of storage (memory, NVRAM, disk, etc.) that store science data in an object-oriented manner. Managing data as objects enables powerful optimization opportunities for data movement and transformations, and storage mechanisms that take advantage of the deep storage hierarchy and enable automated performance tuning.&lt;/p>
&lt;h3 id="command-line-and-python-interface-to-an-object-centric-data-management-system">Command line and python interface to an object-centric data management system&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Python&lt;/code>, &lt;code>object-centric data management&lt;/code>, &lt;code>PDC&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Linux, C, Python&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/houjun-tang/">Houjun Tang&lt;/a>, &lt;a href="mailto:sbyna@lbl.gov">Suren Byna&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;a href="https://github.com/hpc-io/pdc" target="_blank" rel="noopener">Proactive Data Containers (PDC)&lt;/a> is an object-centric data management system for scientific data on high performance computing systems. It manages objects and their associated metadata within a locus of storage (memory, NVRAM, disk, etc.). Managing data as objects enables powerful optimization opportunities for data movement and transformations, and storage mechanisms that take advantage of the deep storage hierarchy and enable automated performance tuning. This project includes developing and updating efficient and user friendly command line and Python interfaces for PDC.&lt;/p></description></item><item><title>FasTensor</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/lbl/fastensor/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/lbl/fastensor/</guid><description>&lt;p>&lt;a href="https://sdm.lbl.gov/fastensor/" target="_blank" rel="noopener">FasTensor&lt;/a> is a parallel execution engine for user-defined functions on multidimensional arrays. The user-defined functions follow the stencil metaphor used for scientific computing and is effective for expressing a wide range of computations for data analyses, including common aggregation operations from database management systems and advanced machine learning pipelines. FasTensor execution engine exploits the structural-locality in the multidimensional arrays to automate data management operations such as file I/O, data partitioning, communication, parallel execution, and so on.&lt;/p>
&lt;h3 id="continuous-integration">Continuous Integration&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Data Management&lt;/code>, &lt;code>Analytics&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++, github&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="mailto:kwu@lbl.gov">John Wu&lt;/a>, &lt;a href="mailto:dbin@lbl.gov">Bin Dong&lt;/a>, &lt;a href="mailto:sbyna@lbl.gov">Suren Byna&lt;/a>&lt;/li>
&lt;/ul>
&lt;ul>
&lt;li>Develop a test suite for the public API of FasTensor&lt;/li>
&lt;li>Automate execution of the test suite&lt;/li>
&lt;li>Document the continuous integration process&lt;/li>
&lt;li>Develop performance testing suite&lt;/li>
&lt;/ul></description></item><item><title>FasTensor</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/lbl/fastensor/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/lbl/fastensor/</guid><description>&lt;p>&lt;a href="https://sdm.lbl.gov/fastensor/" target="_blank" rel="noopener">FasTensor&lt;/a> is a parallel execution engine for user-defined functions on multidimensional arrays. The user-defined functions follow the stencil metaphor used for scientific computing and is effective for expressing a wide range of computations for data analyses, including common aggregation operations from database management systems and advanced machine learning pipelines. FasTensor execution engine exploits the structural-locality in the multidimensional arrays to automate data management operations such as file I/O, data partitioning, communication, parallel execution, and so on.&lt;/p>
&lt;h3 id="tensor-execution-engine-on-gpu">Tensor execution engine on GPU&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Data Management&lt;/code>, &lt;code>Analytics&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++, github&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Difficult&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/john-wu/">John Wu&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/bin-dong/">Bin Dong&lt;/a>, &lt;a href="mailto:sbyna@lbl.gov">Suren Byna&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Tensor based computing is needed by scientific applications and now advanced AI model training. Most tensor libraries are hand customized and optimized on GPU, and most of they only serve one kind of application. For example, TensorFlow is only optimized for AI model training. Optimizing generic tensor computing libraries on GPU can benefit wide applications. Our FasTensor, as a generic tensor computing library, can only work efficiently on CPU now. How to run the FasTensor on GPU is still none-explored work. Research and development challenges will include but not limited to: 1) how to maintain structure-locality of tensor data on GPU; 2) how to reduce the performance loss when the structure-locality of tensor is broken on GPU.&lt;/p>
&lt;ul>
&lt;li>Develop a mechanism to move user-define computing kernels onto GPU&lt;/li>
&lt;li>Evaluate the performance of the execution engine&lt;/li>
&lt;li>Document the execution mechanism&lt;/li>
&lt;li>Develop performance testing suite&lt;/li>
&lt;/ul>
&lt;h3 id="continuous-integration">Continuous Integration&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Data Management&lt;/code>, &lt;code>Analytics&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++, github&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (300 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/john-wu/">John Wu&lt;/a>, &lt;a href="mailto:dbin@lbl.gov">Bin Dong&lt;/a>, &lt;a href="mailto:sbyna@lbl.gov">Suren Byna&lt;/a>&lt;/li>
&lt;/ul>
&lt;ul>
&lt;li>Develop a test suite for the public API of FasTensor&lt;/li>
&lt;li>Automate execution of the test suite&lt;/li>
&lt;li>Document the continuous integration process&lt;/li>
&lt;/ul></description></item><item><title>Proactive Data Containers (PDC)</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/lbl/pdc/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/lbl/pdc/</guid><description>&lt;p>&lt;a href="https://sdm.lbl.gov/pdc/about.html" target="_blank" rel="noopener">Proactive Data Containers&lt;/a> (PDC) are containers within a locus of storage (memory, NVRAM, disk, etc.) that store science data in an object-oriented manner. Managing data as objects enables powerful optimization opportunities for data movement and
transformations, and storage mechanisms that take advantage of the deep storage hierarchy and enable automated performance tuning&lt;/p>
&lt;h3 id="python-interface-to-an-object-centric-data-management-system">Python interface to an object-centric data management system&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Python&lt;/code>, &lt;code>object-centric data management&lt;/code>, &lt;code>PDC&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Python, C, PDC&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="mailto:sbyna@lbl.gov">Suren Byna&lt;/a>, &lt;a href="mailto:htang4@lbl.gov">Houjun Tang&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;a href="https://sdm.lbl.gov/pdc/about.html" target="_blank" rel="noopener">Proactive Data Containers (PDC)&lt;/a> is an object-centric data management system for scientific data on high performance computing systems. It manages objects and their associated metadata within a locus of storage (memory, NVRAM, disk, etc.). Managing data as objects enables powerful optimization opportunities for data movement and transformations, and storage mechanisms that take advantage of the deep storage hierarchy and enable automated performance tuning. Currently PDC has a C interface. Providing a python interface would make it easier for more Python applications to utilize it.&lt;/p></description></item></channel></rss>