<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>osre22 | UCSC OSPO</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/tag/osre22/</link><atom:link href="https://deploy-preview-1007--ucsc-ospo.netlify.app/tag/osre22/index.xml" rel="self" type="application/rss+xml"/><description>osre22</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><lastBuildDate>Mon, 07 Nov 2022 10:15:56 -0700</lastBuildDate><image><url>https://deploy-preview-1007--ucsc-ospo.netlify.app/media/logo_hub6795c39d7c5d58c9535d13299c9651f_74810_300x300_fit_lanczos_3.png</url><title>osre22</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/tag/osre22/</link></image><item><title>Adaptive Load Balancers for Low-latency Multi-hop Networks</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/adaptiveload/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/adaptiveload/</guid><description>&lt;p>This project aims at designing efficient, adaptive link level load balancers for networks that handle different kinds of traffic, in particular networks where flows are heterogeneous in terms of their round trip times. Geo distributed data centers are one such example. With the large-scale deployments of 5G in the near future, there will be even more applications, including more bulk transfers of videos and photos, augmented reality applications and virtual reality applications which take advantage of 5G’s low latency service. With the development and discussion about Web3.0 and Metaverse, the network workloads across data centers are only going to get more varied and challenging. All these add to heavy, bulk of data being sent to the data centers and over the backbone network. These traffic have varying quality of service requirements, like low latency, high throughput and high definition video streaming. Wide area network (WAN) flows are typically data heavy tasks that consist of backup data taken for a particular data center. The interaction of the data center and WAN traffic creates a very interesting scenario with its own challenges to be addressed. WAN and data center traffic are characterized by differences in the link utilizations and round trip times. Based on readings and literature review, there seems to be very little work on load balancers that address the interaction of data center and WAN traffic. This in turn motivates the need for designing load balancers that take into account both WAN and data center traffic in order to create high performance for more realistic scenarios. This work proposes a load balancer that is adaptive to the kind of traffic it encounters by learning from the network conditions and then predicting the optimal route for a given flow.&lt;/p>
&lt;p>Through this research we seek to contribute the following :&lt;/p>
&lt;ul>
&lt;li>Designing a load balancer, that is adaptive to datacenter and WAN traffic, and in general can be adapted to varied traffic conditions&lt;/li>
&lt;li>Real time learning of the network setup and predicting optimal paths&lt;/li>
&lt;li>Low latency, high throughput and increased network utilization deliverables&lt;/li>
&lt;/ul>
&lt;h3 id="adaptive-dynamic-load-balancing-for-data-center-and-wan-traffic">Adaptive, Dynamic Load Balancing for data center and WAN traffic&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &amp;lsquo;data center networking&amp;rsquo;, TCP/IP stack&amp;rsquo;, &amp;lsquo;congestion control&amp;rsquo;, &amp;rsquo;load balancing&amp;rsquo;&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> C++, python, linux ; experience with network simulators would be helpful&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> moderate/ challenging&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:katia@soe.ucsc.edu"> Katia Obraczka&lt;/a>,&lt;a href="mailto:akabbani@gmail.com">Abdul Kabbani&lt;/a>, &lt;a href="mailto:lakrishn@ucsc.edu">Lakshmi Krishnaswamy&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Understanding the OMNeT++ network simulator and creating simple networks and data center topologies to understand the simulation environment.&lt;/li>
&lt;li>Implementing existing load balancers on OMNeT++ and exploring the effect of different features of the load balancers with data center traffic and WAN traffic.&lt;/li>
&lt;li>Finding and testing out WAN specific traffic that may exist, like video streaming traffic, large database queries etc.&lt;/li>
&lt;li>Working with the mentors on developing a learning-based load balancer framework that learns from past sample traffic, network conditions, to adapt dynamically to current network conditions.&lt;/li>
&lt;/ul></description></item><item><title>Apache AsterixDB</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucr/asterixdb/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucr/asterixdb/</guid><description>&lt;p>&lt;a href="http://asterixdb.apache.org/" target="_blank" rel="noopener">AsterixDB&lt;/a> is an open source parallel big-data management system. AsterixDB is a well-established Apache project that has beedddn active in research for more than 10 years. It provides a flexible data model that supports modern NoSQL applications with a powerful query processor that can scale to billions of records and terabytes of data. Users can interact with AsterixDB through a power and easy to use declarative query language, SQL++, which provides a rich set of data types including timestamps, time intervals, text, and geospatial, in addition to traditional numerical and Boolean data types.&lt;/p>
&lt;h3 id="geospatial-data-science-on-asterixdb">Geospatial Data Science on AsterixDB&lt;/h3>
&lt;ul>
&lt;li>&lt;em>Topics&lt;/em>: Data science, SQL++, documentation&lt;/li>
&lt;li>&lt;em>Skills&lt;/em>: SQL, Writing, Spreadsheets&lt;/li>
&lt;li>&lt;em>Difficulty&lt;/em>: Medium&lt;/li>
&lt;li>&lt;em>Size&lt;/em>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;em>Mentors&lt;/em>: &lt;a href="mailto:eldawy@ucr.edu">Ahmed Eldawy&lt;/a>, &lt;a href="mailto:asevi006@ucr.edu">Akil Sevim&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Build a data science project using AsterixDB that analyzes geospatial data among other dimensions. Use &lt;a href="https://star.cs.ucr.edu/?Chicago%20Crimes#center=41.8313,-87.6830&amp;amp;zoom=11" target="_blank" rel="noopener">Chicago Crimes&lt;/a> as the main dataset and combine with other datasets including &lt;a href="https://star.cs.ucr.edu/?osm21/pois#center=41.8313,-87.6830&amp;amp;zoom=11" target="_blank" rel="noopener">points of interests&lt;/a> &lt;a href="https://star.cs.ucr.edu/?TIGER2018/ZCTA5#center=41.8313,-87.6830&amp;amp;zoom=11" target="_blank" rel="noopener">ZIP Code boundaries&lt;/a>. During this project, we will answer interesting questions about the data and visualize the results such as:&lt;/p>
&lt;ul>
&lt;li>What is the most common crime type in a specific date or over the weekends?&lt;/li>
&lt;li>Where do most of the arrests happen?&lt;/li>
&lt;li>How are the crime rates change over time for different regions?&lt;/li>
&lt;/ul>
&lt;h4 id="the-goals-of-this-project-are">The goals of this project are:&lt;/h4>
&lt;ul>
&lt;li>Understand how to build a scalable data science project using AsterixDB.&lt;/li>
&lt;li>Translate common questions to SQL queries and run them on large data.&lt;/li>
&lt;li>Learn how to visualize the results of queries and present them.&lt;/li>
&lt;li>Write detailed documentation about the process of building a data science application in AsterixDB.&lt;/li>
&lt;li>Improve the documentation of AsterixDB while working in the project to improve the experience for future users.&lt;/li>
&lt;/ul>
&lt;h4 id="machine-learning-integration">Machine Learning Integration&lt;/h4>
&lt;p>As a bonus task, and depending on the progress of the project, we can explore the integration of machine learning with AsterixDB through Python UDFs. We will utilize the AsterixDB Python integration through &lt;a href="https://asterixdb.apache.org/docs/0.9.7/udf.html" target="_blank" rel="noopener">user-defined functions&lt;/a> to connect AsterixDB backend with &lt;a href="https://scikit-learn.org/stable/index.html" target="_blank" rel="noopener">scikit-learn&lt;/a> to build some unsupervised and supervised models for the data. For example, we can cluster the crimes based on their location and other attributes to find interesting patterns or hotspots.&lt;/p></description></item><item><title>CephFS</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/cephfs/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/cephfs/</guid><description>&lt;p>&lt;a href="https://docs.ceph.com/en/latest/cephfs/" target="_blank" rel="noopener">CephFS&lt;/a> is a distributed file system on top of &lt;a href="https://ceph.io" target="_blank" rel="noopener">Ceph&lt;/a>. It is implemented as a distributed metadata service (MDS) that uses dynamic subtree balancing to trade parallelism for locality during a continually changing workloads. Clients that mount a CephFS file system connect to the MDS and acquire capabilities as they traverse the file namespace. Capabilities not only convey metadata but can also implement strong consistency semantics by granting and revoking the ability of clients to cache data locally.&lt;/p>
&lt;h3 id="cephfs-namespace-traversal-offloading">CephFS namespace traversal offloading&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Ceph&lt;/code>, &lt;code>filesystems&lt;/code>, &lt;code>metadata&lt;/code>, &lt;code>programmable storage&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++, Ceph / MDS&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="mailto:carlosm@ucsc.edu">Carlos Maltzahn&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The frequency of metadata service (MDS) requests relative to the amount of data accessed can severely affect the performance of distributed file systems like CephFS, especially for workloads that randomly access a large number of small files as is commonly the case for machine learning workloads: they purposefully randomize access for training and evaluation to prevent overfitting. The datasets of these workloads are read-only and therefore do not require strong coherence mechanisms that metadata services provide by default.&lt;/p>
&lt;p>The key idea of this project is to reduce the frequency of MDS requests by offloading namespace traversal, i.e. the need to open a directory, list its entries, open each subdirectory, etc. Each of these operations usually require a separate MDS request. Offloading namespace traversal refers to a client’s ability to request the metadata (and associated read-only capabilities) of an entire subtree with one request, thereby offloading the traversal work for tree discovery to the MDS.&lt;/p>
&lt;p>Once the basic functionality is implemented, this project can be expanded to address optimization opportunities, e.g. describing regular tree structures as a closed form expression in the tree’s root, shortcutting tree discovery.&lt;/p></description></item><item><title>DirtViz (2022)</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/dirtviz/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/dirtviz/</guid><description>&lt;p>DirtViz is a project to visualize data collected from
sensors deployed in sensor networks. We have deployed a number of
sensors measuring qualities like soil moisture, temperature, current
and voltage in outdoor settings. This project involves extending (or
replacing) our existing plotting scripts to create a fully-feledged
dataviz tool tailored to the types of data collected from embedded
systems sensor networks.&lt;/p>
&lt;h3 id="visualize-sensor-data">Visualize Sensor Data&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Data Visualization&lt;/code>, &lt;code>Analytics&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: javascript, python, bash, webservers, git, embedded systems&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Easy/Moderate&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong> 175 hours&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/colleen-josephson/">Colleen Josephson&lt;/a>&lt;/li>
&lt;/ul>
&lt;ul>
&lt;li>Develop set of visualization tools (ideally web based) that easily allows users to zoom in on date ranges, change axes, etc.&lt;/li>
&lt;li>Document the tool thoroughly for future maintenance&lt;/li>
&lt;li>If interested, we are also interested in investigating correlations between different data streams&lt;/li>
&lt;/ul></description></item><item><title>Eusocial Storage Devices</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/eusocial/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/eusocial/</guid><description>&lt;p>As storage devices get faster, data management tasks rob the host of CPU cycles and main memory bandwidth. The &lt;a href="https://cross.ucsc.edu/projects/eusocialpage.html" target="_blank" rel="noopener">Eusocial project&lt;/a> aims to create a new interface to storage devices that can leverage existing and new CPU and main memory resources to take over data management tasks like availability, recovery, and migrations. The project refers to these storage devices as “eusocial” because we are inspired by eusocial insects like ants, termites, and bees, which as individuals are primitive but collectively accomplish amazing things.&lt;/p>
&lt;h3 id="dynamic-function-injection-for-rocksdb">Dynamic function injection for RocksDB&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Skills:&lt;/strong> C/C++, Java&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong> 175 or 350 hours&lt;/li>
&lt;li>&lt;strong>Mentor:&lt;/strong> &lt;a href="mailto:jliu120@ucsc.edu">Jianshen Liu&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Recent research reveals that the compaction process in RocksDB can be altered to optimize future data access by changing the data layout in compaction levels. The benefit of this approach can be extended to different data layout optimization based on application access patterns and requirements. In this project, we want to create an interface that would allow users to dynamically inject layout optimization functions to RockDB, using containerization technologies such as Webassembly.&lt;/p>
&lt;ul>
&lt;li>Reference: Saxena, Hemant, et al. &amp;ldquo;Real-Time LSM-Trees for HTAP Workloads.&amp;rdquo; arXiv preprint arXiv:2101.06801 (2021).&lt;/li>
&lt;/ul>
&lt;h3 id="demonstrating-a-composable-storage-system-accelerated-by-memory-semantic-technologies">Demonstrating a composable storage system accelerated by memory semantic technologies&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Skills:&lt;/strong> C/C++, Bash, Python, System architecture, Network fabrics&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong> 350 hours&lt;/li>
&lt;li>&lt;strong>Mentor:&lt;/strong> &lt;a href="mailto:jliu120@ucsc.edu">Jianshen Liu&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Since the last decade, the slowing down in the performance improvement of general-purpose processors is driving the system architecture to be increasingly heterogeneous. We have seen the kinds of domain-specific accelerator hardware (e.g., FPAG, SmartNIC, TPU, GPU) are growing to take over many different jobs from the general-purpose processors. On the other hand, the network and storage device performance have been tremendously improved with a trajectory much outweighed than that of processors. With this trend, a natural thought to continuously scale the storage system performance economically is to efficiently utilize and share different sources from different nodes over the network. There already exist different resource sharing protocols like CCIX, CXL, and GEN-Z. Among these GEN-Z is the most interesting because, unlike RDMA, it enables remote memory accessing without exposing details to applications (i.e., not application changes). Therefore, it would be interesting to see how/whether these technologies can help improve the performance of storage systems, and to what extent. This project would require building a demo system that uses some of these technologies (especially GEN-Z) and run selected applications/workloads to better understand the benefits.&lt;/p>
&lt;ul>
&lt;li>References: Gen-Z: An Open Memory Fabric for Future Data Processing Needs: &lt;a href="https://www.youtube.com/watch?v=JLb9nojNS8E" target="_blank" rel="noopener">https://www.youtube.com/watch?v=JLb9nojNS8E&lt;/a>, Pekon Gupta, SMART Modular; Gen-Z subsystem for Linux, &lt;a href="https://github.com/linux-genz" target="_blank" rel="noopener">https://github.com/linux-genz&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="when-will-rotational-media-users-abandon-sata-and-converge-to-nvme">When will Rotational Media Users abandon SATA and converge to NVMe?&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Skills:&lt;/strong> Entrepreneurial mind, interest in researching high technology markets&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> 350 hours&lt;/li>
&lt;li>&lt;strong>Mentor:&lt;/strong> &lt;a href="mailto:carlosm@ucsc.edu">Carlos Maltzahn&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Goal:&lt;/strong> Determine the benefits in particular market verticals such as genomics and health care to converge the storage stack in data center computer systems to the NVMe device interface, even when devices include rotational media (aka disk drives). The key question: “When do people abandon SATA and SAS and converge to NVMe?”&lt;/p>
&lt;p>&lt;strong>Background:&lt;/strong> NVMe is a widely used device interface for fast storage devices such as flash that behave much more like random access memory than the traditional rotational media. Rotational media is accessed mostly via SATA and SAS which has served the industry well for close to two decades. SATA in particular is much cheaper than NVMe. Now that NVMe is widely available and quickly advancing in functionality, an interesting question is whether there is a market for rotational media devices with NVMe interfaces, converging the storage stack to only one logical device interface, thereby enabling a common ecosystem and more efficient connectivity from multiple processes to storage devices.&lt;/p>
&lt;p>The NVMe 2.0 specification, which came out last year, has been restructured to support the increasingly diverse NVMe device environment (including rotational media). The extensibility of 2.0 encourages enhancements of independent command sets such as Zoned Namespaces (ZNS) and Key Value (NVMe-KV) while supporting transport protocols for NVMe over Fabrics (NVMe-oF). A lot of creative energy is now focused on advancing NVMe while SATA has not changed in 16 years. Having all storage devices connect the same way not only frees up space on motherboards but also enables new ways to manage drives, for example via NVMe-oF that allows drives to be networked without additional abstraction layers.&lt;/p>
&lt;p>&lt;strong>Suggested Project Structure:&lt;/strong> This is really just a suggestion for a starting point. As research progresses, a better structure might emerge.&lt;/p>
&lt;ol>
&lt;li>Convergence of software stack: seamless integration between rotational media and hot storage&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>Direct tiering: one unified interface to place data among fast and slow devices on the same NVMe fabric depending on whether the data is hot or cold.&lt;/li>
&lt;/ul>
&lt;ol start="2">
&lt;li>Computational storage:&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>What are the architectures of computational NVMe devices? For example, offloading compute to an FPGA vs an onboard processor in a disk drive?&lt;/li>
&lt;li>Do market verticals such as genomics and health care for one over the other? When do people abandon SATA and converge to NVMe?&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Project tasks:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Review current literature&lt;/li>
&lt;li>Survey what the industry is doing&lt;/li>
&lt;li>Join weekly meetings to discuss findings with Ph.D. students, experienced industry veterans, and faculty (Thursday’s 2-3pm, can be adjusted if necessary)&lt;/li>
&lt;li>Product is a slide deck with lots of pictures&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Interesting links:&lt;/strong>&lt;br>
&lt;a href="https://www.opencompute.org/wiki/Storage/NVMeHDD" target="_blank" rel="noopener">https://www.opencompute.org/wiki/Storage/NVMeHDD&lt;/a>&lt;br>
&lt;a href="https://2021ocpglobal.fnvirtual.app/a/event/1714" target="_blank" rel="noopener">https://2021ocpglobal.fnvirtual.app/a/event/1714&lt;/a> (video and slides, requires $0 registration)&lt;br>
&lt;a href="https://www.storagereview.com/news/nvme-hdd-edges-closer-to-reality" target="_blank" rel="noopener">https://www.storagereview.com/news/nvme-hdd-edges-closer-to-reality&lt;/a>&lt;br>
&lt;a href="https://www.tomshardware.com/news/seagate-demonstrates-hdd-with-pcie-nvme-interface" target="_blank" rel="noopener">https://www.tomshardware.com/news/seagate-demonstrates-hdd-with-pcie-nvme-interface&lt;/a>&lt;br>
&lt;a href="https://nvmexpress.org/everything-you-need-to-know-about-the-nvme-2-0-specifications-and-new-technical-proposals/" target="_blank" rel="noopener">https://nvmexpress.org/everything-you-need-to-know-about-the-nvme-2-0-specifications-and-new-technical-proposals/&lt;/a>&lt;br>
&lt;a href="https://www.tomshardware.com/news/nvme-2-0-supports-hard-disk-drives" target="_blank" rel="noopener">https://www.tomshardware.com/news/nvme-2-0-supports-hard-disk-drives&lt;/a>&lt;/p></description></item><item><title>FasTensor</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/lbl/fastensor/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/lbl/fastensor/</guid><description>&lt;p>&lt;a href="https://sdm.lbl.gov/fastensor/" target="_blank" rel="noopener">FasTensor&lt;/a> is a parallel execution engine for user-defined functions on multidimensional arrays. The user-defined functions follow the stencil metaphor used for scientific computing and is effective for expressing a wide range of computations for data analyses, including common aggregation operations from database management systems and advanced machine learning pipelines. FasTensor execution engine exploits the structural-locality in the multidimensional arrays to automate data management operations such as file I/O, data partitioning, communication, parallel execution, and so on.&lt;/p>
&lt;h3 id="continuous-integration">Continuous Integration&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Data Management&lt;/code>, &lt;code>Analytics&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++, github&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="mailto:kwu@lbl.gov">John Wu&lt;/a>, &lt;a href="mailto:dbin@lbl.gov">Bin Dong&lt;/a>, &lt;a href="mailto:sbyna@lbl.gov">Suren Byna&lt;/a>&lt;/li>
&lt;/ul>
&lt;ul>
&lt;li>Develop a test suite for the public API of FasTensor&lt;/li>
&lt;li>Automate execution of the test suite&lt;/li>
&lt;li>Document the continuous integration process&lt;/li>
&lt;li>Develop performance testing suite&lt;/li>
&lt;/ul></description></item><item><title>HDF5</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/lbl/hdf5/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/lbl/hdf5/</guid><description>&lt;p>&lt;a href="https://portal.hdfgroup.org/display/knowledge/What&amp;#43;is&amp;#43;HDF5" target="_blank" rel="noopener">HDF5&lt;/a> is a unique technology suite that makes possible the management of extremely large and complex data collections.&lt;/p>
&lt;p>The HDF5 technology suite includes:&lt;/p>
&lt;ul>
&lt;li>A versatile data model that can represent very complex data objects and a wide variety of metadata.&lt;/li>
&lt;li>A completely portable file format with no limit on the number or size of data objects in the collection.&lt;/li>
&lt;li>A software library that runs on a range of computational platforms, from laptops to massively parallel systems, and implements a high-level API with C, C++, Fortran 90, and Java interfaces.&lt;/li>
&lt;li>A rich set of integrated performance features that allow for access time and storage space optimizations.&lt;/li>
&lt;li>Tools and applications for managing, manipulating, viewing, and analyzing the data in the collection.&lt;/li>
&lt;/ul>
&lt;h3 id="python-interface-to-hdf5-asynchronous-io">Python Interface to HDF5 Asynchronous I/O&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Python&lt;/code>, &lt;code>Async I/O&lt;/code>, &lt;code>HDF5&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Python, C, HDF5&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="mailto:sbyna@lbl.gov">Suren Byna&lt;/a>, &lt;a href="mailto:htang4@lbl.gov">Houjun Tang&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>HDF5 is a well-known library for storing and accessing (known as &amp;ldquo;Input and Output&amp;rdquo; or I/O) data on high-performance computing systems. Recently, new technologies, such as asynchronous I/O and caching, have been developed to utilize fast memory and storage devices and to hide the I/O latency. Applications can take advantage of an asynchronous interface by scheduling I/O as early as possible and overlapping computation with I/O operations to improve overall performance. The existing HDF5 asynchronous I/O feature supports the C/C++ interface. This project involves the development and performance evaluation of a Python interface that would allow more Python-based scientific codes to use and benefit from the asynchronous I/O.&lt;/p></description></item><item><title>LiveHD (2022)</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/livehd/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/livehd/</guid><description>&lt;p>Projects for &lt;a href="https://github.com/masc-ucsc/livehd" target="_blank" rel="noopener">LiveHD&lt;/a>. Lead Mentors: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jose-renau/">Jose Renau&lt;/a> and &lt;a href="mailto:swang203@ucsc.edu">Sheng-Hong Wang&lt;/a>.&lt;/p>
&lt;h3 id="hif-tooling">HIF Tooling&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Title&lt;/td>
&lt;td>HIF tooling&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Description&lt;/td>
&lt;td>Tools around Hardware Interchange Format (HIF) files&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Mentor(s)&lt;/td>
&lt;td>Jose Renau&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Skills&lt;/td>
&lt;td>C++17&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Difficulty&lt;/td>
&lt;td>Medium&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Size&lt;/td>
&lt;td>Medium 175 hours&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;a href="https://github.com/masc-ucsc/hif" target="_blank" rel="noopener">Link&lt;/a>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>HIF (&lt;a href="https://github.com/masc-ucsc/hif" target="_blank" rel="noopener">https://github.com/masc-ucsc/hif&lt;/a>) stands for Hardware Interchange Format.
It is designed to be a efficient binary representation with simple API that
allows to have generic graph and tree representations commonly used by hardware
tools. It is not designer to be a universal format, but rather a storate and
traversal format for hardware tools.&lt;/p>
&lt;p>LiveHD has 2 HIF interfaces, the tree (LNAST) and the graph (Lgraph). Both can
read/write HIF format. The idea of this project is to expand the hif repository
to create some small but useful tools around hif. Some projects:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>hif_diff + hif_patch: Create the equivalent of the diff/patch commands that
exist for text but for HIF files. Since the HIF files have a more clear
structure, some patches changes are more constrained or better understood
(IOs and dependences are explicit).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>hif_tree: Print the HIF hierarchy, somewhat similar to GNU tree but showing the HIF hieararchy.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>hif_grep: capacity to grep for some tokens and outout a hif file only with those. Thena hif_tree/hif_cat can show the contents.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="mockturtle">Mockturtle&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Title&lt;/td>
&lt;td>Mockturtle&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Description&lt;/td>
&lt;td>Perform synthesis for graph in LiveHD using Mockturtle&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Mentor(s)&lt;/td>
&lt;td>Jose Renau&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Skills&lt;/td>
&lt;td>C++17, synthesis&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Difficulty&lt;/td>
&lt;td>Medium&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Size&lt;/td>
&lt;td>Medium 175 hours&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;a href="https://github.com/masc-ucsc/livehd/blob/master/docs/cross.md#mockturtle" target="_blank" rel="noopener">Link&lt;/a>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>There are some issues with Mockturtle integration (new cells) and it is not using the latest Mockturtle library versions.
The goal is to use Mockturtle (&lt;a href="https://github.com/lsils/mockturtle" target="_blank" rel="noopener">https://github.com/lsils/mockturtle&lt;/a>) with LiveHD. The main characteristics:&lt;/p>
&lt;ul>
&lt;li>Use mockturtle to tmap to LUTs&lt;/li>
&lt;li>Use mockturtle to synthesize (optimize) logic&lt;/li>
&lt;li>Enable cut-rewrite as an option&lt;/li>
&lt;li>Enable hierarchy cross optimization (hier:true option)&lt;/li>
&lt;li>Use the graph labeling to find cluster to optimize&lt;/li>
&lt;li>Re-timing&lt;/li>
&lt;li>Map to LUTs only gates and non-wide arithmetic. E.g: 32bit add is not mapped to LUTS, but a 2-bit add is mapped.&lt;/li>
&lt;li>List of resources to not map:
&lt;ul>
&lt;li>Large ALUs. Large ALUs should have an OpenWare block (hardcoded in FPGAs and advanced adder options in ASIC)&lt;/li>
&lt;li>Multipliers and dividers&lt;/li>
&lt;li>Barrell shifters with not trivial shifts (1-2 bits) selectable at run-time&lt;/li>
&lt;li>memories, luts&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="query-shell">Query Shell&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Title&lt;/td>
&lt;td>Query Shell&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Description&lt;/td>
&lt;td>Create a console app that interacts with LiveHD to query parameters about designs&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Mentor(s)&lt;/td>
&lt;td>Jose Renau&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Skills&lt;/td>
&lt;td>C++17&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Difficulty&lt;/td>
&lt;td>Medium&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Size&lt;/td>
&lt;td>Medium 175 hours&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;a href="https://github.com/masc-ucsc/livehd/blob/master/docs/cross.md#query-shell-not-lgshell-to-query-graphs" target="_blank" rel="noopener">Link&lt;/a>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;ul>
&lt;li>Based on replxx (like lgshell)&lt;/li>
&lt;li>Query bits, ports&amp;hellip; like
&lt;ul>
&lt;li>&lt;a href="https://github.com/rubund/netlist-analyzer" target="_blank" rel="noopener">https://github.com/rubund/netlist-analyzer&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.jameswhanlon.com/querying-logical-paths-in-a-verilog-design.html" target="_blank" rel="noopener">https://www.jameswhanlon.com/querying-logical-paths-in-a-verilog-design.html&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>It would be cool if subsections (selected) parts can be visualized with something like &lt;a href="https://github.com/nturley/netlistsvg" target="_blank" rel="noopener">https://github.com/nturley/netlistsvg&lt;/a>&lt;/li>
&lt;li>The shell may be expanded to support simulation in the future&lt;/li>
&lt;li>Wavedrom/Duh dumps&lt;/li>
&lt;/ul>
&lt;p>Wavedrom and duh allows to dump bitfield information for structures. It would be interesting to explore to dump tables and bit
fields for Lgraph IOs, and structs/fields inside the module. It may be a way to integrate with the documentation generation.&lt;/p>
&lt;p>Example of queries: show path, show driver/sink of, do topo traversal,&amp;hellip;.&lt;/p>
&lt;p>As an interesting extension would be to have some simple embedded language (TCL or ChaiScript or ???) to control queries more
easily and allow to build functions/libraries.&lt;/p>
&lt;h3 id="lgraph-and-lnast-check-pass">Lgraph and LNAST check pass&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Title&lt;/td>
&lt;td>Lgraph and LNAST check pass&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Description&lt;/td>
&lt;td>Create a pass that check the integrity/correctness of Lgraph and LNAST&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Mentor(s)&lt;/td>
&lt;td>Jose Renau&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Skills&lt;/td>
&lt;td>C++17&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Difficulty&lt;/td>
&lt;td>Medium&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Size&lt;/td>
&lt;td>Large 350 hours&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;a href="https://github.com/masc-ucsc/livehd/blob/master/docs/cross.md#lgraph-and-lnast-check-pass" target="_blank" rel="noopener">Link&lt;/a>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Create a pass that checks that the Lgraph (and/or LNAST) is semantically
correct. The LNAST already has quite a few tests (pass.semantic), but it can be
further expanded. Some checks:&lt;/p>
&lt;ul>
&lt;li>No combinational loops&lt;/li>
&lt;li>No mismatch in bit widths&lt;/li>
&lt;li>No disconnected nodes&lt;/li>
&lt;li>Check for inefficient splits (do not split buses that can be combined)&lt;/li>
&lt;li>Transformations stages should not drop names if same net is preserved&lt;/li>
&lt;li>No writes in LNAST that are never read&lt;/li>
&lt;li>All the edges are possible. E.g: no pin &amp;lsquo;C&amp;rsquo; in Sum_op&lt;/li>
&lt;/ul>
&lt;h3 id="unbitwidth">unbitwidth&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Title&lt;/td>
&lt;td>unbitwidth&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Description&lt;/td>
&lt;td>Not all the variables need bitwidth information. Find the small subset&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Mentor(s)&lt;/td>
&lt;td>Jose Renau&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Skills&lt;/td>
&lt;td>C++17&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Difficulty&lt;/td>
&lt;td>Medium&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Size&lt;/td>
&lt;td>Medium 175 hours&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;a href="https://github.com/masc-ucsc/livehd/blob/master/docs/cross.md#unbitwidth-local-and-global-bitwidth" target="_blank" rel="noopener">Link&lt;/a>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>This pass is needed to create less verbose CHISEL and Pyrope code generation.&lt;/p>
&lt;p>The LGraph can have bitwidth information for each dpin. This is needed for
Verilog code generation, but not needed for Pyrope or CHISEL. CHISEL can
perform local bitwidth inference and Pyrope can perform global bitwidth
inference.&lt;/p>
&lt;p>A new pass should remove redundant bitwidth information. The information is
redundant because the pass/bitwidth can regenerate it if there is enough
details. The goal is to create a pass/unbitwidth that removes either local or
global bitwidth. The information left should be enough for the bitwidth pass to
regenerate it.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Local bitwidth: It is possible to leave the bitwidth information in many
places and it will have the same results, but for CHISEL the inputs should be
sized. The storage (memories/flops) should have bitwidth when can not be
inferred from the inputs.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Global bitwidth: Pyrope bitwidth inference goes across the call hierarchy.
This means that a module could have no bitwidth information at all. We start
from the leave nodes. If all the bits can be inferred given the inputs, the
module should have no bitwidth. In that case the bitwidth can be inferred from
outside.&lt;/p>
&lt;/li>
&lt;/ul></description></item><item><title>Open Source Autonomous Vehicle Controller</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/osavc/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/osavc/</guid><description>&lt;p>The OSAVC is a vehicle-agnostic open source hardware and software project. This project is designed to provide a real-time hardware controller adaptable to any vehicle type, suitable for aerial, terrestrial, marine, or extraterrestrial vehicles. It allows control researchers to develop state estimation algorithms, sensor calibration algorithms, and vehicle control models in a modular fashion such that once the hardware set has been developed switching algorithms requires only modifying one C function and recompiling.&lt;/p>
&lt;p>Lead mentor: &lt;a href="mailto:aamuhunt@ucsc.edu">Aaron Hunter&lt;/a>&lt;/p>
&lt;p>Projects for the OSAVC:&lt;/p>
&lt;h3 id="vehiclecraft-sensor-driver-development">Vehicle/Craft sensor driver development&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: Driver code to integrate sensor to a microcontroller&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C, I2C, SPI, UART interfaces&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong> 175 hours&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong> Aaron Hunter&lt;/li>
&lt;/ul>
&lt;p>Help develop a sensor library for use in autonomnous vehicles. Possible sensors include range finders, ping sensors, IMUs, GPS receivers, RC receivers, barometers, air speed sensors, etc. Code will be written in C using state machine methodology and non-blocking algorithms. Test the drivers on a Microchip microncontroller.&lt;/p>
&lt;h3 id="path-finding-algorithm-using-opencv-and-machine-learning">Path finding algorithm using OpenCV and machine learning&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: Computer vision, blob detection&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C/Python, OpenCV&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong> 175 or 350 hours&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong> Medium&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong> Aaron Hunter&lt;/li>
&lt;/ul>
&lt;p>Use OpenCV to identify a track for an autonomous vehicle to follow. Build on previous work by developing a new model using EfficientDet and an existing training set of images. Port the model to TFlite and implement on the Coral USB Accelerator. Evaluate its performance against our previous efforts.&lt;/p>
&lt;h3 id="state-estimationsensor-fusion-algorithm-development">State estimation/sensor fusion algorithm development&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: Kalman filtering, Mahoney&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C/Python, Matlab/Simulink, numerical optimization algorithms&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong> 350 hours&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong> Aaron Hunter&lt;/li>
&lt;/ul>
&lt;p>Implement an optimal state estimation algorithm from a model. This model can be derived from a Kalman filter or some other state estimation filter (e.g., Mahoney filter). THe model takes sensor readings as input and provides an estimate of the state of a vehicle. Finally, convert the model to standard C using the Simulink code generation or implement in Python (for use on a single board computer, e.g., Raspberry Pi)&lt;/p></description></item><item><title>OpenRAM</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/openram/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/openram/</guid><description>&lt;p>&lt;a href="https://github.com/VLSIDA/OpenRAM" target="_blank" rel="noopener">OpenRAM&lt;/a> is an award winning open-source Python framework to create the layout, netlists, timing and power models, placement and routing models, and other views necessary to use SRAMs in ASIC design. OpenRAM supports integration in both commercial and open-source flows with both predictive and fabricable technologies. Most recently, it has created memories that are included on all of the &lt;a href="https://efabless.com/open_shuttle_program/" target="_blank" rel="noopener">eFabless/Google/Skywater MPW tape-outs&lt;/a>.&lt;/p>
&lt;h3 id="replace-logging-framework-with-library">Replace logging framework with library&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>User Interfaces&lt;/code>, &lt;code>Python APIs&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Easy&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium (175 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:mrg@ucsc.edu">Matthew Guthaus&lt;/a>,&lt;a href="mailto:jcirimel@ucsc.edu">Jesse Cirimelli-Low&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Replace the custom logging framework in OpenRAM with &lt;a href="https://docs.python.org/3/library/logging.html" target="_blank" rel="noopener">Python logging&lt;/a> module. New logging should allow levels of detail as well as tags to enable/disable logging of particular features to aid debugging.&lt;/p>
&lt;h3 id="rom-generator">ROM generator&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>VLSI Design Basics&lt;/code>, &lt;code>Memories&lt;/code>, &lt;code>Python&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, VLSI&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium/Challenging&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:mrg@ucsc.edu">Matthew Guthaus&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Use the OpenRAM API to generate a Read-Only Memory (ROM) file from an input hex file. Project
will automatically generate a Spice netlist, layout, Verilog model and timing characterization.&lt;/p>
&lt;h3 id="register-file-generator">Register File generator&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>VLSI Design Basics&lt;/code>, &lt;code>Memories&lt;/code>, &lt;code>Python&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, VLSI&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium/Challenging&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:mrg@ucsc.edu">Matthew Guthaus&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Use the OpenRAM API to generate a Register File from standard library cells. Project
will automatically generate a Spice netlist, layout, Verilog model and timing characterization.&lt;/p>
&lt;h3 id="built-in-self-test-and-repair">Built-In Self Test and Repair&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>VLSI Design Basics&lt;/code>, &lt;code>Python&lt;/code>, &lt;code>Verilog&lt;/code>, &lt;code>Testing&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, Verilog&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium/Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium (175 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:mrg@ucsc.edu">Matthew Guthaus&lt;/a>, &lt;a href="mailto:bonal@ucsc.edu">Bugra Onal&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Finish integration of parameterized Verilog modeule to support Built-In-Self-Test and Repair
of OpenRAM memories using spare rows and columns in OpenRAM memories.&lt;/p>
&lt;h3 id="layout-verses-schematic-lvs-visualization">Layout verses Schematic (LVS) visualization&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>VLSI Design Basics&lt;/code>, &lt;code>Python&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, VLSI, JSON&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Easy/Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:mrg@ucsc.edu">Matthew Guthaus&lt;/a>,&lt;a href="mailto:jcirimel@ucsc.edu">Jesse Cirimelli-Low&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Create a visualization interface to debug layout verses schematic mismatches in &lt;a href="https://github.com/RTimothyEdwards/magic" target="_blank" rel="noopener">Magic&lt;/a> layout editor. Results will be parsed from a JSON output of &lt;a href="https://github.com/RTimothyEdwards/netgen" target="_blank" rel="noopener">Netgen&lt;/a>.&lt;/p></description></item><item><title>OpenROAD - A Complete, Autonomous RTL-GDSII Flow for VLSI Designs</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/openroad/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/openroad/</guid><description>&lt;p>&lt;a href="https://theopenroadproject.org" target="_blank" rel="noopener">OpenROAD&lt;/a> is a front-runner in open-source semiconductor design automation tools and know-how. OpenROAD reduces barriers of access and tool costs to democratize system and product innovation in silicon. The OpenROAD tool and flow provide an autonomous, no-human-in-the-loop, 24-hour RTL-GDSII capability to support low-overhead design exploration and implementation through tapeout. We welcome a diverse community of designers, researchers, enthusiasts and entrepreneurs who use and contribute to OpenROAD to make a far-reaching impact.
Our mission is to democratize and advance design automation of semiconductor devices through leadership, innovation, and collaboration.&lt;/p>
&lt;p>OpenROAD is the key enabler of successful Chip initiatives like the Google-sponsored &lt;a href="efabless.com">Efabless&lt;/a> that has made possible more than 150 successful tapeouts by a diverse and global user community. The OpenROAD project repository is &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD" target="_blank" rel="noopener">https://github.com/The-OpenROAD-Project/OpenROAD&lt;/a>.&lt;/p>
&lt;p>Design of static RAMs in VLSI designs for good performance and area is generally time-consuming. Memory compilers significantly reduce design time for complex analog and mixed-signal designs by allowing designers to explore, verify and configure multiple variants and hence select a design that is optimal for area and performance. This project requires the support of memory compilers to &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts" target="_blank" rel="noopener">OpenROAD-flow-scripts&lt;/a> based on popular PDKS such as those provided by &lt;a href="https://github.com/vlsida/openram" target="_blank" rel="noopener">OpenRAM&lt;/a>.&lt;/p>
&lt;h3 id="openlane-memory-design-macro-floorplanning">OpenLane Memory Design Macro Floorplanning&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Memory Compilers&lt;/code>, &lt;code>OpenRAM&lt;/code>, &lt;code>Programmable RAM&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: python, basic knowledge of memory design, VLSI technology, PDK, Verilog&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="mailto:mrg@ucsc.edu">Matthew Guthaus&lt;/a>, &lt;a href="mailto:mehdi@umich.edu">Mehdi Saligane&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Improve and verify &lt;a href="https://github.com/The-OpenROAD-Project/OpenLane" target="_blank" rel="noopener">OpenLane&lt;/a> design planning with OpenRAM memories. Specifically, this project will utilize the macro placer/floorplanner and resolve any issues for memory placement. Issues that will need to be addressed may include power supply connectivity, ability to rotate memory macros, and solving pin-access issues.&lt;/p>
&lt;h3 id="openlane-memory-design-timing-analysis">OpenLane Memory Design Timing Analysis&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Memory Compilers&lt;/code>, &lt;code>OpenRAM&lt;/code>, &lt;code>Programmable RAM&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: python, basic knowledge of memory design, VLSI technology, PDK, Verilog&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="mailto:mrg@ucsc.edu">Matthew Guthaus&lt;/a>, &lt;a href="mailto:mehdi@umich.edu">Mehdi Saligane&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Improve and verify &lt;a href="https://github.com/The-OpenROAD-Project/OpenLane" target="_blank" rel="noopener">OpenLane&lt;/a> Static Timing Analysis using OpenRAM generated library files. Specifically, this will include verifying setup/hold conditions as well as creating additional checks such as minimum period, minimum pulse width, etc. Also, the project will add timing information to Verilog behavioral model.&lt;/p>
&lt;h3 id="openlane-memory-macro-pdk-support">OpenLane Memory Macro PDK Support&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Memory Compilers&lt;/code>, &lt;code>OpenRAM&lt;/code>, &lt;code>Programmable RAM&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: python, basic knowledge of memory design, VLSI technology, PDK, Verilog&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="mailto:mrg@ucsc.edu">Matthew Guthaus&lt;/a>, &lt;a href="mailto:mehdi@umich.edu">Mehdi Saligane&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Integrate and verify FreePDK45 OpenRAM memories with an &lt;a href="https://github.com/The-OpenROAD-Project/OpenLane" target="_blank" rel="noopener">OpenLane&lt;/a> FreePDK45 design flow. OpenLane currently supports only Skywater 130nm PDK, but OpenROAD supports FreePDK45 (which is the same as Nangate45). This project will create a design using OpenRAM memories with the OpenLane flow using FreePDK45.&lt;/p>
&lt;h3 id="vlsi-power-planning-and-analysis">VLSI Power Planning and Analysis&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Power Planning for VLSI&lt;/code>, &lt;code>IR Drop Analysis&lt;/code>, &lt;code>Power grid Creation and Analysis&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++, tcl, VLSI Layout&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: Mehdi Saligane &lt;a href="mailto:mehdi@umich.edu">mailto:mehdi@umich.edu&lt;/a>, Ming-Hung &lt;a href="mailto:minghung@umich.edu">mailto:minghung@umich.edu&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Take the existing power planning (pdngen.tcl) module of openroad and recode the functionality in C++ ensuring that all of the unit tests on the existing code pass correctly. Work with a senior member of the team at ARM. Ensure that designs created are of good quality for power routing and overall power consumption.&lt;/p>
&lt;h3 id="demos-and-tutorials">Demos and Tutorials&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Demo Development&lt;/code>, &lt;code>Documentation&lt;/code>, &lt;code>VLSI design basics&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Knowledge of EDA tools, basics of VLSI design flow, tcl, shell scripts, Documentation, Markdown&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium (175 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/indira-iyer/">Indira Iyer&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/vitor-bandeira/">Vitor Bandeira&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>For &lt;a href="https://github.com/The-OpenROAD-Project/OpenLane" target="_blank" rel="noopener">OpenLane&lt;/a>, develop demos showing:
The OpenLane flow and highight key features
GUI visualizations
Design Explorations and Experiments
Different design styles and particular challenges&lt;/p>
&lt;h3 id="comprehensive-flow-testing">Comprehensive Flow Testing&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Testing&lt;/code>, &lt;code>Documentation&lt;/code>, &lt;code>VLSI design basics&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Knowledge of EDA tools, basics of VLSI design, tcl, shell scripts, Verilog, Layout&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium (175 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/indira-iyer/">Indira Iyer&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Develop detailed test plans to test the OpenLane flow to expand coverage and advanced features. Add open source designs to the regression test suite to improve tool quality and robustness. This includes design specification, configuration and creation of all necessary files for regression testing. Suggested sources : ICCAS benchmarks, opencores, LSOracle for synthesis flow option.&lt;/p>
&lt;h3 id="enhance-gui-features">Enhance GUI features&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>GUI&lt;/code>, &lt;code>Visualization&lt;/code>, &lt;code>User Interfaces&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++, Qt&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/matt-liberty/">Matt Liberty&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/vitor-bandeira/">Vitor Bandeira&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>For &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD" target="_blank" rel="noopener">OpenROAD&lt;/a>, develop and enhance visualizations for EDA data and algorithms in the OpenROAD GUI. Allow deeper understanding of the tool results for users and tool internals for developers.&lt;/p>
&lt;h3 id="automate-opendb-code-generation">Automate OpenDB code Generation&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Database&lt;/code>, &lt;code>EDA&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++, Python, JSON, Jinja templating&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/matt-liberty/">Matt Liberty&lt;/a>, &lt;a href="mailto:aspyrou@eng.ucsd.edu">Tom Spyrou&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>For &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD" target="_blank" rel="noopener">OpenROAD&lt;/a>- Automatic code generation for the OpenDB database which allows improvements to the data model with much less hand coding. Allow the generation of storage, serialization, and callback code from a custom schema description format.
r&lt;/p>
&lt;h3 id="implement-an-nlp-based-ai-bot-aimed-at-increasing-users-enhancing-usability-and-building-a-knowledge-base">Implement an NLP based AI bot aimed at increasing users, enhancing usability and building a knowledge base&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>AI&lt;/code>, &lt;code>ML&lt;/code>, &lt;code>Analytics&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Python. ML libraries (e.g., Tensorflow, PyTorch)&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/vitor-bandeira/">Vitor Bandeira&lt;/a>, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/indira-iyer/">Indira Iyer&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The &lt;a href="https://github.com/The-OpenROAD-Project/OpenROAD" target="_blank" rel="noopener">OpenROAD&lt;/a> project contains a storehouse of knowledge in it&amp;rsquo;s Github repositories within Issues and Pull requests. Additionally, project related slack channels also hold useful information in the form of questions and answers, problems and solutions in conversation threads. Implement an AI analytics bot that filters, selects relevant discussions and classifies/records them into useful documentation and actionable issues. This should also directly track, increase project usage and report outcome metrics.&lt;/p></description></item><item><title>Package Management &amp; Reproducibility</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/packaging/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/packaging/</guid><description>&lt;p>Project ideas related to reproducibility and package management, especially as it relates to &lt;em>store type package managers&lt;/em> (&lt;a href="http://nixos.org/" target="_blank" rel="noopener">NixOS&lt;/a>, &lt;a href="https://guix.gnu.org/" target="_blank" rel="noopener">Guix&lt;/a> or &lt;a href="https://spack.io/" target="_blank" rel="noopener">Spack&lt;/a>).&lt;/p>
&lt;p>Lead Mentor: &lt;a href="https://users.soe.ucsc.edu/~fmzakari" target="_blank" rel="noopener">Farid Zakaria&lt;/a> &lt;a href="mailto:fmzakari@ucsc.edu">mailto:fmzakari@ucsc.edu&lt;/a>&lt;/p>
&lt;h3 id="investigate-the-dynamic-linking-landscape">Investigate the dynamic linking landscape&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Operating Systems&lt;/code> &lt;code>Compilers&lt;/code> &lt;code>Linux&lt;/code> &lt;code>Package Management&lt;/code> &lt;code>NixOS&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Experience with systems programming and Linux familiarity&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate to Challenging&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:fmzakari@ucsc.edu">Farid Zakaria&lt;/a> &amp;amp; &lt;a href="https://people.llnl.gov/scogland1" target="_blank" rel="noopener">Tom Scogland&lt;/a> &lt;a href="mailto:scogland1@llnl.gov">mailto:scogland1@llnl.gov&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Dynamic linking as specified in the ELF file format has gone unchallenged since it&amp;rsquo;s invention. With many new package management models that eschew the filesystem hierarchy standard (i.e. Nix, Guix and Spack), many of the idiosyncrasies that define the way in which libraries are discovered are no longer useful and potentially harmful.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Continue development on &lt;a href="https://github.com/fzakaria/shrinkwrap" target="_blank" rel="noopener">Shrinkwrap&lt;/a> a tool to make dynamic library loading simpler and more robust.&lt;/li>
&lt;li>Evaluate it&amp;rsquo;s effectiveness across a wide range of binaries.&lt;/li>
&lt;li>Upstream contributions to &lt;a href="http://nixos.org/" target="_blank" rel="noopener">NixOS&lt;/a> or &lt;a href="https://guix.gnu.org/" target="_blank" rel="noopener">Guix&lt;/a> to leverage the improvement when suitable.&lt;/li>
&lt;li>Investigate alternative improvements to dynamic linking by writing a dynamic linker &amp;ldquo;loadder wrapper&amp;rdquo; to explore new ideas.&lt;/li>
&lt;/ul></description></item><item><title>Polyphorm / PolyPhy</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/polyphorm/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/polyphorm/</guid><description>&lt;p>&lt;a href="https://github.com/CreativeCodingLab/Polyphorm" target="_blank" rel="noopener">Polyphorm&lt;/a> is an agent-based system for reconstructing and visualizing &lt;em>optimal transport networks&lt;/em> defined over sparse data. Rooted in astronomy and inspired by nature, we have used Polyphorm to reconstruct the &lt;a href="https://youtu.be/5ILwq5OFuwY" target="_blank" rel="noopener">Cosmic web&lt;/a> structure, but also to discover network-like patterns in natural language data. You can find more details about our research &lt;a href="https://elek.pub/projects/Rhizome-Cosmology" target="_blank" rel="noopener">here&lt;/a>. Under the hood, Polyphorm uses a richer 3D scalar field representation of the reconstructed network, instead of a discrete representation like a graph or a mesh.&lt;/p>
&lt;p>&lt;strong>PolyPhy&lt;/strong> will be a Python-based redesigned version of Polyphorm, currently in the beginning of its development cycle. PolyPhy will be a multi-platform toolkit meant for a wide audience across different disciplines: astronomers, neuroscientists, data scientists and even artists and designers. All of the offered projects focus on PolyPhy, with a variety of topics including design, coding, and even research. Ultimately, PolyPhy will become a tool for discovering connections between different disciplines by creating quantitatively comparable structural analytics.&lt;/p>
&lt;h3 id="develop-website-for-polyphy">Develop website for PolyPhy&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Web Development&lt;/code> &lt;code>Dynamic Updates&lt;/code> &lt;code>UX&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> web development experience, good communicator, (HTML/CSS), (Javascript)&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:oelek@ucsc.edu">Oskar Elek&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Develop a clean and welcoming website for the project. The organization needs to reflect the needs of PolyPhy users, but also provide a convenient entry point for interested project contributors. No excessive pop-ups or webjunk.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Work with mentors on understanding the context of the project.&lt;/li>
&lt;li>Port the contents of the &lt;a href="https://github.com/CreativeCodingLab/Polyphorm" target="_blank" rel="noopener">repository page&lt;/a> to a dedicated website.&lt;/li>
&lt;li>Design the structure of the website according to best OS practices.&lt;/li>
&lt;li>Work with the visual designer (see below) in creating a coherent and organic presentation.&lt;/li>
&lt;li>Interactively link important metrics from the project dev environment as well as documentation.&lt;/li>
&lt;/ul>
&lt;h3 id="design-visual-experience-for-polyphys-website-and-presentations">Design visual experience for PolyPhy&amp;rsquo;s website and presentations&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Design&lt;/code> &lt;code>Art&lt;/code> &lt;code>UX&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> vector and bitmap drawing, sense for spatial symmetry and framing, (interactive content creation), (animation)&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium (175 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:oelek@ucsc.edu">Oskar Elek&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Develop visual content for the project using its main themes: nature-inspired computation, biomimetics, interconnected structures. Aid in designing visual structure of the website as well as other public-facing artifacts.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Work with mentors on understanding the context of the project.&lt;/li>
&lt;li>Design imagery and other graphical elements to visually (re-)present PolyPhy.&lt;/li>
&lt;li>Work with the technical writer (see below) in designing a coherent story.&lt;/li>
&lt;li>Work with the web developer (see above) in creating a coherent and organic presentation.&lt;/li>
&lt;/ul>
&lt;h3 id="write-polyphys-technical-story-and-content">Write PolyPhy&amp;rsquo;s technical story and content&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Writing&lt;/code> &lt;code>Documentation&lt;/code> &lt;code>Storytelling&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> experienced writing structured text over 10 pages, well read, (technical or scientific education)&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:oelek@ucsc.edu">Oskar Elek&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Integral to PolyPhy&amp;rsquo;s presentation is a story that the users and the project contributors can relate to. The objective is to develop the verbal part of that story, as well as major portions of technical documentation that matches it. The difficulty of the project is scalable.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Work with mentors on understanding the context of the project.&lt;/li>
&lt;li>Write different pages of the project website.&lt;/li>
&lt;li>Work with mentors to improve project&amp;rsquo;s written community practices (diversity, communication).&lt;/li>
&lt;li>Write and edit narrative and explanatory parts of PolyPhy&amp;rsquo;s documentation.&lt;/li>
&lt;li>Work with the visual designer (see above) in designing a coherent story.&lt;/li>
&lt;/ul>
&lt;h3 id="video-tutorials-and-presentation-for-polyphy">Video tutorials and presentation for PolyPhy&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Video Presentation&lt;/code> &lt;code>Tutorials&lt;/code> &lt;code>Didactics&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> video editing, creating educational content, communication, (native or fluent in another language)&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Easy-Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:oelek@ucsc.edu">Oskar Elek&lt;/a>, &lt;a href="mailto:deehrlic@ucsc.edu">Drew Ehrlich&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Create a public face for PolyPhy that reflects its history, context, and teaches its functionality to users in different degrees of familiarity.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Work with mentors on understanding the context and history of the project.&lt;/li>
&lt;li>Interview diverse project contributors.&lt;/li>
&lt;li>Create a video documenting PolyPhy&amp;rsquo;s history, with roots in astronomy, complex systems, fractals.&lt;/li>
&lt;li>Create a set of tutorial videos for starting and intermediate PolyPhy users.&lt;/li>
&lt;li>Create an accessible template for future tutorials.&lt;/li>
&lt;/ul>
&lt;h3 id="implement-heterogeneous-data-io-ops">Implement heterogeneous data I/O ops&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>I/O Operations&lt;/code> &lt;code>File Conversion&lt;/code> &lt;code>Numerics&lt;/code> &lt;code>Testing&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python, experience working with scientific or statistical data, good debugging skills&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate-Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Medium or Large (175 or 350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:oelek@ucsc.edu">Oskar Elek&lt;/a>, &lt;a href="mailto:anishagoel14@gmail.com">Anisha Goel&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>By default, PolyPhy operates with an unordered set of points as an input and scalar fields (float ndarrays) as an output, but others are applicable as well. Design and implement interfaces to load and export different data formats (CSV, OBJ, HDF5, FITS&amp;hellip;) and modalities (points, meshes, density fields). The difficulty of the project can be scaled based on contributor&amp;rsquo;s interest.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Research which modalities are used by members of the target communities.&lt;/li>
&lt;li>Implement modular loaders for the inputs and an interface to PolyPhy core.&lt;/li>
&lt;li>Implement exporters for simulation datasets and visualization captures.&lt;/li>
&lt;li>Write testing code for the above.&lt;/li>
&lt;li>Integrate external packages as necessary.&lt;/li>
&lt;/ul>
&lt;h3 id="setup-cicd-for-polyphy">Setup CI/CD for PolyPhy&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Continuous Integration&lt;/code> &lt;code>Continuous Deployment&lt;/code> &lt;code>DevOps&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> experience with CI/CD, GitHub, Python package deployment&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:oelek@ucsc.edu">Oskar Elek&lt;/a>, &lt;a href="mailto:anishagoel14@gmail.com">Anisha Goel&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The objective is to setup a CI/CD pipeline that automates the build testing and deployment of the software. The resulting process needs to be robust to contributor errors and work in the distributed conditions of a diverse contributor base.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Automate continuous building, testing, merging and deployment for PolyPhy in GitHub.&lt;/li>
&lt;li>Publish the CI/CD metrics and build assets to the project webpage.&lt;/li>
&lt;li>Work with other contributors in educating them about the best practices of using the developed CI/CD pipeline.&lt;/li>
&lt;li>Add support for automated packaging using common management systems (pip, Anaconda).&lt;/li>
&lt;/ul>
&lt;h3 id="refine-polyphys-ui-and-develop-new-functional-elements">Refine PolyPhy&amp;rsquo;s UI and develop new functional elements&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>UI/UX&lt;/code> &lt;code>Visual Experience&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> Python programming, UI/UX development experience, (knowledge of graphics)&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Moderate&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:oelek@ucsc.edu">Oskar Elek&lt;/a>, &lt;a href="mailto:dabramov@ucsc.edu">David Abramov&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The key feature of PolyPhy is its interactivity. By interacting with the underlying simulation model, the user can adjust its parameters in real time and respond to its behavior. For instance, an astrophysics expert can load a dataset of 100k galaxies and reconstruct the large-scale structure of the intergalactic medium. A responsive UI combined with real-time visualization allows them to judge the fidelity of the reconstruction and make necessary changes.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Implement a platform-agnostic UI to house PolyPhy&amp;rsquo;s main rendering context as well as secondary analytics.&lt;/li>
&lt;li>Work with the visualization developer (see below) to integrate the rendering functionality.&lt;/li>
&lt;li>Optimize to UI&amp;rsquo;s performance.&lt;/li>
&lt;li>Test the implementation on different OS platforms.&lt;/li>
&lt;/ul>
&lt;h3 id="create-new-data-visualization-regimes">Create new data visualization regimes&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Interactive Visualization&lt;/code> &lt;code>Data Analytics&lt;/code> &lt;code>3D Rendering&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> basic graphics theory and math, Python, GPU programming, (previous experience visualizing novel datasets)&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:oelek@ucsc.edu">Oskar Elek&lt;/a>, &lt;a href="mailto:dabramov@ucsc.edu">David Abramov&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Data visualization is one of the core components of PolyPhy, as it provides a real-time overview of the underlying MCPM simulation. Through the feedback provided by the visualization, PolyPhy users can adjust the simulation model and make new findings about the dataset. Various operations over the reconstructed data (e.g. spatial searching) as well as important statistical summaries also benefit from clear visual presentation.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Develop novel ways of visualizing scientific data in PolyPhy.&lt;/li>
&lt;li>Work with diverse data modalities - point clouds, graphs, scalar and vector fields.&lt;/li>
&lt;li>Add support for visualizing metadata, such as annotations and labels.&lt;/li>
&lt;li>Create UI elements for plotting statistical summaries computed in real-time.&lt;/li>
&lt;/ul>
&lt;h3 id="discrete-graph-extraction-from-simulated-scalar-fields">Discrete graph extraction from simulated scalar fields&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics:&lt;/strong> &lt;code>Graph Theory&lt;/code> &lt;code>Data Science&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills:&lt;/strong> good understanding of discrete math and graph theory, Python, (GPU programming)&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Challenging&lt;/li>
&lt;li>&lt;strong>Size:&lt;/strong> Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentors:&lt;/strong> &lt;a href="mailto:oelek@ucsc.edu">Oskar Elek&lt;/a>, &lt;a href="mailto:farhasan@nmsu.edu">Farhanul Hasan&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Develop a custom method for graph extraction from scalar field data produced by PolyPhy. Because PolyPhy typically produces network-like structures, representing these structures as weighted discrete graphs is very useful for efficiently navigating the data. The most important property of this abstracted representation is that it preserves the topology of the base scalar field by navigating the 1D ridges of the scalar field.&lt;/p>
&lt;p>Specific tasks:&lt;/p>
&lt;ul>
&lt;li>Become familiar with different algorithms for graph growing and skeleton extraction.&lt;/li>
&lt;li>Implement the most suitable method in PolyPhy, interpreting the source scalar field as a throughput (transport) network. The weights of the resulting graph need to reflect the source throughputs between the respective node locations.&lt;/li>
&lt;li>Implement common graph operations, e.g. hierarchical clustering and reduction, shortest path between two nodes, range queries.&lt;/li>
&lt;li>Optimize the runtime of the implemented methods.&lt;/li>
&lt;li>Work with the visualization developer (see above) to visualize the resulting graphs.&lt;/li>
&lt;/ul></description></item><item><title>Proactive Data Containers (PDC)</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/lbl/pdc/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/lbl/pdc/</guid><description>&lt;p>&lt;a href="https://sdm.lbl.gov/pdc/about.html" target="_blank" rel="noopener">Proactive Data Containers&lt;/a> (PDC) are containers within a locus of storage (memory, NVRAM, disk, etc.) that store science data in an object-oriented manner. Managing data as objects enables powerful optimization opportunities for data movement and
transformations, and storage mechanisms that take advantage of the deep storage hierarchy and enable automated performance tuning&lt;/p>
&lt;h3 id="python-interface-to-an-object-centric-data-management-system">Python interface to an object-centric data management system&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Python&lt;/code>, &lt;code>object-centric data management&lt;/code>, &lt;code>PDC&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: Python, C, PDC&lt;/li>
&lt;li>&lt;strong>Difficulty&lt;/strong>: Medium&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: Large (350 hours)&lt;/li>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="mailto:sbyna@lbl.gov">Suren Byna&lt;/a>, &lt;a href="mailto:htang4@lbl.gov">Houjun Tang&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;a href="https://sdm.lbl.gov/pdc/about.html" target="_blank" rel="noopener">Proactive Data Containers (PDC)&lt;/a> is an object-centric data management system for scientific data on high performance computing systems. It manages objects and their associated metadata within a locus of storage (memory, NVRAM, disk, etc.). Managing data as objects enables powerful optimization opportunities for data movement and transformations, and storage mechanisms that take advantage of the deep storage hierarchy and enable automated performance tuning. Currently PDC has a C interface. Providing a python interface would make it easier for more Python applications to utilize it.&lt;/p></description></item><item><title>Skyhook Data Management</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/skyhookdm/</link><pubDate>Mon, 07 Nov 2022 10:15:56 -0700</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre22/ucsc/skyhookdm/</guid><description>&lt;p>&lt;a href="https://iris-hep.org/projects/skyhookdm.html" target="_blank" rel="noopener">SkyhookDM&lt;/a>&lt;/p>
&lt;p>The Skyhook Data Management project extends object storage with data
management functionality for tabular data. SkyhookDM enables storing and query
tabular data in the &lt;a href="https://ceph.io" target="_blank" rel="noopener">Ceph&lt;/a> distributed object storage system. It thereby
turns Ceph into an &lt;a href="https://arrow.apache.org" target="_blank" rel="noopener">Apache Arrow&lt;/a>-native
storage system, utilizing the Arrow Dataset API to store and query data with server-side data processing, including selection and projection that can significantly reduce the data returned to the client.&lt;/p>
&lt;p>SkyhookDM is now part of Apache Arrow (see &lt;a href="https://arrow.apache.org/blog/2022/01/31/skyhook-bringing-computation-to-storage-with-apache-arrow/" target="_blank" rel="noopener">blog post&lt;/a>).&lt;/p>
&lt;hr>
&lt;h3 id="support-reading-from-skyhook-in-daskray-using-the-arrow-dataset-api">Support reading from Skyhook in Dask/Ray using the Arrow Dataset API&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Arrow&lt;/code>, &lt;code>Dask/Ray&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: 175 hours&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;/ul>
&lt;ul>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="mailto:jayjeetc@ucsc.edu">Jayjeet Chakraboorty&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Problem:&lt;/strong> Dask and Ray are parallel-computing frameworks similar to Apache Spark but in a Python ecosystem. Each of these frameworks support reading tabular data from different data sources such as a local filesystem, cloud object stores, etc. These systems have recently added support for the Arrow Dataset API to read data from different sources. Since, the Arrow dataset API supports Skyhook, we can leverage this capability to offload compute-heavy Parquet file decoding and decompression into the Ceph storage layer. This can help us speed up the queries significantly as CPU will get freed up in the Dask/Ray workers for other processing tasks.&lt;/p>
&lt;h3 id="implement-gandiva-based-query-executor-in-skyhookdm">Implement Gandiva based query executor in SkyhookDM&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Arrow&lt;/code>, &lt;code>Gandiva&lt;/code>, &lt;code>SIMD&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: 350 hours&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Hard&lt;/li>
&lt;/ul>
&lt;ul>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="mailto:jayjeetc@ucsc.edu">Jayjeet Chakraboorty&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Problem:&lt;/strong> &lt;a href="https://arrow.apache.org/blog/2018/12/05/gandiva-donation/" target="_blank" rel="noopener">Gandiva&lt;/a> allows efficient evaluation of query expressions using runtime code generation using LLVM. The generated code leverages SIMD instructions and is highly optimized for parallel processing in modern CPUs. It is natively supported by Arrow for compiling and executing expressions. SkyhookDM currently uses the Arrow Dataset API (which internally uses Arrow Compute APIs) to execute query expressions inside the Ceph OSDs. Since, the Arrow Dataset API particularly does not support Gandiva currently, the goal of this project is to add support for Gandiva in the Arrow Dataset API in order to accelerate query processing when offloaded to the storage layer. This will help Skyhook combat some of the peformance issues due to the inefficient serialization interface of Arrow.&lt;/p>
&lt;p>&lt;strong>References:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://arrow.apache.org/blog/2018/12/05/gandiva-donation/" target="_blank" rel="noopener">https://arrow.apache.org/blog/2018/12/05/gandiva-donation/&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.dremio.com/subsurface/increasing-performance-with-arrow-and-gandiva/" target="_blank" rel="noopener">https://www.dremio.com/subsurface/increasing-performance-with-arrow-and-gandiva/&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/apache/arrow/tree/master/cpp/src/gandiva" target="_blank" rel="noopener">https://github.com/apache/arrow/tree/master/cpp/src/gandiva&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="add-ability-to-create-and-save-views-from-datasets">Add Ability to create and save views from Datasets&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>Arrow&lt;/code>, &lt;code>Database views&lt;/code>, &lt;code>virtual datasets&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++&lt;/li>
&lt;li>&lt;strong>Size&lt;/strong>: 175 hours&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;/ul>
&lt;ul>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="mailto:jayjeetc@ucsc.edu">Jayjeet Chakraboorty&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Problem - Workloads may repeat the same or similar queries over time. This causes repetition of IO and compute operations, wasting resources.
Saving previous computation in the form of materialized views can provide benefit for future
workload processing.
Solution - Add a method to the Dataset API to create views from queries and save the view as an object in a separate pool with some object key that can be generated from the query that created it.&lt;/p>
&lt;p>Reference:
&lt;a href="https://docs.dremio.com/working-with-datasets/virtual-datasets.html" target="_blank" rel="noopener">https://docs.dremio.com/working-with-datasets/virtual-datasets.html&lt;/a>&lt;/p>
&lt;hr>
&lt;h3 id="integrating-delta-lake-on-top-of-skyhookdm">Integrating Delta Lake on top of SkyhookDM&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Topics&lt;/strong>: &lt;code>data lakes&lt;/code>, &lt;code>lake house&lt;/code>, &lt;code>distributed query processing&lt;/code>&lt;/li>
&lt;li>&lt;strong>Skills&lt;/strong>: C++&lt;/li>
&lt;li>&lt;strong>NSize&lt;/strong>: 175 or 350 hours&lt;/li>
&lt;li>&lt;strong>Difficulty:&lt;/strong> Medium&lt;/li>
&lt;/ul>
&lt;ul>
&lt;li>&lt;strong>Mentor&lt;/strong>: &lt;a href="mailto:jayjeetc@ucsc.edu">Jayjeet Chakraboorty&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;a href="https://delta.io/" target="_blank" rel="noopener">Delta Lake&lt;/a> is a new architecture for querying big data lakes through Spark, providing transactions.
An important benefit of this integration will be to provide an SQL interface for SkyhookDM functionality, through Spark SQL.
This project will further build upon our current work connecting Spark to SkyhookDM through the Arrow Dataset API.
This would allow us to run some of the TPC-DS queries (popular set of SQL queries for benchmarking databases) on SkyhookDM easily.&lt;/p>
&lt;p>Reference: [Delta Lake paper] (&lt;a href="https://databricks.com/jp/wp-content/uploads/2020/08/p975-armbrust.pdf" target="_blank" rel="noopener">https://databricks.com/jp/wp-content/uploads/2020/08/p975-armbrust.pdf&lt;/a>)&lt;/p></description></item></channel></rss>