<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Ujjwal Shekhar | UCSC OSPO</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ujjwal-shekhar/</link><atom:link href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ujjwal-shekhar/index.xml" rel="self" type="application/rss+xml"/><description>Ujjwal Shekhar</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><lastBuildDate>Sat, 24 Aug 2024 00:00:00 +0000</lastBuildDate><image><url>https://deploy-preview-1007--ucsc-ospo.netlify.app/media/logo_hub6795c39d7c5d58c9535d13299c9651f_74810_300x300_fit_lanczos_3.png</url><title>Ujjwal Shekhar</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/ujjwal-shekhar/</link></image><item><title>Hardware Hierarchical Dynamical Systems</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/</link><pubDate>Sat, 24 Aug 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/</guid><description>&lt;p>Hi everyone! I am Ujjwal Shekhar, a Computer Science student at the International Institute of Information Technology - Hyderabad. I am excited to share my work on the project titled &lt;strong>&amp;ldquo;Hardware Hierarchical Dynamical Systems&amp;rdquo;&lt;/strong> as part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/osre/">Open Source Research Experience (OSRE) program&lt;/a> and &lt;a href="https://summerofcode.withgoogle.com/" target="_blank" rel="noopener">Google Summer of Code&lt;/a>. This project has been an incredible journey, and I&amp;rsquo;ve had the privilege of working with my mentors, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jose-renau/">Jose Renau&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sakshi-garg/">Sakshi Garg&lt;/a>.&lt;/p>
&lt;h1 id="project-overview-and-goals">Project Overview and Goals&lt;/h1>
&lt;blockquote>
&lt;p>Abstract Syntax Trees (ASTs) are fundamental to modern compilers, serving as the backbone for parsing and transforming code. When compiling hardware code, the sheer volume of data can make compilation times a significant bottleneck. My project focuses on building a memory-optimized tree data structure specifically tailored for AST-typical queries.&lt;/p>
&lt;/blockquote>
&lt;p>The &lt;a href="https://github.com/masc-ucsc/livehd" target="_blank" rel="noopener">LiveHD&lt;/a> repository, developed by the &lt;a href="https://masc.soe.ucsc.edu" target="_blank" rel="noopener">Micro Architecture Lab&lt;/a> at UCSC, offers a compiler infrastructure optimized for hardware synthesis and simulation. The existing &lt;a href="https://github.com/masc-ucsc/livehd/blob/master/core/lhtree.hpp" target="_blank" rel="noopener">LHTree&lt;/a> data structure provides a foundation, but there was significant potential for further optimization, which I explored throughout this project.&lt;/p>
&lt;h3 id="key-ast-queries">Key AST Queries&lt;/h3>
&lt;p>The core queries that the tree is optimized for include:&lt;/p>
&lt;ul>
&lt;li>Finding the parent of a node.&lt;/li>
&lt;li>Finding the first and last child of a node.&lt;/li>
&lt;li>Locating the previous and next sibling of a node.&lt;/li>
&lt;li>Adding a child to a node.&lt;/li>
&lt;li>Inserting a sibling to a node.&lt;/li>
&lt;li>Performing preorder, postorder, and sibling order traversal.&lt;/li>
&lt;li>Removing a leaf or an entire subtree from the tree.&lt;/li>
&lt;/ul>
&lt;p>The primary goal was to create a tree class that excels at handling these queries efficiently, while still being robust enough to support less frequent operations. The new HHDS tree structure has demonstrated superior performance for specific tree configurations and continues to show potential across other types, particularly in memory consumption and cache efficiency, compared to the current LHTree.&lt;/p>
&lt;p>The benchmarks were done using Google Bench to test the tree for scalability and performance. The new version of the tree is currently being integrated into the LiveHD core repository. Profiling to find bottlenecks in the tree was also done using Callgrind and KCachegrind.&lt;/p>
&lt;h2 id="background-and-motivation">Background and Motivation&lt;/h2>
&lt;h3 id="naive-approach">Naive approach&lt;/h3>
&lt;p>A straightforward method for storing an n-ary tree is to maintain pointers from each node to its parent, children, and immediate siblings. While simple, this approach is memory-intensive and has poor cache efficiency due to the non-contiguous nature of nodes in memory. The variable memory usage per node, depending on the number of children, can also introduce significant overhead.&lt;/p>
&lt;h3 id="enhancements-to-the-naive-approach">Enhancements to the Naive Approach&lt;/h3>
&lt;p>To reduce memory overhead, one optimization is to store only pointers to the first and last child within each node. This reduces memory usage to a constant per node. Additionally, since many AST-related queries focus on the tree&amp;rsquo;s structure rather than the data itself, we can separate the data from the structure. The tree would store only pointers to the data, allowing the tree structure to be optimized independently of the data storage.&lt;/p>
&lt;blockquote>
&lt;p>While separating the data and the structure may seem like an obvious improvement, we will see that it can be extended to provide greater benefits.&lt;/p>
&lt;/blockquote>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Naive and improved methods of storing the tree" srcset="
/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/fig1_hu78c649c062d309c5f78b4b25d06f11c2_90521_7d57a7eca121eafa6de264160253597d.webp 400w,
/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/fig1_hu78c649c062d309c5f78b4b25d06f11c2_90521_ad09a2aa9614ada2d18b11fd703737e7.webp 760w,
/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/fig1_hu78c649c062d309c5f78b4b25d06f11c2_90521_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/fig1_hu78c649c062d309c5f78b4b25d06f11c2_90521_7d57a7eca121eafa6de264160253597d.webp"
width="760"
height="686"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h3 id="improving-the-cache-efficiency">Improving the cache efficiency&lt;/h3>
&lt;p>While reducing memory consumption is beneficial, the tree&amp;rsquo;s cache efficiency can still be suboptimal if the children of a node are scattered in memory. To enhance cache efficiency, storing children in contiguous memory locations is crucial. This improves spatial locality, which in turn boosts cache performance. Additionally, this approach eliminates the need to explicitly store data pointers in the tree, as the data resides at a contiguous memory index aligned with the bookkeeping.&lt;/p>
&lt;p>By storing children contiguously, we can also eliminate the need for previous and next sibling pointers, as siblings are inherently adjacent in memory. Similarly, we can avoid storing the parent pointer for every child, since all children share the same parent.&lt;/p>
&lt;h2 id="optimizations-in-lhtree-old-method">Optimizations in LHTree (Old method)&lt;/h2>
&lt;p>The &lt;a href="https://github.com/masc-ucsc/livehd/blob/master/core/lhtree.hpp" target="_blank" rel="noopener">LHTree&lt;/a> class in LiveHD was designed with these optimizations in mind. It groups siblings into &lt;em>chunks&lt;/em> of four, storing the parent pointer only in the first sibling of each chunk. The last sibling in each chunk points to the next chunk, minimizing the number of pointers required and thus reducing memory overhead.&lt;/p>
&lt;p>LHTree organizes the entire tree as a 2-dimensional array, where the first dimension represents the tree level and the second dimension represents the node index at that level. This structure improves cache efficiency by storing nodes contiguously in memory. Each tree position is a 48-bit ID, with the last 32 bits representing the node&amp;rsquo;s index and the first 16 bits indicating the tree level.&lt;/p>
&lt;p>This explicit maintenance of level separately limits the tree&amp;rsquo;s scalability for deeper trees, due to the fixed number of bits allocated for the level.&lt;/p>
&lt;blockquote>
&lt;p>Despite these optimizations, LHTree has some limitations, particularly in cache alignment and flexibility, which the HHDS tree aims to address.&lt;/p>
&lt;/blockquote>
&lt;p>Unfortunately, the number of bits required by each &amp;ldquo;chunk&amp;rdquo; happens to be slightly bigger than a single cache line (512 bits). This means that the cache efficiency of the tree is not optimal.&lt;/p>
&lt;h2 id="hhds-tree--a-new-approach">HHDS Tree : A New Approach&lt;/h2>
&lt;h3 id="eliminating-levels">Eliminating Levels&lt;/h3>
&lt;p>The HHDS tree stores everything in a single vector, removing the need for explicit level information. This simplification not only improves cache efficiency but also eliminates restrictions on the number of nodes per level and the total number of levels.&lt;/p>
&lt;h3 id="enhanced-cache-alignment">Enhanced Cache Alignment&lt;/h3>
&lt;p>In the HHDS tree, each node has a 46-bit ID. Chunks in the HHDS tree contain up to eight children, with the first 43 bits of the absolute ID serving as the chunk ID and the last three bits indicating the node&amp;rsquo;s offset within the chunk.&lt;/p>
&lt;p>For each chunk, which is exactly 64 bytes (or 512 bits) long—matching the size of a cache line—the following information is stored:&lt;/p>
&lt;ul>
&lt;li>A 46-bit parent pointer (absolute ID).&lt;/li>
&lt;li>A 43-bit first child long pointer (chunk ID).&lt;/li>
&lt;li>A 43-bit last child long pointer (chunk ID).&lt;/li>
&lt;li>43-bit previous and next sibling chunk pointers.&lt;/li>
&lt;li>Seven 21-bit short delta pointers for the first child.&lt;/li>
&lt;li>Seven 21-bit short delta pointers for the last child.&lt;/li>
&lt;/ul>
&lt;blockquote>
&lt;p>&lt;strong>NOTE&lt;/strong>: The 0th chunk is an INVALID node, the real nodes start from the 1st chunk, with the node at an absolute ID of 8 (chunk ID of 1) being the root node.&lt;/p>
&lt;/blockquote>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Overview of the HHDS tree book-keeping" srcset="
/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/fig2_hucb9a27b986f748027535f10fe0848fa0_79213_30c5181b8def0cc33b1b86e98f51c9db.webp 400w,
/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/fig2_hucb9a27b986f748027535f10fe0848fa0_79213_dbc8dcac70e873bb719beedc7adf4645.webp 760w,
/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/fig2_hucb9a27b986f748027535f10fe0848fa0_79213_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/fig2_hucb9a27b986f748027535f10fe0848fa0_79213_30c5181b8def0cc33b1b86e98f51c9db.webp"
width="760"
height="359"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;blockquote>
&lt;p>Refer to the next section for more information on the short delta pointers.&lt;/p>
&lt;/blockquote>
&lt;p>The chunk is 512 bits long, which is 64 bytes, exactly the size of a cache line. Thus the amount of memory required in the worst case is 512 bits for a single node in the chunk, and in the best case is 46 bits for all 8 nodes in the chunk.&lt;/p>
&lt;blockquote>
&lt;p>We utilized the &lt;code>__attribute__((packed, aligned(64)))&lt;/code> attribute in C++ to ensure that each chunk aligns perfectly with a cache line. Bitfields were employed to pack the data efficiently within the chunk.&lt;/p>
&lt;/blockquote>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-cpp" data-lang="cpp">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nf">__attribute__&lt;/span>&lt;span class="p">((&lt;/span>&lt;span class="n">packed&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">aligned&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">64&lt;/span>&lt;span class="p">)))&lt;/span> &lt;span class="n">Tree_pointers&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">private&lt;/span>&lt;span class="o">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// We only store the exact ID of parent
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nl">parent&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">CHUNK_BITS&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">CHUNK_SHIFT&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nl">next_sibling&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">CHUNK_BITS&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nl">prev_sibling&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">CHUNK_BITS&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Long child pointers
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nl">first_child_l&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">CHUNK_BITS&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nl">last_child_l&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">CHUNK_BITS&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Short (delta) child pointers
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// You cannot make an array of bitfields inside a packed
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// struct, since the compiler will align each bitfield to the
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// size of the nearest power of two.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">first_child_s_0&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">first_child_s_1&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">first_child_s_2&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">first_child_s_3&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">first_child_s_4&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">first_child_s_5&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">first_child_s_6&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">last_child_s_0&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">last_child_s_1&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">last_child_s_2&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">last_child_s_3&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">last_child_s_4&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">last_child_s_5&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Short_delta&lt;/span> &lt;span class="nl">last_child_s_6&lt;/span> &lt;span class="p">:&lt;/span> &lt;span class="n">SHORT_DELTA&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="build-append---short-delta-heuristic">Build Append - Short Delta Heuristic&lt;/h3>
&lt;p>Empirical observations show that children are often added to a node shortly after the parent, meaning they are stored close to the parent in memory. This allows children to be stored as a delta from the parent, reducing the need for full chunk IDs.&lt;/p>
&lt;p>When adding a child:&lt;/p>
&lt;ul>
&lt;li>Attempt to store the child as a delta from the parent.&lt;/li>
&lt;li>If not feasible, allocate a new chunk for the parent and store the pointer to the child chunk in the newly created parent chunk.&lt;/li>
&lt;/ul>
&lt;p>Implementing chunk breaking required careful handling to ensure that when a parent moves to a new chunk, its new chunk can still be referenced efficiently by its parent, potentially requiring recursive adjustments.&lt;/p>
&lt;blockquote>
&lt;p>This is because the grandparent might not be able to store the parent as a delta from itself after the parent moves to a new chunk.&lt;/p>
&lt;/blockquote>
&lt;h2 id="compliance-with-the-livehd-core-repository">Compliance with the LiveHD core repository&lt;/h2>
&lt;p>Since the HHDS tree is an evolution of the LHTree, it was crucial to maintain compatibility with the LiveHD core repository. All necessary methods were implemented in the HHDS tree to ensure seamless integration. Naming conventions and syntax were kept consistent with the LHTree to facilitate a smooth transition.&lt;/p>
&lt;p>Exposed methods in the HHDS tree are:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-cpp" data-lang="cpp">&lt;span class="line">&lt;span class="cl">&lt;span class="cm">/**
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="cm"> * Query based API (no updates)
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="cm"> */&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nf">get_parent&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">Tree_pos&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">curr_index&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">const&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nf">get_last_child&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">Tree_pos&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">parent_index&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">const&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nf">get_first_child&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">Tree_pos&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">parent_index&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">const&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">bool&lt;/span> &lt;span class="nf">is_last_child&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">Tree_pos&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">self_index&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">const&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">bool&lt;/span> &lt;span class="nf">is_first_child&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">Tree_pos&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">self_index&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">const&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nf">get_sibling_next&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">Tree_pos&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">sibling_id&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">const&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nf">get_sibling_prev&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">Tree_pos&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">sibling_id&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">const&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">bool&lt;/span> &lt;span class="nf">is_leaf&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">Tree_pos&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">leaf_index&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">const&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="cm">/**
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="cm"> * Update based API (Adds and Deletes from the tree)
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="cm"> */&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// FREQUENT UPDATES
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nf">append_sibling&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">Tree_pos&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">sibling_id&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="k">const&lt;/span> &lt;span class="n">X&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">data&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nf">add_child&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">Tree_pos&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">parent_index&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="k">const&lt;/span> &lt;span class="n">X&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">data&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nf">add_root&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">X&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">data&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">void&lt;/span> &lt;span class="nf">delete_leaf&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">Tree_pos&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">leaf_index&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">void&lt;/span> &lt;span class="nf">delete_subtree&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">Tree_pos&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">subtree_root&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// INFREQUENT UPDATES
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">Tree_pos&lt;/span> &lt;span class="nf">insert_next_sibling&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">const&lt;/span> &lt;span class="n">Tree_pos&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">sibling_id&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">const&lt;/span> &lt;span class="n">X&lt;/span>&lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">data&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h1 id="benchmarking-results">Benchmarking Results&lt;/h1>
&lt;p>Preliminary benchmarks indicate that the HHDS tree outperforms the LHTree in both runtime efficiency (for certain cases, more on this in a later section) and memory consumption. The HHDS tree demonstrates enhanced performance across various tests, offering a more optimized solution for handling Abstract Syntax Tree (AST) operations.&lt;/p>
&lt;p>I constructed identical trees using both the LHTree and HHDS tree structures and executed a series of queries on each. The benchmarks were performed using Google Benchmark to ensure accurate and consistent results. Below, I detail the specific tests conducted.&lt;/p>
&lt;h3 id="benchmark-tests-overview">Benchmark Tests Overview&lt;/h3>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Deep Tree Test&lt;/strong>&lt;br>
This test simulates a line graph by repeatedly adding a child to the last node in the tree. It is designed to assess the tree&amp;rsquo;s performance when handling deep structures, where each node has a single child.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Wide Tree Test&lt;/strong>&lt;br>
In this scenario, a single root node is created, followed by the addition of numerous child nodes directly under the root. This test evaluates the tree&amp;rsquo;s efficiency in managing wide structures with many immediate children.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Chip-Typical Tree Test&lt;/strong>&lt;br>
This test models a tree commonly seen in hardware design. For each node, a random number of children (ranging from 1 to 7) are added, and the process is recursively applied to the leaf nodes up to a certain depth. This test measures the tree&amp;rsquo;s performance in realistic, varied conditions.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Chip-Typical (Long) Tree Test&lt;/strong>&lt;br>
Similar to the Chip-Typical Tree Test, but with a broader range of children per node (1 to 20). This test is particularly useful for examining performance when the tree is more complex and chunk splitting is more likely.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>These tests provide a comprehensive analysis of the HHDS tree&amp;rsquo;s capabilities, highlighting its superiority over the LHTree for deeper trees.&lt;/p>
&lt;h2 id="addappend-benchmarks">Add/Append Benchmarks&lt;/h2>
&lt;h3 id="deep-tree-test">Deep Tree Test&lt;/h3>
&lt;blockquote>
&lt;p>&lt;code>test_deep_tree_100_hhds&lt;/code> indicates the time taken to run a benchmark on a deep tree of 100 nodes using the HHDS tree structure. This nomenclature is consistent across all tests.&lt;/p>
&lt;/blockquote>
&lt;h4 id="disabled-compiler-optimizations">Disabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10_hhds 11704 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10_lh 19541 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100_hhds 85317 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100_lh 163058 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000_hhds 760260 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000_lh 1442391 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000_hhds 9889199 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000_lh 16215232 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100000_hhds 84650074 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100000_lh 163255882 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000000_hhds 877646208 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000000_lh 1659725904 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000000_hhds 9256118059 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000000_lh 1.4431e+10 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h4 id="enabled-compiler-optimizations">Enabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10_hhds 1443 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10_lh 1462 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100_hhds 7398 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100_lh 17455 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000_hhds 79544 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000_lh 165656 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000_hhds 1337406 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000_lh 1494153 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100000_hhds 12288324 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100000_lh 14897463 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000000_hhds 116810846 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000000_lh 188815892 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000000_hhds 2338596582 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000000_lh 2238844395 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Here, the HHDS tree structure consistently outperforms the LHTree in the Deep Tree Test, showcasing its efficiency in handling deep tree structures.&lt;/p>
&lt;h3 id="wide-tree-test">Wide Tree Test&lt;/h3>
&lt;h4 id="disabled-compiler-optimizations-1">Disabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10_hhds 6581 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10_lh 6235 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100_hhds 34911 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100_lh 35734 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000_hhds 323228 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000_lh 312755 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000_hhds 3547963 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000_lh 2975894 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100000_hhds 33800125 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100000_lh 32538424 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000000_hhds 332509041 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000000_lh 336261868 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000000_hhds 3527352810 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000000_lh 8774024963 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h4 id="enabled-compiler-optimizations-1">Enabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10_hhds 837 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10_lh 512 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100_hhds 3394 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100_lh 2675 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000_hhds 26019 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000_lh 20141 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000_hhds 319068 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000_lh 245964 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100000_hhds 3369183 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100000_lh 2910862 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000000_hhds 39243340 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000000_lh 26777306 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000000_hhds 454508781 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000000_lh 331688046 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Here without compiler optimizations, the HHDS tree structure typically outperforms the LHTree in the Wide Tree Test for large tree sizes. For smaller tree sizes, the LHTree showed a slightly better performance. However, using compiler optimizations, the LHTree starts to perform better than HHDS.&lt;/p>
&lt;blockquote>
&lt;p>The reason for the HHDS tree&amp;rsquo;s superior performance can be attributed to the chunk size being large, which allows for better cache utilization and reduced memory overhead. However, the LH Tree has been put through more tuning and has been in use for a longer time, which could explain its better performance with compiler optimizations. In the future, the HHDS tree could be optimized further to match or exceed the LH Tree&amp;rsquo;s performance.&lt;/p>
&lt;/blockquote>
&lt;h3 id="chip-typical-tree-test">Chip Typical Tree Test&lt;/h3>
&lt;h4 id="disabled-compiler-optimizations-2">Disabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">--------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">--------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_1_hhds 7109 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_1_lh 6803 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_2_hhds 22728 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_2_lh 22064 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_3_hhds 75398 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_3_lh 70910 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_4_hhds 270062 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_4_lh 254423 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_5_hhds 1110254 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_5_lh 1074439 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_6_hhds 5024264 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_6_lh 3900709 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_7_hhds/iterations:5 13290739 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_7_lh/iterations:5 22145462 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_8_hhds/iterations:5 83438683 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_8_lh/iterations:5 105475664 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h4 id="enabled-compiler-optimizations-2">Enabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">--------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">--------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_1_hhds 938 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_1_lh 387 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_2_hhds 1877 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_2_lh 1351 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_3_hhds 7095 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_3_lh 5052 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_4_hhds 35019 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_4_lh 21569 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_5_hhds 130915 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_5_lh 78010 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_6_hhds 522385 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_6_lh 278223 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_7_hhds/iterations:5 4015636 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_7_lh/iterations:5 1648426 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_8_hhds/iterations:5 9873724 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_8_lh/iterations:5 4607773 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>For the Chip Typical test, the HHDS tree&amp;rsquo;s performance is better for larger tree sizes, while the LHTree performs better for smaller tree sizes. However, with compiler optimizations, the LH Tree performs better than the HHDS tree.&lt;/p>
&lt;h3 id="chip-typical-long-tree-test">Chip Typical (long) Tree test&lt;/h3>
&lt;h4 id="disabled-compiler-optimizations-3">Disabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">-------------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">-------------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_1_hhds 8875 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_1_lh 8479 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_2_hhds 62490 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_2_lh 64620 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_3_hhds 625064 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_3_lh 654787 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_4_hhds 6128047 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_4_lh 6528778 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_5_hhds 71345448 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_5_lh 77170587 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_6_hhds/iterations:5 656595039 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_6_lh/iterations:5 860193491 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h4 id="enabled-compiler-optimizations-3">Enabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">-------------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">-------------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_1_hhds 1139 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_1_lh 692 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_2_hhds 8666 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_2_lh 5238 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_3_hhds 90856 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_3_lh 48758 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_4_hhds 1034346 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_4_lh 472964 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_5_hhds 13040238 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_5_lh 5025192 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_6_hhds/iterations:3 131143411 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_6_lh/iterations:3 68739573 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Similar to the previous case, the HHDS tree performs better in debug mode (without compiler optimizations). However, the LH Tree performs better with compiler optimizations.&lt;/p>
&lt;blockquote>
&lt;p>We see that the HHDS tree has shown overall better performance without compiler optimizations, however, with compiler optimizations, the LH Tree has shown better performance. HHDS Tree has shown better performance regardless, for the Deep Tree test. This indicates an inherent trade-off between the choice of both trees. To further investigate this behaviour I conducted some profiling, which is in a later section.&lt;/p>
&lt;/blockquote>
&lt;h2 id="iterators-benchmarks">Iterators Benchmarks&lt;/h2>
&lt;h3 id="deep-tree-test-1">Deep Tree test&lt;/h3>
&lt;h4 id="disabled-compiler-optimizations-4">Disabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">--------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">-------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10_hhds 884 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10_lh 1356 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100_hhds 7987 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100_lh 11191 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000_hhds 86991 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000_lh 105809 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000_hhds 894127 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000_lh 1076983 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100000_hhds 7927102 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100000_lh 11177187 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000000_hhds/iterations:4 80470145 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000000_lh/iterations:4 145763040 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000000_hhds/iterations:3 1055529435 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000000_lh/iterations:3 995416880 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h4 id="enabled-compiler-optimizations-4">Enabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10_hhds 202 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10_lh 93.1 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100_hhds 1595 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100_lh 1039 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000_hhds 15663 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000_lh 11000 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000_hhds 164778 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000_lh 107293 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100000_hhds 1615928 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_100000_lh 1260507 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000000_hhds 19582402 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_1000000_lh 15954697 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000000_hhds 214887559 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_deep_tree_10000000_lh 179118729 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="wide-tree-test-1">Wide Tree test&lt;/h3>
&lt;h4 id="disabled-compiler-optimizations-5">Disabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">-------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">-------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10_hhds 7171 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10_lh 7098 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100_hhds 6204 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100_lh 10372 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000_hhds 62762 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000_lh 106132 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000_hhds 622999 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000_lh 1124283 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100000_hhds 6118490 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100000_lh 9550170 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000000_hhds/iterations:10 59438777 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000000_lh/iterations:10 97842431 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000000_hhds/iterations:7 778347697 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000000_lh/iterations:7 1163215808 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h4 id="enabled-compiler-optimizations-5">Enabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10_hhds 2103 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10_lh 1284 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100_hhds 1563 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100_lh 632 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000_hhds 15627 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000_lh 6410 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000_hhds 149588 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000_lh 56030 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100000_hhds 1511278 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_100000_lh 563926 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000000_hhds 17056051 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_1000000_lh 7754815 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000000_hhds 143994848 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_wide_tree_10000000_lh 55040231 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="chip-typical-test">Chip typical test&lt;/h3>
&lt;h4 id="disabled-compiler-optimizations-6">Disabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">--------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">--------------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_1_hhds 344 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_1_lh 892 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_2_hhds 2192 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_2_lh 1691 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_3_hhds 13628 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_3_lh 14235 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_4_hhds 34049 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_4_lh 84096 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_5_hhds 206482 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_5_lh 203680 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_6_hhds 848996 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_6_lh 708212 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_7_hhds/iterations:5 3645372 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_7_lh/iterations:5 6657982 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_8_hhds/iterations:5 7375050 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_8_lh/iterations:5 4577351 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h4 id="enabled-compiler-optimizations-6">Enabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">-------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">-------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_1_hhds 93.1 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_1_lh 50.1 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_2_hhds 149 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_2_lh 212 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_3_hhds 1166 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_3_lh 554 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_4_hhds 7385 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_4_lh 3138 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_5_hhds 54477 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_5_lh 10643 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_6_hhds 215050 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_6_lh 53043 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_7_hhds 492555 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_7_lh 577120 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_8_hhds 2630675 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_tree_8_lh 1278702 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="chip-typical-long-test">Chip typical (long) test&lt;/h3>
&lt;h4 id="disabled-compiler-optimizations-7">Disabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_1_hhds 911 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_1_lh 1435 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_2_hhds 8161 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_2_lh 8619 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_3_hhds 76618 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_3_lh 132467 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_4_hhds 1644808 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_4_lh 1962406 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_5_hhds 7199648 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_5_lh 9195894 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_6_hhds 169002499 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_6_lh 207296570 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h4 id="enabled-compiler-optimizations-7">Enabled compiler optimizations&lt;/h4>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Benchmark Time
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">------------------------------------------------
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_1_hhds 223 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_1_lh 101 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_2_hhds 2270 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_2_lh 719 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_3_hhds 38291 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_3_lh 12547 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_4_hhds 294222 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_4_lh 187010 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_5_hhds 4721230 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_5_lh 835256 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_6_hhds 30302468 ns
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">test_chip_typical_long_tree_6_lh 10057136 ns
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Overall, both add/append and iterators related benchmarks show an improvement in performance. Without compiler optimizations, HHDS tree performs better than the LH Tree. With compiler optimizations, there are similar differences in the traversal benchmarks. We will now look at some profiling that was done to identify the bottlenecks in the HHDS tree.&lt;/p>
&lt;h2 id="exceptions-and-a-reminder-of-why-they-are-slow">Exceptions, and a reminder of why they are slow.&lt;/h2>
&lt;p>When looking at the performance difference between the HHDS tree and LH tree (after enabling compiler optimizations), I was shocked to see that the HHDS tree was performing worse than the LH tree by multiple orders of magnitude upon using exceptions. This was a surprise to me, as I had not expected exceptions to have such a large impact on performance.&lt;/p>
&lt;p>The reason this happens is because exceptions are slow. When an exception is thrown, the stack is unwound, and the program has to jump to the catch block. This is a slow process, and should be avoided in performance-critical code. Moreover, the compiler cannot optimize code with exceptions as well as it can without them. This is why the HHDS tree performs so much worse than the LH tree when exceptions are enabled. But the HHDS tree still wasn&amp;rsquo;t performing as well as it should have been.&lt;/p>
&lt;h1 id="profiling">Profiling&lt;/h1>
&lt;p>I used &lt;code>callgrind&lt;/code> to profile the HHDS tree and identify potential bottlenecks. The profiling results provided valuable insights into the tree&amp;rsquo;s performance and areas for optimization. I generated a call graph using &lt;code>KCachegrind&lt;/code> and analyzed the function calls to determine the most time-consuming operations.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Profiling results" srcset="
/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/fig3_hubc3fa7f2ca383621c0ea38621e28abe1_254926_06163f8afdc871f89387a8c1724d9e28.webp 400w,
/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/fig3_hubc3fa7f2ca383621c0ea38621e28abe1_254926_731e96b9cf72b9d02381dec918d2530f.webp 760w,
/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/fig3_hubc3fa7f2ca383621c0ea38621e28abe1_254926_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre24/ucsc/livehd/20240824-ujjwalshekhar/fig3_hubc3fa7f2ca383621c0ea38621e28abe1_254926_06163f8afdc871f89387a8c1724d9e28.webp"
width="760"
height="683"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>The call graph clearly shows that the bottleneck is the &lt;code>_create_space&lt;/code> call that is tasked with creating space for a new node. This function is called when a new node is added to the tree, and its performance directly impacts the tree&amp;rsquo;s efficiency.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">inline Tree_pos _create_space(const X&amp;amp; data) {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> // Make space for CHUNK_SIZE number of entries at the end
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> data_stack.emplace_back(data);
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> for (int i = 0; i &amp;lt; CHUNK_MASK; i++) {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> data_stack.emplace_back();
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> }
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> // Add the single pointer node for all CHUNK_SIZE entries
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> pointers_stack.emplace_back();
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> return pointers_stack.size() - 1;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>However, the &lt;code>_create_space&lt;/code> function is relatively simple and should not be causing such a significant performance hit. This indicates that the issue may lie in the memory allocation process or the data structure itself. One possible way of dealing with this would be to increase chunk sizes, or enable dynamic chunk sizing, which would allow for more efficient memory allocation.&lt;/p>
&lt;p>Another possible bottleneck, seems to be any amount of computation that will be done to find the next vacant space in the chunk (like in &lt;code>get_last_child()&lt;/code>). This is because the chunk is a fixed size, and if the chunk is full, the program will have to search for the next chunk that has space. This is a linear operation, and can be slow for wide trees. To fix this, I tried to add extra bookkeeping in the &lt;code>Tree_pointers&lt;/code> node structure:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">class __attribute__((packed, aligned(64))) Tree_pointers {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">private:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> // We only store the exact ID of parent
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> Tree_pos parent : CHUNK_BITS + CHUNK_SHIFT;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> Tree_pos next_sibling : CHUNK_BITS;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> Tree_pos prev_sibling : CHUNK_BITS;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> // Long child pointers
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> Tree_pos first_child_l : CHUNK_BITS;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> Tree_pos last_child_l : CHUNK_BITS;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> // Storing the last occupied index in the short delta
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> // This is to avoid iterating over all short deltas
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> // to find the last occupied index
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> unsigned short last_occupied : CHUNK_SHIFT;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> // Short (delta) child pointers
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> Short_delta first_child_s_0 : SHORT_DELTA;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> Short_delta first_child_s_1 : SHORT_DELTA;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> ...
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>However, the improvement in performance was marginal after making this change. This indicates that the issue may be more complex and require further investigation. This tree has also been added to the repository, in case a future contributor might be able to make use of it.&lt;/p>
&lt;p>There are other possible bottlenecks that might be coming from storing separate short deltas instead of reducing the size of the delta and packing it into a single large integer type. I will be implementing this idea in the future.&lt;/p>
&lt;h1 id="code-contributions">Code contributions&lt;/h1>
&lt;p>All of my Pull requests and code changes here made on the &lt;a href="https://github.com/masc-ucsc/hhds/graphs/contributors" target="_blank" rel="noopener">HHDS repository&lt;/a>. Each contribution has undergone thorough review and been successfully merged into the main repository:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/masc-ucsc/hhds/pull/32" target="_blank" rel="noopener">https://github.com/masc-ucsc/hhds/pull/32&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/masc-ucsc/hhds/pull/37" target="_blank" rel="noopener">https://github.com/masc-ucsc/hhds/pull/37&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/masc-ucsc/hhds/pull/38" target="_blank" rel="noopener">https://github.com/masc-ucsc/hhds/pull/38&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/masc-ucsc/hhds/pull/41" target="_blank" rel="noopener">https://github.com/masc-ucsc/hhds/pull/41&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/masc-ucsc/hhds/pull/47" target="_blank" rel="noopener">https://github.com/masc-ucsc/hhds/pull/47&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/masc-ucsc/hhds/pull/48" target="_blank" rel="noopener">https://github.com/masc-ucsc/hhds/pull/48&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/masc-ucsc/hhds/pull/54" target="_blank" rel="noopener">https://github.com/masc-ucsc/hhds/pull/54&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Additionally, we are planning to integrate these changes into the LiveHD repository in the near future.&lt;/p>
&lt;h1 id="conclusion-and-future-work">Conclusion and Future Work&lt;/h1>
&lt;p>Working on this project has been a valuable learning experience, particularly in applying core C++ features. I discovered that simple, fundamentally sound optimizations often outperform more complex ones. The greatest challenge for me was to steer through the changes in our original Plan of Action, however, due to the support and guidance from my mentors I was able to make it.&lt;/p>
&lt;p>There are still areas where the HHDS tree can be improved to make it more robust. One area of future exploration is dynamic chunk sizing:&lt;/p>
&lt;blockquote>
&lt;p>Dynamic Chunk Sizing: Instead of using fixed 8-sized chunks as we did, we could implement multiple chunk sizes. This would allow users to &amp;ldquo;hint&amp;rdquo; the HHDS tree to use specific chunk types, potentially reducing memory consumption further.&lt;/p>
&lt;/blockquote>
&lt;p>Overall, the HHDS tree has shown promise in handling deep tree structures efficiently. With further optimization and enhancements, it can become a powerful tool for handling complex tree operations.&lt;/p>
&lt;h1 id="acknowledgements">Acknowledgements&lt;/h1>
&lt;p>I would like to thank my mentors, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/jose-renau/">Jose Renau&lt;/a> and &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/sakshi-garg/">Sakshi Garg&lt;/a> for their guidance and support throughout the project. It would not have been possible without their help. Their insights and mentorship have significantly contributed to my learning and the success of this work.&lt;/p></description></item></channel></rss>