<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Manank Patel | UCSC OSPO</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/manank-patel/</link><atom:link href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/manank-patel/index.xml" rel="self" type="application/rss+xml"/><description>Manank Patel</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><image><url>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/manank-patel/avatar_hu2d9e0dcae77518c9aee7e231d85bf8c2_721322_270x270_fill_q75_lanczos_center.jpg</url><title>Manank Patel</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/author/manank-patel/</link></image><item><title>KV store final Blog</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/kvstore/20230825-manank/</link><pubDate>Fri, 25 Aug 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/kvstore/20230825-manank/</guid><description>&lt;p>Hello again!
Before we get started, take a look at my previous blogs, &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/kvstore/20230526-manank">Introduction&lt;/a> and
&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/kvstore/20230730-manank">Mid Term&lt;/a>. The goal of the project was to implement io_uring based backend driver for client side, which was at
that time using traditional sockets. The objective was improving performance from the zero copy capabilities of io uring. In the process, I learnt about many things,
about &lt;a href="https://gitlab.com/kinetic-storage/libkinetic/-/tree/develop" target="_blank" rel="noopener">libkinetic&lt;/a> and KV stores in general.&lt;/p>
&lt;p>I started by writing a separate driver using io_uring in libkinetic/src in ktli_uring.c, most of which is similar to the sockets backend in ktli_sockets.c. The only
difference was in the send and receive functions. For more detailed description about the implementation, refer to the mid term blog.&lt;/p>
&lt;p>After the implementation, it was time to put it to test. We ran extensive benchmarks with a tool called &lt;a href="https://fio.readthedocs.io/en/latest/fio_doc.html" target="_blank" rel="noopener">fio&lt;/a>, which
is generally used to run tests on filesystems and other IO related things. Thanks to Philip, who had already written an IO engine for testing kinetic KV store (&lt;a href="https://github.com/pkufeldt/fio" target="_blank" rel="noopener">link&lt;/a>), I didn&amp;rsquo;t have much problem in setting up the testbench. Again thanks to Philip, He set up a ubuntu server with the kinetic server
and gave me access through ssh. We ran extensive tests on that server, with both socket and uring backends, with several different block sizes. The link to the benchmarks sheet can be found &lt;a href="https://docs.google.com/spreadsheets/d/1HE7-KbxSqYZ3vmTZiJYoq21P7zfymU7N/edit?usp=sharing&amp;amp;ouid=116274960434137108384&amp;amp;rtpof=true&amp;amp;sd=true" target="_blank" rel="noopener">here&lt;/a>.&lt;/p>
&lt;p>We spent a lot of time in reading and discussing the numbers, probably the most time consuming part of the project, we had several long discussions analyzing numbers
and their implications, for example in the initial tests, we were getting very high std dev in mean send times, then we figured it was because of the network
bottleneck, as we were using large block sizes and filling up the 2.5G network bandwidth quickly.&lt;/p>
&lt;p>In conclusion, we found out that there are many other major factors affecting the performance of the KV store, for example the network, and the server side of the KV
store. Thus, though io_uring offers performance benefit at the userspace-kernel level, in this case, there were other factors that had more significant effect than the
kernal IO stack on the client side. Thus, for increasing the performance, we need to look at the server side&lt;/p>
&lt;p>I would like to thank Philip and Aldrin for their unwavering support and in depth discussions on the topic in our weekly meetings, I learned a lot from them
throughout the entire duration of the project.&lt;/p></description></item><item><title>Implemented IO uring for Key-Value Drives</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/kvstore/20230730-manank/</link><pubDate>Mon, 31 Jul 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/kvstore/20230730-manank/</guid><description>&lt;p>Hi everyone!&lt;/p>
&lt;p>I&amp;rsquo;m Manank Patel, (&lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/kvstore/20230526-manank">link&lt;/a> to my Introduction post) and am currently working on &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/kvstore">Efficient Communication with Key/Value Storage Devices&lt;/a>. The goal of the project was to leverage the capabilities of io_uring and implement a new backend driver.&lt;/p>
&lt;p>In the existing sockets backend, we use non-blocking sockets with looping to ensure all the data is written. Here is a simplified flow diagram for the
same. The reasoning behind using non blocking sockets and TCP_NODELAY is to get proper network utilization. This snippet from the code explains it further.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">NODELAY means that segments are always sent as soon as possible,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">even if there is only a small amount of data. When not set,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">data is buffered until there is a sufficient amount to send out,
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">thereby avoiding the frequent sending of small packets, which
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">results in poor utilization of the network. This option is
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">overridden by TCP_CORK; however, setting this option forces
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">an explicit flush of pending output, even if TCP_CORK is
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">currently set.
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Sockets flow" srcset="
/report/osre23/ucsc/kvstore/20230730-manank/ktli_socket_huf9f86d17a6f220de349bb1b61ce1052f_93743_fe3f3d8030752b92e5fb87ea1d67e0c2.webp 400w,
/report/osre23/ucsc/kvstore/20230730-manank/ktli_socket_huf9f86d17a6f220de349bb1b61ce1052f_93743_44c789c0dc2dbae770c40595d35ae941.webp 760w,
/report/osre23/ucsc/kvstore/20230730-manank/ktli_socket_huf9f86d17a6f220de349bb1b61ce1052f_93743_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/kvstore/20230730-manank/ktli_socket_huf9f86d17a6f220de349bb1b61ce1052f_93743_fe3f3d8030752b92e5fb87ea1d67e0c2.webp"
width="469"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>In the above figure, we have a &lt;a href="https://gitlab.com/kinetic-storage/libkinetic/-/blob/manank/src/ktli_socket.c?ref_type=heads#L436" target="_blank" rel="noopener">loop&lt;/a> with a writev call, and we check the return value and if all the data has not been written, then we modify the
offsets and then loop again, otherwise, if all the data has been written, we exit the loop and return from the function. Now this works well with traditional sockets, as we get the return value from the writev call as soon as it returns. In case of io_uring, if we try to follow the same design, we get the
following flow diagram.
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="uring flow" srcset="
/report/osre23/ucsc/kvstore/20230730-manank/ktli_uring_nonb_huf47400b8be9e2650586ffc8c37d95fc6_108831_eaf262f65651ce613bf0a033f897afde.webp 400w,
/report/osre23/ucsc/kvstore/20230730-manank/ktli_uring_nonb_huf47400b8be9e2650586ffc8c37d95fc6_108831_bc898fc227145dff9464f87e8f66363f.webp 760w,
/report/osre23/ucsc/kvstore/20230730-manank/ktli_uring_nonb_huf47400b8be9e2650586ffc8c37d95fc6_108831_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/kvstore/20230730-manank/ktli_uring_nonb_huf47400b8be9e2650586ffc8c37d95fc6_108831_eaf262f65651ce613bf0a033f897afde.webp"
width="417"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>Here, as you can see, there are many additional steps/overhead if we want to check the return value before sending the
next writev, as we need to know how many bytes has been written till now to change the offsets and issue
the next request accordingly. Thus, in every iteration of the loop we need to to get an sqe, prep it for writev, then
submit it, and then get a CQE, and then wait for the CQE to get the return value of writev call.&lt;/p>
&lt;p>The alternate approach would be to write the full message/iovec atomically in one call, as shown in following diagram.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="possible uring flow" srcset="
/report/osre23/ucsc/kvstore/20230730-manank/ktli_uring_ideal_hu2d99f0bee974127b66eb083c255358d0_60614_df20a0788e55e56bf7af70d91c7275c6.webp 400w,
/report/osre23/ucsc/kvstore/20230730-manank/ktli_uring_ideal_hu2d99f0bee974127b66eb083c255358d0_60614_056949985d6ef71540ba0c4992f11376.webp 760w,
/report/osre23/ucsc/kvstore/20230730-manank/ktli_uring_ideal_hu2d99f0bee974127b66eb083c255358d0_60614_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/kvstore/20230730-manank/ktli_uring_ideal_hu2d99f0bee974127b66eb083c255358d0_60614_df20a0788e55e56bf7af70d91c7275c6.webp"
width="535"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>However, on trying this method, and running fio tests, we noticed that it worked well with smaller block sizes, like
16k, 32k and 64k, but was failing constantly with larger block sizes like 512k or 1m. This was because it was not able to
write all the data to the socket in one go. This method showed good results as compared to sockets backend (for small BS
i.e). We tried to increase the send/recv buffers to 1MiB-10MiB but it still struggled with larger blocksizes.&lt;/p>
&lt;p>Going forward, we discussed a few ideas to understand the performance trade-offs. One is to use a static variable and increment it on
every loop iteration, in this way we can find out if that is really the contirbuting factor to our problem. Another idea
is to break down the message in small chunks, say 256k and and set up io uring with sqe polling and then link and submit
those requests in loop, without calling io_uring_submit and waiting for CQE. The plan is to try these ideas, discuss and
come up with new ideas on how we can leverage io_uring for ktli backend.&lt;/p></description></item><item><title>Efficient Communication with Key/Value Storage Devices</title><link>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/kvstore/20230526-manank/</link><pubDate>Fri, 26 May 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-1007--ucsc-ospo.netlify.app/report/osre23/ucsc/kvstore/20230526-manank/</guid><description>&lt;p>Hi everyone!&lt;/p>
&lt;p>I&amp;rsquo;m Manank Patel, and am currently an undergraduate student at Birla Institute of Technology and Sciences - Pilani, KK Birla Goa Campus. As part of the &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/project/osre23/ucsc/kvstore">Efficient Communication with Key/Value Storage Devices&lt;/a> my &lt;a href="https://drive.google.com/file/d/1iJIlHuCpnvDeOyr5DphDDimqdl9s4hKH/view?usp=sharing" target="_blank" rel="noopener">proposal&lt;/a> under the mentorship of &lt;a href="https://deploy-preview-1007--ucsc-ospo.netlify.app/author/aldrin-montana/">Aldrin Montana&lt;/a> and &lt;strong>Philip Kufeldt&lt;/strong> aims to implement io_uring based communication backend for network based key-value store.&lt;/p>
&lt;p>io_uring offers a new kernel interface that can improve performance and avoid the overhead of system calls and zero copy network transmission capabilities. The KV store clients utilize traditional network sockets and POSIX APIs for their communication with the KV store. A notable advancement that has emerged in the past two years is the introduction of a new kernel interface known as io_uring, which can be utilized instead of the POSIX API. This fresh interface employs shared memory queues to facilitate communication between the kernel and user, enabling data transfer without the need for system calls and promoting zero copy transfer of data. By circumventing the overhead associated with system calls, this approach has the potential to enhance performance significantly.&lt;/p></description></item></channel></rss>