<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Leetcode on Dear Fortuna</title><link>http://dearfortuna.blog/tags/leetcode/</link><description>Recent content in Leetcode on Dear Fortuna</description><generator>Hugo -- gohugo.io</generator><language>en-US</language><copyright>© 2026 Vasyl Bodnar | Text: CC BY 4.0 | Code: BSD-3</copyright><lastBuildDate>Sun, 18 Jan 2026 16:00:00 -0400</lastBuildDate><atom:link href="http://dearfortuna.blog/tags/leetcode/index.xml" rel="self" type="application/rss+xml"/><item><title>Finding out My Hashtable is Awful</title><link>http://dearfortuna.blog/posts/finding-out-my-hashtable-is-awful/</link><pubDate>Sun, 18 Jan 2026 16:00:00 -0400</pubDate><guid>http://dearfortuna.blog/posts/finding-out-my-hashtable-is-awful/</guid><description>&lt;h2 id="intro">Intro&lt;/h2>
&lt;p>I once found myself bored, though, not quite the useful kind of boredom.
I did not want to do my projects or something nice.
At the same time, I did not want to just spend it watching youtube or similar.
Thus, I thought, might as well do some leetcode.&lt;/p>
&lt;p>I am not particularly fond of leetcode generally.
Some algorithms are nice, most are not, and I rarely learn as opposed to &amp;ldquo;memorize&amp;rdquo; patterns.
Doing union-find on leetcode rarely feels as nice as using it for constant-folding optimizations in a compiler.
Still, leetcode is necessary for a lot of interviews (for now, given AI and grindflation).&lt;/p></description><content:encoded><![CDATA[<h2 id="intro">Intro</h2>
<p>I once found myself bored, though, not quite the useful kind of boredom.
I did not want to do my projects or something nice.
At the same time, I did not want to just spend it watching youtube or similar.
Thus, I thought, might as well do some leetcode.</p>
<p>I am not particularly fond of leetcode generally.
Some algorithms are nice, most are not, and I rarely learn as opposed to &ldquo;memorize&rdquo; patterns.
Doing union-find on leetcode rarely feels as nice as using it for constant-folding optimizations in a compiler.
Still, leetcode is necessary for a lot of interviews (for now, given AI and grindflation).</p>
<p>I went through a couple of algorithms, doing all of them in C to provide a modicum of joy.
Well as close as you can get to joy given extra annoyances.
E.g. leetcode C compiler setup fails on signed integer overflow.
This requires <code>-fsanitize=signed-integer-overflow</code> on my gcc setup.
This setting has its uses, but not when I just wanted to do a quick fnv1a.</p>
<p>Anyway, some problems went well, usually those I knew or those that are obvious to me.
Some did not, and I had to give up and look up the solution and try to understand it.
There were a few that relied on stuff that would take me a while to implement in C too.</p>
<p>One problem I had was <a href="https://leetcode.com/problems/contains-duplicate-ii/description/">&ldquo;261. Contains Duplicate II&rdquo;</a>.
I started with a simple naive double loop, essentially doing sliding window, but it left me wanting.
I was nearing the end of my leetcode energy, so I decided to look up the solution.
Hashtable, obviously. Very simple too, just a get and a put in a loop.</p>
<p>C does not have a hashtable natively.
Leetcode apparently provides <a href="https://support.leetcode.com/hc/en-us/articles/360011833974-What-are-the-environments-for-the-programming-languages">uthash</a>,
though I had never seen it in a wild (becomes obvious why when you see the ergonomics).
There is also the libc <a href="https://linux.die.net/man/3/hsearch">hash table</a>, but that is just one of POSIX April Fools jokes.</p>
<p>Anyway, I implemented one recently while testing out my C build system (Guile script, nothing too fancy, though it does cache).
It even has SSE2 SIMD and 64bit SWAR (SIMD Within A Register).
So I thought it would be good practice to do another one, and it very much was in hindsight.</p>
<h2 id="spec-matters">Spec matters</h2>
<p><code>got</code>, good-old-table as I called it, based itself on Abseil SwissTable.
Except I tried to avoid just reimplementing someone&rsquo;s solution. I only read an overview and skimped on the details.
It sounded simple enough.</p>
<p>There was a point though where I wondered why Abseil seemed to use tombstones.
But, I did not overthink it and thought I will learn it eventually.</p>
<p>I also thought about doing benchmarks to compare to at least C++ STL <code>std::unordered_map</code>.
But, that could have been too much work for a not too serious hashtable.
Especially since it was more useful to compare to more than one library.</p>
<p>Thus, I decided to do something similar for my solution.
But doing SWAR, and especially with <code>int</code>s as opposed to 64 bit values seemed like a waste of time.
A speedup, sure, but not that serious.
As such, I took to just doing linear probing.
It should be good enough, that&rsquo;s the base for a SwissTable anyway.</p>
<h2 id="absolute-failure">Absolute failure</h2>
<p>So straight from memory I implemented a simple linear probing hashtable.</p>
<p>Here is its struct:</p>





<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span><span style="color:#66d9ef">struct</span> HashMap {
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">int</span> capacity;
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">int</span> length;
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">struct</span> {
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">int</span> valid, key, val;
</span></span><span style="display:flex;"><span>    } arr[];
</span></span><span style="display:flex;"><span>};</span></span></code></pre></div><p>Nothing too involved besides the zero length array trick. That is just used to allocate everything in a single allocation.</p>
<p>I originally did not include <code>length</code> for funny reasons, but it is necessary.
Well, you can optimize it out given that leetcode tests will never get that bad, but let&rsquo;s not get into that.</p>
<p>So after finishing up and cleaning up any immediate compile-time errors, I ran the solution on basic tests.
Success, and given how simple the solution really is, that was to be expected.</p>
<p>Here is how the &ldquo;solution function&rdquo; looks:</p>





<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span><span style="color:#66d9ef">bool</span> <span style="color:#a6e22e">containsNearbyDuplicate</span>(<span style="color:#66d9ef">int</span> <span style="color:#f92672">*</span>nums, <span style="color:#66d9ef">int</span> numsSize, <span style="color:#66d9ef">int</span> k) {
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">struct</span> HashMap <span style="color:#f92672">*</span>map <span style="color:#f92672">=</span> <span style="color:#a6e22e">create_ht</span>(<span style="color:#ae81ff">64</span>);
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">for</span> (<span style="color:#66d9ef">int</span> i <span style="color:#f92672">=</span> <span style="color:#ae81ff">0</span>; i <span style="color:#f92672">&lt;</span> numsSize; i<span style="color:#f92672">++</span>) {
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">int</span> <span style="color:#f92672">*</span>p <span style="color:#f92672">=</span> <span style="color:#a6e22e">get_ht</span>(map, nums[i]);
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">if</span> (p <span style="color:#f92672">&amp;&amp;</span> i <span style="color:#f92672">-</span> <span style="color:#f92672">*</span>p <span style="color:#f92672">&lt;=</span> k) {
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">return</span> true;
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>        <span style="color:#a6e22e">put_ht</span>(<span style="color:#f92672">&amp;</span>map, nums[i], i);
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> false;
</span></span><span style="display:flex;"><span>}</span></span></code></pre></div><p>Freeing wastes cycles. Anyway,
satisfied, I hit the submit button&hellip; Timed out.</p>
<p>That was weird. Even if my table is not particularly fast, that was orders of magnitude too slow.
This was on a testcase of only 54500 <code>int</code>s, so it should not take that long.</p>
<p>I tried a couple of optimizations. E.g. valid field is not necessary since a key can never be more than 2^30.
I even tried to just increase the preallocated memory to see if that could improve runtime, even if just for this one.</p>
<p>All of them were bandaids at best. This was a fundamental issue.</p>
<h2 id="evil-assumptions">Evil assumptions</h2>
<p>So what was at the core of my <code>put</code> and <code>get</code> that made this many times slower than it should be.</p>
<p>Well I had a simple assumption. The valid entry could be anywhere after the initial index from hash.
If you do not see the problem immediately, think about it.
What made it worse was that I do not need to delete anything.</p>
<p>Every entry is allocated right after the last one.
So I was forcibly and completely needlessly going through the entire hash table.
For every call to <code>get</code> and most calls to <code>put</code>.
Most calls to <code>put</code> was because I had this useful thing called early return.</p>
<p>Ironic that the problem itself also has an early return.</p>
<p>The fix to <code>get</code> was adding an <code>else return 0;</code>. Immediate improvement.</p>
<p>Here is it so you can see the error of my ways:</p>





<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span><span style="color:#66d9ef">for</span> (<span style="color:#66d9ef">int</span> i <span style="color:#f92672">=</span> init; i <span style="color:#f92672">&lt;</span> map<span style="color:#f92672">-&gt;</span>capacity; i<span style="color:#f92672">++</span>) {
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">if</span> (map<span style="color:#f92672">-&gt;</span>arr[i].valid) {
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> (map<span style="color:#f92672">-&gt;</span>arr[i].key <span style="color:#f92672">==</span> key) {
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">return</span> <span style="color:#f92672">&amp;</span>map<span style="color:#f92672">-&gt;</span>arr[i].val;
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  } <span style="color:#66d9ef">else</span> { <span style="color:#75715e">// just this part
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>    <span style="color:#66d9ef">return</span> <span style="color:#ae81ff">0</span>;
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}</span></span></code></pre></div><p>This is also a case of a small optimization hiding a better one.
The original code used <code>map-&gt;arr[i].valid &amp;&amp; map-&gt;arr[i].key == key</code>.
This couples them when they really should have been separate.</p>
<p>The fix to <code>put</code> was slightly longer, but essentially the same.</p>
<p>I only realized this after being very annoyed by it and testing it locally.</p>
<p>Thus, the problem was submitted and solved in reasonable time.</p>
<h2 id="unreasonable-time">Unreasonable time</h2>
<p>Ok, no. The time was ~800ms.
This counts as &ldquo;solved&rdquo;, but realistically extremely slow. This is array search in a loop speeds, if not worse.
The other solutions that leetcode presented were 100ms at worst. I was bottom 0.20%.</p>
<p>This prompted me to do some extra testing. Nothing too precise or involved, but enough to see the problems.
You can find the code for it <a href="https://github.com/Vasyl-Bodnar/contains-duplicate2-shenanigans">here</a>.
I will include the (incomplete) table here anyway to avoid spoilers:</p>
<table>
  <thead>
      <tr>
          <th>Name</th>
          <th>Time</th>
          <th>Time (-O2)</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>fine.c</td>
          <td>5 ms</td>
          <td>4 ms</td>
      </tr>
      <tr>
          <td>fine.cpp</td>
          <td>26 ms</td>
          <td>6 ms</td>
      </tr>
      <tr>
          <td>with_got.c</td>
          <td>2227 ms</td>
          <td>691 ms</td>
      </tr>
      <tr>
          <td>awful.c</td>
          <td>8292 ms</td>
          <td>1893 ms</td>
      </tr>
  </tbody>
</table>
<p><code>fine.c</code> is the good solution, <code>awful.c</code> is the original bad solution.
There is also <code>fine.cpp</code> which uses <code>unordered_map</code> and <code>with_got.c</code> which uses my <code>got</code> library.</p>
<p>The only test case I used was the one I timed out on.
Thus, this table is a little useless to compare my table and <code>unordered_map</code> in my opinion.
But it does show you the magnitudes of difference from the bad ones.</p>
<p>Still, it is interesting to see that my <code>fine.c</code> is better or comparable to <code>unordered_map</code>.
Yet, the C++ solution is finished in ~80ms, mine is not.</p>
<p>I then decided to apply the <code>valid</code> field removal I talked about.
I was able to get the time down to ~350ms, which is still far from ideal, but more manageable.</p>
<p>I also tried the <code>uthash</code> that leetcode provides.
That one gave me ~90ms.
The ergonomics and documentation were questionable. Lots of macros, which are not friendly to leetcode.
You have to create your own entry and even include a magic field for a hash handle.
The primary purpose of which seemed to be iteration, but could be more.
Well now that is a little sad.</p>
<p>I then went on to do some stronger optimizations.
I have started using static preallocated memory.
Got rid of length for good with that.
Tested different preallocation sizes.
I found that allocating 256KB is enough to pass the tests with flying colors.
Any more slowed down, any less slowed down.</p>
<p>Done in 2ms, and 16MB of memory per what leetcode reports. C++ uses ~100MB and uthash uses ~60MB for comparison.
Beats ~98% and ~96% respectively.
Pride restored, technically.</p>
<p>I included this as <code>best.c</code> in that <a href="https://github.com/Vasyl-Bodnar/contains-duplicate2-shenanigans">same repo</a>.</p>
<h2 id="what-happened">What happened</h2>
<p>Going from ~800ms to ~350ms by just removing the valid field is not too surprising.
This simply uses less memory which means we have to seek less if we hit a collision.
Better cache usage and whatnot, possibly compiler optimizations too (leetcode uses -O2).</p>
<p>Going from ~350ms to 2ms is a different question.
But the trick is that by having so much capacity (256KB),
I basically turned <code>get</code> and <code>put</code> into array access operations, not &ldquo;amortized&rdquo;, actual O(1).
At that point you could just create buckets for every used number.</p>
<p>I could and probably should do some <code>perf</code> testing to see whether there is another obvious mistake.
E.g. if anything hashes to the end of the array that would always force a resize, a potentially serious issue.
Still, I am satisfied with getting 2ms for now.</p>
<h2 id="what-did-we-learn">What did we learn</h2>
<p>I should go fix my <code>got</code> library.
I call it not too serious, but I cannot allow this level of underperfomance.</p>
<p>The main lesson would probably be an importance of assumptions.
If you take in wrong or expensive assumptions, you may suffer.
On the other hand, if you take in correct assumptions, you can benefit a lot from it.
This often involves a tradeoff with generality like what I did for <code>best.c</code>, but other kinds exist too.</p>
<p>Update: <code>got</code> library should now be fixed (for now, before more bad assumption show up).
I updated the repo by adding the (new) version, it is roughly on par with <code>unordered_map</code> there.
Though, as I said, comparing hash tables based on one testcase and only on sorted integers is not a good idea.</p>
]]></content:encoded></item></channel></rss>