TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Generic dynamic array in almost 2k lines of C

1 pointsby 6equj5over 1 year ago
Yet another C vector implementation! For both it and the title, I took inspiration from an implementation posted a few months ago: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=34973210">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=34973210</a><p>That implementation&#x27;s impressive, over my head, and unsafe, containing overflows and footguns. Its approach, to quote the author, is &quot;all C code is unsafe&quot;. ;p As a Rust fan, I agree, and that&#x27;s in line with my impression of the intrepid spirit of C. But, in Sisyphean defiance, I tried to make an implementation that&#x27;s the boring antithesis of that: long, straightforward, free of macros, and full of overflow checks and even agonizing nanny hand-holding like forcing the caller to check the size of the elements they&#x27;re adding to the vector or `memset()`ing removed elements&#x27; bytes to zero (which totally won&#x27;t get optimized out by the compiler, I&#x27;m sure).<p>While doing it, I learned two C techniques which I ended up using:<p>1)<p>You can have &quot;abstract data types&quot; (almost like classes!) in C, which is how the vector struct&#x27;s exposed.<p>The header just has the following typedef with no struct definition after it:<p><pre><code> typedef struct Vec Vec; </code></pre> <a href="https:&#x2F;&#x2F;github.com&#x2F;qiolol&#x2F;just-c-things&#x2F;blob&#x2F;main&#x2F;Vec&#x2F;Vec.h#L51">https:&#x2F;&#x2F;github.com&#x2F;qiolol&#x2F;just-c-things&#x2F;blob&#x2F;main&#x2F;Vec&#x2F;Vec.h#...</a><p>`struct Vec` is never defined in the header at all -- only in the `.c` file! This way, the internals are encapsulated in the implementation and only accessible&#x2F;modifiable via the functions exposed in the header, like a class&#x27;s private members would be!<p>There are apparently downsides to this, such as not being able to control how vectors are allocated and only being able to include vectors in other structs via such pointers. Someone described this as winding up with &quot;an everything-is-heap-allocated nest of pointers&quot;: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=35594787">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=35594787</a><p>2)<p>You can use <i>contiguous memory</i> to minimize allocations (and perhaps improve cache performance):<p><pre><code> struct Vec { uint8_t* data; &#x2F;&#x2F; ... }; </code></pre> <a href="https:&#x2F;&#x2F;github.com&#x2F;qiolol&#x2F;just-c-things&#x2F;blob&#x2F;main&#x2F;Vec&#x2F;Vec.c#L12">https:&#x2F;&#x2F;github.com&#x2F;qiolol&#x2F;just-c-things&#x2F;blob&#x2F;main&#x2F;Vec&#x2F;Vec.c#...</a><p>The vector stores everything in `data`, a `uint8_t` block.<p>`uint8_t`s are 8-bit (1-byte) integers, making `data` a block of bytes. This lets you do bytewise addressing with pointer arithmetic (unlike `void*`).<p>A vector&#x27;s elements are generic; all it can do is treat elements as generic bytes, so it would make sense to use `void*`s. But we can&#x27;t do pointer arithmetic with `void` pointers, meaning we can&#x27;t do things like traverse the data block or return pointers to arbitrary places within it. Hence `uint8_t`. `unsigned char` would also work (it&#x27;s also 1 byte); `uint8_t` is just more explicit.<p>SO: you can just allocate a big block of bytes called `data`! Whenever you wanna add an element, instead of doing an (expensive) allocation for that element, you <i>just</i> write the element&#x27;s bytes into the data block! When you wanna remove an element, you zero out its bytes or update your metadata so you remember those bytes are now unoccupied.<p>The thing that looks most useful about it is that all the elements&#x27; memory is &quot;contiguous&quot; (nearby, in one big block), unlike it might be with multiple, per-element allocations. And having to fetch different blocks of memory scattered throughout RAM is slower than loading one chunk of memory from RAM and having all (or many) elements in the CPU cache is faster: <a href="https:&#x2F;&#x2F;youtu.be&#x2F;WDIkqP4JbkE?t=1464" rel="nofollow noreferrer">https:&#x2F;&#x2F;youtu.be&#x2F;WDIkqP4JbkE?t=1464</a><p>FOR ALL THAT, though, it still gets CRUSHED by `std::vector`, which my benchmarking reports as 2 to 20 times faster (depending on operation)! The only time my vector beat `std::vector` was when I had forgotten to employ the erase-remove idiom (<a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Erase%E2%80%93remove_idiom#Example" rel="nofollow noreferrer">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Erase%E2%80%93remove_idiom#Exa...</a>) for `std::vector` (<a href="https:&#x2F;&#x2F;github.com&#x2F;qiolol&#x2F;just-c-things&#x2F;blob&#x2F;main&#x2F;Vec&#x2F;benchmark&#x2F;benchmark.cpp#L216">https:&#x2F;&#x2F;github.com&#x2F;qiolol&#x2F;just-c-things&#x2F;blob&#x2F;main&#x2F;Vec&#x2F;benchm...</a>) despite having implemented it in my vector&#x27;s multi-removal function. Fixing that crushed my brief sense of triumph. Haha. :B

no comments

no comments