TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

An informal comparison of the three major implementations of std:string

127 pointsby tyomaabout 1 year ago

10 comments

gte525uabout 1 year ago
Somewhat related - but there are a ridiculous number of platform/systems that ship with derivative of Dinkum C++ standard library - MSVC (included). When you lookup the company behind it - it's basically small shop that seems to be primarily one guy up in MA.
评论 #40326344 未加载
评论 #40324426 未加载
评论 #40324557 未加载
tialaramexabout 1 year ago
The SSO capacity tables seem obviously wrong to me. For clang&#x2F;libc++ we see 11 or 22 bytes, in each case it&#x27;s the size of the whole data structure, minus one byte for the bit flag and length value, and another byte for the zero (ASCII NUL) that&#x27;s obligatory in C++<p>But for MSVC and GCC we&#x27;re told 16 bytes as each has a 16-byte buffer. However they still need that obligatory zero byte, for ASCII NUL so surely the table should show 15.<p>The most popular string in C++, the one which makes SSO almost obligatory as a design feature for the language&#x27;s string object, is the empty string. On a modern (64-bit) machine Clang can store that inline in a local 24-byte object, MSVC and GCC need 32-bytes. Without SSO it would mean probably 24 bytes <i>and</i> a heap allocation. That&#x27;s hard to swallow.
评论 #40334161 未加载
bltabout 1 year ago
This is great. It would be a good C&#x2F;C++ interview question to compare these. Of course you can&#x27;t expect a Raymond Chen level performance, but it should give some insight into experience with low level programming.
评论 #40326883 未加载
dmazzoniabout 1 year ago
This is so interesting!<p>I wonder if anyone has ever tried an implementation that prioritizes even more minimal memory usage for small strings?<p>Of the three implementations discussed, the smallest (on a 64-bit system) is 24 bytes.<p>How about 8 bytes: the union of a single pointer and a char[8]?<p>For strings of 6 or fewer characters, it uses the first 6 bytes to store the string, one for the null terminator, and the last byte to store the length, but with a trick (see below).<p>For strings of 7 or more characters, it uses all 8 to store the address of a larger block of memory storing the capacity, size, and string bytes.<p>The trick is to use the last byte as a way to know whether the bytes are encoding a short string or a pointer. Since pointers will never be odd, we just store an odd number in that last byte. For example, you could store (size &lt;&lt; 1) | 0x1<p>So if the last bit is 1, it&#x27;s a short string. The size is bytes[7] &gt;&gt; 1, and the data() pointer is just equal to the string itself.<p>If the last bit is 0, treat the whole data as a pointer to a structure that encodes the capacity and size and then string bytes as usual.
评论 #40330508 未加载
dataflowabout 1 year ago
The most interesting thing imo is what they all do similarly: they all store the size, instead of the end pointer -- unlike, say, std::vector. Exercise for the reader as to why this is the right tradeoff.
评论 #40324630 未加载
whoopdedoabout 1 year ago
GCC putting a pointer at the top of the structure seems reminiscent of the way Pascal stored strings. A PString is the address of a character buffer like C, but the length of the string is stored at a negative offset. I may be remembering wrong but I think there was an older C++ STL that also used negative offsets.<p>As much as these snippets make clang look heavier, I wonder what it compiles to in practice when the compiler can make better inferences. If you can prove the state of the `is_small` bit those branches disappear. Even at runtime, which implementation is more performant? Real-world profiling may favor clang with branch prediction and speculative processing. Then again, speculation has become a dirty word lately.[1]<p>[1] Get it? &quot;Dirty&quot; because of the cache. I&#x27;m sorry, that pun was entirely unintentional.
评论 #40324599 未加载
评论 #40325252 未加载
kingsleyoparaabout 1 year ago
Obligatory link to a must watch, the CppCon 2016 talk on the complexities of std::string: <a href="https:&#x2F;&#x2F;youtu.be&#x2F;kPR8h4-qZdk?si=x2DbgNIZcKyK5PKt" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;kPR8h4-qZdk?si=x2DbgNIZcKyK5PKt</a>
gary_0about 1 year ago
Note that EASTL (an alternative STL used in game development) does std::string the &quot;Clang way&quot;: <a href="https:&#x2F;&#x2F;github.com&#x2F;electronicarts&#x2F;EASTL&#x2F;blob&#x2F;master&#x2F;include&#x2F;EASTL&#x2F;string.h">https:&#x2F;&#x2F;github.com&#x2F;electronicarts&#x2F;EASTL&#x2F;blob&#x2F;master&#x2F;include&#x2F;...</a>
archermarksabout 1 year ago
Nice article, thanks for sharing! Been implementing my own string type for fun recently and this is a useful reference!
ajrossabout 1 year ago
tl;dr: libc++ is just bad, libstdc++ and MSVC trade punches for first place, with the eyeball win going to the FSF.<p>Though really the performance gates on string-heavy code tend to be in the heap and not the string library itself.
评论 #40325541 未加载
评论 #40325143 未加载