TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Never create Ruby strings longer than 23 characters

54 点作者 ctaglia超过 11 年前

17 条评论

nly超过 11 年前
This is known as the &quot;small string optimisation&quot; in C++, so you can see a similar implementation in Clangs libc++[1].<p>One interesting corollary is that <i>moving</i> short strings in an implementation that does this could actually be ever so slightly (negligibly) slower than moving long ones (since byte copies are slower than word copies). But generally, this is a free lunch optimisation and can save you hundreds of megs of memory when writing programs dealing with millions of short strings.<p>[1] <a href="http://llvm.org/svn/llvm-project/libcxx/trunk/include/string" rel="nofollow">http:&#x2F;&#x2F;llvm.org&#x2F;svn&#x2F;llvm-project&#x2F;libcxx&#x2F;trunk&#x2F;include&#x2F;string</a> - search for &quot;union&quot;
Someone超过 11 年前
<a href="http://www.slideshare.net/nirusuma/what-lies-beneath-the-beautiful-code" rel="nofollow">http:&#x2F;&#x2F;www.slideshare.net&#x2F;nirusuma&#x2F;what-lies-beneath-the-bea...</a> (from march 2012) also discusses this.<p>Also (pedantic):<p><pre><code> #define RSTRING_EMBED_LEN_MAX ((int)((sizeof(VALUE)*3)&#x2F;sizeof(char)-1)) </code></pre> sizeof(char) is always 1, so that division is superfluous.
评论 #6888174 未加载
评论 #6888041 未加载
danielweber超过 11 年前
More like &quot;ruby optimizes for short strings, and chose 23 at the cut-off point for Reasons.&quot;
评论 #6888121 未加载
评论 #6888071 未加载
ben0x539超过 11 年前
There&#x27;s some discussion at <a href="https://news.ycombinator.com/item?id=3425164" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=3425164</a> , including some interesting technical&#x2F;benchmarky comments.
ra88it超过 11 年前
Title: &quot;Never create Ruby strings longer than 23 characters&quot;<p>Conclusion: &quot;Don’t worry! I don’t think you should refactor all your code to be sure you have strings of length 23 or less.&quot;
spoiler超过 11 年前
This is MRI (C Ruby) behaviour and <i>not</i> Ruby - specific , though. However, this is still interesting information.
anon4超过 11 年前
Wouldn&#x27;t it be better to use this declaration though:<p><pre><code> struct RString { struct RBasic basic; union { struct { long len; char *ptr; union { long capa; VALUE shared; } aux; } heap; char ary[]; } as; }; &#x2F;* apologies if I messed up the syntax here *&#x2F; #define RSTRING_EMBED_LEN_MAX (sizeof(((RString*)(0))-&gt;as) - 1) </code></pre> Then you can even use the padding the compiler added, if any, plus you can add more things to heap and the embed length will grow automatically.
markburns超过 11 年前
For anyone interested, he points to an older translation of the Ruby Hacking Guide, there is a pretty much complete translation at<p><a href="http://ruby-hacking-guide.github.com" rel="nofollow">http:&#x2F;&#x2F;ruby-hacking-guide.github.com</a>
评论 #6888169 未加载
gaius超过 11 年前
I suppose the thing to do is analyse your app for the average string length, and just recompile your Ruby with that. Would be even better of it was a command line parameter.
评论 #6888178 未加载
pedrocr超过 11 年前
Why does &quot;str2 = str&quot; actually allocate a new RString instead of just pointing both str and str2 to the same RString?
评论 #6888096 未加载
评论 #6888028 未加载
评论 #6888278 未加载
grosbisou超过 11 年前
Extremely interesting. But I cannot quite understand why RSTRING_EMBED_LEN_MAX is calculated that way.<p>VALUE seems to be unsigned int defined via &quot;typedef uintptr_t VALUE;&quot; and &quot;typedef unsigned __int64 uintptr_t;&quot;<p>But why is it calculated like that I don&#x27;t get. Anyone can explain?
评论 #6888214 未加载
评论 #6888248 未加载
评论 #6888182 未加载
gesman超过 11 年前
I wonder why they didn&#x27;t make cut-off optimization points at 33?<p>When programmers don&#x27;t know in advance how long name&#x2F;email&#x2F;input&#x2F;whatever field is going to be - they just use the magic &quot;power of two&quot; length :)<p>So 32 (or 33) in this case would be more reasonable.
评论 #6888480 未加载
评论 #6888477 未加载
badman_ting超过 11 年前
Reminds me of this Mr Show sketch :) <a href="https://www.youtube.com/watch?v=RkP_OGDCLY0" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=RkP_OGDCLY0</a>
throwaway0094超过 11 年前
Is Ruby&#x27;s internal encoding UTF-8, then?
评论 #6888327 未加载
jokoon超过 11 年前
&quot;never use ruby&quot; works well for me
评论 #6888516 未加载
drakaal超过 11 年前
Who needs more than 23?
评论 #6888471 未加载
corresation超过 11 年前
This all sounds rather terrible for Ruby, doesn&#x27;t it? It isn&#x27;t so much that the short string is faster (though I&#x27;m left unclear whether it itself is on the stack&#x2F;heap, though given the GC nature of Ruby and practical considerations of the language, it must be the heap), but rather that the cost of the short string is <i>also</i> added to the long string in the heap (assumed) allocation of the RString (which becomes larger and thus more difficult to malloc).<p>If this is intended to sit on the stack, which I find highly unlikely (especially given the timings that seem to be the delta between one malloc and two, and would be much more significant if it were a stack allocation versus a heap allocation. This is not comparable to small string optimizations for the stack in C++), maybe. But otherwise it seems like a poorly considered hack.<p>The string type could as easily have been dynamically allocated based upon the length of the string, where the ptr by default points inside that same allocated block. If the string is expanded it can then be realloced and the string alloced somewhere else. No waste, a single allocation, etc.
评论 #6888447 未加载
评论 #6888478 未加载