Every time I see String.intern() my mind leaps to the problem of new Java programmers who are misled into this:<p><pre><code> String a = "hello";
String b = "world";
assert a != b;
b = "hello";
assert a == b;
// OH NICE I'LL USE == FOR STRING COMPARISONS NOW
</code></pre>
It works cause source-code literals are intern'ed down into identical objects by the compiler, but that's a special case that won't apply to strings created at runtime.
This is really an unscientific claim but I ran a hand crafted/hacked benchmark just to get a feeling for the numbers. For 5 to 35 character Strings, == is 20 to 40 times faster than String.equals().<p>Given that s1.equals(s2) if and only if s1.intern() == s2.intern() (assuming you haven't filled the string table), then this looks like an opportunity for a significant optimization.<p>Before doing this, I had hoped that String.equals might check if both were "interned" and shortcut the character by character comparison if this was the case by just comparing references. But interpreting the results of my rough benchmark would suggest this isn't what is happening which would agree with the source provided for the String.equals method.<p>Java String comparison is absolutely ubiquitous so I would have expected that an optimization like this might have been considered?<p>Having said that, the supplied rt.jar source also suggests that the String.hashCode() computation isn't cached/memoized. This strikes me as odd given that Strings are immutatable and Strings are one of the most common key type for Maps.
Total aside from main topic, I love shipilev's posts.<p>If there are other core JVM developers that have similar blogs, I'd love to hear about them here.
We built an inmemory map and we were using String.intern
for both keys and values. We could see that we were saving lots of memory but we had the problems described in the article.
We then built our own 'String.intern' by using yet another static HashMap. It worked.
It was the simplest alternative and it just did the job.
Thanks alekskey for the nice article.
I'd never seen the @Benchmark annotation before so I looked it up.<p>The blog author is also one of JMH's authors.<p><a href="http://openjdk.java.net/projects/code-tools/jmh/" rel="nofollow">http://openjdk.java.net/projects/code-tools/jmh/</a>
Is "Anatomy Park" a Rick and Morty reference? <a href="http://rickandmorty.wikia.com/wiki/Anatomy_Park_(episode)" rel="nofollow">http://rickandmorty.wikia.com/wiki/Anatomy_Park_(episode)</a>
The code creates unique strings to "interns" which most likely isn't what would happen in a real world application (unless you know... code without thought), you'd inter strings with low variance usually.
Not saying that it won't be slower but the memory usage might be lower.
This, and the few other articles up, are a great series. Having done Java development now for 30% of my life these are some amazing pointers.<p>I'd love to buy a hard copy of these if they ever get up to a few dozen articles. Would be good to give to middle-experience devs (like myself) in the future.
"The performance is at the mercy of the native HashTable implementation, which may lag behind what is available in high-performance Java world, especially under concurrent access."<p>What native HashTable is used? Shouldn't the JVM be using an optimized one?
String.intern() would suck much less if strings had an "IS_INTERNED" flag which would prevent hashtable lookups for already interned strings. Really sad given the insane overhead Java strings have.
> in OpenJDK, String.intern() is native, and it actually calls into JVM, to intern the String in the native JVM String pool.<p>How much of this also applies when using the standard Oracle JDK?
The instrumentation here is impressive. The amount of data inspection done with just a few simple commands is a bit overwhelming. Frankly, I rarely hope to find myself looking at this level of metrics.<p>There's a lot down there I like to take for granted. But more likely I try to use methods like string.Intern() exactly never.<p>Use code you know and understand. Frankly, use code you can trust. And wtf would trust a method string.Inter() to do... exactly, what?<p>If you are writing a function to <i>do something</i> the name of the function must be the thing being done. What the heck is a 'static internalize'? The explicit HashMap was a few lines of code, and it's the most basic and obvious, and surprisingly performant approach. So definitely I agree you must use your own HashMap and not a static internalizer.