TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Breaking java.lang.String

220 pointsby coekiealmost 2 years ago

12 comments

gizmo686almost 2 years ago
Is this actually a bug? The default assumption in Java is that types are not thread-safe unless otherwise specified. Attempting to use types in a way that exceeds their documented thread safety has always been allowed to leave your program in an inconsistent state.
评论 #36691808 未加载
评论 #36689404 未加载
评论 #36690312 未加载
评论 #36692416 未加载
评论 #36689660 未加载
评论 #36689354 未加载
评论 #36690805 未加载
评论 #36691427 未加载
评论 #36690106 未加载
dundariousalmost 2 years ago
Calling this a &quot;bug in java.lang.String&quot; is silly. The same &quot;bug&quot; exists for all functions that take mutable objects. If you take a map and lookup two different keys, yep, that&#x27;s a &quot;bug&quot;.<p>The bug is the other piece of code that introduces the data race in the first place. You can argue the case for languages like Rust with it&#x27;s borrow system, or others that use linear types or something along those lines, to eliminate the possibility of this happening, but it&#x27;s quite misleading to say that the innocent user of a mutable object is the source of a bug. You may as well say there&#x27;s a bug in `printf(&quot;Hello, World!\n&quot;);` in C because you could have another thread writing random values to random memory, running `while(1) { *((unsigned char*)(void*)rand()) = rand(); }`
评论 #36693524 未加载
评论 #36691695 未加载
评论 #36691413 未加载
评论 #36693762 未加载
cogman10almost 2 years ago
This is exactly why java needs frozen arrays [1].<p>The safe thing to do is freeze the array before doing anything with it. Then, you can rely on COW to copy to the array if someone is modifying it concurrently with you reading it. In the general case, you&#x27;d have fast string creation and in the tricky case you simply pay the clone cost as a penalty for being dumb.<p>[1] <a href="https:&#x2F;&#x2F;openjdk.org&#x2F;jeps&#x2F;8261007#:~:text=How%20do%20I%20use%20a%20frozen%20array%3F%201,an%20array%20is%20frozen%20or%20modifiable.%20More%20items" rel="nofollow noreferrer">https:&#x2F;&#x2F;openjdk.org&#x2F;jeps&#x2F;8261007#:~:text=How%20do%20I%20use%...</a>
评论 #36696320 未加载
评论 #36691713 未加载
评论 #36688818 未加载
vbezhenaralmost 2 years ago
That&#x27;s a very interesting finding. Nowadays Java security is a joke, but back in the day, Java security was a serious topic. Users were able to run downloaded applets in their browser, so protecting the sandbox was important. It&#x27;s very likely that using those kinds of &quot;corrupted&quot; strings would allow to break out of this sandbox, because that protection code definitely relied on strings being sane and correct.<p>I can&#x27;t imagine this behaviour to cause much problem with modern Java, nobody runs untrusted code anyway. But good to know.
m_0xalmost 2 years ago
Every time, without fail, somebody shows a bug about a piece of code that we take for granted (In this case, the String class) the bug is related to concurrent modifications.<p>Concurrency is so hard that even OpenJDK developers can&#x27;t prevent these kind of bugs
评论 #36689238 未加载
评论 #36689482 未加载
josephcsiblealmost 2 years ago
This is the exact kind of bug that Rust solves with its borrowing system. The problem is that Java has no way to express the concept of &quot;something that nothing else can modify while I&#x27;m looking at it&quot;.
评论 #36691792 未加载
评论 #36688806 未加载
CJeffersonalmost 2 years ago
Out of interest, how should this be handled?<p>Is this a bug in Java which should be fixed (looks like that to me)? My understanding was Java generally doesn&#x27;t do &quot;you did an undefined behaviour, so it&#x27;s your fault&quot;, except for specifically marked very low-level interfaces.
评论 #36689350 未加载
评论 #36689338 未加载
评论 #36691059 未加载
coekiealmost 2 years ago
I just added solutions to the empty String challenge in the blog post. This includes a very interesting find from Xavier Cooney, that causes the same problem without involving any concurrency. It instead makes StringBuilder misbehave by throwing an exception at an unexpected place: <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;XavierCooney&#x2F;e9f6235f05479ac6bf962ca25e31d8d0" rel="nofollow noreferrer">https:&#x2F;&#x2F;gist.github.com&#x2F;XavierCooney&#x2F;e9f6235f05479ac6bf962ca...</a>
nayukialmost 2 years ago
It is possible to fix this String constructor implementation without creating a defensive copy of the input array or having a TOCTOU vulnerability.<p><pre><code> &#x2F;&#x2F; Change this implementation to a loop. public String(char[] value) { while (true) { byte[] temp = StringUTF16.compress(value); if (temp != null) { this.value = temp; this.coder = LATIN1; break; } temp = StringUTF16.toBytes(value); if (temp != null) { this.value = temp; this.coder = UTF16; break; } } } &#x2F;&#x2F; This implementation stays the same. static byte[] StringUTF16.compress(char[] value) { ... } &#x2F;&#x2F; Change this contract and implementation so that it returns null &#x2F;&#x2F; if all characters are below 256, otherwise it returns byte[]. &#x2F;&#x2F; The difference is that previously, this function would never return null. &#x2F;&#x2F; Now, we make sure that the function succeeds if and only if the &#x2F;&#x2F; char array *requires* UTF-16 as opposed to Latin-1. static byte[] StringUTF16.compress(char[] value) { ... }</code></pre>
评论 #36695862 未加载
doodpantsalmost 2 years ago
I enjoyed the article, but if I may express a peeve of mine... In the code listings, can we please not use a syntax coloring scheme that makes the comments nearly unreadable? Especially in blog posts like this, where the code deliberately contains numerous explanatory comments. Such low-contrast text slows down my tired old eyes.
robertlagrantalmost 2 years ago
&gt; Why is &quot;foo!&quot;.equals(&quot;foo⁉&quot;) false?<p>I don&#x27;t really understand this question. They...look different? One is an exclamation mark, and the other is an exclamation mark&#x2F;question mark combo?
评论 #36693492 未加载
JediPigalmost 2 years ago
in the words of linus, java is a horrible language.