TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Java's URL.equals() Performs DNS Resolution

186 pointsby khcover 5 years ago

24 comments

yzmtf2008over 5 years ago
TL;DR: use URI instead. URL is behaving this way due to backwards compatibility.<p>---<p>To quote a JDK bug ticket: <a href="https:&#x2F;&#x2F;bugs.java.com&#x2F;bugdatabase&#x2F;view_bug.do?bug_id=4434494" rel="nofollow">https:&#x2F;&#x2F;bugs.java.com&#x2F;bugdatabase&#x2F;view_bug.do?bug_id=4434494</a><p>&gt; We are well aware of the problem with URL.equals and URL.hashCode. The cause of the problem is due to the existing spec and implementation, where we will try to compare the two URLs by resolving the host IP addresses, instead of just doing a string comparison. Because hashCode has to maintain certain relationships with equals, namely if two objects are equal, they should have the same hashCode, the implementation of hashCode also tries to resolve the host in the URL into an IP address. As a result, we are facing problems with http virtual hosting, as described in the Description part, and performance hit due to DNS name resolutions.<p>&gt; Unfortunately, changing the behavior now would break backward compatibility in a serious way, plus Java Security mechanism depends on it in some parts of the implementation. We can&#x27;t change it now.<p>&gt; However, to address URI parsing in general, we introduced a new class called URI in Merlin (jdk1.4). People are encouraged to use URI for parsing and URI comparison, and leave URL class for accessing the URI itself, getting at the protocol handler, interacting with the protocol etc. So, at present, we don&#x27;t plan on changing the URL.equals&#x2F;hashCode behavior and we will leave the bug open until Tiger, when we re-investigate our options.
评论 #21766714 未加载
评论 #21766413 未加载
评论 #21770912 未加载
评论 #21770610 未加载
评论 #21766269 未加载
评论 #21768550 未加载
评论 #21768375 未加载
strictneinover 5 years ago
&gt; &quot;Two hosts are considered equivalent if both host names can be resolved into the same IP addresses&quot;<p>Uhmm... yikes. Why are they resolving anything? A URL is a string, this should just be doing a string comparison. All of the parts of the URL object are strings or ints, so at worst they should just be comparing all of those individually, not resolving domains and comparing IP addresses. That makes no sense at all.
评论 #21768099 未加载
评论 #21766461 未加载
评论 #21766148 未加载
评论 #21766150 未加载
评论 #21766901 未加载
评论 #21766132 未加载
评论 #21766135 未加载
评论 #21766129 未加载
评论 #21766394 未加载
评论 #21767936 未加载
评论 #21766127 未加载
评论 #21766404 未加载
评论 #21766140 未加载
burtonatorover 5 years ago
Story time...<p>My company Datastreamer has been around for a decade. We provide crawl data (usually a massive amount of crawl data, north of 300GB per day) to our customers.<p>... so we have a LOT of real-world experience pushing data to customers in production over long time periods.<p>Here&#x27;s what we&#x27;ve learned.<p>Networking libraries around HTTP are and have been fundamentally broken for a long time and they&#x27;re broken in pathological ways that you don&#x27;t realize until years later in production.<p>DNS caching is a good one. A lot of systems do infinite DNS caching. Java, until at least Java 8, does infinite DNS caching.<p>Some do infinite HTTP timeout. Timeouts are awesome. You should use a timeout. Without a timeout if the network breaks your code just locks up.<p>Some libs provide no API to change TCP buffer sizes (which you have to do at the kernel level).<p>So about 5 years ago we took a harsh stance. NO CUSTOM CLIENTS.<p>We have a streaming firehose client that we implemented from the ground up to do everything properly. The API is literally that we just stream JSON files to disk.<p>It&#x27;s a docker container now so not too hard to deploy.<p>Your job is to just to listen to the disk and wait for new files to be written. We do a move from a tmpdir to the final dir so the entire file is written and you don&#x27;t have to worry about partial reads.<p>About 80% of our customers love it. The other 20% of customers seem to initially hate it and we have to explain to their CTO or senior architect that, no, you DO NOT want to implement this from the ground up.<p>What happens is that it works immediately, but then 18 months in it will break pathologically and everyone running it has moved on or it&#x27;s in some datacenter that no one has access too.<p>This causes us to break our SLAs and means we have upset customers.<p>This decision by far was one of the best decisions I&#x27;ve ever made and has really helped our growth and stability over time.<p>It&#x27;s really really really nice to keep customers for 5-10 years. They&#x27;re happy and you get steady checks and predictable growth.
评论 #21769321 未加载
评论 #21768366 未加载
peetersover 5 years ago
Been there. I remember fixing a performance bug ages ago where we had URL (or maybe some other address class) used as a key in a HashMap (which seemed like a perfectly valid thing to do with a value class). We were doing literally thousands of DNS lookups for what seemed like the most trivial algorithms.<p>equals() and hashCode() are probably one of the weakest points of Java. While it seems like an obvious candidate for a contract, the issue has always been that one person ends up defining equality for everyone, when often different usecases will warrant different definitions of equality. Are objects equal if they have the same identity? If they have the same data? If they resolve to the same thing? It&#x27;s easy enough to leave them unimplemented but the issue then is a lack of standard library support for providing custom hash and equality functions for Maps and Sets.
评论 #21768502 未加载
ajkjkover 5 years ago
Java has a bunch of relics like this: holdovers from the more zealous OOP days of the past, when updating objects was supposed to update the database, URLs were the concept of domain names instead of the data of a physical URL, etc. I think most people have away from this mindset because it leads to code being hard to reason about.
评论 #21766224 未加载
oldgeekover 5 years ago
Ok kids. URL was deprecated in favor of URI before some of you were born. Easy to be a harsh judge now but back when java was first developed the concept of a &quot;design pattern&quot; did not even exist. In fact the development design patterns was initially driven primarily by people figuring out good ways of doing things in java.
评论 #21769078 未加载
评论 #21766429 未加载
评论 #21766403 未加载
khcover 5 years ago
Submitter here. Someone asked me how often this is actually a problem given there&#x27;s already URI class. Unfortunately github doesn&#x27;t do global exact match in searching, but randomly clicking around I found this in a minute:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;biomics&#x2F;icef&#x2F;blob&#x2F;d69f9be9b1f773598b47de5736f16c0daaffccea&#x2F;coreASIM&#x2F;org.coreasim.eclipse&#x2F;src&#x2F;org&#x2F;coreasim&#x2F;eclipse&#x2F;util&#x2F;IconManager.java#L23" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;biomics&#x2F;icef&#x2F;blob&#x2F;d69f9be9b1f773598b47de5...</a><p>Query I used: <a href="https:&#x2F;&#x2F;github.com&#x2F;search?q=%22Map%3CURL%2C%3E%22+language%3AJava+language%3AJava&amp;type=Code" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;search?q=%22Map%3CURL%2C%3E%22+language%3...</a>
评论 #21767878 未加载
AndrewStephensover 5 years ago
Three of the truisms in my varied programming career are:<p>* DNS will fail even if it is implemented correctly on your clients&#x27; network<p>* It probably isn&#x27;t implemented correctly on your clients&#x27; network<p>* Your software is probably doing more DNS queries than you think<p>This seems like a particularly unfortunate example but things like this are not uncommon. Doing any kind of RPC, even just to a server on your local machine? Half the time your library is doing DNS queries under the hood for no good reason. Or performing reverse DNS queries just to display a hostname in a log file.<p>Its easy to accidentally trigger a lookup. If it happens in a loop, a query that should take .2 seconds now takes 30 minutes.
layoutIfNeededover 5 years ago
This is similar to how NSURL on macOS&#x2F;iOS accesses the filesystem in its constructor: <a href="https:&#x2F;&#x2F;developer.apple.com&#x2F;documentation&#x2F;foundation&#x2F;nsurl&#x2F;1410301-init" rel="nofollow">https:&#x2F;&#x2F;developer.apple.com&#x2F;documentation&#x2F;foundation&#x2F;nsurl&#x2F;1...</a><p>“ This method assumes that path is a directory if it ends with a slash. If path does not end with a slash, the method examines the file system to determine if path is a file or a directory.”<p>People often overlook this and then wonder why their app stutters randomly when the UI thread gets blocked on this NSURL ctor :^)
nostromoover 5 years ago
Anyone who has used Java knows that the URL class is a total mess.<p>Just avoid it entirely.
评论 #21766223 未加载
TimTheTinkerover 5 years ago
I can see why a method like this could be helpful. The URL specification RFC has an incredibly flexible specification for what constitutes a URL. The hostname&#x2F;address section can be particularly hairy - to the point that it likely becomes impractical&#x2F;infeasible to compare hostnames for equality. Comparison via DNS resolution is likely a simple, desirable solution to a real problem.<p>That being said, URL.equals() is a <i>terribly</i> opaque and non-obvious method signature for performing DNS-based comparisons of hostnames. It lacks any indication that calling it involves network IO.
评论 #21766334 未加载
评论 #21767772 未加载
thfuranover 5 years ago
If you think that&#x27;s crazy, check this out:<p>at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)<p>at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)<p>at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)<p>at java.net.InetAddress.getLocalHost(InetAddress.java:1500)<p>- locked &lt;0x00000000800a1578&gt; (a java.lang.Object)<p>at sun.font.FcFontConfiguration.getFcInfoFile(FcFontConfiguration.java:352)<p>at sun.font.FcFontConfiguration.readFcInfo(FcFontConfiguration.java:425)<p>at sun.font.FcFontConfiguration.init(FcFontConfiguration.java:94)<p>- locked &lt;0x00000000d5af3c58&gt; (a sun.font.FcFontConfiguration)<p>at sun.font.FcFontConfiguration.&lt;init&gt;(FcFontConfiguration.java:76)<p>at sun.awt.X11FontManager.createFontConfiguration(X11FontManager.java:768)<p>at sun.font.SunFontManager$2.run(SunFontManager.java:431)<p>at java.security.AccessController.doPrivileged(Native Method)<p>at sun.font.SunFontManager.&lt;init&gt;(SunFontManager.java:376)<p>at sun.awt.FcFontManager.&lt;init&gt;(FcFontManager.java:35)<p>at sun.awt.X11FontManager.&lt;init&gt;(X11FontManager.java:57)<p>at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)<p>at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)<p>at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)<p>at java.lang.reflect.Constructor.newInstance(Constructor.java:423)<p>at java.lang.Class.newInstance(Class.java:442)<p>at sun.font.FontManagerFactory$1.run(FontManagerFactory.java:83)<p>at java.security.AccessController.doPrivileged(Native Method)<p>at sun.font.FontManagerFactory.getInstance(FontManagerFactory.java:74)<p>- locked &lt;0x00000000d5abbb00&gt; (a java.lang.Class for sun.font.FontManagerFactory)<p>at sun.font.SunFontManager.getInstance(SunFontManager.java:250)<p>at sun.font.FontDesignMetrics.getMetrics(FontDesignMetrics.java:264)<p>at sun.swing.SwingUtilities2.getFontMetrics(SwingUtilities2.java:1113)<p>at javax.swing.JComponent.getFontMetrics(JComponent.java:1626)<p>at javax.swing.plaf.basic.BasicLabelUI.getPreferredSize(BasicLabelUI.java:227)<p>at javax.swing.JComponent.getPreferredSize(JComponent.java:1662)
评论 #21780673 未加载
shellacover 5 years ago
This is very old news now, but still worth repeating.<p><i>Use URI</i>
KoenDGover 5 years ago
Intellij&#x27;s code analysis(and other tools as well) warn about this. Particularly they recommend not sticking URL objects in Set or Map structures, due to the inherent equality check.
jrochkind1over 5 years ago
Well THAT&#x27;s clearly a legacy inherited mistake.<p>The semantics don&#x27;t even seem desirable in 2019, let alone the performance characteristics.
Someone1234over 5 years ago
Why is URL.equals(url) doing DNS Resolution comparison under the hood? That seems wildly outside the scope of what the URL library should be doing. It might be a useful feature, but there&#x27;s no rational reason it should be <i>here</i>.<p>Some quick googling suggests people are using java.net.URI instead to bypass this poor design.
评论 #21767803 未加载
ceejayozover 5 years ago
&gt; Note: The defined behavior for equals is known to be inconsistent with virtual hosting in HTTP.<p>That seems... less than useful.
gaulover 5 years ago
URL&#x27;s behavior is unfortunate and users should prefer URI introduced in Java 1.4 in 2002. error-prone warns about dangerous uses of URL:<p><a href="https:&#x2F;&#x2F;errorprone.info&#x2F;bugpattern&#x2F;URLEqualsHashCode" rel="nofollow">https:&#x2F;&#x2F;errorprone.info&#x2F;bugpattern&#x2F;URLEqualsHashCode</a>
JaDoggover 5 years ago
There&#x27;s even a warning about this in effictive Java. Glad I read that.
divyekapoorover 5 years ago
And even with the DNS resolution, they&#x27;ll get it wrong.
评论 #21766209 未加载
mrkeenover 5 years ago
This has bitten me. I inserted two URLs into a set. Then the set contained one item, so I tried to debug the set.
zmzrrover 5 years ago
This is like when `strings&#x27; was found to be vulnerable to code execution because it was parsing ELF files.
skissaneover 5 years ago
java.net.URL needs @Deprecated(forRemoval=true)
anfiltover 5 years ago
Wow, that seems like a bad idea. who thought that was a good idea let alone expected behavior.