TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

URLs: It's Complicated

165 点作者 salutonmundo将近 4 年前

9 条评论

mananaysiempre将近 4 年前
Just to share a little more of the weirdness (discovered while reading a couple of the historical URL &amp; URI RFCs several days ago):<p>Per the original spec, in FTP URLs,<p>- ftp:&#x2F;&#x2F;example.net&#x2F;foo&#x2F;bar will get you bar inside the foo directory inside the default directory of the FTP server at example.net (<i>i.e.</i> CWD foo, RETR bar);<p>- ftp:&#x2F;&#x2F;example.net&#x2F;&#x2F;foo&#x2F;bar will get you bar inside the foo directory <i>inside the empty string directory</i> inside the default directory of the FTP server at example.net (<i>i.e.</i> CWD, CWD foo, RETR bar; what do FTP servers even do with this?);<p>- and it’s ftp:&#x2F;&#x2F;example.net&#x2F;%2Ffoo&#x2F;bar that you must use if you want bar inside the foo directory inside the root directory of the FTP server at example.net (<i>i.e.</i> CWD &#x2F;foo, RETR bar; %2F being the result of percent-encoding a slash character).
评论 #27610262 未加载
评论 #27610516 未加载
评论 #27614036 未加载
评论 #27610627 未加载
leifg将近 4 年前
It seems like the colon is too ambiguous (is used as a protocol delimiter, delimiter for user&#x2F;pass, delimiter for port).<p>Reminds a little bit of Java labels where you can do this:<p><pre><code> public class Labels { public static void main(String args[]){ https:&#x2F;&#x2F;hn.ycombinator.com for(int i=0; i&lt;10; i++){ System.out.println(&quot;.......&quot;+i ); } } } </code></pre> the https: is a label named https and everything after the colon is a comment so this is valid code.
评论 #27607077 未加载
评论 #27607768 未加载
zepearl将近 4 年前
All extremely useful: the overview, the examples and the comments.<p>A few months ago while writing a bot&#x2F;crawler I searched for hours for something like this, but I found only full specs or just bits and pieces scattered around that used different terminology and&#x2F;or had different opinions.<p>In the end I didn&#x27;t even clearly understand what should be the max total URL length (e.g. mixed opinions here <a href="https:&#x2F;&#x2F;stackoverflow.com&#x2F;questions&#x2F;417142&#x2F;what-is-the-maximum-length-of-a-url-in-different-browsers" rel="nofollow">https:&#x2F;&#x2F;stackoverflow.com&#x2F;questions&#x2F;417142&#x2F;what-is-the-maxim...</a> - come on, a xGiB long URL?) =&gt; most of the time 2000 bytes is mentioned but it&#x27;s not 100% clear.<p>Writing a bot made me understand 1) why browsers are so complicated and 2) that the Internet is a mess (e.g. once I even found a page that used multiple character encodings...).<p>My personal opinion is that everything is too lax. Browsers try to be the best ones by implementing workarounds for stuff that does not have (yet) or does not comply to a spec =&gt; this way it can only end up in a mess. A simple example is the HTTP-header &quot;Content-Encoding&quot; ( <a href="https:&#x2F;&#x2F;developer.mozilla.org&#x2F;en-US&#x2F;docs&#x2F;Web&#x2F;HTTP&#x2F;Headers&#x2F;Content-Encoding" rel="nofollow">https:&#x2F;&#x2F;developer.mozilla.org&#x2F;en-US&#x2F;docs&#x2F;Web&#x2F;HTTP&#x2F;Headers&#x2F;Co...</a> ) which I think should only indicate what kind of compression is being used, but I keep seeing in there stuff like &quot;utf8&quot;&#x2F;&quot;image&#x2F;jpeg&quot;&#x2F;&quot;base64&quot;&#x2F;&quot;8bit&quot;&#x2F;&quot;none&quot;&#x2F;&quot;binary&quot;&#x2F;etc... and all those pages&#x2F;files work perfectly in the browsers even if with those values they should actually be rejected... .
评论 #27608530 未加载
评论 #27611180 未加载
评论 #27610779 未加载
surfingdino将近 4 年前
I have come across even more issues caused by IRIs used incorrectly in place of URIs by a popular web framework, causing havoc with OAuth redirects.<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Internationalized_Resource_Identifier" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Internationalized_Resource_Ide...</a>
jfrunyon将近 4 年前
&gt; making this is a valid URL: <a href="https:&#x2F;&#x2F;!$%:)(*&amp;^@www.netmeister.org&#x2F;blog&#x2F;urls.html" rel="nofollow">https:&#x2F;&#x2F;!$%:)(*&amp;^@www.netmeister.org&#x2F;blog&#x2F;urls.html</a><p>Uh, no. &quot;%:)&quot; is not &lt;&quot;%&quot; HEXDIG HEXDIG&gt; nor is % allowed outside of that. (Although your browser will likely accept it)<p>&gt; This includes spaces, and the following two URLs lead to the same file located in a directory that&#x27;s named &quot; &quot;: &gt; <a href="https:&#x2F;&#x2F;www.netmeister.org&#x2F;blog&#x2F;urls&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.netmeister.org&#x2F;blog&#x2F;urls&#x2F;</a> &#x2F;f &gt; <a href="https:&#x2F;&#x2F;www.netmeister.org&#x2F;blog&#x2F;urls&#x2F;%20&#x2F;f" rel="nofollow">https:&#x2F;&#x2F;www.netmeister.org&#x2F;blog&#x2F;urls&#x2F;%20&#x2F;f</a> &gt; Your client may automatically percent-encode the space, but e.g., curl(1) lets you send the raw space:<p>Uh, no. Just because one of your clients is wrong and some servers allow it doesn&#x27;t mean it&#x27;s allowed by the spec.<p>In fact, the HTTP&#x2F;1.1 RFC defers to RFC2396 for the meaning of &lt;abs_path&gt;: &lt;path_segments&gt; which begin with a &#x2F;.<p>What is &lt;path_segments&gt;? A bunch of slash-delimited &lt;segment&gt;s.<p>What is &lt;segment&gt;? A bunch of &lt;pchar&gt; and maybe a semicolon.<p>What is &lt;pchar&gt;? &lt;unreserved&gt;, &lt;escaped&gt;, or some special characters (not including space).<p>What is &lt;unreserved&gt;? Letters, digits, and some special characters (not including space).<p>What is &lt;escaped&gt;? &lt;&quot;%&quot; hex hex&gt;.<p>Most HTTP clients and servers are pretty forgiving about what they accept, because other people do broken stuff, like sending them literal spaces. But that doesn&#x27;t mean it&#x27;s &quot;allowed&quot;, that doesn&#x27;t mean every server allows it, and that doesn&#x27;t mean it&#x27;s a good idea.<p>&gt; That is, if your web server supports (and has enabled) user directories, and you submit a request for &quot;~username&quot;: [it does stuff]<p>Uh, no. If you&#x27;re using Apache, that might be true. As you mentioned, this is implementation-defined (as are all pathnames).<p>&gt; Now with all of this long discussion, let&#x27;s go back to that silly URL from above: ... Now this really looks like the Buffalo buffalo equivalent of a URL.<p>Not really.<p>&gt; Now we start to play silly tricks: &quot;⁄ ⁄www.netmeister.org&quot; uses the fraction slash characters<p>You are aware that URLs predate Unicode, right? Not to mention that Unicode lookalike characters are a Unicode (or UI) problem, not a URL problem?<p>&gt; The next &quot;https&quot; now is the hostname component of the authority: a partially qualified hostname, that relies on &#x2F;etc&#x2F;hosts containing an entry pointing https to the right IP address.<p>Or on a search domain (which could be configured locally, or through GPO on Windows, or through DHCP!). Or maybe your resolver has a local zone for it. Or maybe ...
sitdown将近 4 年前
Layouts using &lt;table&gt;s are complicated too. For example, this page has a ~7800px-wide &lt;pre&gt; tag in a &lt;table&gt; that&#x27;s 720px wide.
scandinavian将近 4 年前
Specifically using another font for the code tag then the rest of the blog to hide the difference between ⁄⁄ and &#x2F;&#x2F; seems weird. I get that it wouldn&#x27;t be interesting if not doing that, but doesn&#x27;t that just show that it&#x27;s really not as complicated as you make it out to be?
teknopaul将近 4 年前
URLs are not complicated, unless you complicate them.<p>foo|foo -foo &#x27;s^foo^foo^&#x27;&quot;&quot;&gt;foo 2&gt;&gt;foo<p>is not a very good example for teaching the structure of the the command line.<p>Pick a better one.<p>It&#x27;s simple.
评论 #27611104 未加载
prepend将近 4 年前
It doesn’t seem complicated at all. Complicated to me means difficult to understand. This just involves reading the spec and it all seems pretty simple and consistent.<p>Complicated doesn’t mean “new to me.” If I haven’t read a man page, that doesn’t mean the command is complicated.
评论 #27606681 未加载
评论 #27607703 未加载
评论 #27606848 未加载
评论 #27607825 未加载