> making this is a valid URL: <a href="https://!$%:)(*&^@www.netmeister.org/blog/urls.html" rel="nofollow">https://!$%:)(*&^@www.netmeister.org/blog/urls.html</a><p>Uh, no. "%:)" is not <"%" HEXDIG HEXDIG> nor is % allowed outside of that. (Although your browser will likely accept it)<p>> This includes spaces, and the following two URLs lead to the same file located in a directory that's named " ":
> <a href="https://www.netmeister.org/blog/urls/" rel="nofollow">https://www.netmeister.org/blog/urls/</a> /f
> <a href="https://www.netmeister.org/blog/urls/%20/f" rel="nofollow">https://www.netmeister.org/blog/urls/%20/f</a>
> Your client may automatically percent-encode the space, but e.g., curl(1) lets you send the raw space:<p>Uh, no. Just because one of your clients is wrong and some servers allow it doesn't mean it's allowed by the spec.<p>In fact, the HTTP/1.1 RFC defers to RFC2396 for the meaning of <abs_path>: <path_segments> which begin with a /.<p>What is <path_segments>? A bunch of slash-delimited <segment>s.<p>What is <segment>? A bunch of <pchar> and maybe a semicolon.<p>What is <pchar>? <unreserved>, <escaped>, or some special characters (not including space).<p>What is <unreserved>? Letters, digits, and some special characters (not including space).<p>What is <escaped>? <"%" hex hex>.<p>Most HTTP clients and servers are pretty forgiving about what they accept, because other people do broken stuff, like sending them literal spaces. But that doesn't mean it's "allowed", that doesn't mean every server allows it, and that doesn't mean it's a good idea.<p>> That is, if your web server supports (and has enabled) user directories, and you submit a request for "~username": [it does stuff]<p>Uh, no. If you're using Apache, that might be true. As you mentioned, this is implementation-defined (as are all pathnames).<p>> Now with all of this long discussion, let's go back to that silly URL from above: ... Now this really looks like the Buffalo buffalo equivalent of a URL.<p>Not really.<p>> Now we start to play silly tricks: "⁄ ⁄www.netmeister.org" uses the fraction slash characters<p>You are aware that URLs predate Unicode, right? Not to mention that Unicode lookalike characters are a Unicode (or UI) problem, not a URL problem?<p>> The next "https" now is the hostname component of the authority: a partially qualified hostname, that relies on /etc/hosts containing an entry pointing https to the right IP address.<p>Or on a search domain (which could be configured locally, or through GPO on Windows, or through DHCP!). Or maybe your resolver has a local zone for it. Or maybe ...