Fun with IP address parsing

505 pointsby mr_tyzicover 4 years ago

38 comments

geoffpadoover 4 years ago

> This is the same IP address: 3232271615. You get that by interpreting the 4 bytes of the IP address as a big-endian unsigned 32-bit integer, and print that. This leads to a classic parlor trick: if you try to visit <a href="http://3232271615" rel="nofollow">http://3232271615</a> , Chrome will load <a href="http://192.168.140.255" rel="nofollow">http://192.168.140.255</a>.This was the source of one of my favorite “bugs” ever. I was working on multiple mobile apps for a company, and they had a deep link setup that was incredibly basic: <scheme>://<integer>, which would take you to an article with a simple incrementing ID. This deep link system “just worked” on iOS and Android; take the URL, grab the host, parse it as an int, grab that story ID. Windows Phone, however… the integers we were parsing out were totally wrong, returning incredibly old stories!Turned out that the host we were given by the frameworks from the URL was auto-converted to an IP in dotted-quad format, and then the int parser was just grabbing the last segment… which meant that we were always getting stories <256, instead of the ~40000 range we were expecting.

评论 #25550039 未加载

评论 #25548910 未加载

评论 #25548132 未加载

评论 #25549029 未加载

评论 #25550228 未加载

arkadiytover 4 years ago

These different representations also lead to frequent server side request forgery (SSRF) bypasses - someone might be blocking local IPv4 but you can still access their AWS metadata endpoint at ::ffff:169.254.169.254, etc.For anyone using Ruby, I'm the author of a gem [1] that comprehensively protects against SSRF bugs. For anyone using Golang I recommend this [2] blog post.[1]: <a href="https://github.com/arkadiyt/ssrf_filter" rel="nofollow">https://github.com/arkadiyt/ssrf_filter</a>[2]: <a href="https://www.agwa.name/blog/post/preventing_server_side_request_forgery_in_golang" rel="nofollow">https://www.agwa.name/blog/post/preventing_server_side_reque...</a>

评论 #25548957 未加载

评论 #25546752 未加载

评论 #25546520 未加载

评论 #25546611 未加载

评论 #25546624 未加载

lrossiover 4 years ago

Can confirm that visiting <a href="http://127.1" rel="nofollow">http://127.1</a> on ipad indeed works and redirects to <a href="http://127.0.0.1" rel="nofollow">http://127.0.0.1</a>. This is very surprising and, at least for me, humbling.I think I will quote this article any time I see someone using regex to validate or parse IPs.

评论 #25549181 未加载

评论 #25550653 未加载

评论 #25547608 未加载

评论 #25549125 未加载

z3t4over 4 years ago

I'm now going to change my LAN to use 10.0.0.1 instead of 192.168.0.1 so that I can just type 10.1 This will help not only when testing stuff on mobiles only to have to rewrite the whole adress again because you forgot <a href="http://" rel="nofollow">http://</a> but also when telling the kids what IP to connect to when setting up LAN games. Or coworkers when telling them them some LAN/router IP. Time server is on 10.36

评论 #25547598 未加载

评论 #25553196 未加载

评论 #25550575 未加载

chungyover 4 years ago

> I’m on the fence about that last one, the “IPv6 with an embedded dotted decimal” form. My reference parser (Go’s net.ParseIP) understands it, but it’s not really that useful any more in the real world. At the dawn of IPv6, the idea was that you could upgrade an address to IPv6 by prepending a pair of colons, as in ::1.2.3.4, but modern transition mechanisms no longer offer anything as clear-cut as this, so the notation doesn’t really show up in the wild.I have to disagree with this conclusion. I see it very frequently on Linux. It turns out that programs can bind their listen address to just ::, and the kernel will still allow connections from IPv4, with the address mapped to ::ffff:0.0.0.0/32 -- outbound connections use the same notation.

评论 #25546613 未加载

评论 #25548930 未加载

AnthonyMouseover 4 years ago

> 1:2:3:4:5:6:77.77.88.88 means 1:2:3:4:5:6:7777:8888Wait, what? 77.77.88.88 is in dotted decimal. It doesn't correspond to 7777:8888 in hex.edit: Somebody else already noticed on Twitter:> And as @alanjmcf noticed, I messed up one of the representations above.> 1:2:3:4:5:6:77.77.88.88 means 1:2:3:4:5:6:4d4d:5858, not 1:2:3:4:5:6:7777:8888. I missed out a decimal-to-hex conversion in there.

评论 #25546205 未加载

j1eloover 4 years ago

> It does not process Class A/B notation, or hex or octal notation.I got to find that notation useful once, to make a shorter one-liner... without even knowing that there were different classes of IPv4 address, and that I was looking at one of them.It's a tiny function that gives me the IP address of my machine in the LAN, for either Linux and Mac:<pre><code> # Get main local IP address from the default external route (Internet gateway) iplan() { # Note: "1" is shorthand for "1.0.0.0" case "$OSTYPE" in linux*) ip -4 -oneline route get 1 | grep -Po 'src \K([\d.]+)' ;; darwin*) ipconfig getifaddr "$(route -n get 1 | sed -n 's/.*interface: //p')" ;; esac } </code></pre> (sorry to people reading on small screens)Full disclosure, I got the "1 is shorthand for 1.0.0.0" from here (which didn't get into explaining why it is a shorthand): <a href="https://stackoverflow.com/a/25851186" rel="nofollow">https://stackoverflow.com/a/25851186</a>

评论 #25548936 未加载

评论 #25548357 未加载

gumbyover 4 years ago

> So, it’s a de-facto standard that boils down to mostly “what did 4.2BSD understand?“By the way 4.2BSD was being compatible with older or contemporary implementations, like ITS which was running TCP before any Unix was.For example plenty of machines back then used octal as a preferred human representation. In fact that’s why octal is the default format of numeric constants in C: C, like Unix, was initially developed for an 18-bit (six octal digits) PDP-7. The smaller 16-bit PDP-11 version came later.

lucb1eover 4 years ago

"All possible notations of this IPv4 address" <a href="https://lucb1e.com/rp/php/funnip.php?link&ip=80.100.131.150" rel="nofollow">https://lucb1e.com/rp/php/funnip.php?link&ip=80.100.131.150</a>It was a surprising amount of work to figure out all the different formats an IP address can be shown in and convert a given IP into all those formats.

评论 #25547895 未加载

octoberfranklinover 4 years ago

How about the PGP word list? <a href="https://en.wikipedia.org/wiki/PGP_word_list" rel="nofollow">https://en.wikipedia.org/wiki/PGP_word_list</a><pre><code> $ ping stairway scavenger tracker upcoming PING 209.216.230.240 (209.216.230.240) 56(84) bytes of data. 64 bytes from 209.216.230.240: icmp_seq=1 ttl=50 time=68.2 ms 64 bytes from 209.216.230.240: icmp_seq=2 ttl=50 time=69.5 ms 64 bytes from 209.216.230.240: icmp_seq=3 ttl=50 time=67.2 ms</code></pre>

评论 #25556872 未加载

评论 #25550054 未加载

phoe-krkover 4 years ago

> Fully canonically, :: is 0000:0000:0000:000:0000:0000:0000:0000.Nitpick: missed a single zero in the middle there.

评论 #25546857 未加载

评论 #25557365 未加载

评论 #25546547 未加载

jpxwover 4 years ago

As Go’s net package IP parsing was mentioned, here’s a fun fact: under their API it is impossible to distinguish between an IPv4-mapped IPV6 address and the equivalent normal IPv4 address.

评论 #25547957 未加载

strenholmeover 4 years ago

Since I write a Lua-parsed DNS server which works with IPv6, even when compiled for an ancient version of MINGW on Windows XP (which has IPv6 support but no built-in IPv6 parser), I had to write an IPv6 address parser (no inet_pton(), which is what most programs use for IPv6 parsing, on that system).No, I did not add dotted quad notation to the parser. No, you can not have more than four hex digits in a single quad; 00000001:2::3 is a syntax error. It supports “normal” stuff like ::, ::1, 2001:db8::1, and even non-normal stuff like “2001-0db8-1234-5678 0000-0000-0000-0005” (to be compatible with the really basic IPv6 parser I put in MaraDNS’s recursive resolver nearly two years ago), but does not support any of the IPv6 corner cases in the linked article.The IPv6 test cases in the automated test for the parser are at: <a href="https://github.com/samboy/MaraDNS/blob/master/deadwood-github/tools/coLunacyDNS/sqa/sqa_ip6Parse/Input" rel="nofollow">https://github.com/samboy/MaraDNS/blob/master/deadwood-githu...</a> (The final three lines are supposed to return errors)

thomashabets2over 4 years ago

I especially love it when address parsers on the same OS don't agree:<a href="http://openbsd-archive.7691.n7.nabble.com/inet-net-pton-seems-broken-when-used-with-octal-or-hex-td193971.html" rel="nofollow">http://openbsd-archive.7691.n7.nabble.com/inet-net-pton-seem...</a>

评论 #25546682 未加载

proactivesvcsover 4 years ago

I'm not convinced these are "cursed". They may be the result of bygone networking conventions, implementation ideas that never came to mainstream fruition, flexibility for use-cases etc. Just because we don't understand something that looks strange, doesn't mean it's cursed, nor that one can simply turn one's nose up and say "I don't understand why these exist so I'll just ignore them when I implement x".

评论 #25547221 未加载

评论 #25548021 未加载

评论 #25546850 未加载

评论 #25546954 未加载

评论 #25547207 未加载

评论 #25548069 未加载

skeletonjellyover 4 years ago

I think they've got Class A/B/C wrong? Or at least they're using it in a way that I never learnt> The familiar 192.168.140.255 notation is technically the “Class C” notation. You can also write that address in “class B” notation as 192.168.36095, or in “Class A” notation as 192.11046143. What we’re doing is coalescing the final bytes of the address into either a 16-bit or a 24-bit integer field.According to this:<a href="https://www.digitalocean.com/community/tutorials/understanding-ip-addresses-subnets-and-cidr-notation-for-networking" rel="nofollow">https://www.digitalocean.com/community/tutorials/understandi...</a>Which details my understanding, classes refer to the ranges, not so much grouping the latter partHappy to be corrected!

评论 #25547477 未加载

m463over 4 years ago

An "fun" use of ip addresses is in NTP.in the ntp config file, you will have stuff like this:<pre><code> server 127.127.1.0 # local clock </code></pre> or:<pre><code> server 127.127.20.0 minpoll 4 iburst prefer # gps clock </code></pre> where the "ip address" is of the form: 127.127.<clocktype>.<instance>here's a page explaining the clock types:<a href="https://www.eecis.udel.edu/~mills/ntp/html/refclock.html" rel="nofollow">https://www.eecis.udel.edu/~mills/ntp/html/refclock.html</a>but basically it's a weird anachronism. I'm not sure if NTP will actually bind to those addresses using the tcp/ip stack, or if it someone just got lazy and coopted the ip address parser for off-label use.

kortillaover 4 years ago

What is the use-case of a decimal representation of a v6 address or a 32-bit int representation of an ipv4 address?I’ve never had someone tell me, “see if you can ping 143267841”. I’ve worked in networking for coming up on 30 years now and just haven’t found the use.

评论 #25546431 未加载

评论 #25546420 未加载

评论 #25546411 未加载

评论 #25550479 未加载

评论 #25546615 未加载

评论 #25546518 未加载

tomcooksover 4 years ago

Boomers like me know all of the IPv4 obfuscation techniques thanks to Fravia' Searchlores, may he forever rest in peace.<a href="https://www.theoryforce.com/fravia/searchlores/obscure" rel="nofollow">https://www.theoryforce.com/fravia/searchlores/obscure</a>

评论 #25546356 未加载

评论 #25547532 未加载

abotsisover 4 years ago

Wow, this. One thing I didn’t see mentioned was “0”. You mentioned it, but it didn’t grok to something I know to work in some implementations: “ping 0” behaves like “ping 127.0.0.1”.

评论 #25547560 未加载

vzalivaover 4 years ago

That's why things like IP address textual representation needs to be rigorously and formally specified using non-ambiguous syntax notation. The implementations then can formally verified to comply to this syntax spec. At the end I would love to have a formally verified library implementation of IP address parser for major mainstream programming languages which everybody could rely upon and do not try to write their own parser. That's a dream.

jtvjanover 4 years ago

I wrote a little applet where you can put in a class A decimal IP address, and it gives you the 3×4 representations mentioned in the article: <a href="https://jtvjan.nl/tools/cursed_ipv4.html" rel="nofollow">https://jtvjan.nl/tools/cursed_ipv4.html</a>If you count mixed representations, there would be 120 possibilities, but the tool doesn't generate those.

beaugundersonover 4 years ago

I maintain a JavaScript library that does exactly this (called ip-address). Unit tests are very important for handling the esoteric formats, though there are a couple that were new to me in David's post.One of my motivations for writing the library was being able to grep for IPv6 addresses in text files; it's surprisingly difficult to match all valid representations of a simple IPv6 address as seen in the example here:<a href="https://twitter.com/beaugunderson/status/527393872909828096" rel="nofollow">https://twitter.com/beaugunderson/status/527393872909828096</a>I also maintain a site for examining IPv6 addresses that may be useful to people working with IPv6:<a href="http://v6decode.com/" rel="nofollow">http://v6decode.com/</a>

peteretepover 4 years ago

Raging debate recently at our coworking space about if 24.7.365 is a valid IP (you can certainly ping it)

评论 #25549218 未加载

jweatherover 4 years ago

I spent hours debugging an issue that boiled down to an IPV4 parser that treated leading zeroes as octal. Connections to 192.168.123.100 worked as expected. Connections to 192.168.123.034 went to 192.168.123.28. I thought sure it was an issue in my TCP client code, which was handling connections to hundreds of different devices.Guilty party was Poco::Net library if I recall correctly. I can maybe see this making sense if you provide four octal digits (0377), but not three, and I have a hard time believing anybody has ever used this on purpose.

评论 #25549197 未加载

SoSoRoCoCoover 4 years ago

> a big-endian unsigned 32-bit integerThis is how embedded stacks (LWiP) store IPv4. Didnt' know browsers could respond to it thought.Mixing IPv4 and IPv6 is just evil.

评论 #25550462 未加载

Sami_Lehtinenover 4 years ago

Reminds me from email addresses, most sites are doing it wrong. *There clearly should be a common library to take care of these things, which are way too complex for most of developers.* <a href="https://en.wikipedia.org/wiki/Email_address#Examples" rel="nofollow">https://en.wikipedia.org/wiki/Email_address#Examples</a>

sloshnmoshover 4 years ago

This Dec to hex to Sacco online converter might be helpful:<a href="https://www.rapidtables.com/convert/number/ascii-hex-bin-dec-converter.html" rel="nofollow">https://www.rapidtables.com/convert/number/ascii-hex-bin-dec...</a>

rkagererover 4 years ago

This is great! If I'm honest with myself, one thing keeping me from configuring IPv6 as an option locally was the intimidating addresses. This is a great explainer, I finally feel like I "get it".

ChrisMarshallNYover 4 years ago

Might find this project interesting: <a href="https://github.com/RiftValleySoftware/RVS_IPAddress" rel="nofollow">https://github.com/RiftValleySoftware/RVS_IPAddress</a>

intcover 4 years ago

Somewhat related: A simple IPv6 subnet calculator written in Lua: <a href="https://github.com/intc/ip6snetc" rel="nofollow">https://github.com/intc/ip6snetc</a>

ipv4dhcpover 4 years ago

At what point is the format parsed? Is <a href="http://36475893" rel="nofollow">http://36475893</a> sent to the router or converted to 192.168.56.12 in the browser?

tzsover 4 years ago

I've long thought it would be amusing to arrange to have both the phone number xxx-yyy-zzzz and the IP addresses xxx.yyy.zzzz and xxx.yyyzzzz.

jancsikaover 4 years ago

Where are the weirdo IPv4 forms used in practice?

评论 #25546555 未加载

评论 #25546562 未加载

评论 #25547161 未加载

kpcyrdover 4 years ago

It would be nice if we could deprecate some of them instead of embracing those "cute" standards.

erk__over 4 years ago

They wrote it into a blog that may be nicer to read <a href="https://blog.dave.tf/post/ip-addr-parsing/" rel="nofollow">https://blog.dave.tf/post/ip-addr-parsing/</a>

评论 #25546303 未加载

knownover 4 years ago

<a href="https://www.php.net/manual/en/function.ip2long" rel="nofollow">https://www.php.net/manual/en/function.ip2long</a> and <a href="https://www.php.net/manual/en/function.long2ip.php" rel="nofollow">https://www.php.net/manual/en/function.long2ip.php</a> in PHP

londons_exploreover 4 years ago

Writing a parser and saying "I'm dropping support for all these old ways of doing things" seems like poor form.Unless there is a big reason, never drop backwards compatibility. In this case, supporting all those forms would be very do-able. The best way to support them would be to find some old BSD parsing code and port it, then you can be sure every corner case is handled the exact same way. Handling corner cases differently is a great way to introduce security vulnerabilities and crash/DoS bugs that every user of your library will have to be aware of.Maintaining such code isn't really a good excuse here either - the code is only going to be a few thousand lines, is self contained with no dependencies, is easy to test, not going to change much with time, etc.Basically, there is no benefit to removing this feature, so don't break what isn't broken.

评论 #25546853 未加载