Potholes to avoid when migrating to IPv6

146 pointsby deafcalculusover 6 years ago

22 comments

GlitchMrover 6 years ago

Don't manually parse those things. If your database provides a way to parse those (like PostgreSQL does), make use of it. Use what your programming language provides, and if it really doesn't provide the functionality to parse IPv6 addresses, use a library to do so.For example, in Rust you can write this:<pre><code> use std::error::Error; use std::net::{AddrParseError, SocketAddr}; fn parse_address(addr: &str) -> Result<SocketAddr, AddrParseError> { addr.parse() .or_else(|_| Ok(SocketAddr::new(addr.parse()?, 10443))) } fn main() -> Result<(), Box<dyn Error>> { let addresses = [ "[2001:0db8:f00f::0553:1211:0088]:10444", "2001:0db8:f00f::0553:1211:0088", ]; for address in &addresses { println!("{:?}", parse_address(address)?); } Ok(()) } </code></pre> This isn't specific to IPv6 by the way. It also applies to other standards like CSV (although I suppose with CSV it does vary, I saw so many broken CSV files that sometimes a custom implementation is the best way to go to parse those).

评论 #18794216 未加载

评论 #18795269 未加载

评论 #18794674 未加载

zAy0LfpBZLC8mACover 6 years ago

Well, I mean, yes, those are all mistakes people are going to make, no doubt about that. But somehow the solution is missing?The core mistake here is using string manipulation at all. The only correct way to handle string data is to parse it, and then either operate on the parsed data structure, or re-serialize into a canonical format and use that for comparisons and stuff.And in the best case, you don't try to build your own parser, but use a well-tested one that's already there. For the particular use case of parsing IP addresses, it's probably best to use getaddrinfo() with AI_NUMERICHOST, and for re-serialization getnameinfo() with the same flag. If those don't understand the address, you most likely won't be able to connect anyway. And they will handle stuff like link-local addresses correctly, at least as long as you are on the host that actually has the respective interface.For databases, you use column types intended for storing IP addresses, so the database will do the parsing and canonicalization for you.And when you actually have to build a parser yourself, read the damn spec for the format instead of going by what you think the format is, because most likely it's not that.And mind you that most of those problems are not really IPv6-specific. There are also many ways to write an IPv4 address that your average parser will understand. Most of those are generally frowned upon, so they don't occur often, but if you want to reliably compare IPv4 addresses, you actually need to do the same as for IPv6.

评论 #18794602 未加载

nieveover 6 years ago

It won't solve most of the problems, but if you're using PostgreSQL its native inet datatype (which supports 4 or 6) instead of a text or binary string can save a world of pain and there are alternatives in some other RDBMSs:<a href="https://www.postgresql.org/docs/current/datatype-net-types.html#DATATYPE-INET" rel="nofollow">https://www.postgresql.org/docs/current/datatype-net-types.h...</a> <a href="https://www.postgresql.org/docs/current/functions-net.html" rel="nofollow">https://www.postgresql.org/docs/current/functions-net.html</a>First class network and ip types that properly support contains and exclusion operators make a lot of things less error-prone and potentially much faster.

评论 #18794728 未加载

peterwwillisover 6 years ago

Don't try to parse things without a standard. Even CSV and e-mail addresses are more complicated than they seem.Also, it's pretty silly that we still use these unintuitive conventions from 40 years ago for modern systems. Is 192.168.2.8:10443 an address? A phone number and extension? Is it TCP, UDP? IPv4, IPv6? An HTTPS service, or just something resembling its decimal notation assigned service number? Are there multiple services proxied behind this one address? Can I route between them? When I request a URI, does the application know what I really want/expect? What about a timeout for my request? What about authentication/authorization? Consistency requirements? Idempotence? Security guarantees?Operating systems don't even take <host>:<port> arguments for network syscalls, that's just a convention we sort of came up with and later stuck to. But as a URL it's pretty crap. I suggest we replace them with modern URLs that can embed tiered information such as session IDs, service types, routes, security requirements, operational parameters, etc. Most people may only need <a href="https://google.com/" rel="nofollow">https://google.com/</a>, but sometimes we may also want to request webv2+uquic+v6:/SC,TLSv1.3/userid[s:84742049]@google.com(r:NA)/ . I know that's ugly as sin, but hopefully people wouldn't need to specify all of that all of the time (service name/version, transport, address, strong consistency, TLS 1.3, userid, session id, host/namespace, North American region).

评论 #18795739 未加载

评论 #18793996 未加载

评论 #18794531 未加载

评论 #18794698 未加载

donatjover 6 years ago

It’s certainly been said but IPv6 to my eyes is awash in second system syndrome, which has largely slowed its adoption.The complexity of handling addresses is plainly a failure of the design. Using colons for separators when they’re already being used for ports served no purpose but to confuse. Having more than a single valid form of an address again only serves to confuse.If there’s anything to learn from the UNIX principals it’s there is great power in making things easily manipulated as strings. The design of IPv6 makes this impossible.

评论 #18794327 未加载

评论 #18796089 未加载

评论 #18795386 未加载

评论 #18798161 未加载

评论 #18794660 未加载

mgerdtsover 6 years ago

Not just ipv6.<pre><code> $ ping 127.1 PING 127.1 (127.0.0.1): 56 data bytes ... $ ping $(((127 << 24) + 1)) PING 2130706433 (127.0.0.1): 56 data bytes ... </code></pre> I've primarily seen the second form in the early days of the web when spammers were presumably trying to bypass mail or web filters that were scanning for blacklisted IPs.Edit: fix formatting

评论 #18794573 未加载

评论 #18795278 未加载

swinglockover 6 years ago

The problem was (and remained) programmers thinking addresses are strings.

评论 #18795287 未加载

评论 #18794668 未加载

评论 #18794300 未加载

评论 #18795913 未加载

magicalhippoover 6 years ago

I'm really stumped as to why they didn't eliminate ports when designing IPv6. Having the last bits in the address take the role of ports makes sense given the hierarchical nature of IPv6 addresses. The first bits specify the network (my home), next bits the device, and last bits the service/endpoint on that device.It would also make parsing much easier given that they chose : as separator in the addresses.

评论 #18795006 未加载

z3t4over 6 years ago

Another gotcha is that ipv4 and ipv6 has different firewall tables. So if you for example have blocked all traffic besides port 80, you also need to do it on ipv6 !

评论 #18794174 未加载

评论 #18794296 未加载

评论 #18794407 未加载

fanf2over 6 years ago

All of this flailing around would have been avoided by either thinking carefully about canonical representations, or reading <a href="https://tools.ietf.org/html/rfc5952" rel="nofollow">https://tools.ietf.org/html/rfc5952</a>

signa11over 6 years ago

fwiw, rfc-5952, specifically, sec:6 outlines recommended string representations. also, it is generally much nicer to use 'sock_storage' as an underlying representation of AF_{UNIX/INET,INET6}, endpoints rather than building abstractions to tide over differences of 'socaddr_*'

contravariantover 6 years ago

Maybe this is a stupid question at this point but if using colons introduces this degree of ambiguity why not stick with simple periods? Or any other separator for that matter.

nlyover 6 years ago

Even putting aside the ambiguity of using ':' as a address:port delimiter, there are differences between platforms in parsing just IPv6 addressese.g. on Linux/glibc 2.28<pre><code> [cling]$ #include <arpa/inet.h> [cling]$ ::in6_addr dst; [cling]$ dst (::in6_addr &) @0x7ff318aff010 [cling]$ inet_pton (AF_INET6, "1234:1234:1234:1234:1234::1234:8.8.8.8", &dst) (int) 0 [cling]$ inet_pton (AF_INET6, "1234:1234:1234:1234:1234:1234:8.8.8.8", &dst) (int) 1 </code></pre> Here '::' in this 'full' address is (correctly) rejected.However, although I haven't verified this on FreeBSD (perhaps someone can?), there's a comment in the libc source suggesting that this will be accepted there<a href="https://github.com/freebsd/freebsd/blob/1d6e4247415d264485ee94b59fdbc12e0c566fd0/lib/libc/inet/inet_pton.c#L127" rel="nofollow">https://github.com/freebsd/freebsd/blob/1d6e4247415d264485ee...</a>Parsing sucks.

throwaway12iiiover 6 years ago

Obviously the point of the article is to not manually parse ip addresses.Well actually, python can do that just as well as all the other languages.<pre><code> >>> import ipaddress >>> ipaddress.ip_address('2001:db8::') IPv6Address('2001:db8::')</code></pre>

评论 #18794019 未加载

评论 #18794022 未加载

评论 #18793960 未加载

pdkl95over 6 years ago

The fundamental problem in the example:<pre><code> leader_host = bigdata.example.org:10443 </code></pre> is ":10443" is not part of the hostname. The field is called "leader_host"; if a port is needed, it should use it's own field instead of trying to overload the host field.<pre><code> leader_host = bigdata.example.org leader_port = 10443 </code></pre> (and as as others have already mentioned, don't write your own parser when they already exist in your stdlib/etc)

ta57746uhhjover 6 years ago

IPv6 was over-engineered. We need IPv5 which just fixes the address space, then have a rethink about v7 ... without all the nonsense.

mrunkelover 6 years ago

Why not split the parameters?Address and port as separate values makes more sense to me.

评论 #18794396 未加载

评论 #18794414 未加载

johnklosover 6 years ago

Canonicalize everything, then compare the canonicalized results. We've got this ;)

peanut-walrusover 6 years ago

':' is also a perfectly valid character in a domain name, so the naive splitting already does not work in the IPv4 ecosystem.

评论 #18794143 未加载

ggmover 6 years ago

I've taken to using python split(':')[:-1] and like constructs

评论 #18793597 未加载

评论 #18793615 未加载

评论 #18793811 未加载

emilfihlmanover 6 years ago

IPv6 is a clusterfuck of "hey lest add x!" and "lets forget reasonable previous experience!".

thisacctforrealover 6 years ago

Call me a luddite but I'm not looking forward to IPv6.How about IPv5; add another byte.We can keep the colons 192:168:1:1:1.