> <i>GET / HTTP/1.0</i><p>> <i>Host: www.youtube.com</i><p>> <i>We send it in 2 parts: first comes GET / HTTP/1.0 \n Host: www.you and second sends as tube.com \n .... In this example, ISP cannot find blocked word YouTube in packets and you can bypass it!</i><p>If you talk to anyone from China that <i>this</i> is how you bypass (HTTP) "deep packet inspection", it would sound incrediblely naive. I'm not criticizing here, thanks for developing an anti-censorship tool, but my point is, any DPI that can be bypassed in this way is simply too outdated, it's far from the state-of-art threats we are facing worldwide.<p>What China does today is what your ISP/government is going to do tomorrow, when they upgrade the equipment. Learning a history lesson from China, can help providing insights for developers in other countries to know where this cat-and-mouse game is heading to...<p>> paulryanrogers: So basically it just does two things: carefully chunking HTTP header packets and encrypted DNS? Not sure this will work for very long.<p>Of course it will not. I'll explain why.<p>---<p>Literally, the same technique was used in China during the early days of Great Firewall, around 2007. At that time, the "censorship stack" was simple, basically, it had...<p>* A brute-force IP blocking list<p>This is a constantly updated list of IP addresses of "unwanted" web servers, such as YouTube or Facebook. They are distributed via the BGP protocol, just like how normal routing information is distributed. Once your server enters this blacklist, nothing can be done. Not all unwanted websites enter the list due to its computational/storage costs.<p>* A DNS poisoning infrastructure<p>A list of "unwanted" domain names are maintained. These domain names are entered to the national DNS root server as records with bogus IP addresses. It was used more widely than the IP blocklist, since it has zero cost to operate, but it can only block websites in the list and it takes time for the censor to be aware of a target's existence.<p>* A naive keyword filtering system.<p>All outgoing international traffic is mirrored for inspection. A keyword inspection system attempts to match the URLs in HTTP requests against a blacklist of unwanted keywords. Rumors said the string matching was performed by hardware in ASIC/FPGA, allowing enormous throughput.<p>* A TCP reset attack system<p>Once an unwanted TCP connection is identified by the keyword inspection system, the TCP Reset attack system fires a bogus RST packet to your computer, fooled by the packet, your operating system will voluntarily work against you and terminate the connection, saving the censors' CPU time. The keyword filtering system paired with reset attack was the preferred way to carry out censorship.<p>That's all. The principle of operation was simple and easy to understand. So what were the options for bypassing it? There were a lot. To begin with, the blocked IP addresses were blocked, you could do nothing about it. But in the earliest day, accessing them was as simple as finding a random HTTP proxy server. Later, the inspection system was upgraded to match HTTP proxy requests. Then, you could simply play some magic tricks with your HTTP requests, like the example in the beginning, so that your request wouldn't trigger a match. Around the same time, in-browser web proxy tools became popular, they were PHP scripts running on a web server that fetched pages. However, they became useless when the keyword matching system was upgraded to match the content of the entire page, not simply the requests (remember, few sites had HTTPS). At this point, all plaintext proxy techniques and HTTP request "massaging" techniques were all officially dead.<p>Some naive rot13-like techniques were later implemented to some web proxies, HTTPS web proxies were also a thing, but they saw limited use.<p>* New: A complete keyword filtering system - Inspect all HTML pages (Was: A naive keyword filtering system)<p>Another target to attack was the DNS poisoning system, sometimes all you needed was a correct IP address, since not all IPs were included in the blocklist due to the costs. Initially, all one needed to do was modifying one nameserver to 8.8.8.8. However, countermeasures were quickly deployed. A simple countermeasure was rerouting 8.8.8.8 to the ISP's nameserver, continued feeding the same bogus records to you. Nevertheless, there were always alternative resolvers to use. So the system was upgraded to provide a DNS spoofing infrastructure - at the instant an outgoing DNS packet is detected, the spoofing system would immediately answer with a bogus packet. The real packet would arrive at a hundred milliseconds later, but it would be too late, your OS had already accepted the bogus result.<p>And ironically, even if DNSSEC was widely supported (it was not), it couldn't do anything but returning an SERVFAIL, since DNSSEC can only check whether the result was true, dropping the bogus packet and accepting the true one was outside the capabilities of a standard DNSSEC implementation.<p>* New: A Real-time DNS Spoofing System<p>Better tools were developed later, that acted like a transparent resolver between the upstream resolver and your computer, that identified the bogus results to drop them, but the use was limited. Also, at this point, the IP blocklist has been greatly expanded. Even if a correct IP could be obtained, it was still inaccessible. Around 2008 or so, a special open source project was launched by developers in China - /etc/hosts list, whenever someone found a Facebook IP address that was not in the blocklist yet, one sent patches to the project. There were also shell scripts to keep your list up-to-date.<p>However, a /etc/hosts list was useful but its usefulness was limited. First, it was a matter of time before a new IP address was blocked. Also, a working IP address still was restricted by the same keyword filtering system.<p>* New: Expanded IP Blocklist.<p>Some people also realized that the firewall was only able to terminate a connection by fooling the operating system. Soon, iptables rules for blocking RST packets appeared in technical blogs. By ignoring all RST packets, one essentially gained immunity at the expense of network stability, as legitimate RSTs were also ignored. Soon, the censorship responded by upgrading the reset attack system, so that RST packets were sent to both directions - even if you ignored RST, the server on the other side would still terminate it. Also, RST was now "latched-on" for a limited time, when the first RST was triggered, the target remained inaccessible in several minutes.<p>* New: Bidirectional TCP Reset Attack<p>* New: "Latched-On" Reset Attack<p>When HTTPS was enabled, it was impossible to perform keyword inspection in the HTML pages - at this time, censor sometimes still wished to allow partial access, only triggering the block when detected a match. This strategy cannot be applied to HTTPS, since the content was all encrypted. Some people realized some popular websites supported HTTPS but not enabled it by default, such as Wikipedia. The Great Firewall responded by implementing a HTTPS certificate matching subsystem in the keyword matching system, when a particular certificate was matched, you were greeted by a TCP RST packet (this system has been removed later when HTTPS saw widespread use).<p>* New: Certificate-Based HTTPS Blocking System<p>At this point, around 2010, the only reliable way to browse the web was using a fully-encrypted proxy, such as SSH dynamic port forwarding or a VPN, which required purchasing a VPS from a hosting provider. SSH was more popular due to its ease of use - all one needed was finding a SSH server and ran "ssh -D 1337", so that port 1337 would be a SOCKS5 proxy provided by OpenSSH. OpenVPN was reserved for heavy web users, since it's more difficult to setup, but had better performance.<p>From the beginning to the 2010s, anyone who was using VPN or SSH can enjoy reliable web browsing (only be disturbed from time to time due to the overloaded international bandwidth). However, the good days came to an end when the Great Firewall implemented a real-time traffic classifier, it was first applied to SSH. It observed the SSH packets in real-time and attempted to identify whether an overlay proxy traffic was carried on top of it. The blocking mechanism was enhanced as well, now it was able to dynamically inserting null route entries when it decided that the communication with a server was unwanted. The IP blocking system was also improved, now it was able to collect unwanted IP addresses at a faster rate with help of the traffic classifier. If you used SSH as a proxy, after a while the connection would be identified, with all packets dropped, repeated offenses would earn you a permanent IP block. For VPNs, the firewall implemented a real-time classifier to detect OpenVPN's TLS handshakes. When handshakes were detected, a RST packet is sent (or if you use UDP, all packets are dropped). Repeated offenses would earn you a permanent IP block as well.<p>New: Real-Time Traffic Classifier<p>New: Real-Time IP Blocking<p>New: Actively Updated IP Blocklist using Classifiers as Feedback<p>Traffic classifiers would later be expanded to cover HTTPS-in-HTTPS as well, so a naive HTTPS proxy wouldn't work, and possibly have other features, it's a mystery.<p>BTW, after Google exited from China, the HTTPS version was immediately blocked, and for HTTP, a ridiculous keyword blocklist was enforced and it generated huge amount of false-positive RSTs for harmless words, apparently a deliberate decision, preferring false-positive over false-negative. Eventually, all Google services had been permanently blocked. The IP block became extensive, major websites have been completely blocked, the unblocked sites were only exceptions. For most people, the arrival of widely-used HTTPS was too late and useless, since IPs were blocked. And as mentioned, SSH and VPNs were classified and blocked as well.<p>This was when a new generation of proxy tools started to gain popularity,