<i>"... we applied EasyList to both the Alexa 5k, a curated list of the 5,000 most popular sites on the web, and a random sampling of 5,000 sites from the Alexa 1,000,000 (ensuring no duplicate sites). Our measurement was in several steps:</i><p><i>1. Use Selenium and the DevTools Protocol to record every URL requested when rendering and executing a website.</i><p><i>2. Add additional automation to randomly select three distinct same-domain URLs from anchor tags on a page.</i><p><i>3. Used the above automation to visit the homepage of each site, and a maximum of three child pages, and recorded all URLs requested for images, script files, and other web resources.</i><p><i>4. Determine which of those URLs would be blocked by the version of EasyList fetched on that day, using Brave's optimized ad-block implementation.</i><p>...<p><i>We found that the vast majority of EasyList rules are not used when browsing popular websites; 3,268 of 39,198 (~8%) of network and exception rules were used during our crawls (these measurements exclude element rules)."</i><p>That doesn't mean that EasyList is not useful for browsing the rest of the internet.
Much work has been done since this article came out to remove stale filters, see:<p><a href="https://twitter.com/fanboynz/status/1344796683612299265" rel="nofollow">https://twitter.com/fanboynz/status/1344796683612299265</a>
For anyone interested in this blog post here, the full conference paper version is here: <a href="https://www.peteresnyder.com/static/papers/easylist-sigmetrics-2020.pdf" rel="nofollow">https://www.peteresnyder.com/static/papers/easylist-sigmetri...</a>
Can someone explain the steplike shapes in the curve in the "time to filter a request" plot? I was under the impression that ad blockers used hash tables or a similar structure which is agnostic of the address being checked with O(1). Are these some kind of cache misses?