What's fascinating here is AdFlush is a classical feature engineering approach: define a bunch of features on the data manually, and then use ML to figure out the most useful / impactful ones. This is not the "throw terabytes of data and see what happens" approach we see with LLMs. It's a bit funny to even point this out because I don't recall the last time a feature-engineered ML project made it to the HN front page.<p>Features can be brittle, but they are understandable. The paper's appendix [1] lists the 27 features that will likely make a request/resource "ad-related". These include interesting ones like JS AST depth, average JS identifier length, the "bracket to dot notations ration in JS", and a number of graph measures for the graph of scripts.<p>And contrary to what comments in this thread are saying, they do compare against a blocklist-based adblocker: uBlock Origin. That's in section 5.5. They say they outperform uBlock Origin. But even they say they don't reduce overall page time bc their algorithm is expensive.<p>[1]: <a href="https://dl.acm.org/doi/pdf/10.1145/3589334.3645698" rel="nofollow">https://dl.acm.org/doi/pdf/10.1145/3589334.3645698</a>
AdFlush (F1 Score: 0.98) seems to do better than some other adblockers: AdGraph (F1 score: 0.93), WebGraph (F1 score: 0.90), and WTAgraph (F1 score: 0.84), but it begs the question: why not compare to the most popular adblockers: uBlock Origin, Adblock Plus etc.<p>I think the authors want to compare apples with apples, so they only compare their algorithm to other adblockers that use algorithms, as opposed to those which use crowdsourced lists. The paper somewhat acknowledges this:<p>> <i>However, manual maintenance of these filter lists requires significant human effort</i><p>Seems like one of those tasks where crowdsourcing scales so nicely (only one person has to report an ad for it to go into a crowdsourced list that blocks it for millions of others) that it makes an algorithmic approach unnecessary.
Without comparison to the accuracy of crowed sourced blocklists it's not that valuable.
Maybe there is a group of hopelessly overworked blocklist maintainers/contributors, that I'm not aware of. If so, their cries for help don't seem to make the HN front page.
From a user perspective, blocking banner ads feels like a basically solved problem. I think the real pain point here is that for large chunks of the web, there is no distinction between ads and content.
> We tested AdFlush on a dataset of 10,000 real-world websites, achieving an F1 score of 0.98, thereby outperforming AdGraph (F1 score: 0.93), WebGraph (F1 score: 0.90), and WTAgraph (F1 score: 0.84). Additionally, AdFlush significantly reduces computational overhead, requiring 56% less CPU and 80% less memory than AdGraph. We also assessed AdFlush's robustness against adversarial manipulations, demonstrating superior resilience with F1 scores ranging from 0.89 to 0.98<p>Neat results, I wonder how it compares to uBO or the different blacklists. I assume it self-update with newer techniques and can detect certain patterns?
The future is here.<p>If I recall, in Permutation City there's some part where somebody deals with spam with AI. The user tries to use a simulation to listen to potential spam to filter it, while the spam tries to figure out whether a real person is listening to it and only tries to spam when a real person is there.<p>Or something along those lines, it's been a long time since I read it.
Blocking image ads seems like a relatively well-solved problem. I mean, speaking as someone who can't stand ads, I don't see very many of them anymore when I'm on desktop.<p>The harder, more pernicious type of ads are the modals that pop up when your cursor moves toward the back button, or when you scroll down a certain distance on the page. "Wait! Before you go, take a moment to give us your email address!"<p>Those can be blocked, but by the time you've seen them, they've already done all the damage they can do—which is to say, they've annoyed you.<p>I wish somebody could come up with a way to detect and stop them. I spent an afternoon trying to come up with reusable techniques to detect these popups, but there are just too many possibilities.
Always a joy to see efforts in the ongoing battle against advertisements.<p>There are few things I feel radical about, and Ads are one of them. I believe they are a drain in several ways:<p>They waste computational resources and electricity on both ends. They compromise the visual design and layout of webpages. They distract and take mental energy away from the user. They make the internet (and anywhere ads exist) more "ugly" and less aesthetically pleasing - which negatively impacts mental health. They often sell low-quality services/products or outright scams, which harms those least educated and poorest individuals.<p>Death to advertisement! On billboards! On television! On the internet!<p>Ads are a parasite on the human mind that need to go away, forever.
I use a combination of UBO, PiHole and AdGuard on my mobile devices. Can't say I've seen an ad in the last year. Is this trying to solve an existing problem or speculating on where things could go in future?
Looks like this is the associated repo on GitHub: <a href="https://github.com/SKKU-SecLab/AdFlush">https://github.com/SKKU-SecLab/AdFlush</a>
So AdFlush beats uBlock Origin with a marginal detection rate advantage of 0.86 vs 0.84, at the cost of significant performance overhead: median 2.7s load time (no ad block); 2.2s (uBO); 6.6s (AdFlush clean); 3.4s (AdFlush cached).<p>I'd like to see a tandem uBO+AdFlush extension that just enables uBO by default, with a "I still see ADs!" button in the extension UI that refreshes with AdFlush enabled and auto-submits any missed ads to a new FlushList filter list.
So, this begs the question when we'll see ML put in place to avoid AdBlocker detection. Or ads as we know them just disappear from the web and are replaced with other kinds of ML-enabled ads. I imagine deep-fake models used for interchangeable product placement in videos or pictures or so.
Nice! I’d love to know if AI-Ad / tracking / telemetry / etc blocking could be improved for MITM network layer filtering not just the browser.
Oh boy, that didn't take long. Just last year I made Butter <a href="https://butter.sonnet.io" rel="nofollow">https://butter.sonnet.io</a> as an excuse to talk about this:<p>> This project is a half-serious, half-assed attempt to demonstrate that in the next few years the process of blocking this type of content could be almost entirely automated. Yes, it would be wasteful from a computational and human potential perspective, and otherwise completely unnecessary, but hey, more money would change hands!
> We tested AdFlush on a dataset of 10,000 real-world websites, achieving an F1 score of 0.98, thereby outperforming AdGraph (F1 score: 0.93), WebGraph (F1 score: 0.90), and WTAgraph (F1 score: 0.84).<p>... Has anyone even heard of these ad blockers before?