AdFlush

276 pointsby grac312 months ago

19 comments

pradn12 months ago

What's fascinating here is AdFlush is a classical feature engineering approach: define a bunch of features on the data manually, and then use ML to figure out the most useful / impactful ones. This is not the "throw terabytes of data and see what happens" approach we see with LLMs. It's a bit funny to even point this out because I don't recall the last time a feature-engineered ML project made it to the HN front page.Features can be brittle, but they are understandable. The paper's appendix [1] lists the 27 features that will likely make a request/resource "ad-related". These include interesting ones like JS AST depth, average JS identifier length, the "bracket to dot notations ration in JS", and a number of graph measures for the graph of scripts.And contrary to what comments in this thread are saying, they do compare against a blocklist-based adblocker: uBlock Origin. That's in section 5.5. They say they outperform uBlock Origin. But even they say they don't reduce overall page time bc their algorithm is expensive.[1]: <a href="https://dl.acm.org/doi/pdf/10.1145/3589334.3645698" rel="nofollow">https://dl.acm.org/doi/pdf/10.1145/3589334.3645698</a>

评论 #40516026 未加载

评论 #40516108 未加载

nomilk12 months ago

AdFlush (F1 Score: 0.98) seems to do better than some other adblockers: AdGraph (F1 score: 0.93), WebGraph (F1 score: 0.90), and WTAgraph (F1 score: 0.84), but it begs the question: why not compare to the most popular adblockers: uBlock Origin, Adblock Plus etc.I think the authors want to compare apples with apples, so they only compare their algorithm to other adblockers that use algorithms, as opposed to those which use crowdsourced lists. The paper somewhat acknowledges this:> However, manual maintenance of these filter lists requires significant human effortSeems like one of those tasks where crowdsourcing scales so nicely (only one person has to report an ad for it to go into a crowdsourced list that blocks it for millions of others) that it makes an algorithmic approach unnecessary.

评论 #40511977 未加载

评论 #40511734 未加载

评论 #40512541 未加载

评论 #40511674 未加载

YmiYugy12 months ago

Without comparison to the accuracy of crowed sourced blocklists it's not that valuable. Maybe there is a group of hopelessly overworked blocklist maintainers/contributors, that I'm not aware of. If so, their cries for help don't seem to make the HN front page. From a user perspective, blocking banner ads feels like a basically solved problem. I think the real pain point here is that for large chunks of the web, there is no distinction between ads and content.

评论 #40512647 未加载

3abiton12 months ago

> We tested AdFlush on a dataset of 10,000 real-world websites, achieving an F1 score of 0.98, thereby outperforming AdGraph (F1 score: 0.93), WebGraph (F1 score: 0.90), and WTAgraph (F1 score: 0.84). Additionally, AdFlush significantly reduces computational overhead, requiring 56% less CPU and 80% less memory than AdGraph. We also assessed AdFlush's robustness against adversarial manipulations, demonstrating superior resilience with F1 scores ranging from 0.89 to 0.98Neat results, I wonder how it compares to uBO or the different blacklists. I assume it self-update with newer techniques and can detect certain patterns?

评论 #40515154 未加载

dale_glass12 months ago

The future is here.If I recall, in Permutation City there's some part where somebody deals with spam with AI. The user tries to use a simulation to listen to potential spam to filter it, while the spam tries to figure out whether a real person is listening to it and only tries to spam when a real person is there.Or something along those lines, it's been a long time since I read it.

karaterobot12 months ago

Blocking image ads seems like a relatively well-solved problem. I mean, speaking as someone who can't stand ads, I don't see very many of them anymore when I'm on desktop.The harder, more pernicious type of ads are the modals that pop up when your cursor moves toward the back button, or when you scroll down a certain distance on the page. "Wait! Before you go, take a moment to give us your email address!"Those can be blocked, but by the time you've seen them, they've already done all the damage they can do—which is to say, they've annoyed you.I wish somebody could come up with a way to detect and stop them. I spent an afternoon trying to come up with reusable techniques to detect these popups, but there are just too many possibilities.

Night_Thastus12 months ago

Always a joy to see efforts in the ongoing battle against advertisements.There are few things I feel radical about, and Ads are one of them. I believe they are a drain in several ways:They waste computational resources and electricity on both ends. They compromise the visual design and layout of webpages. They distract and take mental energy away from the user. They make the internet (and anywhere ads exist) more "ugly" and less aesthetically pleasing - which negatively impacts mental health. They often sell low-quality services/products or outright scams, which harms those least educated and poorest individuals.Death to advertisement! On billboards! On television! On the internet!Ads are a parasite on the human mind that need to go away, forever.

评论 #40518388 未加载

评论 #40515633 未加载

评论 #40530466 未加载

评论 #40517581 未加载

tjpnz12 months ago

I use a combination of UBO, PiHole and AdGuard on my mobile devices. Can't say I've seen an ad in the last year. Is this trying to solve an existing problem or speculating on where things could go in future?

评论 #40512007 未加载

alexcason12 months ago

Looks like this is the associated repo on GitHub: <a href="https://github.com/SKKU-SecLab/AdFlush">https://github.com/SKKU-SecLab/AdFlush</a>

评论 #40514604 未加载

infogulch12 months ago

So AdFlush beats uBlock Origin with a marginal detection rate advantage of 0.86 vs 0.84, at the cost of significant performance overhead: median 2.7s load time (no ad block); 2.2s (uBO); 6.6s (AdFlush clean); 3.4s (AdFlush cached).I'd like to see a tandem uBO+AdFlush extension that just enables uBO by default, with a "I still see ADs!" button in the extension UI that refreshes with AdFlush enabled and auto-submits any missed ads to a new FlushList filter list.

jarbus12 months ago

I didn't realize this was an active area of research, love this.

cimnine12 months ago

So, this begs the question when we'll see ML put in place to avoid AdBlocker detection. Or ads as we know them just disappear from the web and are replaced with other kinds of ML-enabled ads. I imagine deep-fake models used for interchangeable product placement in videos or pictures or so.

h4kor12 months ago

How does this compare to list based solutions? An overblocking/underblocking comparison would be great

gastonmorixe12 months ago

Nice! I’d love to know if AI-Ad / tracking / telemetry / etc blocking could be improved for MITM network layer filtering not just the browser.

rpastuszak12 months ago

Oh boy, that didn't take long. Just last year I made Butter <a href="https://butter.sonnet.io" rel="nofollow">https://butter.sonnet.io</a> as an excuse to talk about this:> This project is a half-serious, half-assed attempt to demonstrate that in the next few years the process of blocking this type of content could be almost entirely automated. Yes, it would be wasteful from a computational and human potential perspective, and otherwise completely unnecessary, but hey, more money would change hands!

mannycalavera4212 months ago

<a href="https://chromewebstore.google.com/search/adflush" rel="nofollow">https://chromewebstore.google.com/search/adflush</a><a href="https://imgflip.com/i/8s3nur" rel="nofollow">https://imgflip.com/i/8s3nur</a>

评论 #40514652 未加载

Havoc12 months ago

How realtime is this? Or well enough to not be noticeable while browsing

评论 #40511656 未加载

flakiness12 months ago

This can be a Copilot+PC's killer feature :-)

seized12 months ago

评论 #40516900 未加载