Facebook disqualifies leaders of Deepfake Detection Challenge for rule breach

163 pointsby baylearnalmost 5 years ago

12 comments

vivekkalyanalmost 5 years ago

The rules seem pretty clear that consent is required from any persons appearing in any external datasets that are required. The winners scraped data from Youtube videos so I am not not sure the issue is.The more worrying takeaway is that the winners scraped videos from people who clearly had no intention of their videos being used for a deepfake detection algorithm. Yet they did not think of the ethical considerations of using that data (did everyone in the video even have a say in the video being uploaded?). I think Kaggle disqualifying the team is the right move (even if it's a painful one for the winners).

评论 #23525969 未加载

评论 #23525943 未加载

评论 #23527567 未加载

评论 #23525889 未加载

评论 #23525884 未加载

评论 #23528683 未加载

phhalmost 5 years ago

IMO the real issue is that Facebook wanted a commercially-usable product, and they thought Kaggle had all the safeguards for that, but no:Because of RGPD and friends, Facebook can't store those photos, even though their license are permissive, and respects Kaggle's rules.This clearly shows what Kaggle is: It is a way to get very cheap and high quality data-science work. It's not for hiring people, not for truly helping the research community, not for to help people learn. Nope, just cheap workers.It really feels like Facebook have there whole deepfake detection strategy here! They put like 2M$ on the table to solve an issue that will(?) plague their whole multi-billion platform.

评论 #23526420 未加载

评论 #23526132 未加载

评论 #23525883 未加载

Trasteralmost 5 years ago

I think the issue here is that Kaggle's statement that the top teams broke the rules is just very opaque. They stated they broke the rules on external data. The article then goes on to talk about what data the teams used and what licenses it has, and what data the teams were asked to provide. But it really is almost impossible to know what the concerns of FB/Kaggle were without them specifically stating them. Clearly whatever the issue was it didn't effect every team - so it may be there were details of the licenses that the disqualified teams used that weren't good enough. As I say though, it's very difficult to say and it's kind of hard to think of a reason Facebook would arbitrarily disqualify teams for no good reason. It's perfectly possible FB were concerned about image rights or something else, but people seem to be perfectly happy just assuming some grand conspiracy.

nlalmost 5 years ago

For those who aren't aware, many Kaggle competitions allow external data (this one did) but require disclosure, and often there is some back-and-forth to clarify the exact details of what is used.In this case the disqualified participants are well respected and haven't previously been involved in any dubious behavior. They properly disclosed what they were doing and despite there being other clarifications there was none that person releases for CC-BY data would be required.Obviously this is a ridiculous requirement. There's no way for that team to be able to do that, but they did take proper care to use data that Facebook could reasonably use. It's unreasonable for FB/Kaggle to expect participants in a data science competition to suddenly know what Facebook's data ethics department is demanding this week outside what is legally required.

评论 #23526136 未加载

评论 #23526357 未加载

评论 #23525930 未加载

dx034almost 5 years ago

Why would written consent be needed from people appearing on pictures with CC-BY licence? Was this just an overreaction or is there any legal risk for Facebook using those pictures without additional consent?

评论 #23525859 未加载

评论 #23525690 未加载

评论 #23527271 未加载

评论 #23531799 未加载

sillysaurusxalmost 5 years ago

As a machine learning researcher, where exactly am I supposed to get a dataset that complies with Facebook's/Kaggle's rules in this case?No one is disputing that the team was disqualified fair and square. But this rule – where you must get consent from every single person appearing in your training data – seems neither standard nor sensible.Firstly, as someone else pointed out, copyright doesn't apply here at all. You can use whatever training data you want as long as your model is sufficiently transformative. OpenAI used terabytes of copyrighted music in their training for OpenAI Jukebox; they certainly didn't get a license from every musician.Beyond that – big companies don't play by this rule! If a BigCo wants to train on some data, you bet they'll be using it. When's the last time Google sent you an email like "Are you ok with us using your flickr photos to help improve Google Image Search?"So my question is simple: in the context of this competition, where should I go to get a decent dataset? The winners were disqualified doing exactly what I would have done. What's the alternative?Also, yes, ethics are a concern. If you're concerned about ethics, aim it at big companies, not us small fries that are merely trying to win some cash. Again, no one disputes that they were disqualified for valid reasons. But it has nothing to do with ethics and everything to do with the artificial constraints imposed by this competition.

评论 #23527259 未加载

评论 #23529635 未加载

RNCTXalmost 5 years ago

Ironic rules from Facebook considering Facebook has been caught multiple times harvesting data from other apps on phones without permission.

ig1almost 5 years ago

As someone who previously competed on Kaggle, this seems a reasonable decision. In previous contests it was pretty clear if you wanted to do something that used third party data you should get pre-clearance for it from Kaggle/contest organizers.The disqualified competitors here seem to have assumed that CC-BY meant you can do whatever you want with data, when actually that's far from true. CC-BY is solely about copyright and doesn't address other rights (e.g. model release, gdpr, etc.)

withinboredomalmost 5 years ago

> and each individual participant further waives all rights to have damages multiplied or increased.What about divided? by a fraction? :trollface: Does that fall under "increased"?

amitportalmost 5 years ago

This competition should not be about scraping and tagging skills (impressive as they may).So maybe they'll get to win on lack of clarity in the specifications, but that will be unfortunate.

评论 #23525807 未加载

评论 #23525801 未加载

UweSchmidtalmost 5 years ago

It's unfortunate that the title leads with the "backlash" from the thing that happened, not the thing itself ("Kaggle disqualifies participants over usage of external data usage"). This suggests that the decision about the case has already been made by a plurality and with fervor. In reality this article is the first time many HN readers learn about this for the first time.While I'm sure it's just accidental in this case, I see this all over the news and suspect attempts to steer public opinion by condemning people or institions in the headline, before the news is actually reported on. A forum of independent thinkers should insist on not having the news presented to them in a potentially manipulative manner.

评论 #23526108 未加载

评论 #23526061 未加载

teddyhalmost 5 years ago

One possible reason for the “no external datasets” rule might be that the data which Facebook uses to judge the competition is also taken from the same publicly available sources. If this is so, then if somebody uses those same datasets, they would have trained to the test, so to speak, which obviously would not lead to good outcomes when run against future data.

评论 #23525668 未加载