I feel that a lot of Neil's suggestions while good have many problems with the fact that many of the currently existing datasets explicitly ignore these rules anyways due to their "research" focus. I'm curious to how this will stand in a courtroom and know that it's already unfolding in various legal cases. Sometimes I wish we'd slow down and think about the ethical issues of the actions of what we really are doing. But so far I haven't seen much of anything in regards to that. The OpenAI founder Sam Altman recently got in trouble for this exact issue with his somewhat questionable cryptocurrency in Kenya. I find it doubtful a robot.txt will be enough to stop this new wave.<p><a href="https://techcrunch.com/2023/08/15/worldcoin-in-kenya/" rel="nofollow noreferrer">https://techcrunch.com/2023/08/15/worldcoin-in-kenya/</a>