> Simply observe the event in which a user does a query q in Brave and then, within one hour, does the same query on a different search engine. What we do is to move the script that detects bad-queries to the browser, run it against the queries that the user does in real-time and then, when all conditions are met, send the following data back to our servers.<p>Wait. Brave browser sends back to Brave Search engine about your browsing? Other search engines usage, but also crawl pages on your computer to help build their search index?<p>Ref: <a href="https://github.com/brave/web-discovery-project/blob/main/modules/web-discovery-project/sources/README.md">https://github.com/brave/web-discovery-project/blob/main/mod...</a>
> Fair use is a doctrine in the law of the United States that allows limited use of copyrighted material without requiring permission from the rights holders. It provides for the legal, non-licensed citation or incorporation of copyrighted material in another author's work under a four-factor balancing test:<p>> 1) The purpose and character of the use, including <i>whether such use is of a commercial nature or is for nonprofit educational purposes</i><p>> 2) The nature of the copyrighted work<p>> 3) The amount and substantiality of the portion used in relation to the copyrighted work as a whole<p>> 4) The effect of the use upon the potential market for or value of the copyrighted work<p>[emphasis from TFA]<p>HN always talks about derivative work and transformativeness, but never about these. The fourth one especially seems clear in its implications for models.<p>Regardless, it makes it seem much less clear cut than people here often say.
From article:<p>> without any worry for copyright infringement because Brave acts as a middleman.<p>This isn’t how law works. Unless Brave is explicitly indemnifying all their customers (which their lawyers would have to be <i>insane</i> to let them do), any trouble you could get in, is going to be 100% your problem. Pointing the finger at Brave could theoretically get them in trouble too, but would in no way let you off the hook.
I firmly believe that corps like these don't deserve the benefit of the doubt.
Google, Brave and really anyone big enough to allow themselves to do bad things and get away with it must adhere to a standard where they proactively show their stuff doesn't have malicious intents.
The websites a Brave user browses are anonymously relayed to their servers for indexing/training. So, they crawl the web without a crawler and the website operators can't do anything about it.<p>That's genius!
I think this title is overstated. It seems like Brave is trying to do the right thing here vs other companies that don't even make the attempt. (Also, crawling as a service has been a thing for a while.)
Brave continues to be shady. They claim to respect robots.txt but don't identify their crawler if you want to block it.<p>> They don't mention their crawler anywhere in their docs, either. So, if you wanted to block Brave from crawling and indexing and ultimately selling your content to third parties, your only option for the time being would be to block all crawlers, which is how Brave would be able to "respect robots.txt".
Unpopular opinion: the next iteration of privacy laws needs to factor in AI. If AI is allowed to slurp up PII or derogative works and the people defending it defend it with the zeal of cryptobros then we're in for a decade of real pain in terms of both copyright law, PII, and IP exposure.
Why use brave if my info is already being leaked by third parties? E.g. experian. Is it worth the inconvenience and their repeated tricky attempts at monetizing their security conscious niche? Not being facetious, just a real question from a non security conscious person.
My entirely biased opinion is <a href="https://www.mojeek.com/" rel="nofollow noreferrer">https://www.mojeek.com/</a> - a traditional search engine crawler (as in, follow links on the web) that identifies its user agent. Dead Simple. The open web, you can search on it.
How long until IP works its way onto ai training data or ais themselves? Ie that for some specific instance, the training is intentionally wrong, so as to check and prove that there has been a breach of IP.
This discussion on fair use are always quite anglocentric.<p>Atricle 3 and 4 of the EU 'Copyright in the Digital Single Market' give data miners quite extensive rights.<p>Move operation to the EU, train a foundational model, than train a constitutional model based on that.<p>As much as I hate the upcoming AI regulation,
the CDSM is solid.<p><a href="https://academic.oup.com/grurint/article/71/8/685/6650009" rel="nofollow noreferrer">https://academic.oup.com/grurint/article/71/8/685/6650009</a>
<a href="https://eur-lex.europa.eu/eli/dir/2019/790/oj" rel="nofollow noreferrer">https://eur-lex.europa.eu/eli/dir/2019/790/oj</a><p>Update:
Fixed wrong link
It's always surprising to me when I hear people using the brave browser... It's by a company that initially tried to replace their blocked ads with their own "safe and non-intrusive" ads as far as I remember, until they backpaddled because of the outrage.<p>It's also a for-profit company and you're not the customer, as you're not paying them money.<p>I'd be way more worried how they're using the data they're collecting on you vs Google or MS