This just proves all the "suspicions" privacy-conscious users have had about large corporations fingerprinting users, often in very obvious ways. There's often no better place to find ideas for surveillance than the people conscious about being surveilled.
Seems like a lot of it came from them inadvertently posting some internal API to GitHub:
<a href="https://github.com/googleapis/elixir-google-api/commit/078b497fceb1011ee26e094029ce67e6b6778220">https://github.com/googleapis/elixir-google-api/commit/078b4...</a>
I believe these are the leaked docs: <a href="https://hexdocs.pm/google_api_content_warehouse/0.4.0/api-reference.html" rel="nofollow">https://hexdocs.pm/google_api_content_warehouse/0.4.0/api-re...</a>
> My anonymous source claimed that way back in 2005, Google wanted the full clickstream of billions of Internet users, and with Chrome, they’ve now got it. The API documents suggest Google calculates several types of metrics that can be called using Chrome views related to both individual pages and entire domains.<p>What answer do the engineers at google working on this have for this violation of privacy?
Sometimes I wonder how much better the internet would be hits on Google weren't directly tied to revenue from Google itself through its ad program. I am certain Google has made the internet and the world a worse place to live.
I work in search and didn't find anything surprising in here. But that's mostly because I've just assumed Google has been lying for years about many things, such as not using click data or Chrome data.<p>I've directly seen people who have successfully manipulated search rankings by having logged-in chrome users search for a term, and then click on a given page. Works like a charm (though may not stick once the manipulation is done, unless organic users also prefer it).
If anyone is surprised about chrome sending urls to Google, you can turn the “feature” off by unchecking “Make searches and browsing better” in the sync section of Google chrome settings.<p>Creepy.
> Thousands of documents, which appear to come from Google’s internal Content API Warehouse, were released March 13 on Github by an automated bot called yoshi-code-bot<p>Does anyone know more about yoshi-code-bot and how were these documents suddenly published?<p>Was it a script misconfiguration? A manual push? Something else?
From the article:<p>Boosting "organic traffic":<p>- Brand matters more than anything else<p>- Experience, expertise, authoritativeness, and trustworthiness (“E-E-A-T”) might not matter as directly as some SEOs think.<p>- Content and links are secondary when user intention around navigation (and the patterns that intent creates) are present.<p>- Classic ranking factors: PageRank, anchors (topical PageRank based on the anchor text of the link), and text-matching have been waning in importance for years. But Page Titles are still quite important.<p>- For most small and medium businesses and newer creators/publishers, SEO is likely to show poor returns until you’ve established credibility, navigational demand, and a strong reputation among a sizable audience.<p>TL;DR: Clickbait + bot farms are the way to go. No wonder the internet is going to shit.
FYI, it's much easier to read the linked GitHub code via the published docs at <a href="https://hexdocs.pm/google_api_content_warehouse/0.4.0/api-reference.html" rel="nofollow">https://hexdocs.pm/google_api_content_warehouse/0.4.0/api-re...</a>
Most of the factors in ranking a page are no surprise. But i was surprised that having Product reviews on your site is apparently a demotion? Surely, many people are searching to find just that?
I would usually call this a dupe but this article and the other one from SparkToro are completely different even if they are on the same topic.<p>Haven’t had a chance to look at the API myself but the first impressions are that a lot of this was suspected by SEOs, but Google kept rejecting the ideas. Looks like clicks increase ranking for sure, which means click farms definitely have a legitimate business solution to offer.
I found it interesting that the docs mention "site2vec" scores. This implies, I think, a variant of word2vec or document2vec, but for the full site; so probably a vector sum of the doc2vec scores of all individual pages?
> Successful clicks matter.<p>I wonder about this. If I click a link and read it and I find that it's garbage (e.g. got ranked based on SEO rather than useful content) does it count as a successful click? Worse yet, some of these sites have blatant errors that are only discovered after examination.<p>This is relative to technical subject matter. Other searches, such as shopping may not suffer this kind of problem (or I have not noticed it.)<p>I also wonder how Google knows a click is successful. If I open a link in another tab, does the browser tell Google how long I lingered on the site? Perhaps Chrome does but I use Firefox.
Something like this I guess:<p>var words = query.split<p>var results = executeQuery( Select * from AdWords aw where word in query inner join adlinks al on aw.id = al.id return al.url, al.desc)<p>If (results.size < 30) {
// todo call search engine
}<p>Return results
It doesn't look like a leak but a misdeployment.<p>Same service wrappers from two years ago:
<a href="https://github.com/googleapis/google-api-php-client-services/blob/670c3854fffc2f642efa86b083e2664fd55435e1/src/Contentwarehouse/QualityNavboostCrapsCrapsClickSignals.php">https://github.com/googleapis/google-api-php-client-services...</a>
> Prior to the email and call, I had neither met nor heard of the person who emailed me about this leak. They asked that their identity remain veiled<p>And yet the journalist included a screenshot with one of the weakest blurs I've ever seen... Why would you not excise the person's video portion completely? What good does it serve to have it included in the story? Even if that portion is faked, why would you offer potential signals like skin complexion, hair color, background picture, etc.? Why...
Hopefully this doesn’t surprise anyone..if Google actually told us correct information about how the search algorithm works it would be abused immediately
What I find most interesting about this is that a lot of supposed "smart" algorithms of Big Tech are in fact a patchwork of "dumb" rules rules and human-picked winners. This would explain why the quality of search results is failing to keep up with developments in LLMs.<p>This also explains why it's impossible for incumbents to unseat the winners in many search categories -- <i>because they've literally been picked as the winners by humans at Google.</i><p>Looking at my Twitter/X feed, I also see an oddly similar dynamic. Certain accounts appear to have been manually boosted, showing up all the time -- whereas others posting even the same exact content will never appear.<p>Silicon valley will loudly tell you all about how wonderful they are at "democratizing," however, if you look under the surface it appears they're just hand picking the winners.
Maybe this is an unpopular opinion, but if a search algorithm is truly designed to showcase the best content, then making it transparent shouldn't lead to manipulation
> A sample of statements from Google representatives (Matt Cutts, Gary Ilyes, and John Mueller) denying the use of click-based user signals in rankings over the years.
If there are really 14,000 attributes, most of them will have a weight near 0, thus are irrelevant. If they would be all heavy weighted, the ranking would be rendered irrelevant due to the sheer amount of attributes.