Anonymous Source Shared Leaked Google Search API Documents

333 pointsby andrewfongabout 1 year ago

34 comments

precomputeabout 1 year ago

This just proves all the "suspicions" privacy-conscious users have had about large corporations fingerprinting users, often in very obvious ways. There's often no better place to find ideas for surveillance than the people conscious about being surveilled.

评论 #40502508 未加载

theolivenbaumabout 1 year ago

Seems like a lot of it came from them inadvertently posting some internal API to GitHub: <a href="https://github.com/googleapis/elixir-google-api/commit/078b497fceb1011ee26e094029ce67e6b6778220">https://github.com/googleapis/elixir-google-api/commit/078b4...</a>

评论 #40514932 未加载

评论 #40506785 未加载

评论 #40498058 未加载

xnxabout 1 year ago

I believe these are the leaked docs: <a href="https://hexdocs.pm/google_api_content_warehouse/0.4.0/api-reference.html" rel="nofollow">https://hexdocs.pm/google_api_content_warehouse/0.4.0/api-re...</a>

precomputeabout 1 year ago

> My anonymous source claimed that way back in 2005, Google wanted the full clickstream of billions of Internet users, and with Chrome, they’ve now got it. The API documents suggest Google calculates several types of metrics that can be called using Chrome views related to both individual pages and entire domains.What answer do the engineers at google working on this have for this violation of privacy?

评论 #40517956 未加载

评论 #40501220 未加载

评论 #40498840 未加载

评论 #40498988 未加载

评论 #40510660 未加载

评论 #40498428 未加载

vouaobrasilabout 1 year ago

Sometimes I wonder how much better the internet would be hits on Google weren't directly tied to revenue from Google itself through its ad program. I am certain Google has made the internet and the world a worse place to live.

评论 #40510691 未加载

评论 #40510789 未加载

评论 #40510870 未加载

评论 #40510695 未加载

评论 #40510711 未加载

评论 #40510987 未加载

评论 #40510700 未加载

评论 #40510696 未加载

评论 #40510667 未加载

评论 #40517185 未加载

nsmog767about 1 year ago

I work in search and didn't find anything surprising in here. But that's mostly because I've just assumed Google has been lying for years about many things, such as not using click data or Chrome data.I've directly seen people who have successfully manipulated search rankings by having logged-in chrome users search for a term, and then click on a given page. Works like a charm (though may not stick once the manipulation is done, unless organic users also prefer it).

ec109685about 1 year ago

If anyone is surprised about chrome sending urls to Google, you can turn the “feature” off by unchecking “Make searches and browsing better” in the sync section of Google chrome settings.Creepy.

评论 #40500263 未加载

评论 #40498419 未加载

评论 #40577051 未加载

评论 #40498363 未加载

评论 #40500000 未加载

thih9about 1 year ago

> Thousands of documents, which appear to come from Google’s internal Content API Warehouse, were released March 13 on Github by an automated bot called yoshi-code-botDoes anyone know more about yoshi-code-bot and how were these documents suddenly published?Was it a script misconfiguration? A manual push? Something else?

评论 #40510609 未加载

ilrwbwrkhvabout 1 year ago

And that's why if a developer doesn't use Firefox and uses Chrome, they are just helping a monopoly take over everything and make a mess.

评论 #40497601 未加载

评论 #40502105 未加载

precomputeabout 1 year ago

From the article:Boosting "organic traffic":- Brand matters more than anything else- Experience, expertise, authoritativeness, and trustworthiness (“E-E-A-T”) might not matter as directly as some SEOs think.- Content and links are secondary when user intention around navigation (and the patterns that intent creates) are present.- Classic ranking factors: PageRank, anchors (topical PageRank based on the anchor text of the link), and text-matching have been waning in importance for years. But Page Titles are still quite important.- For most small and medium businesses and newer creators/publishers, SEO is likely to show poor returns until you’ve established credibility, navigational demand, and a strong reputation among a sizable audience.TL;DR: Clickbait + bot farms are the way to go. No wonder the internet is going to shit.

BillFranklinabout 1 year ago

FYI, it's much easier to read the linked GitHub code via the published docs at <a href="https://hexdocs.pm/google_api_content_warehouse/0.4.0/api-reference.html" rel="nofollow">https://hexdocs.pm/google_api_content_warehouse/0.4.0/api-re...</a>

评论 #40510770 未加载

llmblockchainabout 1 year ago

> GoogleApi.ContentWarehouse.V1.Model.AppsPeopleOzExternalMergedpeopleapiAboutMeExtendedDataPhotosCompareDataDiffDataJava, is that you?!

评论 #40501599 未加载

评论 #40500205 未加载

isaacfrondabout 1 year ago

Most of the factors in ranking a page are no surprise. But i was surprised that having Product reviews on your site is apparently a demotion? Surely, many people are searching to find just that?

评论 #40510666 未加载

评论 #40510327 未加载

评论 #40511135 未加载

评论 #40510861 未加载

评论 #40510610 未加载

评论 #40510274 未加载

评论 #40511342 未加载

skilledabout 1 year ago

I would usually call this a dupe but this article and the other one from SparkToro are completely different even if they are on the same topic.Haven’t had a chance to look at the API myself but the first impressions are that a lot of this was suspected by SEOs, but Google kept rejecting the ideas. Looks like clicks increase ranking for sure, which means click farms definitely have a legitimate business solution to offer.

JSDevOpsabout 1 year ago

Seriously considering switching back to Firefox after all these years.

评论 #40498333 未加载

评论 #40518205 未加载

评论 #40500938 未加载

评论 #40518914 未加载

评论 #40498456 未加载

评论 #40510689 未加载

评论 #40520402 未加载

9devabout 1 year ago

I found it interesting that the docs mention "site2vec" scores. This implies, I think, a variant of word2vec or document2vec, but for the full site; so probably a vector sum of the doc2vec scores of all individual pages?

HankB99about 1 year ago

> Successful clicks matter.I wonder about this. If I click a link and read it and I find that it's garbage (e.g. got ranked based on SEO rather than useful content) does it count as a successful click? Worse yet, some of these sites have blatant errors that are only discovered after examination.This is relative to technical subject matter. Other searches, such as shopping may not suffer this kind of problem (or I have not noticed it.)I also wonder how Google knows a click is successful. If I open a link in another tab, does the browser tell Google how long I lingered on the site? Perhaps Chrome does but I use Firefox.

评论 #40512052 未加载

badgersnakeabout 1 year ago

Something like this I guess:var words = query.splitvar results = executeQuery( Select * from AdWords aw where word in query inner join adlinks al on aw.id = al.id return al.url, al.desc)If (results.size < 30) { // todo call search engine }Return results

ilyazubabout 1 year ago

It doesn't look like a leak but a misdeployment.Same service wrappers from two years ago: <a href="https://github.com/googleapis/google-api-php-client-services/blob/670c3854fffc2f642efa86b083e2664fd55435e1/src/Contentwarehouse/QualityNavboostCrapsCrapsClickSignals.php">https://github.com/googleapis/google-api-php-client-services...</a>

usuiabout 1 year ago

> Prior to the email and call, I had neither met nor heard of the person who emailed me about this leak. They asked that their identity remain veiledAnd yet the journalist included a screenshot with one of the weakest blurs I've ever seen... Why would you not excise the person's video portion completely? What good does it serve to have it included in the story? Even if that portion is faked, why would you offer potential signals like skin complexion, hair color, background picture, etc.? Why...

评论 #40500446 未加载

评论 #40497906 未加载

评论 #40498550 未加载

评论 #40498660 未加载

评论 #40497797 未加载

adrianvincentabout 1 year ago

The algorithm is probably so complex and bloated at this point I doubt even Google knows how it really works

评论 #40506418 未加载

评论 #40506479 未加载

adamgordonbellabout 1 year ago

Where is the link to the document?

评论 #40503391 未加载

zarathustrealabout 1 year ago

Hopefully this doesn’t surprise anyone..if Google actually told us correct information about how the search algorithm works it would be abused immediately

pembrookabout 1 year ago

What I find most interesting about this is that a lot of supposed "smart" algorithms of Big Tech are in fact a patchwork of "dumb" rules rules and human-picked winners. This would explain why the quality of search results is failing to keep up with developments in LLMs.This also explains why it's impossible for incumbents to unseat the winners in many search categories -- because they've literally been picked as the winners by humans at Google.Looking at my Twitter/X feed, I also see an oddly similar dynamic. Certain accounts appear to have been manually boosted, showing up all the time -- whereas others posting even the same exact content will never appear.Silicon valley will loudly tell you all about how wonderful they are at "democratizing," however, if you look under the surface it appears they're just hand picking the winners.

评论 #40501911 未加载

alunabout 1 year ago

Maybe this is an unpopular opinion, but if a search algorithm is truly designed to showcase the best content, then making it transparent shouldn't lead to manipulation

8noteabout 1 year ago

For those out of the know, what's a "crap" in this? A "crap crap"?

throwaway743about 1 year ago

... why the hell would an anonymous source use google meet to share info on google? ... so much for remaining anonymous :/

jgalt212about 1 year ago

> A sample of statements from Google representatives (Matt Cutts, Gary Ilyes, and John Mueller) denying the use of click-based user signals in rankings over the years.

renegade-otterabout 1 year ago

There are so many Kagi fans on HN that it's a matter of time before the Big G buys it and shuts it down, like hundreds of its products before.

SadCordDroneabout 1 year ago

Didn't read article fully, but - since it's protocall buffer definitions, what if these fields are there for backward compatibility?

Havocabout 1 year ago

Does it also recommend eating at least two stones a day?

StevenNunezabout 1 year ago

Wait... There's Elixir to be done at Google?!

dentempleabout 1 year ago

TL;DR Google lies about how its search algorithm works.

评论 #40510781 未加载

Aldipowerabout 1 year ago

If there are really 14,000 attributes, most of them will have a weight near 0, thus are irrelevant. If they would be all heavy weighted, the ranking would be rendered irrelevant due to the sheer amount of attributes.

评论 #40510687 未加载

评论 #40511854 未加载

34 comments

precomputeabout 1 year ago

评论 #40502508 未加载

theolivenbaumabout 1 year ago

评论 #40514932 未加载

评论 #40506785 未加载

评论 #40498058 未加载

xnxabout 1 year ago

precomputeabout 1 year ago

评论 #40517956 未加载

评论 #40501220 未加载

评论 #40498840 未加载

评论 #40498988 未加载

评论 #40510660 未加载

评论 #40498428 未加载

vouaobrasilabout 1 year ago

评论 #40510691 未加载

评论 #40510789 未加载

评论 #40510870 未加载

评论 #40510695 未加载

评论 #40510711 未加载

评论 #40510987 未加载

评论 #40510700 未加载

评论 #40510696 未加载

评论 #40510667 未加载

评论 #40517185 未加载

nsmog767about 1 year ago

ec109685about 1 year ago

评论 #40500263 未加载

评论 #40498419 未加载

评论 #40577051 未加载

评论 #40498363 未加载

评论 #40500000 未加载

thih9about 1 year ago

评论 #40510609 未加载

ilrwbwrkhvabout 1 year ago

And that's why if a developer doesn't use Firefox and uses Chrome, they are just helping a monopoly take over everything and make a mess.

评论 #40497601 未加载

评论 #40502105 未加载

precomputeabout 1 year ago

BillFranklinabout 1 year ago

评论 #40510770 未加载

llmblockchainabout 1 year ago

> GoogleApi.ContentWarehouse.V1.Model.AppsPeopleOzExternalMergedpeopleapiAboutMeExtendedDataPhotosCompareDataDiffDataJava, is that you?!

评论 #40501599 未加载

评论 #40500205 未加载

isaacfrondabout 1 year ago

Most of the factors in ranking a page are no surprise. But i was surprised that having Product reviews on your site is apparently a demotion? Surely, many people are searching to find just that?

评论 #40510666 未加载

评论 #40510327 未加载

评论 #40511135 未加载

评论 #40510861 未加载

评论 #40510610 未加载

评论 #40510274 未加载

评论 #40511342 未加载

skilledabout 1 year ago

JSDevOpsabout 1 year ago

Seriously considering switching back to Firefox after all these years.

评论 #40498333 未加载

评论 #40518205 未加载

评论 #40500938 未加载

评论 #40518914 未加载

评论 #40498456 未加载

评论 #40510689 未加载

评论 #40520402 未加载

9devabout 1 year ago

HankB99about 1 year ago

评论 #40512052 未加载

badgersnakeabout 1 year ago

ilyazubabout 1 year ago

usuiabout 1 year ago

评论 #40500446 未加载

评论 #40497906 未加载

评论 #40498550 未加载

评论 #40498660 未加载

评论 #40497797 未加载

adrianvincentabout 1 year ago

The algorithm is probably so complex and bloated at this point I doubt even Google knows how it really works

评论 #40506418 未加载

评论 #40506479 未加载

adamgordonbellabout 1 year ago

Where is the link to the document?

评论 #40503391 未加载

zarathustrealabout 1 year ago

Hopefully this doesn’t surprise anyone..if Google actually told us correct information about how the search algorithm works it would be abused immediately

pembrookabout 1 year ago

评论 #40501911 未加载

alunabout 1 year ago

Maybe this is an unpopular opinion, but if a search algorithm is truly designed to showcase the best content, then making it transparent shouldn't lead to manipulation

8noteabout 1 year ago

For those out of the know, what's a "crap" in this? A "crap crap"?

throwaway743about 1 year ago

... why the hell would an anonymous source use google meet to share info on google? ... so much for remaining anonymous :/

jgalt212about 1 year ago

> A sample of statements from Google representatives (Matt Cutts, Gary Ilyes, and John Mueller) denying the use of click-based user signals in rankings over the years.

renegade-otterabout 1 year ago

There are so many Kagi fans on HN that it's a matter of time before the Big G buys it and shuts it down, like hundreds of its products before.

SadCordDroneabout 1 year ago

Didn't read article fully, but - since it's protocall buffer definitions, what if these fields are there for backward compatibility?

Havocabout 1 year ago

Does it also recommend eating at least two stones a day?

StevenNunezabout 1 year ago

Wait... There's Elixir to be done at Google?!

dentempleabout 1 year ago

TL;DR Google lies about how its search algorithm works.

评论 #40510781 未加载

Aldipowerabout 1 year ago

评论 #40510687 未加载

评论 #40511854 未加载