Google search only has 60% of my content from 2006

396 pointsby skmabout 6 years ago

24 comments

alisterabout 6 years ago

Why does Google deeply index those useless telephone directory sites? Try searching for the impossible U.S. phone number "307-139-2345" and you'll see a bunch of "who called me?" or "reverse phone number lookup" sites. Virtually all of those sites are complete garbage. They make no attempt to collect numbers from telephone directories or from the web. They won't identify a number as being the main phone number for Disneyland for example.It's odd that so many of those sites exist, that Google indexes them so deeply, and that they show up in searches so prominently. It's obvious that they are spam, scams, or worthless, but those same sites have been appearing prominently for years.I agree with the author. My experience has also been that Google heavily prioritizes very large and frequently-updated sites over small static information-rich personal sites. I think it's a big flaw that needs to be fixed or for someone else to do better.

评论 #19765821 未加载

评论 #19764553 未加载

评论 #19764015 未加载

评论 #19767310 未加载

评论 #19765013 未加载

评论 #19766849 未加载

评论 #19765980 未加载

评论 #19763917 未加载

评论 #19764609 未加载

评论 #19765973 未加载

评论 #19765723 未加载

评论 #19765666 未加载

评论 #19766051 未加载

评论 #19766196 未加载

评论 #19765132 未加载

评论 #19765095 未加载

评论 #19765520 未加载

userbinatorabout 6 years ago

It really angers me that, despite the fact that it may be essentially perfectly what I'm looking for, if it was published long ago, Google may refuse to find it.Something like a news search engine would definitely be better off prioritising the new results, but for something more general-purpose, it's an absolutely horrible choice.I know this may be a bit of an edge-case, but I frequently search for service information or manuals for products that predate even the invention of the Internet by several decades. It saddens me that the results are clogged with sites selling what may really be public-domain content, and now I'm even more angered by the fact that what I'm looking for is probably out there and could've been found years ago, but just "hidden" now.Of course, if you try harder, you'll get the infamous and dehumanising(!) "you are a robot" CAPTCHA-hellban. I once triggered that at work while searching for solutions to an error message, and was so infuriated that I made an obscene gesture at the screen and shouted "fuck you Google!", accidentally disturbing my coworkers (who then sympathised after I explained.)

评论 #19763830 未加载

评论 #19766276 未加载

评论 #19764722 未加载

HocusLocusabout 6 years ago

<a href="https://slashdot.org/comments.pl?sid=7132077&cid=49308245" rel="nofollow">https://slashdot.org/comments.pl?sid=7132077&cid=49308245</a>From my short dystopian story, The Time Rift of 2100: How We lost the Future"IN A SAD IRONY as to the supposed superiority of digital over analog --- that this whole profession of digitally-stored 'source' documentation began to fade and was finally lost. It had became dusty, and the unlooked-for documents of previous eras were first flagged and moved to lukewarm storage. It was a circular process, where the world's centralized search indices would be culled to remove pointers to things that were seldom accessed. Then a separate clean-up where the fact that something was not in the index alone determined that it was purgeable. The process was completely automated of course, so no human was on hand to mourn the passing of material that had been the proud product of entire careers. It simply faded.""THEN SOMETHING TOOK THE INTERNET BY STORM, it was some silly but popular Game with a perversely intricate (and ultimately useless) information store. Within the space of six months index culling and auto-purge had assigned more than a third of all storage to the Game. Only as the Game itself faded did people begin to notice that things they had seen and used, even recently, were simply no longer there. Or anywhere. It was as if the collective mind had suffered a stroke. Were the machines at fault, or were we? Does it even matter? Life went on. We no longer knew much about these things from which our world was constructed, but they continued to work."

评论 #19765439 未加载

saagarjhaabout 6 years ago

> Other things were weirder, like this old post being soft recognized as a 404 Not Found response. My web server is properly configured and quite capable of sending correct HTTP response codes, so ignoring standards in that regard is just craziness on Google's part.I've noticed Google does this when you don't seem to have a lot of content on the page. I think it "guesses" that short pages are poorly-marked 404s.

评论 #19763526 未加载

lloyddeabout 6 years ago

Brings to mind: Tim Bray’s article Google Memory Loss <a href="https://www.tbray.org/ongoing/When/201x/2018/01/15/Google-is-losing-its-memory" rel="nofollow">https://www.tbray.org/ongoing/When/201x/2018/01/15/Google-is...</a>Discussion at the beginning of the year: <a href="https://news.ycombinator.com/item?id=16153840" rel="nofollow">https://news.ycombinator.com/item?id=16153840</a>

tholmanabout 6 years ago

Google will also happily surface a stackoverflow article from 2010 about how to solve a js problem... frustrating the top 3 answers will be with jquery, when its not the approach someone would take in the last 5 years.Definitely frustrating, but also showing some need to retire specific pieces of the past away from the top recommendations.

评论 #19764410 未加载

评论 #19764765 未加载

评论 #19764614 未加载

评论 #19764274 未加载

评论 #19765311 未加载

评论 #19764203 未加载

评论 #19764556 未加载

评论 #19765970 未加载

raphtabout 6 years ago

With all the talk about Google results not being satisfying anymore to a growing number of users, I'm surprised we haven't seen more sites pop up that would allow users to display the results of multiple search engines of their choosing either by mixing (eg all 1st results then all 2nd, etc) or by seeing them side by side... while stripping ads and cards and the like.

评论 #19764950 未加载

评论 #19765419 未加载

megablastabout 6 years ago

You used to be able to google a simple question, something that could be answered on the search page without having to click through. But since no one clicked on them, they stopped appearing after a few years. The only results were ones where the data was hidden and you had to click through.

bufferoverflowabout 6 years ago

I have a couple of websites generated from databases. Each has around half a million pages of unique content. The first one was indexed in like a week at 100K/day, almost instant tsunami of traffic. The second one is being indexed at 100-1000 pages per day, it's been years.Google works in mysterious ways.

评论 #19767464 未加载

tylerlabout 6 years ago

You'll see this effect from every search engine. They have no choice, there are a lot of sites with an infinite number of pages; so instead the number of pages they store per site depends on how important your site is, and they try to store your top N pages by relative importance.

评论 #19763751 未加载

评论 #19763745 未加载

cavisneabout 6 years ago

Googling phrases of the soft 404 page and some of the authors 2003 blogs did show the pages.I did notice that all of the authors content is duplicated in index pages, so maybe Google just doesn't consider the article page the canonical link.

sytelusabout 6 years ago

According to Google Inside Search, only 1 in 3000 pages gets indexed. As content on Internet grows, the whole idea of downloading every single page to create an index of entire Internet in one place becomes unworkable. So we should see this ratio continue to degrade until this fundamental architecture is improved.

评论 #19764397 未加载

phendrenad2about 6 years ago

Google doesn't want to index the web, it wants to index what it can monetize. It is a business after all, and storage space costs money.

评论 #19764401 未加载

Pxtlabout 6 years ago

To play devils advocate for a second, remember how much noise Google has to sift through. Every possible search term exists in every possible combination, often written in lovingly crafted content farm articles by actual humans.If Google offered you those, it might be 1000 pages of empty nonsense before your actual desired content.

评论 #19765976 未加载

评论 #19765954 未加载

ThePhysicistabout 6 years ago

Maybe they have some algorithm that purges pages which haven’t shown up (or haven’t been clicked) in a long time? It would make sense to assume that something which hasn’t been clicked on for five years will likely not yield (m)any clicks in the future so it might be good to discard it.Concerning the auto generated sites e.g. for phone numbers or IPs it might be that people actually click on them quite often, hence Google keeps them in the index?

评论 #19765426 未加载

mark242about 6 years ago

This is what sitemap.xml was made for. You can give hints to all of the engines, and they will duly follow them.

评论 #19763774 未加载

paulpauperabout 6 years ago

You need many high quality incoming links to have all content indexed quickly.

dennisgorelikabout 6 years ago

Google Search users prefer fresh content, so Google Index prioritises fresh content too (and is more likely to drop old content that users are not interested in).

dcbadacdabout 6 years ago

I just last week said to someone that Google has dementia. Turns out it's really not just me who thinks that.

maverickmax90about 6 years ago

Folks please use startpage.com just give it a chance. It has worked out very well for me in terms of privacy and equal search results compared to the big g.

评论 #19764433 未加载

评论 #19764429 未加载

influxabout 6 years ago

The fact that google bought some of the only archives of old Usenet posts and as far as I can tell threw them away is pure evil.

variable11about 6 years ago

Arguably, search is such a vital function of modern society that it could be considered a public good and seized on the principal of eminent domain.

评论 #19766082 未加载

评论 #19765078 未加载

harrykingabout 6 years ago

Yes google index only those things which site map allows

netsaabout 6 years ago

Is this website anti-google?

评论 #19765357 未加载