For<p>> Ask HN: Is there a search engine which
excludes the world's biggest websites?<p>> Discovering unknown paths of the web
seems almost impossible with google et
al..<p>> Are there any earch engines which
exclude or at least penalize results from,
say, top 500 websites?<p>Let's back up a little and then try for an
answer:<p>Some points:<p>(1) For some <i>qualitative exclamation</i>,
there is a LOT of <i>content</i> on the
Internet.<p>(2) There are in principle and no doubt so
far significantly in practice a LOT of
searches people want to do. The search in
the OP is an example.<p>(3) Much like in an old library card
catalog subject index, the most popular
search engines are based heavily on key
words and then whatever else, e.g., <i>page
rank</i>, date, etc.<p>So: (1) -- (3) represent some challenges
so far not very well met: In particular,
we can't expect that the key words, etc.
of (3) will do very well on all or nearly
all the searches in (2) for much of the
content in (1).<p>And the search in the OP is an example of
a challenge so far not well met.<p>Moreover, the search in the OP is no doubt
just one of many searches with challenges
so far not well met.<p>Long ago, Dad had a friend who worked at
Battelle, and IIRC they did a review of
<i>information retrieval</i> that concluded
that keyword search covers only a
fraction, maybe ballpark only 1/3rd, of
the need for effective searching. And the
search in the OP is an example of what is
not covered because the <i>library card
catalog</i> did not index size of the book or
Web site! :-)!<p>Seeing this situation, my rough, ballpark
estimate has been that the currently
popular Internet search engines do well on
only about 1/3rd of the content on the
Internet, searches people want to do, and
results they want to find.<p>So, I decided to see what could be done
for the other 2/3rds.<p>I started with some not very well known or
appreciated advanced pure math; it looks
like useless, generalized abstract
nonsense, but if calm down, stare at it,
think about it, ..., can see a path for a
solution. Although I never thought about
the search in the OP until now, in
principle the solution should work also
for that search. Or, the math is a bit
<i>abstract</i> and <i>general</i> which can
translate in practice to doing well on
something as varied as the 2/3rds.<p>Then for the computing, I did some
original applied math research.<p>Using TeX, I wrote it all up with theorems
and proofs.<p>So, the project is to be a Web site.
While in my career
I've been programming for decades,
this was my first Web site. I selected
Windows and .NET, and typed in 100,000
lines of text with 24,000 statements in
Visual Basic .NET (apparently equivalent
in semantics to C# but with
<i>syntactic sugar</i> I prefer).<p>The software appears to run as intended
and well enough for significant
production.<p>I was slowed down by one interruption
after another, none related to the work.<p>But, roughly, ballpark, the Web site
should be good, or by a lot the best so
far, for the 2/3rds and in particular for
the search in the OP.<p>So, for<p>> Ask HN: Is there a search engine which
excludes the world's biggest websites?<p>there's one coded and running and on the
way to going live!<p>I intend to announce an alpha test here at
HN.