Show HN: Didyougogo – An Altavista slayer

269 pointsby misterman0almost 7 years ago

46 comments

SyneRyderalmost 7 years ago

The index is super tiny. A search for "the" got 112 results. Seems like a quick way to explore the entire index. Also it indexes pages twice if you submit them twice, so that needs to be fixed.But for some crazy reason, I kinda like this. It feels like the 90s internet. The links included so far have that same random mix of lots of nerdy links, homepages & personal blogs, a few religious sites, and the occasional big news website. Because there's no crawler yet, it's limited to the specific pages people thought were noteworthy. And because the index is so limited, I'm stumbling on interesting things.It's so weird looking at this and thinking "Y'know, maybe this could also work if the links were curated into yet another hierarchical officious oracle", or "if this site let me pay to show a small text ad on the side when someone searched for a relevant keyword, I might spend a few dollars here".Someone submitted the "Strawberry Pop-Tart Blow-Torches" page, which is one of my earliest internet memories. Whoever submitted that, thank you for the nostalgia!

评论 #17745011 未加载

评论 #17745429 未加载

spchampion2almost 7 years ago

I searched for "Cnn" and got 0 results. I searched for "Amazon" and got a five random results, including the IMDB page for "Rambo, Part 2."If this were really like AltaVista, I'd get 3 trillion results and have to use advanced Boolean logic to cut that down to the most useful 7,000 - so I guess having no results is sort of easier...

评论 #17744369 未加载

评论 #17746058 未加载

评论 #17744265 未加载

oh-mosesalmost 7 years ago

Awesome, but man, that name really needs some work. It sounds like I'm asking a two-year-old whether he's been on the potty.

评论 #17745118 未加载

评论 #17744992 未加载

评论 #17745218 未加载

评论 #17744848 未加载

tcmbalmost 7 years ago

Kudos for your courage to make your great ambitions public from the start.1. Does the site do any crawling on its own, or is the public index only fed from submissions?2. It appears Umlaut/Unicode handling needs some work: When I search for "Käse" (German for 'cheese'), I get the response "0 results for 'K&#228;se' in 'www' (0 ms)".At this point I'm not sure if there's actually 0 results or if it was actually searching for the escaped string.

评论 #17744376 未加载

pmoricialmost 7 years ago

Is this supposed to be a joke? I can't tell. The index is certainly extremely limited.

评论 #17744244 未加载

评论 #17744381 未加载

评论 #17744318 未加载

asaibxalmost 7 years ago

As others have commented, love the ambitiousness of this! However, Unicode searches do not seem to work at all -- not just "中文" but also even "français" gives an error. Unicode support is something you definitely want to build in from the very beginning in order to avoid headaches (for you and users) in the future. Even if there is no content in the index, the presence of non-ASCII characters in the search term should not lead to a server error. Suggest you make Unicode the default encoding for everything even if you are not planning on supporting non-English search results for the moment, just to avoid unexpected errors when people search for things like "café" for example.

评论 #17747800 未加载

apoalmost 7 years ago

I'm Marcus, founder of Didyougogo and author of the software behind it. For the past ten years I've been trying to improve my programming and math skills to get to a level where I could write a proper web search engine for the written word using absolute cutting-edge IR methods. The final result is something I have not seen or read about: a language represented as a 65K wide vector-space, serialized into a binary tree that is balanced according to node's cosine angle between them and their closest neighbours. Querying is very fast, even for long phrases. Fuzzy, prefix, suffix and wildcard type queries comes for free with the vector-space model. The system uses relatively little resources and can run on as little as 1 CPU and 1GB RAM.Is there any further technical documentation than this (besides the source code)?I tried searching some of the terms in this description on Google, but found little specific information. One search turned up k-d trees. Is this related?<a href="https://en.wikipedia.org/wiki/K-d_tree" rel="nofollow">https://en.wikipedia.org/wiki/K-d_tree</a>

评论 #17744902 未加载

评论 #17744830 未加载

azinman2almost 7 years ago

Got zero relevant results. Not even sure how the results came back, as the words weren’t in there. Tried “Taoist tai chi,” then “Taoism”.Love the ambition, but a long way to go go.

评论 #17745474 未加载

NeedMoreTeaalmost 7 years ago

Interesting idea. Isn't it a little late to slay Alta Vista though? :)I searched for apple. Top result was the archive.org macos that showed up here on HN recently, 2nd and 3rd were apple.com indexed 10s apart.Then some odd results - though they do include the word apple on page just once. The imdb page for 12 Monkeys appears 3 times.I guess you're not trimming duplicates? Seems like you need some way to weight rankings too.I wish you every success - search definitely needs some competition.

评论 #17744480 未加载

golangnewsalmost 7 years ago

I really like this idea, and the very simple implementation - big things start small. We need more search engines, including ones which are not supported by advertising.Thanks for submitting.

评论 #17747060 未加载

mklalmost 7 years ago

评论 #17746335 未加载

pebersalmost 7 years ago

Definitely some ambitious goals. There's nothing bad about that, but this has an awfully long way to go - e.g. searching for "hacker news" works fine, searching for almost anything else didn't find anything relevant. So while it's nice to say it can run in 1CPU / 1GB, I'm not sure it's very useful at that size (but I don't know how big it'd have to get to "break even" there).Anyway, noted that it's a very early version, so good luck with it!

评论 #17744576 未加载

_ixalmost 7 years ago

"If you are willing and able to offer sponsorship, reach out to me at marcuslager at the biggest email provider in the world * dot com."Is that still yahoo.com?

评论 #17745328 未加载

EamonnMRalmost 7 years ago

Reminds me of <a href="http://wiby.me" rel="nofollow">http://wiby.me</a>

评论 #17745514 未加载

deweyalmost 7 years ago

A search engine without https, I think I'll stick with Google for now.

评论 #17744617 未加载

评论 #17744882 未加载

z3phyralmost 7 years ago

Going on another vertical, this reminds me how useful early usenet was. Reddit is too general and way less nerdy and mainstream to be a worthy usenet replacement. Wishlist: a usenet killer

评论 #17746140 未加载

andaialmost 7 years ago

> has a ranking model that encourages a good ratio between content and markup (less markup/script is better)Well, I'm sold!

pferdonealmost 7 years ago

Searched for „warez“... didn‘t return anything... I want to live in the old days again :‘(

waterhousealmost 7 years ago

I'm getting 502 right now. Google cache: <a href="http://webcache.googleusercontent.com/search?q=cache:O9c79dJYOcoJ:didyougogo.com/blog/didyougogo.html+&cd=1&hl=en&ct=clnk&gl=us&client=firefox-b-1" rel="nofollow">http://webcache.googleusercontent.com/search?q=cache:O9c79dJ...</a>Or archive.org: <a href="http://web.archive.org/web/20180813020050/http://didyougogo.com/blog/didyougogo.html" rel="nofollow">http://web.archive.org/web/20180813020050/http://didyougogo....</a>

keketialmost 7 years ago

The minimalistic layout is a pleasure to use compared to AltaVista's bloated UI.

评论 #17744401 未加载

humantiyalmost 7 years ago

I like that we are now seeing this market of pro privacy and less tracking type services like duckduckgo and this. Odd throw back to say altavista slayer. Now we need an ask jeeves slayer and we've covered most bases.

nkozyraalmost 7 years ago

Interesting project. Run this blog entry through a spellchecker, btw.

评论 #17744513 未加载

usermacalmost 7 years ago

What just happened? I search for a park I visited just yesterday. "186" hits(?) and two of those were two top page HN sites I just visited!? I'm spooked.

ohiovralmost 7 years ago

I tried my favorite test search "android studio missing symbol r" and was pretty disappointed by the randomness of the results, but that is a tough one. Tried "newest iphone" but didn't come up with anything relevant until about 6 results down that found apple.com [edit didn't realize how small the index was]

ohiovralmost 7 years ago

I think what could be cool is applying this as a personal search engine and marrying it somehow to a personal dns server or squid/proxy server so that you can have a way of harvesting your own browsing data. By using the squid or dnsmasq logs you could spider out urls from it, and build your index automatically.

评论 #17747011 未加载

sleepychualmost 7 years ago

This is neat & impressive!Why would I use this over duckduckgo? (Assuming that we're some time on and the index is comparable?)

medecaualmost 7 years ago

I thought of something similar has a holiday project. A small search engine using SQLite FTS5 for a small set of websites crawled with Scrapy.I made it public yesterday on <a href="https://fts.fail/" rel="nofollow">https://fts.fail/</a>Good luck slaying that dragon though.

projektiralmost 7 years ago

Hmm. I tried to add a page for "duck", but it doesn't seem to work, and very time I search for "duck", I still see a bunch of anime websites. Why are those anime websites even on there?Also, plans to add HTTPS?This looks cool, though, good luck!

reitanqildalmost 7 years ago

This is really cool. I love the feel of it and the ideas of running both on prem as well as oublic instances, letting them cooperate and teaming up with companies.I know (almost) nothing about search engines but I hope something like this succeeds.

mcjiggerlogalmost 7 years ago

I don't understand what it's referring to when you say submit a URL AND a search term. They're two separate forms. I submitted some URLs and they never show up with relevant searches.

Jeema101almost 7 years ago

Who are you using for hosting? Amazon offers a free tier that could probably host this to start out with if you're currently using a computer in your bedroom or something. ;)

nasredinalmost 7 years ago

Name makes it sound like it's related to DDG.Definetely need a better one.

评论 #17744415 未加载

评论 #17744507 未加载

mfinchamalmost 7 years ago

The "submit a URL" seems to need the URL scheme added (e.g. <a href="https://" rel="nofollow">https://</a>) or it silently fails.

gitgudalmost 7 years ago

<pre><code> 91 results for 'hello world' in 'www' (32615 ms) </code></pre> Not sure it can "slay" Google, but interesting project!

viraptoralmost 7 years ago

Most of the goals can be already achieved using the Yacy project. Also it's already got an existing, massive, distributed index.

gunkaaaalmost 7 years ago

I love it - well done.As always, the question is how it scales.

评论 #17744392 未加载

notananthemalmost 7 years ago

I get no results

josephvalmost 7 years ago

It's fast! I like the technical detail - index too limited.Searched Red Dead Redemption 2 - no game infoSearched "bobs" - no bobs

cygnedalmost 7 years ago

One of colleagues argues that search has become infrastructure and thus there should be an offering from the state which is also responsible for other infrastructure.There was a (failed) attempt by the EU I know about. And I don’t see that happening in the near future.

评论 #17744341 未加载

评论 #17744919 未加载

jl2718almost 7 years ago

I tried emailing you at hotmail, but you are over the 1MB limit.

评论 #17745483 未加载

nerdb0talmost 7 years ago

when i submit something to the search engine, it produces a result that doesn't have anything to do with the search term.it's unclear to me how i am supposed to help improve this.

评论 #17744630 未加载

chrxralmost 7 years ago

I like that didyougogo isn't in the index! Added!

sergiotapiaalmost 7 years ago

> marcuslager at the biggest email provider in the world * dot com.?

评论 #17746558 未加载

jhabdasalmost 7 years ago

Sounds too good to be true. What's the catch?

评论 #17744410 未加载

评论 #17744324 未加载

brunosuticalmost 7 years ago

Is there a way to be notified of product updates?

评论 #17744948 未加载

kiechualmost 7 years ago

Direct spike on Google hearth.

46 comments

SyneRyderalmost 7 years ago

评论 #17745011 未加载

评论 #17745429 未加载

spchampion2almost 7 years ago

评论 #17744369 未加载

评论 #17746058 未加载

评论 #17744265 未加载

oh-mosesalmost 7 years ago

Awesome, but man, that name really needs some work. It sounds like I'm asking a two-year-old whether he's been on the potty.

评论 #17745118 未加载

评论 #17744992 未加载

评论 #17745218 未加载

评论 #17744848 未加载

tcmbalmost 7 years ago

评论 #17744376 未加载

pmoricialmost 7 years ago

Is this supposed to be a joke? I can't tell. The index is certainly extremely limited.

评论 #17744244 未加载

评论 #17744381 未加载

评论 #17744318 未加载

asaibxalmost 7 years ago

评论 #17747800 未加载

apoalmost 7 years ago

评论 #17744902 未加载

评论 #17744830 未加载

azinman2almost 7 years ago

Got zero relevant results. Not even sure how the results came back, as the words weren’t in there. Tried “Taoist tai chi,” then “Taoism”.Love the ambition, but a long way to go go.

评论 #17745474 未加载

NeedMoreTeaalmost 7 years ago

评论 #17744480 未加载

golangnewsalmost 7 years ago

I really like this idea, and the very simple implementation - big things start small. We need more search engines, including ones which are not supported by advertising.Thanks for submitting.

评论 #17747060 未加载

mklalmost 7 years ago

评论 #17746335 未加载

pebersalmost 7 years ago

评论 #17744576 未加载

_ixalmost 7 years ago

"If you are willing and able to offer sponsorship, reach out to me at marcuslager at the biggest email provider in the world * dot com."Is that still yahoo.com?

评论 #17745328 未加载

EamonnMRalmost 7 years ago

Reminds me of <a href="http://wiby.me" rel="nofollow">http://wiby.me</a>

评论 #17745514 未加载

deweyalmost 7 years ago

A search engine without https, I think I'll stick with Google for now.

评论 #17744617 未加载

评论 #17744882 未加载

z3phyralmost 7 years ago

Going on another vertical, this reminds me how useful early usenet was. Reddit is too general and way less nerdy and mainstream to be a worthy usenet replacement. Wishlist: a usenet killer

评论 #17746140 未加载

andaialmost 7 years ago

> has a ranking model that encourages a good ratio between content and markup (less markup/script is better)Well, I'm sold!

pferdonealmost 7 years ago

Searched for „warez“... didn‘t return anything... I want to live in the old days again :‘(

waterhousealmost 7 years ago

keketialmost 7 years ago

The minimalistic layout is a pleasure to use compared to AltaVista's bloated UI.

评论 #17744401 未加载

humantiyalmost 7 years ago

nkozyraalmost 7 years ago

Interesting project. Run this blog entry through a spellchecker, btw.

评论 #17744513 未加载

usermacalmost 7 years ago

What just happened? I search for a park I visited just yesterday. "186" hits(?) and two of those were two top page HN sites I just visited!? I'm spooked.

ohiovralmost 7 years ago

评论 #17747011 未加载

sleepychualmost 7 years ago

This is neat & impressive!Why would I use this over duckduckgo? (Assuming that we're some time on and the index is comparable?)

medecaualmost 7 years ago

projektiralmost 7 years ago

reitanqildalmost 7 years ago

mcjiggerlogalmost 7 years ago

I don't understand what it's referring to when you say submit a URL AND a search term. They're two separate forms. I submitted some URLs and they never show up with relevant searches.

Jeema101almost 7 years ago

Who are you using for hosting? Amazon offers a free tier that could probably host this to start out with if you're currently using a computer in your bedroom or something. ;)

nasredinalmost 7 years ago

Name makes it sound like it's related to DDG.Definetely need a better one.

评论 #17744415 未加载

评论 #17744507 未加载

mfinchamalmost 7 years ago

The "submit a URL" seems to need the URL scheme added (e.g. <a href="https://" rel="nofollow">https://</a>) or it silently fails.

gitgudalmost 7 years ago

<pre><code> 91 results for 'hello world' in 'www' (32615 ms) </code></pre> Not sure it can "slay" Google, but interesting project!

viraptoralmost 7 years ago

Most of the goals can be already achieved using the Yacy project. Also it's already got an existing, massive, distributed index.

gunkaaaalmost 7 years ago

I love it - well done.As always, the question is how it scales.

评论 #17744392 未加载

notananthemalmost 7 years ago

I get no results

josephvalmost 7 years ago

It's fast! I like the technical detail - index too limited.Searched Red Dead Redemption 2 - no game infoSearched "bobs" - no bobs

cygnedalmost 7 years ago

评论 #17744341 未加载

评论 #17744919 未加载

jl2718almost 7 years ago

I tried emailing you at hotmail, but you are over the 1MB limit.

评论 #17745483 未加载

nerdb0talmost 7 years ago

when i submit something to the search engine, it produces a result that doesn't have anything to do with the search term.it's unclear to me how i am supposed to help improve this.

评论 #17744630 未加载

chrxralmost 7 years ago

I like that didyougogo isn't in the index! Added!

sergiotapiaalmost 7 years ago

> marcuslager at the biggest email provider in the world * dot com.?

评论 #17746558 未加载

jhabdasalmost 7 years ago

Sounds too good to be true. What's the catch?

评论 #17744410 未加载

评论 #17744324 未加载

brunosuticalmost 7 years ago

Is there a way to be notified of product updates?

评论 #17744948 未加载

kiechualmost 7 years ago

Direct spike on Google hearth.