TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Didyougogo – An Altavista slayer

269 pointsby misterman0almost 7 years ago

46 comments

SyneRyderalmost 7 years ago
The index is super tiny. A search for &quot;the&quot; got 112 results. Seems like a quick way to explore the entire index. Also it indexes pages twice if you submit them twice, so that needs to be fixed.<p>But for some crazy reason, I kinda like this. It feels like the 90s internet. The links included so far have that same random mix of lots of nerdy links, homepages &amp; personal blogs, a few religious sites, and the occasional big news website. Because there&#x27;s no crawler yet, it&#x27;s limited to the <i>specific</i> pages people thought were noteworthy. And because the index is so limited, I&#x27;m stumbling on interesting things.<p>It&#x27;s so weird looking at this and thinking &quot;Y&#x27;know, maybe this could also work if the links were curated into yet another hierarchical officious oracle&quot;, or &quot;if this site let me pay to show a small text ad on the side when someone searched for a relevant keyword, I might spend a few dollars here&quot;.<p>Someone submitted the &quot;Strawberry Pop-Tart Blow-Torches&quot; page, which is one of my earliest internet memories. Whoever submitted that, thank you for the nostalgia!
评论 #17745011 未加载
评论 #17745429 未加载
spchampion2almost 7 years ago
I searched for &quot;Cnn&quot; and got 0 results. I searched for &quot;Amazon&quot; and got a five random results, including the IMDB page for &quot;Rambo, Part 2.&quot;<p>If this were really like AltaVista, I&#x27;d get 3 trillion results and have to use advanced Boolean logic to cut that down to the most useful 7,000 - so I guess having no results is sort of easier...
评论 #17744369 未加载
评论 #17746058 未加载
评论 #17744265 未加载
oh-mosesalmost 7 years ago
Awesome, but man, that name really needs some work. It sounds like I&#x27;m asking a two-year-old whether he&#x27;s been on the potty.
评论 #17745118 未加载
评论 #17744992 未加载
评论 #17745218 未加载
评论 #17744848 未加载
tcmbalmost 7 years ago
Kudos for your courage to make your great ambitions public from the start.<p>1. Does the site do any crawling on its own, or is the public index only fed from submissions?<p>2. It appears Umlaut&#x2F;Unicode handling needs some work: When I search for &quot;Käse&quot; (German for &#x27;cheese&#x27;), I get the response &quot;0 results for &#x27;K&amp;#228;se&#x27; in &#x27;www&#x27; (0 ms)&quot;.<p>At this point I&#x27;m not sure if there&#x27;s actually 0 results or if it was actually searching for the escaped string.
评论 #17744376 未加载
pmoricialmost 7 years ago
Is this supposed to be a joke? I can&#x27;t tell. The index is certainly extremely limited.
评论 #17744244 未加载
评论 #17744381 未加载
评论 #17744318 未加载
asaibxalmost 7 years ago
As others have commented, love the ambitiousness of this! However, Unicode searches do not seem to work at all -- not just &quot;中文&quot; but also even &quot;français&quot; gives an error. Unicode support is something you definitely want to build in from the very beginning in order to avoid headaches (for you and users) in the future. Even if there is no content in the index, the presence of non-ASCII characters in the search term should not lead to a server error. Suggest you make Unicode the default encoding for everything even if you are not planning on supporting non-English search results for the moment, just to avoid unexpected errors when people search for things like &quot;café&quot; for example.
评论 #17747800 未加载
apoalmost 7 years ago
<i>I&#x27;m Marcus, founder of Didyougogo and author of the software behind it. For the past ten years I&#x27;ve been trying to improve my programming and math skills to get to a level where I could write a proper web search engine for the written word using absolute cutting-edge IR methods. The final result is something I have not seen or read about: a language represented as a 65K wide vector-space, serialized into a binary tree that is balanced according to node&#x27;s cosine angle between them and their closest neighbours. Querying is very fast, even for long phrases. Fuzzy, prefix, suffix and wildcard type queries comes for free with the vector-space model. The system uses relatively little resources and can run on as little as 1 CPU and 1GB RAM.</i><p>Is there any further technical documentation than this (besides the source code)?<p>I tried searching some of the terms in this description on Google, but found little specific information. One search turned up k-d trees. Is this related?<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;K-d_tree" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;K-d_tree</a>
评论 #17744902 未加载
评论 #17744830 未加载
azinman2almost 7 years ago
Got zero relevant results. Not even sure how the results came back, as the words weren’t in there. Tried “Taoist tai chi,” then “Taoism”.<p>Love the ambition, but a long way to go go.
评论 #17745474 未加载
NeedMoreTeaalmost 7 years ago
Interesting idea. Isn&#x27;t it a little late to slay Alta Vista though? :)<p>I searched for apple. Top result was the archive.org macos that showed up here on HN recently, 2nd and 3rd were apple.com indexed 10s apart.<p>Then some odd results - though they do include the word apple on page just once. The imdb page for 12 Monkeys appears 3 times.<p>I guess you&#x27;re not trimming duplicates? Seems like you need some way to weight rankings too.<p>I wish you every success - search definitely needs some competition.
评论 #17744480 未加载
golangnewsalmost 7 years ago
I really like this idea, and the very simple implementation - big things start small. We need more search engines, including ones which are not supported by advertising.<p>Thanks for submitting.
评论 #17747060 未加载
mklalmost 7 years ago
Please put a license on the source code. Right now, by default, it&#x27;s &quot;all rights reserved&quot; so no one can use it or do anything with it.
评论 #17746335 未加载
pebersalmost 7 years ago
Definitely some ambitious goals. There&#x27;s nothing bad about that, but this has an awfully long way to go - e.g. searching for &quot;hacker news&quot; works fine, searching for almost anything else didn&#x27;t find anything relevant. So while it&#x27;s nice to say it can run in 1CPU &#x2F; 1GB, I&#x27;m not sure it&#x27;s very useful at that size (but I don&#x27;t know how big it&#x27;d have to get to &quot;break even&quot; there).<p>Anyway, noted that it&#x27;s a very early version, so good luck with it!
评论 #17744576 未加载
_ixalmost 7 years ago
&quot;If you are willing and able to offer sponsorship, reach out to me at marcuslager at the biggest email provider in the world * dot com.&quot;<p>Is that <i>still</i> yahoo.com?
评论 #17745328 未加载
EamonnMRalmost 7 years ago
Reminds me of <a href="http:&#x2F;&#x2F;wiby.me" rel="nofollow">http:&#x2F;&#x2F;wiby.me</a>
评论 #17745514 未加载
deweyalmost 7 years ago
A search engine without https, I think I&#x27;ll stick with Google for now.
评论 #17744617 未加载
评论 #17744882 未加载
z3phyralmost 7 years ago
Going on another vertical, this reminds me how useful early usenet was. Reddit is too general and way less nerdy and mainstream to be a worthy usenet replacement. Wishlist: a usenet killer
评论 #17746140 未加载
andaialmost 7 years ago
&gt; has a ranking model that encourages a good ratio between content and markup (less markup&#x2F;script is better)<p>Well, I&#x27;m sold!
pferdonealmost 7 years ago
Searched for „warez“... didn‘t return anything... I want to live in the old days again :‘(
waterhousealmost 7 years ago
I&#x27;m getting 502 right now. Google cache: <a href="http:&#x2F;&#x2F;webcache.googleusercontent.com&#x2F;search?q=cache:O9c79dJYOcoJ:didyougogo.com&#x2F;blog&#x2F;didyougogo.html+&amp;cd=1&amp;hl=en&amp;ct=clnk&amp;gl=us&amp;client=firefox-b-1" rel="nofollow">http:&#x2F;&#x2F;webcache.googleusercontent.com&#x2F;search?q=cache:O9c79dJ...</a><p>Or archive.org: <a href="http:&#x2F;&#x2F;web.archive.org&#x2F;web&#x2F;20180813020050&#x2F;http:&#x2F;&#x2F;didyougogo.com&#x2F;blog&#x2F;didyougogo.html" rel="nofollow">http:&#x2F;&#x2F;web.archive.org&#x2F;web&#x2F;20180813020050&#x2F;http:&#x2F;&#x2F;didyougogo....</a>
keketialmost 7 years ago
The minimalistic layout is a pleasure to use compared to AltaVista&#x27;s bloated UI.
评论 #17744401 未加载
humantiyalmost 7 years ago
I like that we are now seeing this market of pro privacy and less tracking type services like duckduckgo and this. Odd throw back to say altavista slayer. Now we need an ask jeeves slayer and we&#x27;ve covered most bases.
nkozyraalmost 7 years ago
Interesting project. Run this blog entry through a spellchecker, btw.
评论 #17744513 未加载
usermacalmost 7 years ago
What just happened? I search for a park I visited just yesterday. &quot;186&quot; hits(?) and two of those were two top page HN sites I just visited!? I&#x27;m spooked.
ohiovralmost 7 years ago
I tried my favorite test search &quot;android studio missing symbol r&quot; and was pretty disappointed by the randomness of the results, but that is a tough one. Tried &quot;newest iphone&quot; but didn&#x27;t come up with anything relevant until about 6 results down that found apple.com [edit didn&#x27;t realize how small the index was]
ohiovralmost 7 years ago
I think what could be cool is applying this as a personal search engine and marrying it somehow to a personal dns server or squid&#x2F;proxy server so that you can have a way of harvesting your own browsing data. By using the squid or dnsmasq logs you could spider out urls from it, and build your index automatically.
评论 #17747011 未加载
sleepychualmost 7 years ago
This is neat &amp; impressive!<p>Why would I use this over duckduckgo? (Assuming that we&#x27;re some time on and the index is comparable?)
medecaualmost 7 years ago
I thought of something similar has a holiday project. A small search engine using SQLite FTS5 for a small set of websites crawled with Scrapy.<p>I made it public yesterday on <a href="https:&#x2F;&#x2F;fts.fail&#x2F;" rel="nofollow">https:&#x2F;&#x2F;fts.fail&#x2F;</a><p>Good luck slaying that dragon though.
projektiralmost 7 years ago
Hmm. I tried to add a page for &quot;duck&quot;, but it doesn&#x27;t seem to work, and very time I search for &quot;duck&quot;, I still see a bunch of anime websites. Why are those anime websites even on there?<p>Also, plans to add HTTPS?<p>This looks cool, though, good luck!
reitanqildalmost 7 years ago
This is really cool. I love the feel of it and the ideas of running both on prem as well as oublic instances, letting them cooperate and teaming up with companies.<p>I know (almost) nothing about search engines but I hope something like this succeeds.
mcjiggerlogalmost 7 years ago
I don&#x27;t understand what it&#x27;s referring to when you say submit a URL AND a search term. They&#x27;re two separate forms. I submitted some URLs and they never show up with relevant searches.
Jeema101almost 7 years ago
Who are you using for hosting? Amazon offers a free tier that could probably host this to start out with if you&#x27;re currently using a computer in your bedroom or something. ;)
nasredinalmost 7 years ago
Name makes it sound like it&#x27;s related to DDG.<p>Definetely need a better one.
评论 #17744415 未加载
评论 #17744507 未加载
mfinchamalmost 7 years ago
The &quot;submit a URL&quot; seems to need the URL scheme added (e.g. <a href="https:&#x2F;&#x2F;" rel="nofollow">https:&#x2F;&#x2F;</a>) or it silently fails.
gitgudalmost 7 years ago
<p><pre><code> 91 results for &#x27;hello world&#x27; in &#x27;www&#x27; (32615 ms) </code></pre> Not sure it can &quot;slay&quot; Google, but interesting project!
viraptoralmost 7 years ago
Most of the goals can be already achieved using the Yacy project. Also it&#x27;s already got an existing, massive, distributed index.
gunkaaaalmost 7 years ago
I love it - well done.<p>As always, the question is how it scales.
评论 #17744392 未加载
notananthemalmost 7 years ago
I get no results
josephvalmost 7 years ago
It&#x27;s fast! I like the technical detail - index too limited.<p>Searched Red Dead Redemption 2 - no game info<p>Searched &quot;bobs&quot; - no bobs
cygnedalmost 7 years ago
One of colleagues argues that search has become infrastructure and thus there should be an offering from the state which is also responsible for other infrastructure.<p>There was a (failed) attempt by the EU I know about. And I don’t see that happening in the near future.
评论 #17744341 未加载
评论 #17744919 未加载
jl2718almost 7 years ago
I tried emailing you at hotmail, but you are over the 1MB limit.
评论 #17745483 未加载
nerdb0talmost 7 years ago
when i submit something to the search engine, it produces a result that doesn&#x27;t have anything to do with the search term.<p>it&#x27;s unclear to me how i am supposed to help improve this.
评论 #17744630 未加载
chrxralmost 7 years ago
I like that didyougogo isn&#x27;t in the index! Added!
sergiotapiaalmost 7 years ago
&gt; marcuslager at the biggest email provider in the world * dot com.<p>?
评论 #17746558 未加载
jhabdasalmost 7 years ago
Sounds too good to be true. What&#x27;s the catch?
评论 #17744410 未加载
评论 #17744324 未加载
brunosuticalmost 7 years ago
Is there a way to be notified of product updates?
评论 #17744948 未加载
kiechualmost 7 years ago
Direct spike on Google hearth.