The index is super tiny. A search for "the" got 112 results. Seems like a quick way to explore the entire index. Also it indexes pages twice if you submit them twice, so that needs to be fixed.<p>But for some crazy reason, I kinda like this. It feels like the 90s internet. The links included so far have that same random mix of lots of nerdy links, homepages & personal blogs, a few religious sites, and the occasional big news website. Because there's no crawler yet, it's limited to the <i>specific</i> pages people thought were noteworthy. And because the index is so limited, I'm stumbling on interesting things.<p>It's so weird looking at this and thinking "Y'know, maybe this could also work if the links were curated into yet another hierarchical officious oracle", or "if this site let me pay to show a small text ad on the side when someone searched for a relevant keyword, I might spend a few dollars here".<p>Someone submitted the "Strawberry Pop-Tart Blow-Torches" page, which is one of my earliest internet memories. Whoever submitted that, thank you for the nostalgia!
I searched for "Cnn" and got 0 results. I searched for "Amazon" and got a five random results, including the IMDB page for "Rambo, Part 2."<p>If this were really like AltaVista, I'd get 3 trillion results and have to use advanced Boolean logic to cut that down to the most useful 7,000 - so I guess having no results is sort of easier...
Kudos for your courage to make your great ambitions public from the start.<p>1. Does the site do any crawling on its own, or is the public index only fed from submissions?<p>2. It appears Umlaut/Unicode handling needs some work: When I search for "Käse" (German for 'cheese'), I get the response "0 results for 'K&#228;se' in 'www' (0 ms)".<p>At this point I'm not sure if there's actually 0 results or if it was actually searching for the escaped string.
As others have commented, love the ambitiousness of this! However, Unicode searches do not seem to work at all -- not just "中文" but also even "français" gives an error. Unicode support is something you definitely want to build in from the very beginning in order to avoid headaches (for you and users) in the future. Even if there is no content in the index, the presence of non-ASCII characters in the search term should not lead to a server error. Suggest you make Unicode the default encoding for everything even if you are not planning on supporting non-English search results for the moment, just to avoid unexpected errors when people search for things like "café" for example.
<i>I'm Marcus, founder of Didyougogo and author of the software behind it. For the past ten years I've been trying to improve my programming and math skills to get to a level where I could write a proper web search engine for the written word using absolute cutting-edge IR methods. The final result is something I have not seen or read about: a language represented as a 65K wide vector-space, serialized into a binary tree that is balanced according to node's cosine angle between them and their closest neighbours. Querying is very fast, even for long phrases. Fuzzy, prefix, suffix and wildcard type queries comes for free with the vector-space model. The system uses relatively little resources and can run on as little as 1 CPU and 1GB RAM.</i><p>Is there any further technical documentation than this (besides the source code)?<p>I tried searching some of the terms in this description on Google, but found little specific information. One search turned up k-d trees. Is this related?<p><a href="https://en.wikipedia.org/wiki/K-d_tree" rel="nofollow">https://en.wikipedia.org/wiki/K-d_tree</a>
Got zero relevant results. Not even sure how the results came back, as the words weren’t in there. Tried “Taoist tai chi,” then “Taoism”.<p>Love the ambition, but a long way to go go.
Interesting idea. Isn't it a little late to slay Alta Vista though? :)<p>I searched for apple. Top result was the archive.org macos that showed up here on HN recently, 2nd and 3rd were apple.com indexed 10s apart.<p>Then some odd results - though they do include the word apple on page just once. The imdb page for 12 Monkeys appears 3 times.<p>I guess you're not trimming duplicates? Seems like you need some way to weight rankings too.<p>I wish you every success - search definitely needs some competition.
I really like this idea, and the very simple implementation - big things start small. We need more search engines, including ones which are not supported by advertising.<p>Thanks for submitting.
Definitely some ambitious goals. There's nothing bad about that, but this has an awfully long way to go - e.g. searching for "hacker news" works fine, searching for almost anything else didn't find anything relevant. So while it's nice to say it can run in 1CPU / 1GB, I'm not sure it's very useful at that size (but I don't know how big it'd have to get to "break even" there).<p>Anyway, noted that it's a very early version, so good luck with it!
"If you are willing and able to offer sponsorship, reach out to me at marcuslager at the biggest email provider in the world * dot com."<p>Is that <i>still</i> yahoo.com?
Going on another vertical, this reminds me how useful early usenet was. Reddit is too general and way less nerdy and mainstream to be a worthy usenet replacement. Wishlist: a usenet killer
I like that we are now seeing this market of pro privacy and less tracking type services like duckduckgo and this. Odd throw back to say altavista slayer. Now we need an ask jeeves slayer and we've covered most bases.
What just happened? I search for a park I visited just yesterday. "186" hits(?) and two of those were two top page HN sites I just visited!? I'm spooked.
I tried my favorite test search "android studio missing symbol r" and was pretty disappointed by the randomness of the results, but that is a tough one. Tried "newest iphone" but didn't come up with anything relevant until about 6 results down that found apple.com [edit didn't realize how small the index was]
I think what could be cool is applying this as a personal search engine and marrying it somehow to a personal dns server or squid/proxy server so that you can have a way of harvesting your own browsing data. By using the squid or dnsmasq logs you could spider out urls from it, and build your index automatically.
I thought of something similar has a holiday project.
A small search engine using SQLite FTS5 for a small set of websites crawled with Scrapy.<p>I made it public yesterday on <a href="https://fts.fail/" rel="nofollow">https://fts.fail/</a><p>Good luck slaying that dragon though.
Hmm. I tried to add a page for "duck", but it doesn't seem to work, and very time I search for "duck", I still see a bunch of anime websites. Why are those anime websites even on there?<p>Also, plans to add HTTPS?<p>This looks cool, though, good luck!
This is really cool. I love the feel of it and the ideas of running both on prem as well as oublic instances, letting them cooperate and teaming up with companies.<p>I know (almost) nothing about search engines but I hope something like this succeeds.
I don't understand what it's referring to when you say submit a URL AND a search term. They're two separate forms. I submitted some URLs and they never show up with relevant searches.
Who are you using for hosting? Amazon offers a free tier that could probably host this to start out with if you're currently using a computer in your bedroom or something. ;)
The "submit a URL" seems to need the URL scheme added (e.g. <a href="https://" rel="nofollow">https://</a>) or it silently fails.
<p><pre><code> 91 results for 'hello world' in 'www' (32615 ms)
</code></pre>
Not sure it can "slay" Google, but interesting project!
One of colleagues argues that search has become infrastructure and thus there should be an offering from the state which is also responsible for other infrastructure.<p>There was a (failed) attempt by the EU I know about. And I don’t see that happening in the near future.
when i submit something to the search engine, it produces a result that doesn't have anything to do with the search term.<p>it's unclear to me how i am supposed to help improve this.