TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Why Writing Your Own Search Engine Is Hard

49 pointsby helwrover 14 years ago

8 comments

iamelgringoover 14 years ago
Search is a gold mine, and I don't understand why there aren't more people diving in to building niche search engines. Sure you can't really compete with Google on size, but there's a lot of nooks and crannies online where you can pick up valuable search traffic around the edges.<p>At least, that's why I'm working on a search engine for financial news at <a href="http://Newsley.com/search" rel="nofollow">http://Newsley.com/search</a>. (We're focused on building the crawlers and the index right now. Search is _very_ alpha).<p>After reading this article, I feel validated for a bunch of the decisions that I've been making. I've been running on EC2, but their disk IO is slow as molasses. So, I'm starting to build servers and throw them in my garage. I'll be migrating to garage servers in the next few months. Pretty much everyone I talk to thinks running servers in your garage is a terrible idea, but I can't think of any way else to do this cheaper and still have control over my hardware. It's nice to read that I'm not crazy for thinking this.<p>It was also great to read that on early search engines, the bulk of the work is done by small teams. Being the only dev, at times I think I'm a bit crazy for trying to boostrap a search startup. Again, it was nice to read that it's not all that crazy to try and do it on my own.
评论 #2038911 未加载
评论 #2038959 未加载
评论 #2038653 未加载
bradleylandover 14 years ago
This is from 2004. A lot of the paper still applies, in principle, but I'd argue that there are far fewer people chomping at the bit to get in to the search business these days. Now it's all "social" or "game" related.
评论 #2038275 未加载
评论 #2038149 未加载
评论 #2038690 未加载
评论 #2038198 未加载
rwmjover 14 years ago
I wonder what happened to the Internet Archive search tool she wrote (recall.archive.org)?
评论 #2038310 未加载
iwwrover 14 years ago
In other words, avoid spending money, refine your algorithms first. Faster machines may be tempting, but that makes scaling horribly expensive down the road.
评论 #2038143 未加载
korussianover 14 years ago
I think for most people writing a search engine is overkill when there are existing options out there.<p>If you want to search a subset of sites, then Google CSE is really all you need + whatever bells &#38; whistles you'd like to add around it. I've done that here: <a href="http://searchESLCafe.com" rel="nofollow">http://searchESLCafe.com</a>, adding "recent searches", search via wildcard subdomain (i.e. foo.searchESLCafe.com or bar.searchESLCafe.com or foo_bar.searchESLCafe.com, etc), and customizing the heck out of Google CSE's options.<p>Is there a demand out there for the search engine to parse the results into something informative at-a-glance? I'm not so sure it's the user's first priority. Or, to put it another way, there's plenty of hard-to-reach info out there that you can hand users via a customized Google CSE, and they don't mind doing the leg-work of clicking on the query results and finding their own answers.<p>It's a lot more important to have an accurate search algorithm than drill-down-related bells &#38; whistles.<p>Google does a great job of returning solid results for any subset of sites, so why not let Google handle it, and concentrate on the other stuff?
joshbaptisteover 14 years ago
Heh.. wonder what yegg of DuckDuckGo thinks of this article.
knownover 14 years ago
We can rollout our won Google search engine via <a href="http://aspseek.org" rel="nofollow">http://aspseek.org</a>
mixmaxover 14 years ago
<i>Application server is busy. Either there are too many concurrent requests or the server still is starting up</i><p>Apparently scaling is hard too.