TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: How difficult is a search engine MVP that just works?

9 点作者 bluu00超过 4 年前
what it is, i mean, what&#x27;s stopping you from making the next google <i>search</i>?<p>Tf-Idf? No. Just kidding. Infromation retrieval, text mining, ML&#x2F;DL ?! what is going on with this field ! Every other resource seems outdated? What is the state of art ?<p>Reading some of these posts : https:&#x2F;&#x2F;boyter.org&#x2F;2010&#x2F;08&#x2F;build-vector-space-search-engine-python&#x2F;<p>https:&#x2F;&#x2F;www.dr-josiah.com&#x2F;2010&#x2F;07&#x2F;building-search-engine-using-redis-and.html<p>https:&#x2F;&#x2F;stevenloria.com&#x2F;tf-idf&#x2F;<p>https:&#x2F;&#x2F;stories.algolia.com&#x2F;a-search-engine-in-css-b5ec4e902e97

5 条评论

a3camero超过 4 年前
I’ve done this over the last year on a tiny scale for my own needs: gorillafind.com. It’s from scratch and just for government sites to sidestep some of the challenges (but so far only has 50 sites). The cost per site is around $1&#x2F;mo for crawling, indexing, converting file formats and then serving up results. It’s difficult but not impossible and very educational. If you’d like to hear more about doing it yourself and some of the challenges feel free to email me with the contact info on the site. My system isn’t open source but I’m more than happy to chat about the research I’ve done and how you can make one.<p>I’d start off with not doing state of the art because it’s overkill for an “MVP”. And if you don’t need proper browser rendering of pages, there’s open source crawlers out there like Nutch that might work. If you’re making one yourself, the outdated academic papers and presentations by search companies are a good resource as the basic ideas of crawling and indexing haven’t changed too much (even if ranking and other components have changed a lot). A search engine is really a set of related components and there are many examples out there to use as inspiration for your MVP.
评论 #24629757 未加载
ploika超过 4 年前
Have you read this post?<p><a href="https:&#x2F;&#x2F;danluu.com&#x2F;sounds-easy&#x2F;" rel="nofollow">https:&#x2F;&#x2F;danluu.com&#x2F;sounds-easy&#x2F;</a><p>It describes some of the various difficulties in building the next Google Search, much better than I could.
评论 #24629536 未加载
bluu00超过 4 年前
And to add, i am not looking for a &quot;how to learn&quot;?<p>Neither it is some <i></i>site-wide search<i></i> que.<p>And not a business model as well.<p>Not a &quot;privacy first but no results&quot; search.<p>Something that works.
ioli超过 4 年前
I don&#x27;t know how feasible it is, but couldn&#x27;t you use ElasticSearch ???
评论 #24629555 未加载
techdragon超过 4 年前
I built a “search engine” but unfortunately my approach lacked any implicit ranking mechanism.<p>TLDR: my regex search engine needs result ranking to be more useful before I consider showing it to other humans.