TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: I'm building a personal web search engine

8 点作者 a5huynh将近 3 年前

2 条评论

a5huynh将近 3 年前
Hey HN, I&#x27;m building an open source search platform that lives on your device, indexing what you want, exposing it to you in a super simple &amp; super fast interface.<p>I took the idea of adding &quot;site:reddit.com&quot; to your Google searches and expanded on it with the idea of &quot;lenses&quot; to add context to your search query and give the crawler direction in terms of what to crawl &amp; index. This means that all queries are run locally, it does not relay your search to any 3rd-party search engine. Think of it as your personal bookcase at home vs. the Library of Congress.<p>It&#x27;s still in a super early state but would love for people to start using it and providing some feedback and see what sort of lenses people want to build and search through!<p>Some details about the stack for the interested:<p><pre><code> * All Rust w&#x2F; some HTML&#x2F;CSS for the client. * Client is built w&#x2F; yew + tauri * Backend uses tantivy to index the web pages, sqlite3 to hold metadata &#x2F; crawl queue </code></pre> Thanks in advance!
marginalia_nu将近 3 年前
Cool.<p>But a warning, based on doing quite a lot of crawling from home through my own search engine, it&#x27;s very easy to have your IP or IP-block end up on annoying graylists where basically every other website you visit will throw a CAPTCHA in your face. I&#x27;m aware this is a risk and use a VPN for most of my private web surfing anyway so it&#x27;s not that much of a bother, but it&#x27;s a bit sketchy to expose other people to that risk through something like this.<p>It would probably be wise to use canned crawls for major websites, maybe something like trading WARCs &lt;<a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Web_ARChive" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Web_ARChive</a>&gt; over bit-torrent or whatever. Most of these types of websites don&#x27;t change <i>that</i> often in the places that matter.
评论 #31536616 未加载