TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: I'm building a personal web search engine

8 pointsby a5huynhalmost 3 years ago

2 comments

a5huynhalmost 3 years ago
Hey HN, I&#x27;m building an open source search platform that lives on your device, indexing what you want, exposing it to you in a super simple &amp; super fast interface.<p>I took the idea of adding &quot;site:reddit.com&quot; to your Google searches and expanded on it with the idea of &quot;lenses&quot; to add context to your search query and give the crawler direction in terms of what to crawl &amp; index. This means that all queries are run locally, it does not relay your search to any 3rd-party search engine. Think of it as your personal bookcase at home vs. the Library of Congress.<p>It&#x27;s still in a super early state but would love for people to start using it and providing some feedback and see what sort of lenses people want to build and search through!<p>Some details about the stack for the interested:<p><pre><code> * All Rust w&#x2F; some HTML&#x2F;CSS for the client. * Client is built w&#x2F; yew + tauri * Backend uses tantivy to index the web pages, sqlite3 to hold metadata &#x2F; crawl queue </code></pre> Thanks in advance!
marginalia_nualmost 3 years ago
Cool.<p>But a warning, based on doing quite a lot of crawling from home through my own search engine, it&#x27;s very easy to have your IP or IP-block end up on annoying graylists where basically every other website you visit will throw a CAPTCHA in your face. I&#x27;m aware this is a risk and use a VPN for most of my private web surfing anyway so it&#x27;s not that much of a bother, but it&#x27;s a bit sketchy to expose other people to that risk through something like this.<p>It would probably be wise to use canned crawls for major websites, maybe something like trading WARCs &lt;<a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Web_ARChive" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Web_ARChive</a>&gt; over bit-torrent or whatever. Most of these types of websites don&#x27;t change <i>that</i> often in the places that matter.
评论 #31536616 未加载