If you haven't heard of Brave Goggles (<a href="https://github.com/brave/goggles-quickstart" rel="nofollow">https://github.com/brave/goggles-quickstart</a>) I highly recommend checking it out. Just being able to create the search index is a massive task, so being able to apply rules server-side to their "expanded recall set" will give you what most people building search engines want, which is to control the algorithm. We weren't able to do that until now since applying rules client-side doesn't work well on a small search result set.<p>Related: I created a tool to create Goggles using subreddits as a signal source for domains: <a href="https://github.com/forcesunseen/narwhalizer" rel="nofollow">https://github.com/forcesunseen/narwhalizer</a>
Shameless self-plug, I've been building some similar that you can run locally as an app: <a href="https://github.com/a5huynh/spyglass" rel="nofollow">https://github.com/a5huynh/spyglass</a><p>You can define some basic rules & it'll go out and crawl those particular sites. Or use one that someone else has built. It can also sync with your Chrome/Firefox bookmarks. Would love feedback from folks who get a chance to use it !
It's interesting that this uses a distributed P2P index. That's a very good idea and one of the things that has held me back from even thinking about trying to build my own tech-focused search engine.<p>One thing I was hoping to see in the FAQ was how they prevent rogue nodes from inserting spam or other kinds of mischief into the public index.
Use this as a personal knowledge base. Indexed my blog. Indexed a bookmarks export. Indexed a knowledge base. Works well. It also convinced me of power user ui
I love the idea of this, but I tried to spin up my own instance and was immediately overwhelmed by the million little knobs and settings for it.<p>It seems like a lot of fun if you understand all the tuning, but I feel like the current state alienates most users who want to use it in simple scenarios.
Recently installed YaCy on my Synology via docker image the provide. Already saved about 10Gb of content interesting to me. Now, I have a personal Search Engine. Awesome.
I would like to use this. However, in the past when I've tried it I didn't like the results. It would be nice to hear about more competition in the P2P information retrieval (search engine) tech space. YaCy seems to be the only one I've consistently heard about over the years.
Has anyone tried LinkAce? I'd love to hear someone's thoughts on YaCy vs LinkAce.<p>This is great timing. After looking at YaCy for my Synology NAS a few week ago, I looked at some alternatives. I like the look of LinkAce, though it seems to be less popular and I haven't found much on how a setup on a Synology NAS works.<p>I'd love some advice, I have a massive number of bookmarks across dozens of folders. Something like this is exactly what I'm looking for.
Copernic used to be a great way to do this. Register every search engine you like in the local software, apply rules, search all the web search engines at once. Until they went 100% corporate, it was awesome.
I have about 100,000 PDFs that I want indexed and searchable. They're on a website and I want people to be able to visit the website and search through the PDFs.<p>Should I use Yacy or Apache Solr?<p>All opinions and rants welcome.