On the topic of search engines, I really liked classes by David Evans. The task was also building a simple search engine from scratch. It's really for beginners, as the emphasis is on coding in general, but I've found it to be very approachable.<p><a href="https://www.cs.virginia.edu/~evans/courses/" rel="nofollow">https://www.cs.virginia.edu/~evans/courses/</a>
I always wonder if the days of search engines for specific topics could return. With LLM's providing less than accurate results in some areas, and Google, bing, etc being taken over by adverts or well organised SEO, there feels like a place for accurate, specialised search.
Nice idea, but this approach
does not handle out of vocabulary words well which is one major motivation for using a vector-based search. It might not perform significantly better compared to lexical matching like tf-idf or BM25, and being slower because of linear complexity. But cool regardless.
The author has a nice series on compiling a Lisp [0], but unfortunately his search engine fails to find it by querying it with "lisp" or "Lisp".<p>[0] <a href="https://bernsteinbear.com/blog/compiling-a-lisp-0/" rel="nofollow">https://bernsteinbear.com/blog/compiling-a-lisp-0/</a>
The SVG equation is very difficult to read if you're using a dark OS theme because the blog uses the OS preference for dark/light theme (and doesn't seem to give an option to change it manually, either.)
> The idea behind the search engine is to embed each of my posts into this domain by adding up the embeddings for the words in the post.<p>Ah, OK! I never really grokked how to use word-level embeddings. Makes more sense now.
This was a really nice read. Now I have no excuse not to upgrade my blog search. I do feel that I'll have a ton of long tail words like 'prank'.