Show HN: Instantly search 28M books from OpenLibrary

424 pointsby jaboover 4 years ago

22 comments

jaboover 4 years ago

Quick context: I built this from the Open Library (Internet Archive) books dataset (<a href="https://openlibrary.org/" rel="nofollow">https://openlibrary.org/</a>), as a follow up to this comment on the GoodReads post earlier today: <a href="https://news.ycombinator.com/item?id=25408186" rel="nofollow">https://news.ycombinator.com/item?id=25408186</a>Thank you @mekarpeles for helping me get access to the data quickly and giving me pointers about the schema.I should add - I built this in about 12 hours as a weekend project, so there might be some lurking issues.> Details about the Tech Stack:The dataset has ~28.6 million books and is indexed on Typesense [1], an open source alternative to Algolia/ElasticSearch that a friend and I are working on.The UI was built using the Typesense adapter for InstantSearch.js [2] and is a static site bundled using ParcelJS.The app is hosted on S3, with CloudFront for a CDN.The search backend is powered by a geo-distributed 3-node Typesense cluster running on Typesense Cloud [3], with nodes in Oregon, Frankfurt and Mumbai.Here's the source code: <a href="https://github.com/typesense/showcase-books-search" rel="nofollow">https://github.com/typesense/showcase-books-search</a>[1] <a href="https://github.com/typesense/typesense" rel="nofollow">https://github.com/typesense/typesense</a>[2] <a href="https://github.com/typesense/typesense-instantsearch-adapter" rel="nofollow">https://github.com/typesense/typesense-instantsearch-adapter</a>[3] <a href="https://cloud.typesense.org" rel="nofollow">https://cloud.typesense.org</a>

评论 #25415953 未加载

评论 #25418116 未加载

评论 #25421348 未加载

评论 #25414789 未加载

评论 #25419471 未加载

评论 #25420142 未加载

impalallamaover 4 years ago

Straight up impossible for me to search for Isaac Asimov's "I, Robot". Some kinda of input cleaning strips out the I from "I robot", and just searches "Robot". "iRobot" does not get the results. And the search does not accept commas. Just some of the fun that comes with searching I suppose.

评论 #25418620 未加载

评论 #25418596 未加载

评论 #25418543 未加载

评论 #25418600 未加载

BenElgarover 4 years ago

Cool! Small bit of feedback: typing each character adds an entry to the browser's history which seems a bit excessive. Might consider using replaceState over pushState.

评论 #25419493 未加载

cstuderover 4 years ago

One usability feedback: The small OpenLibrary icon in the results looks a little bit like a trashcan. So after searching I was unsure what to do: I didn't want to got to Amazon nor would I want to trash a result.Suggestion: Link the red book title to the OpenLibrary page.

评论 #25415058 未加载

markdownover 4 years ago

A big problem with the OpenLibrary website is that there is no way to filter search results for books that are actually in the library, which I find quite odd TBH.This "Instant Search" doesn't improve on that.Also, some of the Amazon links are broken for me. They look like <a href="https://www.amazon.com/s?k=9798654289605" rel="nofollow">https://www.amazon.com/s?k=9798654289605</a>

评论 #25414847 未加载

评论 #25422469 未加载

modover 4 years ago

I saw in the comment thread where you promised to do it, and bam, a few hours later, here it is. Kudos, and thanks!

评论 #25415298 未加载

rayragover 4 years ago

I've searched for 'Lovecraft' and then opened first 15 results - no book was in library. Then I went to the openlibrary.org and again searched for 'Lovecraft' and among first 20 results I could only borrow one book for an hour. What is the point of showing books that aren't in library, or I am missing something?

评论 #25420278 未加载

tompazourekover 4 years ago

When I read the title, I thought it was going to search inside the 28M books, but it's just the title, subject, and author. Still a cool project though.

评论 #25415076 未加载

jtbaylyover 4 years ago

Sadly it isn’t finding the first two books I looked for that are in the dataset.<a href="https://openlibrary.org/works/OL7116092W/The_church_of_Christ" rel="nofollow">https://openlibrary.org/works/OL7116092W/The_church_of_Chris...</a>And <a href="https://openlibrary.org/works/OL2525391W/Holiness?edition=" rel="nofollow">https://openlibrary.org/works/OL2525391W/Holiness?edition=</a>

评论 #25416484 未加载

_tom_over 4 years ago

You might want to clarify that this does not search 28M books, it searches metadata for 28M books. Very different.I notice many titles did not have authors. I spot checked this one: <a href="https://openlibrary.org/books/OL25434821M/Still_Star-Crossed" rel="nofollow">https://openlibrary.org/books/OL25434821M/Still_Star-Crossed</a>This title has no author in the search results, but does have one on the linked page. Perhaps an import issue.Edit: Forgot to say "This is awesome. Particularly for 12 hours work!"

thejoshover 4 years ago

There needs to be debounce. Currently seeing 2-10s+ for simple typing, and it jitters around from what I typed.Searching from Perth, Australia.

评论 #25414909 未加载

评论 #25414868 未加载

maz1bover 4 years ago

I'd suggest adding some kind of explanation as to how many records each of your cloud offerings can support (1gb = N records, and so forth)

评论 #25418783 未加载

评论 #25418143 未加载

slowhand09over 4 years ago

Zero results for ISBN-13: 978-1888118049 ISBN-10: 1888118040 Unintended Consequences by John Ross<a href="https://openlibrary.org/works/OL2964952W/Unintended_consequences?edition=isbn_9781888118049" rel="nofollow">https://openlibrary.org/works/OL2964952W/Unintended_conseque...</a>

评论 #25423034 未加载

yborisover 4 years ago

I would love, if possible, to do an exact multi-word search, e.g. "Bill Gates". Currently if the phrase is not found, you get results for one of the words. Putting the search in quotes does not work (like it does when performing a search on Google).

评论 #25418725 未加载

评论 #25417708 未加载

wolfgarbeover 4 years ago

Great project. It seems though, that Books search does not support phrase search and stop words are ignored: "king of england" becomes king england. Is this by design? Does this affect only Books search or typesense generally?

评论 #25414944 未加载

porpoiseover 4 years ago

Not sure if this is unintended behavior, but after clicking from this thread to the site and doing a few searches (all while remaining on the same page), I wanted to history back to this thread but it took me 7-8 clicks to do so.

评论 #25420485 未加载

smusamashahover 4 years ago

It's super uncomfortable when any site uses OR based search. Is there a way to perform AND based search so that when I write more words, the results reduce instead of increasing exponentially?

评论 #25423365 未加载

mpalmerover 4 years ago

Can't fault the performance! Nice work. My one UX suggestion would be to present search results in a more table-like layout, as reading both across and down to see every result is not ideal.

评论 #25423077 未加载

tommoorover 4 years ago

Seems like this exposes the flaw in the dataset. Eg, searching for "cixin liu" there are many different variations of the same book that show up, lots and lots of duplication.

评论 #25421274 未加载

评论 #25421135 未加载

interactivecodeover 4 years ago

@jabo is it possible to connect typesense (or any other instance search provider) to a postgresql instance and have it “just work”?

评论 #25423553 未加载

评论 #25420153 未加载

评论 #25417856 未加载

评论 #25417692 未加载

knownover 4 years ago

I think relevancy can be improved for e.g. <a href="https://books-search.typesense.org/?b%5Bquery%5D=linus%20torvalds" rel="nofollow">https://books-search.typesense.org/?b%5Bquery%5D=linus%20tor...</a>

评论 #25419519 未加载

germankaover 4 years ago

In any language?