Launch HN: Exa (YC S21) – The web as a database

399 点作者 willbryk2 天前

Hey HN! We’re Will and Jeff from Exa (<a href="https://exa.ai">https://exa.ai</a>). We recently launched Exa Websets, an embeddings-powered search engine designed to return exactly what you’re asking for. You can get precise results for complex queries like “all startups working on open-source developer tools based in SF, founded 2021-2025”. Demo here - <a href="https://youtu.be/Unt8hJmCxd4" rel="nofollow">https://youtu.be/Unt8hJmCxd4</a>We started working on Exa because we were frustrated that while LLM state-of-the-art is advancing every week, Google has gotten worse over time. The Internet used to feel like a magical information portal, but it doesn’t feel that way anymore when you’re constantly being pushed towards SEO-optimized clickbait.Websets is a step in the opposite direction. For every search, we perform dozens of embedding searches over Exa’s vector database of the web to find good search candidates, then we run agentic workflows on each result to verify they match exactly what you asked for.Websets results are good for two reasons. First, we train custom embedding models for our main search algorithm, instead of typical keyword matching search algorithms. Our embeddings models are trained specifically to return exactly the type of entity you ask for. In practice, that means if you search “startups working in nanotech”, keyword-based search engines return listicles about nanotech startups, because these listicles match the keywords in the query. In contrast, our embedding models return actual startup homepages, because these startup homepages match the meaning of the query.The second is that LLMs provide the last-mile intelligence needed to verify every result. Each result and piece of data is backed with supporting references that we used to validate that the result is actually a match for your search criteria. That’s why Websets can take minutes or even hours to run, depending on your query and how many results you ask for. For valuable search queries, we think this is worth it.Also notably, Websets are tables, not lists. You can add “enrichment” columns to find more information about each result, like “# of employees” or “does author have blog?”, and the cells asynchronously load in. This table format hopefully makes the web feel more like a database.A few examples of searches that work with Websets:- “Math blogs created by teachers from outside the US”: <a href="https://websets.exa.ai/cma1oz9xf007sis0ipzxgbamn">https://websets.exa.ai/cma1oz9xf007sis0ipzxgbamn</a>- "research paper about ways to avoid the O(n^2) attention problem in transformers, where one of the first author's first name starts with "A","B", "S", or "T", and it was written between 2018 and 2022”: <a href="https://websets.exa.ai/cm7dpml8c001ylnymum4sp11h">https://websets.exa.ai/cm7dpml8c001ylnymum4sp11h</a>- “US based healthcare companies, with over 100 employees and a technical founder": <a href="https://websets.exa.ai/cm6lc0dlk004ilecmzej76qx2">https://websets.exa.ai/cm6lc0dlk004ilecmzej76qx2</a>- “all software engineers in the Bay Area, with experience in startups, who know Rust and have published technical content before”: <a href="https://youtu.be/knjrlm1aibQ" rel="nofollow">https://youtu.be/knjrlm1aibQ</a>You can try it at <a href="https://websets.exa.ai/">https://websets.exa.ai/</a> and API docs are at <a href="https://docs.exa.ai/websets">https://docs.exa.ai/websets</a>. We’d love to hear your feedback!

57 条评论

WuxiFingerHold1 天前

If you require to have an account to try the web search out (which you have all the right in the world, it's your service), tell us before we enter the service and type in our search. This comes around as sneaky. You should be clear upfront.

评论 #43912552 未加载

评论 #43914304 未加载

评论 #43913225 未加载

AznHisoka2 天前

I searched for 'data providers that start with the letter R that sell job postings data', and it's been 15 minutes and it barely verified the first row.But if it filtered it first to "start with the letter R", it would only have to look at perhaps 5% of the results it's trying to verify!So it's doing needless verification of results that will be thrown out by another filter that should've been applied first!

评论 #43910382 未加载

hubraumhugo2 天前

I think you guys nailed the "selling shovels during a gold rush" as the biggest issue with LLMs currently is their reliability/hallucinations, not their capabilities. If I can use websets to back up LLM responses through your API, that's super useful.Since you were part of YC 21, could you share a bit about your pivots/product iterations you went through over the last 4 years?

评论 #43908841 未加载

xp842 天前

This is super cool! It took a while, but did a great job of evaluating the results, and the airtable-like results UI feels great.Congrats on your launch. With the natural way this lends itself to comparison shopping this is an amazing tool for people trying to find "the best X for me" whether that's a TV, a school, etc. So much content that you find on Google when trying to answer that type of query, is designed to trick, bamboozle, and to hide the facts that you might use to answer this question (but most of all to get you to click affiliate links).

vetleen1 天前

You did get me to click the 'upgrade' button, but the pricing is too high for me.I did one search with 4 criteria, then added the two free columns, and at this point i had spent 750 of my 1000 free credits. The next tier being $49 with only 8000 credits, which means only 10 searches a month.The search I did was super useful, and I would love to use the product, and reccomend it to my coworkers. But the pricing is what stops me.Best of luck. I'll probably use it once a month if I can remember :)

joshstrange2 天前

I think it might be a good idea to give some kind of indication that work is being done in the background (or perhaps mine stalled out?).The initial search/experience is good but then I got dumped here [0] and it's not clear to me if things are still happening or if it broke (it's been at least 5 min with no UI updates.I can't see the full results yet but this is very interesting and a task I ask OpenAI's Deep Research to attempt periodically. It makes a good show of doing the work but the results are not great IMHO (for asking it generate lists/tables of data like this). I can see this tool being incredibly useful for lead generation (how I am testing it out).[0] <a href="https://cs.joshstrange.com/dySqK1mb" rel="nofollow">https://cs.joshstrange.com/dySqK1mb</a>

评论 #43910527 未加载

byearthithatius2 天前

I was so excited for this, but sadly it doesn't work at all, not even UI feedback for the error:(The UI showed literally no change. So I checked and the console shows:``` Try: 14 Not Found 681-7df1b139fa2dc9f0.js:14:3379 Try: 15 Not Found 681-7df1b139fa2dc9f0.js:14:3379 Try: 16 Not Found 681-7df1b139fa2dc9f0.js:14:3379 Try: 17 Not Found 681-7df1b139fa2dc9f0.js:14:3379 Try: 18 Not Found 681-7df1b139fa2dc9f0.js:14:3379 Try: 19 Not Found 681-7df1b139fa2dc9f0.js:14:3379 Try: 20 Not Found 681-7df1b139fa2dc9f0.js:14:3379 Gave up after 10 seconds. 681-7df1b139fa2dc9f0.js:14:3379 filteredSuggestions Array(3) [ {…}, {…}, {…} ] 681-7df1b139fa2dc9f0.js:14:3379 ```Also your table doesn't fit in the viewport so I can't see the results.Firefox Ubuntu.

评论 #43909722 未加载

评论 #43910064 未加载

评论 #43910397 未加载

mbeavitt2 天前

This is super cool. You provide examples of “searches that work” - can you give an idea of the limitations here? What kind of searches won’t work?

评论 #43907634 未加载

theamk2 天前

Did my favorite search query, and the result were pretty bad, as expected:"robotics servo motors with two-directional control for under $100"1. <a href="https://mjbots.com/" rel="nofollow">https://mjbots.com/</a> - their motor are $1369. FAIL.2. <a href="https://www.pololu.com/" rel="nofollow">https://www.pololu.com/</a> - this is huge store, but they do have some motors like that. Pass, but wish it linked to specific page and not top top-level one.3. dh-robotics.com - no prices, but some products on open market are few K$. Likely fail as well.4. <a href="https://www.robotarticulation.com/" rel="nofollow">https://www.robotarticulation.com/</a> - The product is not for sale (early beta), and it looks likely much more than $1K. FAIL.5. <a href="https://www.lynxmotion.com/" rel="nofollow">https://www.lynxmotion.com/</a> - another huge store, most two-directional motors are expensive but there are some under $100... Pass, but wish it linked to specific page and not top top-level one.

评论 #43909781 未加载

评论 #43910284 未加载

raymondgh大约 11 小时前

I searched for series b startups hiring for data analyst roles in the Bay Area for a friend. It returned lots of bigger companies that I didn’t want included. I also had no luck trying to figure out how to copy paste or share the results (using iPhone). Overall though I think this is a great idea and wish you the best!

wdrw大约 21 小时前

I was trying to submit some feedback using your "Feedback" button on the top right, but got an error when trying to submit it :(Anyway, the model used doesn't seem to be very good, it did not understand a basic "OR" criteria. I asked for a list of companies with an office in Toronto that are involved in hardware development such as custom silicon, robotics, satellites or drones. It completely misunderstood the "or" part (and the "such as" part). E.g. I see many robotics companies marked as a "Miss" because they only do robotics but not any of the other things on my list.Overall though I love the idea, I would pay for your service (on a pay-as-you-go per-query basis) if the underlying model was smart enough for me to actually rely on the results.

esafak2 天前

I suggest caching and enabling the sharing of results. I am not signed in so I don't know if that is feature I am missing.I searched for "alternatives to jq with a functional API" and one of the criteria it came up with was "Provides technical details or comparisons relevant to the alternatives" but the table only listed the repo's url and description. And the description was truncated with ellipses with no way for me to resize the columns. Also, it missed the opportunity to tell me that some shells can replicate jq's functionality. Finally, it would have to be faster to be a daily driver. At this speed, it is something I would reserve for backup, for when the workhorse fails. Which means I would not want to pay $49/month.Hope that helps. Interesting idea.

评论 #43907847 未加载

评论 #43907980 未加载

dbuxton2 天前

Hey! Congrats on the launch. I just signed up for a trial account and I’m pretty impressed with the search API (haven’t used websets yet but looks cool).Our experimental use case is enabling quick and dirty integration of web-based docs into an employee service agentic chatbot - lots of the questions are around “how do I max out my 401k”, which connects to internal information, but some are more like “how do I link a calendar to calendly”.The one thing I’d love to have in the search product is a cruft cleaner for the results of web queries. Where you have cached the data presumably this wouldn’t add much overhead. Reduces what you have to feed to the LLM downstream and might improve the embeddings performance.

评论 #43907768 未加载

frankramos2 天前

The Exa LinkedIn webset is something very innovative. Many current providers make it difficult if not against "Terms of Service" to build a product using their data. The irony is that they simply scraped LinkedIn.

drob5181 天前

Some feedback for you.1. I love the idea.2. The UI needs to work on smaller screens (e.g., tablets). The current layout is VERY cramped.3. Its ability to search for businesses in a given geography is poor. I asked it to search for businesses in a city and it was giving me results that were obviously incorrect from halfway across the country.4. For a homepage URL for a business, it once gave me a parked domain name at GoDaddy's "domain for sale" page. That seemed like a blunder. Is that because it's pulling in WHOIS information and it connected some addresses?5. Performance is quite poor. Perhaps that's because you're getting "Hackernews'd" with a surge of people consuming all your capacity.

评论 #43917962 未加载

willbryk2 天前

Thanks for the support - we're getting hug of death though so please bear with us while we scale up!

srameshc2 天前

So the crawlers are feeding to database and also something is classifying the data stream and organizing the data and everything is open as a very large dataset. This is an interesting concept.

评论 #43907473 未加载

评论 #43907853 未加载

jackienotchan2 天前

AI crawlers have lead to a big surge in scraping/crawling activity on the web, and many don't use proper user agents and don't stick to any scraping best practices that the industry has developed over the past two decades (robots.txt, rate limits). This comes with negative side effects for website owners (costs, downtime, etc.), as repeatedly reported on HN (and experienced myself).Do you have any built-in features that address these issues?

评论 #43908892 未加载

whoisjuan2 天前

Did you guys change the pricing of Exa?When I checked this a year or so ago, I might have gotten the impression that it was cheaper. Now, it costs the same as what Perplexity charges for search-grounded queries, which is the same as Google charges for Gemini queries with search.So basically, one player sets a price, and everyone is anchored on that as the pricing for the entire category? I'm just genuinely interested in why every offering in this space is priced like this.It seems a bit misaligned with how pure LLM queries are priced.I have a product that would benefit from search grounding, but this pricing wouldn't work with my volume of queries.

评论 #43910463 未加载

评论 #43910294 未加载

upcoming-sesame2 天前

This is a nice alternative for my Gemini Deep Research use case.Most of the time I want to find some vendors / companies and Deep Research does that but also responds with a wall of unnecessary text where I just want the table

mfrye02 天前

Congrats on the launch!How do you dedupe entities, like companies and people? I've noticed ChatGPT tends to provide "great" results when asking about different entities, but in reality it just groups similar sounding entities together in its answer.For example, I asked ChatGPT about a well known startup. It gave me a confident answer about how much they raised, their current status, etc. When looking at the 3 sources they cited though, it was actually 3 different companies that all had similar sounding names that it just grouped together to form its answer.Basically, how do I trust the output of your system?

评论 #43908695 未加载

foobahhhhh2 天前

Very nice! Like a Databricks for Google, or perhaps think of it as Google backend as a service (at least their AI like backend not the main search)It disrupts anyone who merely does one thing this does. E.g. contact building app can be done by this. I imagine many "wrapper" apps can be built on this.I am serious though. It felt like using databricks a little bit, obviously without all the functionality but that will come.I'm bullish! Modulo competition. Someone who does this makes their billion.

评论 #43910772 未加载

bhl1 天前

> The second is that LLMs provide the last-mile intelligence needed to verify every result. Each result and piece of data is backed with supporting references that we used to validate that the result is actually a match for your search criteria.Evals on this would be great to benchmark the gap between using websets versus a generic web search tool. Otherwise to a developer, it's just marketing.

评论 #43911323 未加载

waterproof1 天前

I love the enrichments feature. Have you considered making it available separately from the initial web search?I often have projects where the enrichments feature alone would be super useful: I would provide, say, a list of company names, and then use enrichments to qualify them based on location, age, founder experience etc etc.

评论 #43912479 未加载

lgiordano_notte2 天前

Really cool direction. The embedding-first + agentic verification pipeline resonates, similar pattern worked well for us in the web interaction space.

ByteAtATime2 天前

This is really cool! Just a small nitpick: on a low-powered device, the hero globe is really laggy (it's fine if I scroll past it, though).

评论 #43907415 未加载

评论 #43907127 未加载

评论 #43908907 未加载

ixxie1 天前

Seems awesome, but let me know when your entry level plan is under $10. I'd love to be able to prepay for credits rather than have a subscription!

androng2 天前

the homepage has an error in my Google Chrome and my Google Chrome incognito but not in my Safari <a href="https://drive.google.com/file/d/1ayWyf6ni_kofWrw9lowXjAX_AiIAWJBh/view?usp=sharing" rel="nofollow">https://drive.google.com/file/d/1ayWyf6ni_kofWrw9lowXjAX_AiI...</a>

euvin1 天前

I found the hallucination detector demo: <a href="https://demo.exa.ai/hallucination-detector">https://demo.exa.ai/hallucination-detector</a>The search engine was impressive enough but I think this implementation was a nice cherry on top.

forthwall2 天前

Really novel idea but - I think there's a bug for the first example, when I land on the websets page, it searches "Engineers with startup experience based in california" but whats returned are a bunch of tennis websites

评论 #43910754 未加载

评论 #43910623 未加载

wormius1 天前

WHY DO YOU PEOPLE DO THIS? STOP WITH THE NAME COLLISIONS ALREADY. <a href="https://github.com/ogham/exa">https://github.com/ogham/exa</a>

评论 #43912008 未加载

评论 #43912011 未加载

tibbar2 天前

Wow, this is such an exciting product to me, a great application of modern tools. I'm using it to search for people who have very specific backgrounds that I would be interested to talk to. Thank you for building this.

thm2 天前

Now that you've got some money in the bank, you should get a license for the serif on your website (font-family: RecklessTrial-Regular;).

Gamester1 天前

Congratulations on the launchVery helpful for candidates searching but still a bit slow for every day use Like - “what are the events happening today in my city”But I believe you guys will crack it soon and make it better

mkrishnan2 天前

Congratulations! great idea,some issues I noticed, I searched "lucid air touring models available for sale Under 20,000 miles" and tried to add column "sale price", but did get the price details, same for other cars as well

评论 #43907816 未加载

saadatq2 天前

This looks really great.And also how “internal” business intelligence/operations tools should work. search first to find relevant artifacts - “top 10 customers in AMEA”, followed by agentic verification and enrichment.Congrats on the launch!

评论 #43907781 未加载

mh-2 天前

Congrats on the launch!Can it perform searches that rely on the rendered (JS-executed) state of the website? If so, does it have access to the DOM?Example use case: "The 10 most trafficked e-commerce sites that load Adobe Analytics tag(s)."

评论 #43907139 未加载

moralestapia2 天前

I wish you all the best, exa is pretty much Perplexity done right. So nice!

tcbtcb2 天前

This is so cool! What are the top use cases you’re seeing rn? The semantic heavy search is something most sourcing platforms fail consistently on, especially around people search

BiraIgnacio1 天前

Love the idea, keep up the work and I think this can be really be something between a "standard web search engine" and WolframAlpha

twostorytower2 天前

Congrats on the launch! Given you were in YC S21, when AI was much more under the radar, did you recently pivot? I'm guessing it wasn't a 4 year road to launch.

评论 #43907711 未加载

Mockapapella2 天前

Honestly I thought you guys had launched already (and didn't know you were a part of YC), been aware of you guys for years now it seems. Congrats on the launch! Hope the twitter issues aren't causing you guys too many problems.Normally I'd send this as a DM or email, but I think it could be useful for others to learn about how to use your service/the limitations of it. A couple weeks ago I made a search for:<pre><code> In early 2023, Andrej Karpathy said something like "large training runs are a good test of the overall health of the network." Something something resilience as well I think. I need you to find it. </code></pre> Unfortunately it wasn't able to find it, but it was either in a tweet or a really long presentation, neither of which are good targets for search. It was around the same time that this (<a href="https://www.youtube.com/watch?v=c3b-JASoPi0" rel="nofollow">https://www.youtube.com/watch?v=c3b-JASoPi0</a>) video was posted, like within a couple weeks before or after. How could I have improved my query? Does exa work over videos?

评论 #43909064 未加载

orliesaurus2 天前

Nice! This feels like Clay(.com) interface (sales people love it) but for every piece of data that needs adjacent information.

jppope2 天前

I really love the concept here. Lots of utility. Going to play around with it tonight and see if it can work for some usecases.

ing33k1 天前

Quick question : How does it compare to what Diffbot offers?

smolder1 天前

IMO, we should stop abusing personal data for profit. What does this bring to the table that doesn't advance the surveillance state? Does it help individuals without hurting them?

oofbaroomf2 天前

How big do you think your index is compared to Google?

评论 #43908279 未加载

justanotheratom2 天前

can websets enrich a column with images?

评论 #43908986 未加载

alecdewitz2 天前

Congrats guys!

评论 #43907789 未加载

xena2 天前

Do you respect robots.txt? How can I block your crawlers?

评论 #43909372 未加载

mschrage1 天前

Congrats on the launch!

adi_lancey1 天前

looks great, nice work

benatkin1 天前

This sounds promising.I tried "Full-stack web frameworks started 2023 or later" and the first result was FastHTML which is a very good answer. I was hoping for Dioxus but that I think is actually a little bit older. Of course Google's results, including Gemini, were useless. MeteorJS was not started in 2023 or later. LOL.

artembugara2 天前

Will, Jeff, I am a BIG Exa fan. Congrats on finally doing your HN Launch.I think NewsCatcher (my YC startup) and Exa aren’t direct competitors but we definitely share the same insight — SERP is not the right way to let LLM interact with web. Because it’s literally optimized for humans who can open 10 pages at most.What we found is that LLMs can sift through 10k+ web pages if you pre-extract all the signals out of it.But we took a bit of a different angle. Even though we have over 1.5 billion of news stories only in our index we don’t have a solution to sift through as your Websets do (saw your impressive GPU cluster :))So what we do instead is we do bespoke pipelines for our customers (who are mostly large enterprise/F1000). So we fine-tune LLMs on specific information extraction with very high accuracy.Our insight: for many enterprises the solution should be either a perfect fit or nothing. And that’s where they’re ok to pay 10-100x for the last mile effort.P.S. Will, loved your comment on a podcast where you said Exa can be used to find a dating partner.

评论 #43909789 未加载

orhmeh092 天前

Not to be confused with exa: <a href="https://github.com/ogham/exa">https://github.com/ogham/exa</a>

评论 #43911998 未加载

rushingcreek2 天前