TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Weaviate is an open-source search engine powered by ML, vectors, and GraphQL

5 pointsby thirdtriggerabout 4 years ago

1 comment

peter_d_shermanabout 4 years ago
&gt;&quot;For van Luijt, this was an &quot;Aha&quot; moment. Like everyone else working in technology,<p><i>he had to deal with lots of unstructured data.</i><p>In his words, relating data is a problem. Data integration is hard to do, even for structured data. When you have unstructured data from different sources, it becomes extremely challenging.<p>Van Luijt read up on RankBrain and figured it uses word vectorization to infer relations in the queries and then try to present results. Vectors are how machine learning models understand the world. Where people see images, for example, machine learning models see image representations, in the form of vectors.<p>A vector is a very long list of numbers, which can be thought of as coordinates in a geometrical space. Three-dimensional vectors -- i.e. vectors of the form (X, Y, Z) -- correspond to a space humans are familiar with. But multi-dimensional vectors also exist, and this complicates things:<p>&quot;There are many dimensions, but to paint a mental picture, you can say there&#x27;s just three dimensions. The problem now is, it&#x27;s great that you can use a vector to recognize a pattern in a photo and then say, yes, it&#x27;s a cat, or no, it&#x27;s not a cat. But then, what if you want to do that for one hundred thousand photos or for a million photos? Then you need a different solution, you need to have a way to look into the space and find similar things.&quot;<p>This is what Google did with RankBrain for text. Van Luijt was intrigued. He started experimenting with Natural Language Processing (NLP) models. He even got to ask Google&#x27;s people directly: Were they going to build a B2B search engine solution? Since their reply was &quot;no,&quot; he set out to do that with Weaviate.<p>Searching the document space with vectors<p>NLP machine learning models output vectors: They place individual words in a vector space. The idea behind Weaviate was:<p><i>What if we take a document -- an email, a product, a post, whatever -- look at all the individual words that describe it and calculate a vector for those words.</i><p>This will be where the document sits in the vector space. And then, if you ask, for example: What publications are most related to fashion? The search engine should look into the vector space, and find publications like Vogue, as being close to &quot;fashion&quot; in this space.<p>This is at the core of what Weaviate does. In addition, data in Weaviate are stored in a graph format. When nodes in the graph are located, users can traverse further and find other nodes in the graph.<p>It&#x27;s not that it isn&#x27;t possible to store vectors in traditional databases. It is, and people do that. But after a certain point, it becomes impractical. Besides performance, complexity is also a barrier. For example, van Luijt mentioned, in most cases, people are not privy to the details of how vectorization happens.<p>Weaviate comes with a number of built-in vectorizers. Some are general-purpose, some are tailored to specific domains such as cybersecurity or healthcare.<p><i>A modular structure enables people to plugin their own vectorizers, too.</i>&quot;<p>PDS: In the above context,<p><i>&quot;Vectorizer&quot; = &quot;Domain Specific Search Engine&quot;</i>