TechEcho

4 comments

The document discusses a monolith real-time recommendation system with collisionless embedding table. The system is designed to provide personalized content for each individual user in real-time. The data for recommendation mostly contain sparse categorical features, some of which appear with low frequency.A collisionless embedding table is a type of data structure that is used to store information in a way that minimizes the chances of two pieces of information colliding or conflicting with each other. This is often done by using a hashing algorithm to map data to specific locations in the table, which reduces the likelihood of two pieces of data being stored in the same location.Embeddings are a way of representing data in a lower-dimensional space. In this case, the embedding is used to represent the data from the user's interactions. This can be used to make predictions about relationships between vectors.The advantage of storing the dot products of the embeddings would be that it would allow for a more efficient calculation of the similarity between two vectors. Dot products are a measure of how similar two vectors are, so by storing the dot products of the embeddings, it would be possible to quickly calculate the similarity between any two vectors. This would be especially useful in a recommendation system, where it is often necessary to calculate the similarity between a user's vector and a large number of other vectors in order to find the most similar items.

eggie5over 2 years ago

Interesting work on Online Learning:There are many empirical studies which show for feature hashing, a few collisions don't have a sig impact on perf (<a href="https://youtu.be/ARjNMdCzN-Q?t=599" rel="nofollow">https://youtu.be/ARjNMdCzN-Q?t=599</a>).However, for some archs, the impact is catastrophic. Eg matrix factorization. Any collision leads to an incorrect item. Zero Collision Hashing addresses the problem of mapping collisions. One technique is to introduce state into the hashing fn using the current id assignments.real time publishing protocol:* minute-level weight syncing * delta pushes only * ignore machine failures and rely on (possibly stale) full snapshot loading to bootstrap the new hosts

评论 #33519880 未加载

alsurenover 2 years ago

Does anyone have an archive of their code for this?<a href="https://github.com/bytedance/monolith" rel="nofollow">https://github.com/bytedance/monolith</a> exists, but is an empty repo. A Web Search for "github bytedance monolith" finds a bunch of files in that repo that are 404s when you click on them.

评论 #33517841 未加载

jamesblondeover 2 years ago

It is an interesting architecture. Flink for real time feature computation and Kafka to share training samples with clients for low end to end latency.Nothing on use of embeddings for similarity search, though. I assume they are using FAIS?

Monolith: Real time recommendation system with collisionless embedding table

4 comments

Monolith: Real time recommendation system with collisionless embedding table

4 comments