科技回声

6 条评论

tysam_and超过 1 年前

I do not want to be too much of a downer, but is there something keeping us from just using the traditional verbiage we've used in the community for years and call it "projecting into a shared latent space" (or something bland but descriptive that a researcher like me could quickly latch onto, like 'separate key and query encoders')?<p>I understand that proprietary names are necessary to sell ideas, architectures, etc, but projecting things into the same latent space is an....old concept, that, like the old Ecclesiastes verse, come up in new and unique ways/applications. Though there is nothing new under the sun, indeed.<p>Please forgive this stodgy young person. Frippery and grumpiness are a deep skill of mine, and I apply it to my own DL research as well. I do not want to discourage the author from writing more pieces, explaining concepts is I believe a great trend to have in a community.<p>Thank you and curious for anyone's thoughts. <3 :) :')))) <3

评论 #37639834 未加载

评论 #37639812 未加载

评论 #37640933 未加载

评论 #37640422 未加载

joewferrara超过 1 年前

This is a very cool concept. The example given of Bytedance combining the text embedder ALBERT with the a transformer image embedder, to make an embedder that can do image and text at the same time to get an interaction score is fascinating. I had not heard of being able to combine unrelated embedders before, and I want to know more examples. A quick google search found this in depth blog article of how their using two-tower embeddings at Uber, which was written recently (this past July)<p><a href="https://www.uber.com/blog/innovative-recommendation-applications-using-two-tower-embeddings/" rel="nofollow noreferrer">https://www.uber.com/blog/innovative-recommendation-applicat...</a>

评论 #37639767 未加载

评论 #37639731 未加载

rdedev超过 1 年前

So i have played around with this architecture specifically for entity linking. Two BERT encoders: one for the query text and another for the candidates. Initially the two encoders were separate but trained together. When I tried using the same BERT model for both, the accuracy jumped by 4 percentage points. Was pretty suprised by this and i guess it got me thinking that maybe a simple cosine similarity loss function is not enough information for the model to shared latent space. Maybe we also need some weights to be the same between encoders. Granted in my use case above they are the same modality but if we are building a model with image and text encoders it might be helpful to try and tie the weights in the last layers of those two encoders

评论 #37643099 未加载

johnsutor超过 1 年前

I much prefer the n-tower approach <a href="https://arxiv.org/pdf/2307.10802.pdf" rel="nofollow noreferrer">https://arxiv.org/pdf/2307.10802.pdf</a>

choeger超过 1 年前

Soo, if and when this "AI" is applied to suggestions, will it stop to suggest washing machines to me after I just bought one the other day in the same shop? Or the same article I just read this morning? As long as this very simply case is not covered, I consider these algorithms cow manure.

taylorius超过 1 年前

Please don't take this the wrong way, but I'm somewhat knowledgeable about AI and machine learning, and when I heard the phrase "two tower embedding" the first thing I thought of was the tragedy on 9/11. So I'm not sure how good a slogan it is, if that was your aim at all.

评论 #37641462 未加载

评论 #37641856 未加载

6 条评论

tysam_and超过 1 年前

评论 #37639834 未加载

评论 #37639812 未加载

评论 #37640933 未加载

评论 #37640422 未加载

joewferrara超过 1 年前

评论 #37639767 未加载

评论 #37639731 未加载

rdedev超过 1 年前

评论 #37643099 未加载

johnsutor超过 1 年前

I much prefer the n-tower approach <a href="https://arxiv.org/pdf/2307.10802.pdf" rel="nofollow noreferrer">https://arxiv.org/pdf/2307.10802.pdf</a>

choeger超过 1 年前

taylorius超过 1 年前

评论 #37641462 未加载

评论 #37641856 未加载

Two-Tower Embedding Model

6 条评论

Two-Tower Embedding Model

6 条评论