I made a STSB alternatives, but with dialog/assistant samples.<p>I couldn't find anything similar online (!), so I built it.<p>The reason I did it was because I needed a very small model that would work well with my React component, and none of the existing 17M models performed adequately.<p>The one I created with this dataset does.<p>Embedding models, like other types of models, can be task-specific, and I didn't have any officially recognized task for my needs.<p>The closest is the "sentence similarity" task, but one of the most recognized benchmark for it is STSB and I find STSB to be quite strange.<p>Here is a 5 out of 5 scored example from STSB: "A person cuts an onion." and "A person is cutting an onion."<p>Here is a 1 out of 5 scored example from STSB: "A man is playing the flute" and "A man is playing the guitar".<p>STSB isn't what I need for my "real world" task. What I need is a way to find best paragraphs that are answers for the question the user asks. This is why I made that dataset and this is why I fine-tuned an embedding model. It was a fun experience and the model works really well! :)