TechEcho

I made a STSB alternatives, but with dialog/assistant samples.I couldn't find anything similar online (!), so I built it.The reason I did it was because I needed a very small model that would work well with my React component, and none of the existing 17M models performed adequately.The one I created with this dataset does.Embedding models, like other types of models, can be task-specific, and I didn't have any officially recognized task for my needs.The closest is the "sentence similarity" task, but one of the most recognized benchmark for it is STSB and I find STSB to be quite strange.Here is a 5 out of 5 scored example from STSB: "A person cuts an onion." and "A person is cutting an onion."Here is a 1 out of 5 scored example from STSB: "A man is playing the flute" and "A man is playing the guitar".STSB isn't what I need for my "real world" task. What I need is a way to find best paragraphs that are answers for the question the user asks. This is why I made that dataset and this is why I fine-tuned an embedding model. It was a fun experience and the model works really well! :)

Show HN: I made a dataset for finetuning embedding models

no comments

Show HN: I made a dataset for finetuning embedding models

no comments