For example, I have a long text that says, "I have to change the screw number 1234," and a short one that says, "screw number 1234 changed."<p>Both were inserted by different people, referring to the same thing.<p>I thought of using an LLM (GPT-4), however, my dataset is too large (millions of entries) and it would be expensive.<p>Is there any other better or good enough way?<p>Thank you.
Try <a href="https://www.sbert.net/" rel="nofollow">https://www.sbert.net/</a><p>These models are self-hosted and cheap to run. Much smaller than GPT 3 or 4 but trained especially for this purpose.