Hi HN,<p>With the following function, you can compare the semantic of two strings in Google Sheet. It is essentially a wrapper around the Universal Sentence Encoder [1].<p>It will encode two sentences and return the cosine similarity between the two embeddings. It works with English, and 15 other languages. It can compare two sentences even if they are in different languages.<p>For example: SIMILARITY("Hacker News is a social news website focusing on computer science and entrepreneurship", "Hacker News est un site d'actualités sociales axé sur l'informatique et l'entrepreneuriat") returns 0.88
While SIMILARITY("Hacker News is a social news website focusing on computer science and entrepreneurship", "The site was created by Paul Graham in February 2007") returns 0.17.<p>I initially built it to align instructions in two different languages, and figured it could be useful for other people as well.<p>To load it into Google Sheets, you can open a Sheet, click on "Tools", "Script Editor" and copy paste the function. It should now be available within the Sheet.<p>Let me know what you do with it :-)<p>Maxime.<p><pre><code> /**
* Multilingual semantic similarity between two string based on Google's Universal Sentence Encoder and cosine similarity.
*
* @param {string} str_1 The first text.
* @param {string} str_2 The second text.
* @return {float} Number between -1 and 1, where a number of 1 means the two texts are semantically similar.
* @customfunction
*/
function SIMILARITY(str_1, str_2) {
var data = {'str_1': str_1, 'str_2': str_2};
var options = {
'method' : 'post',
'contentType': 'application/json',
'payload' : JSON.stringify(data)
};
response = UrlFetchApp.fetch('https://neuralyst.io/string_similarity', options);
sim = JSON.parse(response.getContentText())
return sim["similarity"]
}
</code></pre>
[1]: https://tfhub.dev/google/universal-sentence-encoder-multilingual-large/3