科技回声

zawerf将近 6 年前

I was confused about the intended use case but there's more information in the docs folder: <a href="https://github.com/microsoft/bistring/blob/master/docs/Introduction.rst" rel="nofollow">https://github.com/microsoft/bistring/blob/master/docs/Intro...</a><p>Apparently it's for machine learning where you want to pick out a span/substring in the original text but your model can only accept normalized text (I am guessing for stuff like transforming out-of-vocabulary words into UNK/unknown tokens). This solves that problem by keeping track of the index mapping between the original text and transformed text.<p>(picking out spans is very common task in NLP, for example see the SQuAD dataset: <a href="https://rajpurkar.github.io/SQuAD-explorer/explore/v2.0/dev/Normans.html" rel="nofollow">https://rajpurkar.github.io/SQuAD-explorer/explore/v2.0/dev/...</a>)

评论 #20430054 未加载

评论 #20429134 未加载

andrewflnr将近 6 年前

Somewhat related: Boomerang <a href="https://www.seas.upenn.edu/~harmony/" rel="nofollow">https://www.seas.upenn.edu/~harmony/</a> Discussed here at least once: <a href="https://news.ycombinator.com/item?id=565874" rel="nofollow">https://news.ycombinator.com/item?id=565874</a><p>The title made me think of Boomerang, this looks like it has rather different use cases in mind.

blt将近 6 年前

This is interesting, but the readme doesn't say much about use cases. What is a big application that could benefit from this?

评论 #20429281 未加载

Bistring – Bidirectionally Transformed Strings

3 条评论

Bistring – Bidirectionally Transformed Strings

3 条评论