I was confused about the intended use case but there's more information in the docs folder: <a href="https://github.com/microsoft/bistring/blob/master/docs/Introduction.rst" rel="nofollow">https://github.com/microsoft/bistring/blob/master/docs/Intro...</a><p>Apparently it's for machine learning where you want to pick out a span/substring in the original text but your model can only accept normalized text (I am guessing for stuff like transforming out-of-vocabulary words into UNK/unknown tokens). This solves that problem by keeping track of the index mapping between the original text and transformed text.<p>(picking out spans is very common task in NLP, for example see the SQuAD dataset: <a href="https://rajpurkar.github.io/SQuAD-explorer/explore/v2.0/dev/Normans.html" rel="nofollow">https://rajpurkar.github.io/SQuAD-explorer/explore/v2.0/dev/...</a>)
Somewhat related: Boomerang <a href="https://www.seas.upenn.edu/~harmony/" rel="nofollow">https://www.seas.upenn.edu/~harmony/</a> Discussed here at least once: <a href="https://news.ycombinator.com/item?id=565874" rel="nofollow">https://news.ycombinator.com/item?id=565874</a><p>The title made me think of Boomerang, this looks like it has rather different use cases in mind.