Very interesting tool, although storing as typos does seem to be a bit visible and prone to mistaken 'correction'. Other approaches to consider might be:<p>* Changing punctuation for visually identical, but different characters. This would not work for printed documents however.<p>* Encoding only 'believable' typos, e.g. it's its. You could encode a binary stream across all instances of it(')s, or other substitutions.<p>* Encoding the stream in whitespace, e.g. Two/One spaces after a full stop. Printed documents would be lossy though (as full stops at line endings would be ambiguous). There are error detection/correction systems that can help though.
I worked on something very similar, my version also mutated punctuation and common phrases/words with synonyms and sentence re-ordering. Instead of steganography the purpose was to create identifiable mutations in text acting as a canary to tie disclosures back to specific recipients. Each party receiving a confidential document had slight mutations unique to their own document and given a copy/paste from a fairly small fragment(s) could be used to identify the owner of the version.
I did one of these many years ago, basically just abusing lex/flex: <a href="https://github.com/countrygeek/stegparty/blob/master/stegparty.txt" rel="nofollow">https://github.com/countrygeek/stegparty/blob/master/stegpar...</a>
This is similar to steganos (<a href="https://github.com/fastforwardlabs/steganos" rel="nofollow">https://github.com/fastforwardlabs/steganos</a>), which tries to limit itself to changes that do not change the meaning of the text.