TechEcho

I created a Python library and CLI to automatically identify and remove personal information from text documents using Natural Language Processing. It has been used to de-identify internal employee surveys and patient satisfaction surveys.What my project does:* Identifies and replaces person names using spaCy's transformer model* Converts gender-specific pronouns to neutral alternatives* Handles possessives and hyphenated names* Offers HTML output with color-coded replacements___Here's a quick example:<pre><code> Input: John Smith's report was excellent. He clearly understands the topic. Output: [PERSON]'s report was excellent. HE/SHE clearly understands the topic. </code></pre> ___This was a fun project to work on - especially solving the challenge of maintaining correct character positions during replacements. The backwards processing approach was a neat solution to avoid recalculating positions after each replacement.* blog post: <a href="https://gitgist.com/posts/introducing-deidentification-python-module/" rel="nofollow">https://gitgist.com/posts/introducing-deidentification-pytho...</a>* github: <a href="https://github.com/jftuga/deidentification">https://github.com/jftuga/deidentification</a>* PyPI: <a href="https://pypi.org/project/text-deidentification" rel="nofollow">https://pypi.org/project/text-deidentification</a>

Show HN: Deidentification, Python tool for removing personal info using NLP

no comments

Show HN: Deidentification, Python tool for removing personal info using NLP

no comments