TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Deidentification, Python tool for removing personal info using NLP

1 pointsby jftuga4 months ago
I created a Python library and CLI to automatically identify and remove personal information from text documents using Natural Language Processing. It has been used to de-identify internal employee surveys and patient satisfaction surveys.<p>What my project does:<p>* Identifies and replaces person names using spaCy&#x27;s transformer model<p>* Converts gender-specific pronouns to neutral alternatives<p>* Handles possessives and hyphenated names<p>* Offers HTML output with color-coded replacements<p>___<p>Here&#x27;s a quick example:<p><pre><code> Input: John Smith&#x27;s report was excellent. He clearly understands the topic. Output: [PERSON]&#x27;s report was excellent. HE&#x2F;SHE clearly understands the topic. </code></pre> ___<p>This was a fun project to work on - especially solving the challenge of maintaining correct character positions during replacements. The backwards processing approach was a neat solution to avoid recalculating positions after each replacement.<p>* blog post: <a href="https:&#x2F;&#x2F;gitgist.com&#x2F;posts&#x2F;introducing-deidentification-python-module&#x2F;" rel="nofollow">https:&#x2F;&#x2F;gitgist.com&#x2F;posts&#x2F;introducing-deidentification-pytho...</a><p>* github: <a href="https:&#x2F;&#x2F;github.com&#x2F;jftuga&#x2F;deidentification">https:&#x2F;&#x2F;github.com&#x2F;jftuga&#x2F;deidentification</a><p>* PyPI: <a href="https:&#x2F;&#x2F;pypi.org&#x2F;project&#x2F;text-deidentification" rel="nofollow">https:&#x2F;&#x2F;pypi.org&#x2F;project&#x2F;text-deidentification</a>

no comments

no comments