TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Deidentification, Python tool for removing personal info using NLP

1 点作者 jftuga4 个月前
I created a Python library and CLI to automatically identify and remove personal information from text documents using Natural Language Processing. It has been used to de-identify internal employee surveys and patient satisfaction surveys.<p>What my project does:<p>* Identifies and replaces person names using spaCy&#x27;s transformer model<p>* Converts gender-specific pronouns to neutral alternatives<p>* Handles possessives and hyphenated names<p>* Offers HTML output with color-coded replacements<p>___<p>Here&#x27;s a quick example:<p><pre><code> Input: John Smith&#x27;s report was excellent. He clearly understands the topic. Output: [PERSON]&#x27;s report was excellent. HE&#x2F;SHE clearly understands the topic. </code></pre> ___<p>This was a fun project to work on - especially solving the challenge of maintaining correct character positions during replacements. The backwards processing approach was a neat solution to avoid recalculating positions after each replacement.<p>* blog post: <a href="https:&#x2F;&#x2F;gitgist.com&#x2F;posts&#x2F;introducing-deidentification-python-module&#x2F;" rel="nofollow">https:&#x2F;&#x2F;gitgist.com&#x2F;posts&#x2F;introducing-deidentification-pytho...</a><p>* github: <a href="https:&#x2F;&#x2F;github.com&#x2F;jftuga&#x2F;deidentification">https:&#x2F;&#x2F;github.com&#x2F;jftuga&#x2F;deidentification</a><p>* PyPI: <a href="https:&#x2F;&#x2F;pypi.org&#x2F;project&#x2F;text-deidentification" rel="nofollow">https:&#x2F;&#x2F;pypi.org&#x2F;project&#x2F;text-deidentification</a>

暂无评论

暂无评论