TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Fast HTML Sanitization in Python

1 pointsby ColdHeatover 4 years ago

1 comment

ColdHeatover 4 years ago
Author here.<p>Recently I was looking for a way to sanitize user generated HTML of malicious things like JavaScript.<p>Solutions like bleach, html_sanitizer, and lxml&#x27;s Cleaner all work but I found that their performance on complicated HTML snippets were lacking because they needed to rely on html5lib for parsing HTML5. And completely normal content would get mangled without using html5lib.<p>I ended up writing these Python bindings to the bluemonday library. It seems to perform much better than existing Python solutions for the same problem[2]. I suspect because more of the work can be done in native code instead of having to pass an XML tree around.<p>Hoping that this is useful to someone else but also looking for any feedback. Especially about how the bindings were written.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;microcosm-cc&#x2F;bluemonday" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;microcosm-cc&#x2F;bluemonday</a><p>[2] <a href="https:&#x2F;&#x2F;github.com&#x2F;ColdHeat&#x2F;pybluemonday#performance" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;ColdHeat&#x2F;pybluemonday#performance</a>