TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Regex on a File or Stream

1 pointsby buzzdenverabout 1 year ago
I just ran into this seemingly not that hard issue of trying to match a multi-line regex against a 3Gb text file. What is the right tool for this? grep and perl failed running into PCRE limits.

4 comments

jeplerabout 1 year ago
Maybe some other PCRE-compatible implementation offers streaming. For instance, <a href="https:&#x2F;&#x2F;www.intel.com&#x2F;content&#x2F;www&#x2F;us&#x2F;en&#x2F;developer&#x2F;articles&#x2F;technical&#x2F;introduction-to-hyperscan.html" rel="nofollow">https:&#x2F;&#x2F;www.intel.com&#x2F;content&#x2F;www&#x2F;us&#x2F;en&#x2F;developer&#x2F;articles&#x2F;t...</a> says it has this feature, but of course given who it&#x27;s from it may be tied to a single brand of CPU.<p>github seems to be <a href="https:&#x2F;&#x2F;github.com&#x2F;intel&#x2F;hyperscan">https:&#x2F;&#x2F;github.com&#x2F;intel&#x2F;hyperscan</a>
zaktoo2about 1 year ago
Could you paste the regex portion of it please? Possibly some efficiencies to be gained there. You could also split the file into smaller chunks and then check the boundaries of the chunks.
评论 #39622953 未加载
cvalkaabout 1 year ago
<a href="https:&#x2F;&#x2F;github.com&#x2F;VirusTotal&#x2F;yara">https:&#x2F;&#x2F;github.com&#x2F;VirusTotal&#x2F;yara</a>
burntsushiabout 1 year ago
ripgrep should be able to handle it with the -U&#x2F;--multiline flag.