TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

How to process 100gb tsv and XML files?

2 pointsby anindha9 months ago
I am trying to parse a music data file that is close to 100gb. What app or programming language is best for handling a file like this?<p>Thanks!

6 comments

FlyingAvatar9 months ago
It really depends on what you need to do with the data, but in most cases Python could do this pretty easily with csv.reader (with a \t delimiter for TSV) or xml.etree.ElementTree.iterparse (for XML) in streaming fashion such that you&#x27;re not loading the whole file at once.
pradeepchhetri9 months ago
You can leverage ClickHouse to process your music data. ClickHouse supports both TSV[0] and XML[1] data formats.<p>[0] <a href="https:&#x2F;&#x2F;clickhouse.com&#x2F;docs&#x2F;en&#x2F;interfaces&#x2F;formats#tabseparated" rel="nofollow">https:&#x2F;&#x2F;clickhouse.com&#x2F;docs&#x2F;en&#x2F;interfaces&#x2F;formats#tabseparat...</a><p>[1] <a href="https:&#x2F;&#x2F;clickhouse.com&#x2F;docs&#x2F;en&#x2F;interfaces&#x2F;formats#xml" rel="nofollow">https:&#x2F;&#x2F;clickhouse.com&#x2F;docs&#x2F;en&#x2F;interfaces&#x2F;formats#xml</a>
mobilio9 months ago
Great solution is DuckDB: <a href="https:&#x2F;&#x2F;duckdb.org&#x2F;docs&#x2F;data&#x2F;csv&#x2F;overview.html" rel="nofollow">https:&#x2F;&#x2F;duckdb.org&#x2F;docs&#x2F;data&#x2F;csv&#x2F;overview.html</a>
datadrivenangel9 months ago
What kind of single music data file is 100gb?<p>Also how is it structured? If it&#x27;s actually a tab separated value file, consider using something like polars or DuckDB?
anindha9 months ago
I found this <a href="https:&#x2F;&#x2F;klogg.filimonov.dev" rel="nofollow">https:&#x2F;&#x2F;klogg.filimonov.dev</a>
abdusco9 months ago
For TSV, you might wanna consider importing it into a Sqlite database, then querying it however you please.<p><a href="https:&#x2F;&#x2F;stackoverflow.com&#x2F;a&#x2F;35454070&#x2F;5298150" rel="nofollow">https:&#x2F;&#x2F;stackoverflow.com&#x2F;a&#x2F;35454070&#x2F;5298150</a><p>You can also use datasette &amp; sqlite utils for it<p><a href="https:&#x2F;&#x2F;sqlite-utils.datasette.io&#x2F;en&#x2F;stable&#x2F;cli.html#inserting-csv-or-tsv-data" rel="nofollow">https:&#x2F;&#x2F;sqlite-utils.datasette.io&#x2F;en&#x2F;stable&#x2F;cli.html#inserti...</a>