TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

DataChain: A Pythonic data-frame library for artificial intelligence

2 pointsby mayop1009 months ago

1 comment

dmpetrov9 months ago
Hey! I&#x27;m one of the creators of DataChain.<p>DataChain works on your local machine and manages files in storage (like images and PDFs in S3 or GCP). Users can slice and dice their files using metadata. Example:<p>- Download only files labeled &quot;Cats&quot; instead of the whole dataset. Use json&#x2F;parque to get labels.<p>- Use LLMs to generate metadata. E.g., &quot;Are there more than 3 people in the image?&quot;.<p>- Add custom metadata to create a rich &quot;DataFrame&quot; of your files<p>The API of the data-frame is based on Python (Pydentic) but queries to Pythion objects are transpiled to database (SQLite). Or you can just convert all metadata into Pandas if you prefer.<p>WDYT? I’d love to hear your thoughts!