TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

DataChain: A Pythonic data-frame library for artificial intelligence

2 点作者 mayop1009 个月前

1 comment

dmpetrov9 个月前
Hey! I&#x27;m one of the creators of DataChain.<p>DataChain works on your local machine and manages files in storage (like images and PDFs in S3 or GCP). Users can slice and dice their files using metadata. Example:<p>- Download only files labeled &quot;Cats&quot; instead of the whole dataset. Use json&#x2F;parque to get labels.<p>- Use LLMs to generate metadata. E.g., &quot;Are there more than 3 people in the image?&quot;.<p>- Add custom metadata to create a rich &quot;DataFrame&quot; of your files<p>The API of the data-frame is based on Python (Pydentic) but queries to Pythion objects are transpiled to database (SQLite). Or you can just convert all metadata into Pandas if you prefer.<p>WDYT? I’d love to hear your thoughts!