TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Harvard Is Releasing a Free AI Training Dataset

75 点作者 ilamont5 个月前

3 条评论

nadis5 个月前
&quot;Around five times the size of the notorious Books3 dataset that was used to train AI models like Meta’s Llama, the Institutional Data Initiative&#x27;s database spans genres, decades, and languages, with classics from Shakespeare, Charles Dickens, and Dante included alongside obscure Czech math textbooks and Welsh pocket dictionaries. Greg Leppert, executive director of the Institutional Data Initiative, says the project is an attempt to “level the playing field” by giving the general public, including small players in the AI industry and individual researchers, access to the sort of highly-refined and curated content repositories that normally only established tech giants have the resources to assemble. &quot;<p>^ this is pretty cool and interesting. The collaboration they&#x27;re doing with Boston Public Library to make articles similarly accessible also sounds pretty exciting.
评论 #42456753 未加载
morgango5 个月前
<a href="https:&#x2F;&#x2F;archive.is&#x2F;xhJvc" rel="nofollow">https:&#x2F;&#x2F;archive.is&#x2F;xhJvc</a>
asimpleusecase5 个月前
<a href="https:&#x2F;&#x2F;hls.harvard.edu&#x2F;today&#x2F;harvards-library-innovation-lab-launches-initiative-to-use-public-domain-data-to-train-artificial-intelligence&#x2F;" rel="nofollow">https:&#x2F;&#x2F;hls.harvard.edu&#x2F;today&#x2F;harvards-library-innovation-la...</a><p>More color from harvard