TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

3T Token Open Corpus for Language Model Pretraining

35 pointsby thatcherthornover 1 year ago

4 comments

version_fiveover 1 year ago
This has some kind of stupid custom license with it that from what I can tell only lets you use models you train on it for "internal use" (or tries to, it's fair use so whatever) . It's getting really shitty to see everyone trying to control how people can use their "contributions" - if it was for commercial reasons I'd understand but it's all this silly "AI harms" garbage. Treat collaborators like adults and let them decide how they want to use ostensibly public domain stuff.
评论 #37184683 未加载
zwapsover 1 year ago
According to the license, AllenAI can just take over (all) the rights and ownership for any derivative works by revoking your usage license, which they can also do at will.<p>Reasonably speaking, nobody can use this dataset for anything of value. I really wonder who comes up with these &quot;open-source&quot; products with such licenses and why they even bother. I guess Marketing?
sunshadowover 1 year ago
Unfortunately the license makes this somewhat useless. Hope they realize that and change it.
ttt3tsover 1 year ago
Dumb license. Useless.