TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Torching the Modern-Day Library of Alexandria: The Tragedy of Google Books

310 点作者 jsomers大约 8 年前

15 条评论

gridit大约 8 年前
I have been going down the rabbit hole of copyright, fair use, and the Google Books Settlement recently. This article is a great summary including a lot of the peripheral issues, but the &quot;2003 law review article&quot; linked in TFA is nigh unreadable to me, compared to the actual legal opinions and briefs[0].<p>They are a couple of fascinating documents. The Authors Guild seems gobsmacked by the final ruling, and so am I. Perhaps the SCOTUS was correct to turn down hearing the case, if only to let the issue settle a little more, but it really feels like it&#x27;s likely to be overturned in the near future.<p>There are some interesting tidbits in the opinions: 1) In the definitive ruling, the judge decides that the harm done to the market for the books is negligible, or overcome by the transformative &quot;purpose&quot; of the the usage (&quot;purpose&quot; is significant because most examples of fair use include some type of new creative &quot;expression&quot;). This is surprising to me. 2) Google Books is ruled fair use in part because the book descriptions (and snippets?) are metadata <i>describing</i> the books, information that should not be controlled by the authors.<p>[0] <a href="http:&#x2F;&#x2F;www.scotusblog.com&#x2F;case-files&#x2F;cases&#x2F;authors-guild-v-google-inc&#x2F;" rel="nofollow">http:&#x2F;&#x2F;www.scotusblog.com&#x2F;case-files&#x2F;cases&#x2F;authors-guild-v-g...</a>
评论 #14173327 未加载
评论 #14175432 未加载
评论 #14173223 未加载
ghaff大约 8 年前
One of the interesting tidbits in the article is the discussion about the length of copyright terms. The common wisdom is that the current (too long IMO) terms are the result of lobbying by Disney and other media companies.<p>The article goes into how, in fact, this really came out of Europe and a fundamentally different perspective on the purpose of copyright than the US Constitution. Wikipedia also has what seems to be a pretty good discussion.[1]<p>So when people say that current copyright law goes way beyond &quot;promote the progress of science and useful arts&quot; they&#x27;re absolutely right. But copyright law in continental Europe was much more focused on protecting the rights of authors.<p>[1] <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;History_of_copyright_law" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;History_of_copyright_law</a>
评论 #14176181 未加载
评论 #14174437 未加载
jkn大约 8 年前
It&#x27;s too bad the vision depicted at the beginning of the article (full texts potentially available in all libraries), didn&#x27;t come true. But I feel that the public did get the most important benefit from the project: the ability to search these books. I&#x27;ve been researching a history of science subject recently and it&#x27;s amazing the amount of information I could get from Google Books and nowhere else online. And where the snippets are not enough, I have the book title and author name, so I know where to look for the information in print.
评论 #14174148 未加载
评论 #14174513 未加载
hackuser大约 8 年前
Imagine an intellectually curious but poor high schooler: They can&#x27;t afford to buy journal articles and books; they have almost no option to access serious, quality information. How much potential is lost to this travesty?<p>We&#x27;ve fallen far, far short of the potential and dream of the Internet and the democratization of knowledge, and the state of things has become a norm; few even notice it or realize what they are missing.<p>The truly valuable knowledge, to a great extent, still is inaccessible to the vast majority of the world. It is in books and academic journals. As a simple example beyond Google Books, I was thinking the other day that Safari Books by itself contains much more valuable knowledge (and far less misinformation) on many technical issues than the rest of the Internet; I learn more about some topics in a few hours on Safari Books than in a year on the Internet.<p>Technically, books and journals easily could be made universally accessible, creating an explosion of knowledge and all the things knowledge enables and motivates - the Enlightenment, science, technology, democracy, liberty, prosperity, most of modern civilization, etc. Instead of being well-informed, most of humanity is left with the dregs, and instead of the Internet providing an explosion of knowledge it has created a plague of misinformation and propaganda. IMHO the lack of high quality knowledge also robs the public of the ability to discriminate between good and bad information: Most lack a model of what quality knowledge is, of even the questions to ask (something encountered frequently in serious scholarship). Few even realize the vast gulf between the quality of generally available information and what is in the books and journals. (I&#x27;ll add that the demise of bookstores means few even see or are aware that the books exist.) And even if they know, it&#x27;s inaccessible.<p>Instead of embracing a technological revolution in the distribution of information - a turning point in the history of humanity - we have brought forward the model used for the old technology, with distribution as controlled and limited as the old medium of paper. For the most part, it seems like the same few people have the quality information, the professional scholars. Let&#x27;s not forget and give up; it&#x27;s too important.
评论 #14175588 未加载
评论 #14176811 未加载
评论 #14174494 未加载
评论 #14174144 未加载
userbinator大约 8 年前
<i>In August 2010, Google put out a blog post announcing that there were 129,864,880 books in the world.</i><p>That number actually sounds surprisingly low. In contrast, I wonder how many the underground &quot;bookz&quot; scene have scanned so far. It&#x27;s hard to find exact numbers, but from what I could find, LibGen contains approximately 3M books, so if Google is accurate, that&#x27;s ~2.3% of all books ever published. No doubt there are other sites I&#x27;m unaware of, probably in other languages, which have also accumulated massive collections of ebooks; but the fact that there exist people who have, for free and on their own time and at risk of being sued for copyright infringement, voluntarily scanned and shared over 2.3% of all the world&#x27;s books is somewhat amazing.
评论 #14173555 未加载
评论 #14174067 未加载
评论 #14173625 未加载
评论 #14174127 未加载
评论 #14173647 未加载
zmmmmm大约 8 年前
&gt; Many of the objectors indeed thought that there would be some other way to get to the same outcome<p>I really feel like Google is a victim of their own engineering brilliance sometimes: the objectors really thought that because Google made this look easy, that it <i>was</i> easy. They figured if one company could just casually decide to do this, that they could reliably expect that someone else or maybe government or another legal avenue will come along. The reality of course, is that Google is special; nobody will do it now and even Google is losing its &quot;specialness&quot;.<p>And further, because Google appeared to be doing it so easily, they all thought that Google profiting from it in some way was unfair. They didn&#x27;t see it as reasonable that Google should be rewarded for the genuine investment of labor and intellectual property involved in pulling this off, precisely because Google didn&#x27;t give the appearance that it was hard. If Google had given more of an appearance of struggling to achieve it - I&#x27;d bet the authors would have suddenly appreciated what Google was doing more and probably accepted the idea that it was fair for Google to profit from it in some way.
评论 #14181222 未加载
评论 #14176086 未加载
pmoriarty大约 8 年前
<i>&quot;...here we’ve done the work to make it real and we were about to give it to the world and now, instead, it’s 50 or 60 petabytes on disk, and the only people who can see it are half a dozen engineers on the project who happen to have access because they’re the ones responsible for locking it up.</i><p><i>&quot;I asked someone who used to have that job, what would it take to make the books viewable in full to everybody? I wanted to know how hard it would have been to unlock them. What’s standing between us and a digital public library of 25 million volumes?</i><p><i>&quot;You’d get in a lot of trouble, they said, but all you’d have to do, more or less, is write a single database query. You’d flip some access control bits from off to on. It might take a few minutes for the command to propagate.&quot;</i><p>Now this would be an interesting leak to Wikileaks.
评论 #14173469 未加载
timonovici大约 8 年前
Couldn&#x27;t they have proposed a neutral party that would store and manage all the books? Just like the Books Rights Registry was going to handle most of the money. I suppose that Google didn&#x27;t expect all that backlash. And, now that I think about, that was out of the scope of the lawsuit as well... In the words of an American president: &quot;Sad!&quot;
tim333大约 8 年前
Maybe Google could get around some of the issues by spinning the thing off as a non profit? It could always owe Google a few million for what they&#x27;d spent so far.
评论 #14176349 未加载
cryptarch大约 8 年前
So it&#x27;s between 50 and 60 petabytes of data?<p>I&#x27;ve been wondering how it would be possible for a disparate group of tech-oriented people to make a collection like that. It would only take a 1000 people with 6 terabytes of storage, which doesn&#x27;t sound impossible to me.<p>The main issues I see are:<p>a) How to share access to the data without exposing yourself?<p>b) How to make the data discoverable and searchable?<p>c) How do you ascertain survival of the data?<p>and optionally: d) How to deal with the freeloader problem?
评论 #14174804 未加载
评论 #14176715 未加载
评论 #14175681 未加载
laughfactory大约 8 年前
Somehow all 25 million books need to be freed. It seems like it would be a great thing for society if this somehow just ended up online.<p>It&#x27;s the orphan works that need to be freed the most. Many good books have been orphaned and will never be reprinted or digitized because the initial publisher is gone, author is hard to track down, etc.
idiot74大约 8 年前
Well, google have an amazing resource on their hands. Data mining, machine learning, etc.
sgt101大约 8 年前
Did Google offer to scan and release as creative commons at any point? Seems like the least evil option to me.
评论 #14175616 未加载
rolleicord大约 8 年前
That&#x27;s a heck of a lot of training data :O
kartan大约 8 年前
Copyright is broken.<p>* <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=tk862BbjWx4" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=tk862BbjWx4</a>