TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

The Tragedy of Google Books (2017)

503 点作者 lispybanana7 个月前

30 条评论

pvg7 个月前
<a href="https:&#x2F;&#x2F;archive.is&#x2F;rQ7Zb" rel="nofollow">https:&#x2F;&#x2F;archive.is&#x2F;rQ7Zb</a>
评论 #41965532 未加载
philipkglass7 个月前
These Google scans are also available in the HathiTrust [1], an organization built from the big academic libraries that participated in early book digitization efforts. The HathiTrust is better about letting the public read books that have actually fallen into the public domain. I have found many books that are &quot;snippet view&quot; only on Google Books but freely visible on HathiTrust.<p>If you are a student or researcher at one of the participating HathiTrust institutions, you can also get access to scans of books that are still in copyright.<p>The one advantage Google Books still has is that its search tools are much faster and sometimes better, so it can be useful to search for phrases or topics on Google Books and then jump over to HathiTrust to read specific books surfaced by the search.<p>[1] <a href="https:&#x2F;&#x2F;www.hathitrust.org&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.hathitrust.org&#x2F;</a>
评论 #41919844 未加载
评论 #41920385 未加载
评论 #41935822 未加载
yonran7 个月前
&gt; Dan Clancy, the Google engineering lead on the project who helped design the settlement, thinks that it was a particular brand of objector—not Google’s competitors but “sympathetic entities” you’d think would be in favor of it, like library enthusiasts, academic authors, and so on—that ultimately flipped the DOJ.<p>I was at Google in 2009 on a team adjacent to Dan Clancy when he was most excited about the Authors’ Guild negotiations to publish orphan works and create a portal to pay copyright holders who signed up, and I recall that one opponent that he was frustrated at was Brewster Kahle of the Internet Archive, who filed a jealous amicus brief (<a href="https:&#x2F;&#x2F;docs.justia.com&#x2F;cases&#x2F;federal&#x2F;district-courts&#x2F;new-york&#x2F;nysdce&#x2F;1:2005cv08136&#x2F;273913&#x2F;291" rel="nofollow">https:&#x2F;&#x2F;docs.justia.com&#x2F;cases&#x2F;federal&#x2F;district-courts&#x2F;new-yo...</a>) complaining that the Authors’ Guild settlement would not grant him access to publishing orphan works too. In my opinion Kahle was wrong; the existence of one orphan works clearinghouse would have encouraged Congress to grant more libraries access instead of doing nothing which is what actually happened in the 15 year since then. Instead of one company selling out-of-print but in-copyright books, or multiple organizations, no one is allowed to sell them today.<p>Since then, of course, Brewster Kahle launched an e-library of copyrighted books without legal authorization anyway which will probably be the death of the current organization that runs the Internet Archive. Tragic all around.
评论 #41918854 未加载
评论 #41922888 未加载
评论 #41918838 未加载
评论 #41920373 未加载
评论 #41921909 未加载
评论 #41918637 未加载
评论 #41918946 未加载
评论 #41927874 未加载
评论 #41920313 未加载
caseysoftware7 个月前
I worked at the Library of Congress on their Digital Preservation Project, circa 2001-2003. The stated goal was to &quot;digitize all of the Library&#x27;s collections&quot; and while most people think of books, I was in the Motion Picture Broadcast and Recorded Sound Division.<p>In our collection were Thomas Edison&#x27;s first motion pictures, wire spool recordings from reporters at D-Day, and LPs of some of the greatest musicians of all time. And that was just our Division. Others - like American Heritage - had photos from the US Civil War and more.<p>Anyway, while the Rights information is one big, ugly tangled web, the other side is the hardware to read the formats. Much of the media is fragile and&#x2F;or dangerous to use so you have to be exceptionally careful. Then you have to document all the settings you used because imagine that three months from now, you learn some filter you used was wrong or the hardware was misconfigured.. you need to go back and understand what was affected how.<p>Cool space. I wish I&#x27;d worked there longer.
评论 #41918369 未加载
评论 #41919080 未加载
ErikAugust7 个月前
“Page had always wanted to digitize books. Way back in 1996, the student project that eventually became Google—a “crawler” that would ingest documents and rank them for relevance against a user’s query—was actually conceived as part of an effort “to develop the enabling technologies for a single, integrated and universal digital library.” The idea was that in the future, once all books were digitized, you’d be able to map the citations among them, see which books got cited the most, and use that data to give better search results to library patrons. But books still lived mostly on paper. Page and his research partner, Sergey Brin, developed their popularity-contest-by-citation idea using pages from the World Wide Web.“<p>Larry Page had some cool ideas… can’t imagine Books will ever be resurrected, unfortunately.
评论 #41917984 未加载
评论 #41918734 未加载
评论 #41922156 未加载
Zigurd7 个月前
O&#x27;Reilly, for whom I&#x27;ve been a lead author and co-author, did this: <a href="https:&#x2F;&#x2F;www.oreilly.com&#x2F;pub&#x2F;pr&#x2F;1042" rel="nofollow">https:&#x2F;&#x2F;www.oreilly.com&#x2F;pub&#x2F;pr&#x2F;1042</a><p>They call it Founder&#x27;s Copyright. The also use Creative Commons. The goal is to make out of print books available at no cost.
评论 #41917815 未加载
评论 #41919247 未加载
评论 #41925182 未加载
svilen_dobrev7 个月前
This seems to be the fate of knowledge&#x2F;content that stays in institutions which have been built with the idea of collecting it and growing it.. but have turned into walled gardens&#x2F;crypts of sort. Rot&#x2F;Rust and be forgotten.<p>A very cynical and dark view is that the New things&#x2F;people need that oblivion in order to feel great, for not haveing to compare with old great-er ones. Rewriting history as it seems fit the current powers-that-be, is easier this way.<p>Or may be it&#x27;s just collective stupidity? or societal immaturity ?<p>(i am coming from completely different killed project on a different continent, but the idea is the same)
评论 #41923234 未加载
评论 #41918053 未加载
评论 #41921300 未加载
评论 #41918855 未加载
submeta7 个月前
With library genesis, who needs Google Books anymore? I buy books physically to support the author&#x2F;s and download an epub version from said site to my kindle. The physical books I hardly read, they are for my shelf. Although I love the feeling of printed books, but I read in bed, and it‘s easier to hold an ebook. Also I read when I commute. It’s lighter to have my Kindle Oasis with me with tons of books on it.
评论 #41920271 未加载
评论 #41918186 未加载
评论 #41924541 未加载
评论 #41925283 未加载
评论 #41935835 未加载
thayne7 个月前
IMO if a work is out of print (or equivalent depending on the medium) for more than a few years, it should be released into the public domain. Or maybe something like the public domain, but requires attribution.
评论 #41917861 未加载
评论 #41918406 未加载
评论 #41917555 未加载
评论 #41918174 未加载
xipho7 个月前
A huge proportion of this corpus is found in the Hathi Trust (see <a href="https:&#x2F;&#x2F;www.hathitrust.org&#x2F;the-collection&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.hathitrust.org&#x2F;the-collection&#x2F;</a>). We have had a grant to crawl and derive an index on it via their supercomputing resources. I&#x27;m sure they are looking to LLM proposals, though they are exceedingly careful about the copyright issues.<p><a href="https:&#x2F;&#x2F;www.hathitrust.org&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.hathitrust.org&#x2F;</a>
评论 #41917924 未加载
评论 #41917630 未加载
boramalper7 个月前
Of course someone needs to scan&#x2F;digitise those books but for those which already are, there is <i>Anna’s Archive</i>.<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Anna%27s_Archive" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Anna%27s_Archive</a>
评论 #41924085 未加载
theendisney47 个月前
Programmers not law makers really control what goes and doesnt online.<p>Bittorent and ipfs etc are nice but things would be better if there was a large static archive with desktop clients exchanging chunks in a complex modular way.<p>Say: I have pages 1-15 of file 123456, you have page 16 but are looking for page 1 of doc 2345, if i can obtain that page a fast exchange is possible. If not a different module can issue an iou that either means i owe something, you are owed something or both. Other modules could create groups that aim to store part of the archive without duplication amoung members. Spam driven modules could also be interesting.<p>The archive can be organized by how dubious the copyright is so that one can limit participation to 50 or 100+ year old publications and&#x2F;or living or dead authors.<p>Its not unlike living on a far away island with the british empire seeking to control every aspect of your life without sufficient means of force.
carlosjobim7 个月前
For Kagi users, I recommend putting books.google.com as a pinned domain. This way, you&#x27;ll many times be presented with some of the best sources for any search query. Then it&#x27;s a matter of finding the ePub file of that book. To read on MacOS, FBReader is a high quality app.
评论 #41920278 未加载
Animats7 个月前
We need a Copyright Term Reduction Act.<p>It&#x27;s time. 50 years, renewal is possible but expensive.
评论 #41918643 未加载
评论 #41925939 未加载
评论 #41918527 未加载
评论 #41923288 未加载
rekabis7 个月前
Let’s rewrite copyright law:<p>1. The author gets to say, “I produced this”, and to control if it gets published.<p>2. Exclusive copyright for 15 year terms.<p>3. Renewal possible if author still alive. Non-human rights holders (corporations, etc.) limited to 30 years total (one renewal) from date of first publication, regardless of item ownership. Failure to renew automatically opens up the product.<p>4. Existing copyright can be overridden if demand isn’t being adequately serviced (sliding scale, challenger must capture minimum % of existing market demand to prove). Pricing of overriding attempts must be reasonable, only cost of production can be directly paid for, everything else goes into an escrow account until the attempt is concluded. This is where anti-abuse rules for <i>both</i> sides are most extensive.<p>Information and knowledge <i>must</i> be free. Our civilization depends vitally upon that freedom.
senkora7 个月前
I’m sure the lawyers will eventually figure out a way to train an LLM on them.
评论 #41917480 未加载
einpoklum7 个月前
Written from a capitalist perspective, extolling &quot;market forces&quot; and legitimizing corporate and government limitations on copying.<p>&quot;between 1923 and 1963 ... copyrights back then had to be renewed, and often the rightsholder wouldn’t bother filing the paperwork&quot; - oh no, how terrible. How lucky we are that in these modern times one doesn&#x27;t even have to file paperwork in order to prevent you from copying information.<p>and they go on to suck to Google and decry how they didn&#x27;t get to legitimize their control over a large swath of human knowledge and cultural heritage.<p>&quot;It certainly seems unlikely that someone is going to spend political capital—especially today—trying to change the licensing regime for books, let alone old ones.&quot; &lt;- copyright regime, licensing regime - all of this stuff is illegitimate apriori. Poetry, literature, music, software, papers and books - we cannot and must not tolerate restrictions on their dissemination.<p>What arrangements the commercial and governmental entities come to, our &quot;arrangement&quot; should be that everything gets disseminated widely and without restriction, so that curtailment, censorship, commercial control etc. just fail.
shadytrees7 个月前
James Somers writes beautifully; <a href="https:&#x2F;&#x2F;www.newyorker.com&#x2F;contributors&#x2F;james-somers" rel="nofollow">https:&#x2F;&#x2F;www.newyorker.com&#x2F;contributors&#x2F;james-somers</a> has some of his other writing
mcepl7 个月前
&gt; Copyright terms have been radically extended in this country largely to keep pace with Europe, where the standard has long been that copyrights last for the life of the author plus 50 years. But the European idea, “It’s based on natural law as opposed to positive law,” Lateef Mtima, a copyright scholar at Howard University Law School, said. “Their whole thought process is coming out of France and Hugo and those guys that like, you know, ‘My work is my enfant,’” he said, “and the state has absolutely no right to do anything with it—kind of a Lockean point of view.” As the world has flattened, copyright laws have converged, lest one country be at a disadvantage by freeing its intellectual products for exploitation by the others. And so the American idea of using copyright primarily as a vehicle, per the constitution, “to promote the Progress of Science and useful Arts,” not to protect authors, has eroded to the point where today we’ve locked up nearly every book published after 1923.<p>This is disingenuous: the article doesn’t mention that the biggest proponent of the prolonging of the copyright terms were Americans (e.g., Walt Disney Corp and Jack Valenti, see “Mickey Mouse Protection Act” for more) not Europeans.
2OEH8eoCRo07 个月前
The tragedy is that Google is tasked with this at all. It would be cool if public libraries could work together on a massive <i>public</i> digital library. This shouldn&#x27;t be Google&#x27;s responsibility.
评论 #41917820 未加载
评论 #41920460 未加载
评论 #41918503 未加载
mparnisari7 个月前
Would it not be a viable solution to let Google scan and sell books, but force them to give the profit from the sales to the government?
DrNosferatu7 个月前
I never seen an explicit mention if the Google Books corpus was indeed or not used for training LLMs…<p>Anyone knows more about it?
kbbgl877 个月前
&gt; “Somewhere at Google there is a database containing 25 million books and nobody is allowed to read them.”<p>Greeted with a paywall on the source. Hypocracy...
tempfile7 个月前
&gt; what happened with piano rolls, with records, with radio, and with cable—isn’t that copyright holders squash the new technology. Instead, they cut a deal and start making money from it.<p>&gt; “History has shown that time and market forces often provide equilibrium in balancing interests,” Wu writes.<p>It is completely braindead to argue that market forces had anything to do with compulsory licensing. It is a matter determined by courts in the public interest.
anoncow7 个月前
Sad and criminal.
afh17 个月前
Ironically behind a paywall (and below a political ad)
geniium7 个月前
TL;DR: bye bye Google
datadrivenangel7 个月前
Thanks Paul!
评论 #41917501 未加载
评论 #41965529 未加载
andrewstuart7 个月前
Google must be tempted to put them in an LLM.
评论 #41917597 未加载
renewiltord7 个月前
Good. It’s important that free access not be permitted. We don’t know what personal data might be contained within. We should only allow those works after a human (appropriately certified) has verified that no personal data exists within.<p>If it exists within the book must be destroyed in its entirety. Too many works of so-called scholarship have relied on the personal letters of dead people.<p>We should not reward grave robbing. The most important thing is the personal data. We must protect the personal data.
评论 #41935821 未加载