In the USA, public access to federal court records is largely behind a paywall known as PACER. In 2009, Aaron Swartz downloaded somewhere between 1% and 25% of those records (~750GB) [1][2].<p>Where are those documents?<p>[1] https://en.wikipedia.org/wiki/Aaron_Swartz#PACER
[2] https://www.aaronswartzday.org/ny-times-pacer-project/
I don't remember whether Aaron's downloads were added to this database or not (which means this isn't technically an answer to your question), but you can access a lot of PACER documents through<p><a href="https://www.courtlistener.com/recap/" rel="nofollow">https://www.courtlistener.com/recap/</a><p><a href="https://free.law/recap" rel="nofollow">https://free.law/recap</a>
I don't have a source for this, but I think I remember reading a long time ago that Swartz also downloaded and released a significant amount of WestLaw legal documents.<p>A few years ago I was looking for state law databases and came across a blog post by a professor (at Stanford I believe) discussing S3 buckets with state law databases. I couldn't figure out how to access it, so I emailed the professor and he gave me a link to a site called Public Resource Law or Free Public law or something ( domain was public.law.resource or resource.law or something like that). Anyway it had huge zip files (< 100 GB) for all federal law and all law for all 50 states (opinions, statutes, rules, regulations, jury instructions etc). I played around with the data and found WestLaw editors notes in several of the documents, making me think this was part of Swartz's WestLaw data dump. The data dump was taken down roughly a year after I found it.<p>I think this data dump is also what courtlistener.com is built on, because courtlistener.com popped up soon after the data dump disappeared from that site.