Perhaps HN readers would appreciate a detailed account of what the NPD torrents contain.<p>The torrent deliver two files like so:<p><pre><code> NPD202401.7z 33,456,912,010 bytes (32GB)
NPD202402.7z 20,548,499,322 bytes (20GB)
</code></pre>
Uncompressing NPD202401.7z results in:<p><pre><code> ssn.txt 176,806,109,779 bytes (165GB)
wc -l ssn.txt ==>> 1,698,302,005 lines
</code></pre>
Uncompressing NPD202402.7z results in:<p><pre><code> ssn2.txt 120,722,361,611 bytes (113GB)
wc -l ssn2.txt ==>> 997,379,508 lines
</code></pre>
This is a total of 1698302005+997379508 = 2,695,681,513 lines.<p>Each line is a comma separated record with these fields:<p>ID,firstname,lastname,middlename,name_suff,dob,address,city,county_name,st,zip,phone1,aka1fullname,aka2fullname,aka3fullname,StartDat,alt1DOB,alt2DOB,alt3DOB,ssn<p>Generally records have ID, firstname, lastname, middlename, address, city, county_name, st, zip, and ssn. Most records do not have the fields for name_suff (name suffix), phone1, aka1fullname, aka2fullname, aka3fullname, StartDat, alt1DOB, alt2DOB, and alt3DOB.<p>There are no emails at all. There is no "@" in the files anywhere. Phone numbers are very rare.<p>I don't know what the ID number at the head of each line represents. I presume it is an internal index used by the organization that compiled the data. The SSN is at the end of each line.<p>The files have U.S. addresses only as far as I can tell. Nothing from Mexico, Canada, or other foreign countries.<p>Many of the lines (records) concern the same person at various addresses. Of 7 random people who I personally know that I checked on, all had entries. There were between 3 and 20 lines (records) for these 7 persons, averaging about 10. They usually differed only in the address field. Going by an estimate of 10 records per person, the 2.6 billion lines represents about 2695681513/10 = 269,568,151 distinct persons in the U.S.<p>The U.S. population is about 337M where 78% is over 18 years of age. In other words, 337000000*0.78 = 262,860,000 Americans are adults. This is pretty close to my estimate of 269,568,151 distinct individuals in the NPD data files.<p>Of the 7 persons I checked on, the names were spelled correctly, although the middle name was sometimes just an initial. I searched each person by multiple methods (address, last name, birth date) so I believe I would have detected names that were spelled slightly wrong.<p>The addresses appeared correct but there was no way to tell which was the current address and the order in which they lived at each address. There is a StartDat field but it was almost never filled in. The latest entry was not always the most current address. In a couple cases, the current address, where the person has been living for several years, was absent.<p>The birth dates were correct in a couple cases, were abbreviated in three cases (that is, instead of showing 19800704, meaning July 4 1980, it showed 19800700, meaning July 1980 without an exact day), and was wrong for one person by a wide margin.<p>All 7 persons I checked had SSN numbers. It was correct for 1 person but I don't know for the other 6. The SSN numbers were consistent for each of the 7 persons I checked on. By this I mean that a person did not have more than 1 SSN number, at least among the 7 persons I checked on.