Is it me or did Reddit just win?
I don’t think a full dataset of all posts and comments exists other than the 2018 one, and now that a large chunk of Reddit’s gone dark that’s ensured no one can scrape one together, how is this not all playing right into Reddit’s hands?
The whole reddit (posts and comments separately) from 2005-06 until 2022-12 is on this [1] torrent link, it's very easy to download, extract and use the data [2]. I'm writing my thesis about the connection between the reddit post's type and the comment structure, and I've been working with this data, for a few months, it's amazing.<p>[1] <a href="https://academictorrents.com/details/7c0645c94321311bb05bd879ddee4d0eba08aaee" rel="nofollow noreferrer">https://academictorrents.com/details/7c0645c94321311bb05bd87...</a><p>[2] <a href="https://github.com/Watchful1/PushshiftDumps">https://github.com/Watchful1/PushshiftDumps</a>
It is so interesting to consider that reddit, the unprofitable startup that everyone loves to mock managed to have the most valuable dataset among all social media. I am certain that despite facebook being 10 times its size, their comments are no match for the information contained in reddit's comments. Why did this happen? greed i guess. Popular doesnt mean valuable.<p>In any case i am rooting for Reddit to win big, but i dont see them having a plan. Their website is stuck in '00s norms while the world has moved forward. Now a clique of moderators take over the site, and reddit doesn't seem to do anything about it. So many lost opportunities
You're right. Call me a pessimistic tinfoil hat but there is a good chance that Reddit Inc. is playing both sides here. This has been a contemporary tactic employed by many authoritarians of our times recently, a tactic I'd like to call "giving your enemy that extra rope to hang itself".<p>It's no wonder that public sympathy is strongly shifting towards the side of spez and Reddit Inc. after all the major subs went dark all of a sudden. The concept of "indefinite blackout" was problematic to begin with. Reddit black outs had happened earlier too when net neutrality was in danger or freedom curbing laws were being passed, it used to be just for a day or two to garner attention.<p>The impression netizens are getting right now is that these "rogue mods" have just hijacked the sub-reddits and disappeared, thus bringing the whole conversations and ecosystem to a standstill. How exactly is this perception not working in favor of Reddit and spez? As I said, <i>giving your enemy the extra rope to hang itself!</i>
IIRC, the Archive Team asked for help on /r/DataHoarder and had saved 10 bn posts to later upload them on the Internet Archive ... that might yield something? I personally have never heard of them before, but that doesn't mean anything ...<p><a href="https://wiki.archiveteam.org/index.php/Reddit#Project_details" rel="nofollow noreferrer">https://wiki.archiveteam.org/index.php/Reddit#Project_detail...</a>