TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Bluesky Social Dataset (235M posts from 4M users)

91 pointsby 7d7n6 months ago

10 comments

infotainment6 months ago
I’m glad to see a new platform that isn’t completely locked down, allowing analysis like this.<p>The trend toward everything being a walled garden is unfortunate.
评论 #42262658 未加载
评论 #42262527 未加载
ks20486 months ago
associated paper: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2404.18984" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2404.18984</a>
评论 #42262178 未加载
gusfoo6 months ago
Meanwhile, over at Blueksy.app a few days ago, the users were incensed about a 1M-post data set and hounded the creator in to withdrawing it.<p><a href="https:&#x2F;&#x2F;bsky.app&#x2F;profile&#x2F;danielvanstrien.bsky.social&#x2F;post&#x2F;3lbu6l4fxdc2e" rel="nofollow">https:&#x2F;&#x2F;bsky.app&#x2F;profile&#x2F;danielvanstrien.bsky.social&#x2F;post&#x2F;3l...</a>
评论 #42264239 未加载
skybrian6 months ago
The paper is from the end of April and they say the data was collected in February, March and April. I guess we can talk about it now, though.<p>Due to high growth since then, this is from before most current users joined.
abahlo6 months ago
If you just want to play around with the data, check out the bsky dataset on Axiom <a href="https:&#x2F;&#x2F;play.axiom.co&#x2F;axiom-play-qf1k&#x2F;stream&#x2F;bsky" rel="nofollow">https:&#x2F;&#x2F;play.axiom.co&#x2F;axiom-play-qf1k&#x2F;stream&#x2F;bsky</a> (700M+ events and counting)
zft6 months ago
If you are interested in some real time visualization there are plenty of projects. For example <a href="http:&#x2F;&#x2F;www.graphtracks.com" rel="nofollow">http:&#x2F;&#x2F;www.graphtracks.com</a>. (I&#x27;m the author)
raidicy6 months ago
I have returned back to this website to try and get the files and they have now been put under restrictive access for some reason.
7d7n6 months ago
Pollution of online social spaces caused by rampaging d&#x2F;misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. To address this pressing issue, we present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social.<p>The dataset contains the complete post history of over 4M users (81% of all registered accounts), totaling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions.<p>Since Bluesky allows users to create and bookmark feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their timestamped “like” interactions and time of bookmarking.<p>This dataset allows unprecedented analysis of online behavior and human-machine engagement patterns. Notably, it provides ground-truth data for studying the effects of content exposure and self-selection, and performing content virality and diffusion analysis.
评论 #42262237 未加载
评论 #42262152 未加载
aubanel6 months ago
Please upload it on the Hugging Face Hub!
aussieguy12346 months ago
Sound like this could be used to train an open source LLM.