TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Bluesky Social Dataset (235M posts from 4M users)

91 点作者 7d7n6 个月前

10 条评论

infotainment6 个月前
I’m glad to see a new platform that isn’t completely locked down, allowing analysis like this.<p>The trend toward everything being a walled garden is unfortunate.
评论 #42262658 未加载
评论 #42262527 未加载
ks20486 个月前
associated paper: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2404.18984" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2404.18984</a>
评论 #42262178 未加载
gusfoo6 个月前
Meanwhile, over at Blueksy.app a few days ago, the users were incensed about a 1M-post data set and hounded the creator in to withdrawing it.<p><a href="https:&#x2F;&#x2F;bsky.app&#x2F;profile&#x2F;danielvanstrien.bsky.social&#x2F;post&#x2F;3lbu6l4fxdc2e" rel="nofollow">https:&#x2F;&#x2F;bsky.app&#x2F;profile&#x2F;danielvanstrien.bsky.social&#x2F;post&#x2F;3l...</a>
评论 #42264239 未加载
skybrian6 个月前
The paper is from the end of April and they say the data was collected in February, March and April. I guess we can talk about it now, though.<p>Due to high growth since then, this is from before most current users joined.
abahlo6 个月前
If you just want to play around with the data, check out the bsky dataset on Axiom <a href="https:&#x2F;&#x2F;play.axiom.co&#x2F;axiom-play-qf1k&#x2F;stream&#x2F;bsky" rel="nofollow">https:&#x2F;&#x2F;play.axiom.co&#x2F;axiom-play-qf1k&#x2F;stream&#x2F;bsky</a> (700M+ events and counting)
zft6 个月前
If you are interested in some real time visualization there are plenty of projects. For example <a href="http:&#x2F;&#x2F;www.graphtracks.com" rel="nofollow">http:&#x2F;&#x2F;www.graphtracks.com</a>. (I&#x27;m the author)
raidicy6 个月前
I have returned back to this website to try and get the files and they have now been put under restrictive access for some reason.
7d7n6 个月前
Pollution of online social spaces caused by rampaging d&#x2F;misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. To address this pressing issue, we present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social.<p>The dataset contains the complete post history of over 4M users (81% of all registered accounts), totaling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions.<p>Since Bluesky allows users to create and bookmark feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their timestamped “like” interactions and time of bookmarking.<p>This dataset allows unprecedented analysis of online behavior and human-machine engagement patterns. Notably, it provides ground-truth data for studying the effects of content exposure and self-selection, and performing content virality and diffusion analysis.
评论 #42262237 未加载
评论 #42262152 未加载
aubanel6 个月前
Please upload it on the Hugging Face Hub!
aussieguy12346 个月前
Sound like this could be used to train an open source LLM.