TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Botrnot: an R package for detecting Twitter bots

57 pointsby tysonzniabout 7 years ago

19 comments

dexenabout 7 years ago
I was surprised by the number of false positives reported here, went ahead and tested on several Twitter accounts of my friends, both professional and personal. 9 out of 25 tested were classified as &#x27;bot&#x27; with probability &gt; 0.6. Only 11 were classified as &#x27;humans&#x27; with probability &gt; 0.7. And that&#x27;s on 25 accounts of people I know personally.<p>Given the preposterous error rate, I deem there is no actual classification logic in R, and instead it uses the (very fallible) humans to do the actual classification via a Mechanical Turk-style API.<p>Wait, do I need to add &quot;&#x2F;s&quot; to this post, or is it obvious enough?
评论 #16592883 未加载
评论 #16591710 未加载
评论 #16592065 未加载
nlabout 7 years ago
I&#x27;ve done some work in this area - it&#x27;s disappointing how terribly the Twitter product has failed to evolve to take account of bot usage.<p>There are (of course) some useful bots, but lots of incredibly harmful bots, they they should be treated differently to actual humans.<p>But Twitter can&#x27;t ship product, so it&#x27;s not really worth suggesting what they should do.<p>In the mean time, my colleagues and I got a nice WWW18 conference paper about a new <i>unsupervised</i> (!) way of detecting some type of bots on Twitter. Like most things it&#x27;s completely obvious in retrospect...
评论 #16591436 未加载
评论 #16591130 未加载
dsaccoabout 7 years ago
I looked at the GitHub README for the project, which says<p><i>&gt; Uses machine learning to classify Twitter accounts as bots or not bots. The default model is 93.53% accurate when classifying bots and 95.32% accurate when classifying non-bots. The fast model is 91.78% accurate when classifying bots and 92.61% accurate when classifying non-bots.<p>Overall, the default model is correct 93.8% of the time.<p>Overall, the fast model is correct 91.9% of the time.</i><p>How is this accuracy determined? There is no information available explaining how this determination is quantified, nor what the caveats are.
评论 #16590451 未加载
minimaxirabout 7 years ago
Per the README to the corresponding repo:<p>&gt; The default [gradient boosted] model uses both users-level (bio, location, number of followers and friends, etc.) and tweets-level (number of hashtags, mentions, capital letters, etc. in a user&#x27;s most recent 100 tweets) data to estimate the probability that users are bots.<p>Not an exact science, but shows what you can do and deploy quickly with R&#x2F;Shiny.<p>The author’s rtweet package is very good for making quick Twitter data visualizations.
andrew-luckerabout 7 years ago
I tried putting in verified users and they were all &quot;probably bots&quot;. By definition is that not the only type of user publicly acknowledged as &quot;not a bot&quot;?
评论 #16590369 未加载
peatmossabout 7 years ago
While the quality of the model can be debated (I noted lots of false positives too), I do note that it’s kind of cool that we’re all sitting around and poking at an app written in an R web framework.<p>If you haven’t:<p>1. Downloaded RStudio IDE<p>2. Built a hello word Shiny App (better still for a flavor of the thing a hello world app using the shiny dashboard package)<p>3. Deployed your app to shinyapps.io<p>I highly encourage you to do so if for no reason than to see how streamlined RStudio has managed to make web app deployment for people who often don’t have much of a programming background.<p>I’m continually impressed with the work RStudio does, even if I’m a curmudgeon and still write all my code in Emacs instead of their IDE. If RStudio expanded to support Python similarly well, I imagine they could really be the place most data scientists work.
评论 #16597403 未加载
prateek_mirabout 7 years ago
It is classifying me as a bot with 94.6% probability. Does it give too much emphasis on retweets ?
评论 #16590673 未加载
评论 #16590556 未加载
derrasterpunktabout 7 years ago
I recently watched a talk from 34c3 (chaos computer club conference) which were held at the end of last year about Twitter bots, their existence and their detection. The speaker couldn&#x27;t find a lot of bots that were cited in studies and that their methodology were somewhat arbitrary.<p>Definitely worth a watch: <a href="https:&#x2F;&#x2F;media.ccc.de&#x2F;v&#x2F;34c3-9268-social_bots_fake_news_und_filterblasen" rel="nofollow">https:&#x2F;&#x2F;media.ccc.de&#x2F;v&#x2F;34c3-9268-social_bots_fake_news_und_f...</a> (The video is German but there should be a translated version of it on the site)
评论 #16591384 未加载
ehudlaabout 7 years ago
The classification is based on an R package for Generalized Boosted Regression Models[1]. Can anyone knowledgeable opine about this choice?<p>[]1 <a href="https:&#x2F;&#x2F;cran.r-project.org&#x2F;web&#x2F;packages&#x2F;gbm&#x2F;" rel="nofollow">https:&#x2F;&#x2F;cran.r-project.org&#x2F;web&#x2F;packages&#x2F;gbm&#x2F;</a>
评论 #16590978 未加载
betolinkabout 7 years ago
I got .6 probability, that&#x27;s pretty high and last time I checked I could fool the Turing test.
mcintyre1994about 7 years ago
I tried a few things, it seemed to be working well but now I keep getting &quot;An error has occurred. Check your logs or contact the app author for clarification.&quot;
plcancelabout 7 years ago
@jack, 0.693?
tysonzniabout 7 years ago
Developer: Michael W. Kearney<p>Link to github: <a href="https:&#x2F;&#x2F;github.com&#x2F;mkearney&#x2F;botrnot" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;mkearney&#x2F;botrnot</a>
ameliusabout 7 years ago
What use is this really if bot creators can incorporate this tool, and adjust their tweets until they pass the test?
glangdaleabout 7 years ago
Welp, apparently I&#x27;m probably a bot. Time to go into the bathroom and cut my arm to verify...
_susanooabout 7 years ago
It seems even @potus has a probability of .929 of being a bot. Is this fake news?
评论 #16592759 未加载
benliong78about 7 years ago
Kinda wish you named it: &#x27;Robot or not&#x27; <a href="https:&#x2F;&#x2F;www.theincomparable.com&#x2F;robot&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.theincomparable.com&#x2F;robot&#x2F;</a>
drefanzorabout 7 years ago
I think you need to fix your bot algorithm.
opsrollerabout 7 years ago
This is the name of a browser plugin mysef and a few others have been working on. Glad you jacked the name.
评论 #16590592 未加载