Botrnot: an R package for detecting Twitter bots

57 pointsby tysonzniabout 7 years ago

19 comments

dexenabout 7 years ago

I was surprised by the number of false positives reported here, went ahead and tested on several Twitter accounts of my friends, both professional and personal. 9 out of 25 tested were classified as 'bot' with probability > 0.6. Only 11 were classified as 'humans' with probability > 0.7. And that's on 25 accounts of people I know personally.Given the preposterous error rate, I deem there is no actual classification logic in R, and instead it uses the (very fallible) humans to do the actual classification via a Mechanical Turk-style API.Wait, do I need to add "/s" to this post, or is it obvious enough?

评论 #16592883 未加载

评论 #16591710 未加载

评论 #16592065 未加载

nlabout 7 years ago

I've done some work in this area - it's disappointing how terribly the Twitter product has failed to evolve to take account of bot usage.There are (of course) some useful bots, but lots of incredibly harmful bots, they they should be treated differently to actual humans.But Twitter can't ship product, so it's not really worth suggesting what they should do.In the mean time, my colleagues and I got a nice WWW18 conference paper about a new unsupervised (!) way of detecting some type of bots on Twitter. Like most things it's completely obvious in retrospect...

评论 #16591436 未加载

评论 #16591130 未加载

dsaccoabout 7 years ago

I looked at the GitHub README for the project, which says> Uses machine learning to classify Twitter accounts as bots or not bots. The default model is 93.53% accurate when classifying bots and 95.32% accurate when classifying non-bots. The fast model is 91.78% accurate when classifying bots and 92.61% accurate when classifying non-bots.Overall, the default model is correct 93.8% of the time.Overall, the fast model is correct 91.9% of the time.How is this accuracy determined? There is no information available explaining how this determination is quantified, nor what the caveats are.

评论 #16590451 未加载

minimaxirabout 7 years ago

Per the README to the corresponding repo:> The default [gradient boosted] model uses both users-level (bio, location, number of followers and friends, etc.) and tweets-level (number of hashtags, mentions, capital letters, etc. in a user's most recent 100 tweets) data to estimate the probability that users are bots.Not an exact science, but shows what you can do and deploy quickly with R/Shiny.The author’s rtweet package is very good for making quick Twitter data visualizations.

andrew-luckerabout 7 years ago

I tried putting in verified users and they were all "probably bots". By definition is that not the only type of user publicly acknowledged as "not a bot"?

评论 #16590369 未加载

peatmossabout 7 years ago

While the quality of the model can be debated (I noted lots of false positives too), I do note that it’s kind of cool that we’re all sitting around and poking at an app written in an R web framework.If you haven’t:1. Downloaded RStudio IDE2. Built a hello word Shiny App (better still for a flavor of the thing a hello world app using the shiny dashboard package)3. Deployed your app to shinyapps.ioI highly encourage you to do so if for no reason than to see how streamlined RStudio has managed to make web app deployment for people who often don’t have much of a programming background.I’m continually impressed with the work RStudio does, even if I’m a curmudgeon and still write all my code in Emacs instead of their IDE. If RStudio expanded to support Python similarly well, I imagine they could really be the place most data scientists work.

评论 #16597403 未加载

prateek_mirabout 7 years ago

It is classifying me as a bot with 94.6% probability. Does it give too much emphasis on retweets ?

评论 #16590673 未加载

评论 #16590556 未加载

derrasterpunktabout 7 years ago

I recently watched a talk from 34c3 (chaos computer club conference) which were held at the end of last year about Twitter bots, their existence and their detection. The speaker couldn't find a lot of bots that were cited in studies and that their methodology were somewhat arbitrary.Definitely worth a watch: <a href="https://media.ccc.de/v/34c3-9268-social_bots_fake_news_und_filterblasen" rel="nofollow">https://media.ccc.de/v/34c3-9268-social_bots_fake_news_und_f...</a> (The video is German but there should be a translated version of it on the site)

评论 #16591384 未加载

ehudlaabout 7 years ago

The classification is based on an R package for Generalized Boosted Regression Models[1]. Can anyone knowledgeable opine about this choice?[]1 <a href="https://cran.r-project.org/web/packages/gbm/" rel="nofollow">https://cran.r-project.org/web/packages/gbm/</a>

评论 #16590978 未加载

betolinkabout 7 years ago

I got .6 probability, that's pretty high and last time I checked I could fool the Turing test.

mcintyre1994about 7 years ago

I tried a few things, it seemed to be working well but now I keep getting "An error has occurred. Check your logs or contact the app author for clarification."

plcancelabout 7 years ago

@jack, 0.693?

tysonzniabout 7 years ago

Developer: Michael W. KearneyLink to github: <a href="https://github.com/mkearney/botrnot" rel="nofollow">https://github.com/mkearney/botrnot</a>

ameliusabout 7 years ago

What use is this really if bot creators can incorporate this tool, and adjust their tweets until they pass the test?

glangdaleabout 7 years ago

Welp, apparently I'm probably a bot. Time to go into the bathroom and cut my arm to verify...

_susanooabout 7 years ago

It seems even @potus has a probability of .929 of being a bot. Is this fake news?

评论 #16592759 未加载

benliong78about 7 years ago

Kinda wish you named it: 'Robot or not' <a href="https://www.theincomparable.com/robot/" rel="nofollow">https://www.theincomparable.com/robot/</a>

drefanzorabout 7 years ago

I think you need to fix your bot algorithm.

opsrollerabout 7 years ago

This is the name of a browser plugin mysef and a few others have been working on. Glad you jacked the name.

评论 #16590592 未加载