I was surprised by the number of false positives reported here, went ahead and tested on several Twitter accounts of my friends, both professional and personal. 9 out of 25 tested were classified as 'bot' with probability > 0.6.
Only 11 were classified as 'humans' with probability > 0.7.
And that's on 25 accounts of people I know personally.<p>Given the preposterous error rate, I deem there is no actual classification logic in R, and instead it uses the (very fallible) humans to do the actual classification via a Mechanical Turk-style API.<p>Wait, do I need to add "/s" to this post, or is it obvious enough?
I've done some work in this area - it's disappointing how terribly the Twitter product has failed to evolve to take account of bot usage.<p>There are (of course) some useful bots, but lots of incredibly harmful bots, they they should be treated differently to actual humans.<p>But Twitter can't ship product, so it's not really worth suggesting what they should do.<p>In the mean time, my colleagues and I got a nice WWW18 conference paper about a new <i>unsupervised</i> (!) way of detecting some type of bots on Twitter. Like most things it's completely obvious in retrospect...
I looked at the GitHub README for the project, which says<p><i>> Uses machine learning to classify Twitter accounts as bots or not bots. The default model is 93.53% accurate when classifying bots and 95.32% accurate when classifying non-bots. The fast model is 91.78% accurate when classifying bots and 92.61% accurate when classifying non-bots.<p>Overall, the default model is correct 93.8% of the time.<p>Overall, the fast model is correct 91.9% of the time.</i><p>How is this accuracy determined? There is no information available explaining how this determination is quantified, nor what the caveats are.
Per the README to the corresponding repo:<p>> The default [gradient boosted] model uses both users-level (bio, location, number of followers and friends, etc.) and tweets-level (number of hashtags, mentions, capital letters, etc. in a user's most recent 100 tweets) data to estimate the probability that users are bots.<p>Not an exact science, but shows what you can do and deploy quickly with R/Shiny.<p>The author’s rtweet package is very good for making quick Twitter data visualizations.
I tried putting in verified users and they were all "probably bots". By definition is that not the only type of user publicly acknowledged as "not a bot"?
While the quality of the model can be debated (I noted lots of false positives too), I do note that it’s kind of cool that we’re all sitting around and poking at an app written in an R web framework.<p>If you haven’t:<p>1. Downloaded RStudio IDE<p>2. Built a hello word Shiny App (better still for a flavor of the thing a hello world app using the shiny dashboard package)<p>3. Deployed your app to shinyapps.io<p>I highly encourage you to do so if for no reason than to see how streamlined RStudio has managed to make web app deployment for people who often don’t have much of a programming background.<p>I’m continually impressed with the work RStudio does, even if I’m a curmudgeon and still write all my code in Emacs instead of their IDE. If RStudio expanded to support Python similarly well, I imagine they could really be the place most data scientists work.
I recently watched a talk from 34c3 (chaos computer club conference) which were held at the end of last year about Twitter bots, their existence and their detection.
The speaker couldn't find a lot of bots that were cited in studies and that their methodology were somewhat arbitrary.<p>Definitely worth a watch:
<a href="https://media.ccc.de/v/34c3-9268-social_bots_fake_news_und_filterblasen" rel="nofollow">https://media.ccc.de/v/34c3-9268-social_bots_fake_news_und_f...</a>
(The video is German but there should be a translated version of it on the site)
The classification is based on an R package for Generalized Boosted Regression Models[1]. Can anyone knowledgeable opine about this choice?<p>[]1 <a href="https://cran.r-project.org/web/packages/gbm/" rel="nofollow">https://cran.r-project.org/web/packages/gbm/</a>
I tried a few things, it seemed to be working well but now I keep getting "An error has occurred. Check your logs or contact the app author for clarification."
Developer: Michael W. Kearney<p>Link to github: <a href="https://github.com/mkearney/botrnot" rel="nofollow">https://github.com/mkearney/botrnot</a>
Kinda wish you named it: 'Robot or not' <a href="https://www.theincomparable.com/robot/" rel="nofollow">https://www.theincomparable.com/robot/</a>