The great debate over what constitutes a suspicious Twitter account and how to tell if you are following any of them.
About a month ago I wrote a blog post describing a pet project of mine mapping networks of automated Twitter accounts. I called them bots in that post, but that word created some understandable confusion so for the follow up I’m going to be more specific.
The assertion that popular online movements may attract bad actors with a variety of motives — from tacky self-promotion to active disinformation — and that otherwise innocent sounded activities like autofollowing might pull these suspicious accounts into a user’s orbit turned out to be much more controversial than I anticipated. For me it was obvious: anything popular will have at least a few people looking to hijack it. Hopefully unsuccessfully.
But for others it was tantamount to calling their whole movement illegitimate. “Release the ids of the bot accounts,” they said. “So that we can get them removed.”
I was uncomfortable with doing that for a couple of different reasons. First because, the whole point of the original post was that trying to hunt accounts that are suspicious had taught me how difficult it is to define suspicious with any degree of accuracy. There were lots of gray areas. There were plenty of accounts that used the level of automation I was looking for but once looked at by a human eye, didn’t seem problematic (just over enthusiastic). I didn’t want to release a list of the accounts I thought were suspicious to people whose intentions were not known. The risk an innocent person getting swept up in a wave of internet abuse seemed too great.
The second reason was that at least some of the people requesting this information only wanted to use it to draw me into un-winnable arguments about my methodology. Even though I had configured the bot hunter with standards developed by full time researchers and even though I was only using the hunter to find accounts worth studying (not report them to Twitter or try to get them banned). Truth be told I did not trust a group of people who insisted that every single person on their high profile trending hashtag was 100% legitimate and I was not interested in continuing the dialogue with them.
But… on the other hand, I could also understand the anxiety around the suggestion that some members of a beloved community might be hijacking the cause and that you might be giving them legitimacy by following them.
So I thought I could take a piece of the bot hunter and release it as an open source script that people could use to see if their account or the accounts of their followers would look suspicious to a bot hunter.
Then I thought, “Oh hey, why don’t I let people define what the script should consider suspicious?” That way there would be no complaints about my criteria, because the user could configure the hunter to weight certain characteristics in a way that made sense to them. And eventually the script became a command line tool because I’ve been writing a lot of Go code lately and wanted to try that out….
Profiles of Suspicious Accounts
Part of the confusion around suspicious accounts is that there are two completely contradictory models of what makes an account suspicious.
The first is based on the concept of fake followers. These accounts hardly post, have a tiny number of friends (if any) and may not even bother to change the default profile picture. Fake followers are typically a commodity. They are bought and sold to pump up influence.
The second model is hashtag hijackers. These accounts are suspiciously automated, they post a hundred times a day. There’s very little original content in their timelines. Most of their posts are retweets (point of clarification here from my first blog post: retweets with comments are not treated by Twitter’s API as retweets. So if you’re adding commentary to your retweets the bot hunter will not count that as a retweet.) The hashtag hijackers attempt to infiltrate existing communities and shift their opinions. The agenda isn’t always so glamorous. Hashtag hijackers try to sell self published books just as they try to radicalize political groups.
Most of the bot hunters online look for the first type, I was way more interested in the second type. I wanted whatever I open sourced to allow the user to decide which criteria to use.
Why does it matter if you follow them (or they follow you)?
A funny thing happened after my last post on this topic. My twitter timeline was flooded with posts from the community I had outraged for months. These were not tweets “@”-ing me, not replies, not even retweets or likes from my friends. These were posts being recommended to me by Twitter’s algorithm solely because they were tweeted by people my friends followed.
Up until that point I had not realized that Twitter did that. When something by a person unknown to me appeared on my timeline I assumed it was because a friend had shared it or liked it. Content added via this friends discovery looks virtually the same, only the tiny gray text on top is different.
The implications of this are pretty significant. It means content can spread via hashtag hijacking even if the account attempting to hijack would be unlikely to fool anyone who looked at it directly. Hashtag hijackers jut need to build a network of legitimate followers in order to get the benefits of Twitter’s recommendation algorithms. (For future research I’d love to see how big and tightly connected the network has to be before we see it picked up in this way).
Today I’m opening up a tool I’m calling Netback to alpha testing. Netback looks at accounts connected to yours on Twitter — either followers or friends — and compares them to a profile you’ve defined outlining what you think a suspicious account looks like. It also lets you weigh each part of the profile so that certain characteristics have greater influence than others. You control everything about what Netback thinks is a potential bad account: what it’s looking for, how much emphasis it puts on what it finds, and the minimum score an account needs to be deemed potentially “suspicious.”
The list of possible criteria you can build a profile around are:
- Posts per Day: calculated as total posts over days the account has existed.
- Number of Followers: with direction, meaning you can tell netback that anything over 100,000 followers is suspicious or anything under 5 followers is suspicious.
- Percentage of Unoriginal Posts: Retweets for now because it’s easier, although a more accurate measure would also consider tweets with identical content to other users posted at virtually the same time …. a feature for the next version. Easy to implement but spends the Twitter rate limit fast.
- Profile Pic: Do they have one?
- Low Posts: Are their total number of posts too low?
Taken separately any one of these characteristics can be found on legitimate accounts, that’s why an important feature of the Netback profile is the weighing. Each characteristic in a profile is weighed from 0 to 3. If you give something a weight of 0 Netback will not consider it at all. If you give something a weight of 3 Netback will treat that characteristic as more important in determining whether the account is suspicious.
There were a few criteria I experimented with using but ultimately decided against (for right now anyway). The most significant one was the screen name. Having a screen name substantially different from your display name or a screen name including a random substring of letters and numbers is a strong indicator that the account isn’t a real person’s. But I wasn’t really satisfied with the results I was getting from doing that. Something for the next release I suppose.
The configuration file that controls the profile is just YAML. Netback also has a wizard to walk you through configuring both Twitter’s API to access you account (it will not be able to post or edit your info, just access publicly accessible information about your account like your friends list) and setting up a profile. But that’s a pretty long list of possible options so I’ve also included some short cuts based on the two general types of suspicious accounts. Using the
--high-activity flags with the
netback profile command will default the weigh of some of those characteristics to 0.
Grooming Your Web Presence
Netback only looks at accounts that are connected to your own. It’s intended to help you find accounts that you are following or are following you that might have malicious intentions, and by doing so empower you to remove them from your orbit. But it’s also an interesting experiment in how you would identify suspicious behavior -vs- the reality of who you consume content from on the internet.
And of course, it’s open source so please send PRs. This is my first command line program in Golang so I look forward to feedback :D