The Twit Cleaner

(notes on behavioral categorization of Twitter accounts)

I don’t follow a lot of people on Twitter, but I still sometimes have trouble deciding whether the accounts I’m following are worth it. Folks with much longer follow lists presumably have even harder going.

Enter The Twit Cleaner, a (sadly, as of late 2013, defunct) service that scans your follow list and automatically categorizes the behavior of everyone on it. They have some straightforward heuristics for deciding whether someone is worth following, mostly documented in their FAQ:

Q. How are the (potential) bad guys broken down?

A. The possible categories are:
Dodgy - spam phrases, @ spamming, duplicate links etc
Absent - No updates in a month, or fewer than 10 tweets.
Repetitive - High numbers of duplicate tweets or links
Flooding - So high volume you can’t see anyone else
Non-Responsive - No interaction & those that follow back < 10%
Little New Content - Retweeting lots or just posting quotes

This is generally a good scheme, but its focus on conversational use of Twitter means that it misidentifies a few types of legitimate account as unsavory. I think a few special case categories would go a long way to making the service’s advice more useful.

Announcement channels

These are the Twitter equivalent of a news ticker—they broadcast announcements related to something, but they don’t converse with people (as a general rule). The Cleaner dings them as dodgy behavior: tweeting the same links all the time and/or not interactional: hardly follow anyone. Examples include @NBCOlympics, @CDCemergency, @asym, @Astro_Soichi, and (ironically) @TwitCleaner itself (the problem here appears to be public @somebody, your report is ready at directed tweets when direct messages fail).

These can probably be machine-identified as extreme outliers in follower-to-followed ratio. @asym and @Astro_Soichi don’t follow anyone; @NBCOlympics and @CDCemergency follow less than 0.1% of their follower numbers. @TwitCleaner likes to follow users of the service, though; maybe they should just whitelist themselves? Also, if Twitter-verified users are not already whitelisted (I wasn’t able to tell from my own report), perhaps they should be.

Lurkers

Lurkers are the opposite of announcement channels: they just read Twitter, they never post anything. Lurking is a time-honored tradition on the Internet and people shouldn’t be penalized for it. I have several lurkers on my follow list just on the off chance that they might start posting in the future.

Accounts that have never posted at all should be distinguished from accounts that post rarely. (The latter are often spammers. Lately Twitter itself has gotten a lot better about finding and banning spammers, but they still turn up now and then.)

Fictional character accounts

There are any number of fictional characters who regularly use Twitter—that is, their authors write and post tweets under their names, usually to provide a bonus story line, or to implement the fourth wall mail slot. Examples include @Othar of Girl Genius and the entire cast (caution: mildly NSFW; @pintsize0101 consistently links to egregiously NSFW images of the where’s my brain bleach variety) of Questionable Content. Fictional characters may absent themselves for long periods because the bonus story line is on hold (Othar recently didn’t post anything for four months but is now back) and might not follow anyone but other characters from the same fictional world (the QC cast does this); both things get them unfairly dinged by the Cleaner.

It probably isn’t possible to identify fictional accounts in a mechanical way. However, you could pick out cliques in the follow graph, sets of accounts that are followed by many but that follow no one but each other, as deserving human attention. If Twitter implemented some sort of account-labeling scheme that would let the people behind the curtain mark accounts as fictional characters, that would be awesome.

Responses to “The Twit Cleaner”

  1. Si Dawson

    Some very interesting points you raise here.

    First - yes, us appearing on the reports is just a result of a slight hiccup with our recent name change from @TheTwitCleaner to @TwitCleaner. It’s fixed now, so won’t happen any more.

    You’ve identified a core limitation of the service - namely, that it’s aimed at Twitter as a conversational medium. Thus, any account not fitting those aims (eg the announcement channels you list) won’t fit into that behaviour pattern & will be flagged.

    The real difficulty is - there’s no way to reliably identify Accounts that don’t interact, except these accounts that I like. As a simple example - you like @NBCOlympics, even though they’re using Twitter solely as a broadcast medium (like a newspaper), personally, if I want a push medium, I’ll use RSS, read a blog, or yes, a newspaper. I use Twitter to interact with people. So, who’s right? Well, neither of us. We just use Twitter in different ways. The only solution is to flag the behaviour types, & give each user the choice.

    There’s also a lot of subtlety that comes back & bites you once you start fine-tuning these algorithms - things that aren’t obvious when looking at a small handful of accounts, but only when looking at millions. Eg - the suggestion to ignore those accounts with extreme following/follower ratios. The catch is - they’re appearing in a category specifically FOR accounts with extreme ratios. If you remove them left, there’s no category left. It would also be very easy for a spammer to game the system by simply unfollowing everyone, voila their ratio goes up & they’re instantly whitelisted.

    Ditto those accounts that only post links, or the same link, etc.

    It is useful information - particularly when people use auto-follow scripts, they can end up following all sorts of random accounts, many of which will not be useful to their aims - thus it should be on the report. Of course, it’s up to each individual user to say (as you have) You know what? I don’t care about people that only post links/post dupes/etc

    At that point, the best solution is simply to select-to-save the entire category.

    There’s also another option - which is simply to have a report that (for you) doesn’t have that option displayed at all. For people that follow a lot of news sources, & rss feeds etc, this would make the most amount of sense.

    People appearing on the report is orthogonal to whether they’re verified. Verification just says that the person is who they say they are. That has nothing to do with whether they’re spouting rubbish or not. Ideally, of course, they’re not, but if they flag any categories, they should definitely appear.

    I do have a small exception in there - that if someone is listed as a celeb (low following/follower ratio), but they follow YOU, they won’t appear on your report. Why? Coz who cares if they’re snobbish to everyone else if they’re friendly with you, right? :)

    Lurkers are interesting. I don’t think there’s any value in distinguishing between accounts that have NEVER posted vs those that post rarely - simply because the number of accounts that have zero tweets is vanishingly small. There’s also the intriguing false positive - which is much more common, that people HAVE tweeted, but then delete all their tweets. Why do they do this? Who knows, but it happens relatively often.

    Following people that never post is a personal choice of course. People I know personally I do, people I don’t I don’t. But again, it shows on the report for informational purposes, it’s up to you who you want to keep or not.

    I’m not sure if you have multiple accounts or not - but a lot of the things I’m describing make nearly no sense if you’re only following a small handful of people (up to a few hundred). At that point, you can easily remember & distinguish between them all. There’s no surprises in your list. Above that though, it’s often very, VERY helpful to have the subtleties of those you’re following pointed out to you.

    Fictional character accounts I have no easy answer for. Getting into graph analysis is a big, big subject, & something we’re definitely not set up for at present. I suspect if we were going to be getting into that field, there would be a TON of useful analysis & interpretation we could do - such that fictional characters would fall out nicely as simply a subset of a much larger set of personally-useful-accounts.

    All good food for thought though, & I definitely appreciate you raising these points. Always good to cover old ground & rethink all the basic assumptions.

  2. Zack Weinberg

    I’m sorry about the bad formatting. This site now uses WordPress, so comment behavior should be much more normal. I’ve taken the liberty of deleting one copy of your comment.

    It’s true that I don’t follow more than a handful of people by Twitter standards, and I know exactly who they all are and why they’re there. I don’t mean to say that you should stop listing any of the people who come up in my report! Rather, what I was thinking is that you could have more categories in the report. Follows nobody and Never posts at all seem like useful behaviors to distinguish from their parent categories, especially.