'Copybot' and forum security


This is another thread to talk about things on the forum itself, particularly spam. Hopefully the moderators and web project manager can join here and allay any fears about technical risks. The last thread, called 'Chat Bot' was partly about how you tell the difference between a genuine user and abuse, and has reached 167 replies, so it was suggested that we start a new thread for each subject. There's also a thread from the last few months called 'Mods Please Make the Spam Stop', which has covered some of this and also covered the times when obvious spam is left on this forum. I don't personally think it's a massive problem, especially compared to some other forums, but it may make people uneasy unless it's dealt with in a clear way.

As I understand it, and  or  or @WebPM can correct me, every interactive site on the web is subject to some abuse, and the forum software the NAS uses (Telligent Community) has some automated ways to detect and moderate this. However, occasionally some advertising for irrelevant products isn't so obvious, and gets through. There are also some other 'borderline' things, where we're not sure if the user is genuine, and interact with them very cautiously. The way this is supposed to work is that we, the forum users, readers and contributors, help detect the probable spam and click on 'Report as abusive' which pops up when you click the 'More' button on any post or comment. The moderators than consider this, and take action such as locking or deleting the thread. There's also a 'report as abusive' button on each user's profile for when it looks like the only purpose of the account is spamming or trolling.

Sorry I'm being so verbose.

The story so far

In the past week or two (May 2018), besides a small spam outbreak advertising pills and stuff, we've noticed what we're calling 'Copybot', which starts new threads by copying something someone real asked several months or years ago. This causes some confusion as people might start responding to these forgeries, not realising the question is very old and has probably been answered. There have been requests, mostly on the other two threads mentioned above, that the NAS checks its site security, and suggestions about how the site could better prevent Copybot.

I've actually only counted six Copybot threads so far. I think three of these have been deleted and three locked by the moderators, although some stuck around for several days (I may update these figures later).

What is Copybot?

Copybot is the name we (I) gave to whatever was behind the occasion when three threads showed up, from two users, that looked a bit suspicious partly because the two posts from the same account seemed to be from different people: one a parent, the other an autistic young person.  Since then we've had a few more, mostly appearing overnight. The threads look like they come from a new user with no avatar image and the standard "NAS37xxx" name (although the registration may go back months). The posts are usually well-written and relevant to autistic individuals and families - which is hardly surprising, because it's copying most of the text from another post. The title is usually transformed a little so 'How to find a girlfriend' became 'I can't find girlfriend', and other ones include 'please everybody help me' to get extra attention - whether this transformation is automated or human is hard to say; sometimes the fake title is taken from the first sentence of the post instead (so that repetition of the title of a post is suspicious).  Sometimes people respond to the bot posting as it sounds genuine, but unsurprisingly I've not seen the bot reply. This is stealing people's real concerns and questions, which we find a bit creepy.  Sometimes the text that is copied is truncated, either omitting the sign-off, or stopping at a punctuation mark.

Several theories have been suggested as to Copybot's motives, such as that Copybot will eventually post malware links or impersonate a genuine user so well that personal information is compromised. However, I think it is simply a side-effect of trying to defeat anti-spam systems. If a bot registers and starts posting spam immediately, it's likely to get picked up by the automated anti-spam. If it registers, waits a bit, posts something apparently sensible, which people reply to and nobody complains is abusive, then it gains 'reputation', and when it does post spam, it's 'cleanlisted' and the spam appears on the site without moderation. Also, if the copied post is automatically detected or treated as spam, then the anti-spam text-detection software gets a bit confused (technically this is sometimes called 'poisoning' a Bayesian classifier) and can't detect adverts for pills and so on so accurately.

How to check, and what to do if there is a Copybot sighting

I've recently been on the forum a lot, and when I see any new post by someone I don't recognise, I check it. First I look at the post, and think about whether the title is written in a matching style to the text; then I look at the first few words and see if they also appear in the 'Related' bar to the right below one of the titles, and if they do, I look at that other post. I also might hover over the user name or avatar of the NASxxxxx poster to get a pop-up how many 'points' they have; or follow the link on that user name to see their profile. So far, for Copybot, there's been nothing written on the profile, and there is usually '7 points' which is what you get for a single post, or sometimes 14 (or 21?) - you can also check the 'Activity' tab of the profile to see if the posts are consistent and genuine.

If still suspicious, I also see if there are distinctive words or phrases and search to see if those have happened before. For example if the phrase 'depersonalization symptoms' appears, that's pretty rare with an unusual spelling, so I can put that into the search bar at the top and press 'Return' - if it shows a previous thread I check that. You can also check using a standard web search engine, by taking half of a well-written sentence (maybe six to ten words or so), putting double quotation marks (") around it and searching - if it only comes up with the latest NAS page, I'd assume it's not Copybot and we have a welcome post from a new user. If it comes up with other, older hits (I've not seen any from outside the NAS site yet, but it's possible), then I compare the two passages to see if they are more or less identical, and if the new post really is a copy.

If it looks genuine to me, I may like it, or try to add a quick response (is it possible to say 'you're not a bot' politely, just for the benefit of other regulars, and ignore what the poster has said?)

If I find it's a copied post, what I do is:

  1. Reply to the post to warn people that 'this is a copy of a thread from...(however long ago)' and use the word 'Copybot' - this helps find current spam threads without linking to them.
  2. Copy in a full link to the original, genuine thread, (a) so the moderators can verify the copying issue; (b) so people interested in the issue can see other people's responses and contribute their own somewhere that is not likely to be deleted.
  3. Ask the moderators to delete or lock the post.
  4. Try not to link to the post from other threads, as that may improve the search engine ranking of the page or bot.
  5. Click 'report as abusive' on the post
  6. Click 'report as abusive' on the user

(Moderators - is it the case that a post or user that has more abuse reports from different people goes to the top of the moderation queue?)

Then it's up to the moderators to lock or delete the post as appropriate. If someone has added a valuable additional reply, I don't see any problem in locking the thread so that reply, and the link to the original thread, is still available. However, they may want to reassign the post to 'Deleted user' or something to prevent the spammy user from posting again (which could be more copies or spam).

If no obvious action is taken, then I suppose we can communicate with the moderators in this thread, via Direct Message, or the communitymanager@nas.org.uk address? Forum rules are here by the way: community.autism.org.uk/.../rules

Technical countermeasures

If this becomes a bigger problem, something more may need to be done until Copybot gives up. DongFeng5 suggested using a 'hash' of the text of a post to check for duplicates in an automated way, or use the type of software that claims to score plagiarism by students. I think this is something NAS would have to suggest to the software suppliers as a feature request. I know a bit about this subject (I've written hundreds of anti-spam regexes for a job), and a 'fuzzy hash' (like iXhash) should be possible and cope with minor text changes. However, Copybot may also copy anything about autism from other sites so as not to be detected  - someone said copied text from an article about baseball had also been used - or possibly use a Markov-chain text from multiple sources to generate random, but vaguely realistic, text.

Making the site HTTPS, partly to protect anyone from having their site password compromised if using unencrypted wireless, has also been suggested.

Oh, blimey. I do go on.


We can also use this thread to report any new instances of Copybot, although I think adding a comment identifying it as Copybot and reporting it as abuse, as described above, is better.  Perhaps mentioning the NAS number without linking would show a useful pattern in the spam signups.

The weather forecast for today, Thursday 7th June 2018 is: no Copybot sightings. Nothing on Friday either, so we're doing well. In fact I haven't noticed a peep out of it until:

Saturday 16 June.

  • NAS37990, approx 4am - thread locked around 9pm, user still exists, but presumably moderated.
  • NAS37991, approx 10am - locked by Monday afternoon, user still exists, but probably moderated
  • may be worth checking IP addresses for NAS37988 NAS37989 NAS37994 NAS37995 to see if part of pattern

Tuesday 19 June:

  • NAS38026, approx 4pm - (one reply) thread locked on Wednesday, user in moderation (check ...27 and ...28?)

Thursday 21 June:

  • NAS38049, approx 10am. Two threads both titled 'NEED HELP?', copying parts of different threads, 5 minutes apart. Not locked as of 10:40, reported and deleted some time that day.