This is another thread to talk about things on the forum itself, particularly spam. Hopefully the moderators and web project manager can join here and allay any fears about technical risks. The last thread, called 'Chat Bot' was partly about how you tell the difference between a genuine user and abuse, and has reached 167 replies, so it was suggested that we start a new thread for each subject. There's also a thread from the last few months called 'Mods Please Make the Spam Stop', which has covered some of this and also covered the times when obvious spam is left on this forum. I don't personally think it's a massive problem, especially compared to some other forums, but it may make people uneasy unless it's dealt with in a clear way.
As I understand it, and Ross-Mod or Kerri-Mod or @WebPM can correct me, every interactive site on the web is subject to some abuse, and the forum software the NAS uses (Telligent Community) has some automated ways to detect and moderate this. However, occasionally some advertising for irrelevant products isn't so obvious, and gets through. There are also some other 'borderline' things, where we're not sure if the user is genuine, and interact with them very cautiously. The way this is supposed to work is that we, the forum users, readers and contributors, help detect the probable spam and click on 'Report as abusive' which pops up when you click the 'More' button below any post or comment. The moderators than consider this, and take action such as locking or deleting the thread. There's also a 'report as abusive' button on each user's profile for occasions when it looks like the only purpose of the account is spamming or trolling.
[Sorry I'm being so verbose.]
In the past week or two (May 2018), besides a small spam outbreak advertising pills and stuff, we've noticed what we're calling 'Copybot', which starts new threads by copying something someone real asked several months or years ago. This causes some confusion as people might start responding to these forgeries, not realising the question is very old and has probably been answered. There have been requests, mostly on the other two threads mentioned above, that the NAS checks its site security, and suggestions about how the site could better prevent Copybot.
I've actually only counted six Copybot threads so far, as of 7 June 2018. I think three of these have been deleted and three locked by the moderators, although some stuck around for several days. (Edit: since then there have been quiet periods and times of ten copied threads per week, which I've been listing at the bottom of this thread.)
Copybot is the name we (I) gave to whatever was behind the occasion when three threads showed up, from two users, that looked a bit suspicious partly because the two posts from the same account seemed to be from different people: one a parent, the other an autistic young person. Since then we've had a few more, mostly appearing overnight. The threads look like they come from a new user with no avatar image and the standard "NAS3nnnn" name. The posts are usually well-written and relevant to autistic individuals and families - which is hardly surprising, because it's copying most of the text from another post. The title is usually transformed a little so 'How to find a girlfriend' became 'I can't find girlfriend', and other ones include 'please everybody help me' to get extra attention - this transformation is apparently automated, in a way that recognises some English phrases, and chooses a random variation on it. Occasionally the fake title can be taken from the first sentence of the post instead. Sometimes people respond to the bot posting as it sounds genuine, but unsurprisingly I've not seen the bot reply. This is stealing people's real concerns and questions, which we find a bit creepy. Sometimes the text that is copied is truncated, either omitting the sign-off, or stopping at a punctuation mark.
[Here is probably a good place to stop reading. It may be too much information already.]
Several theories have been suggested as to Copybot's motives, such as that Copybot will eventually post malware links or impersonate a genuine user so well that personal information is compromised. However, I think it is simply a side-effect of trying to defeat anti-spam systems. If a bot registers and starts posting spam immediately, it's likely to get picked up by the automated anti-spam. If it registers, waits a bit, posts something apparently sensible, which people reply to and nobody complains is abusive, then it gains 'reputation', and when it does post spam, it's 'cleanlisted' and the spam appears on the site without moderation, and can go unnoticed which is why it seems to wait over two months to replace the copied text with spam. Also, if the copied post is automatically detected or treated as spam, then the anti-spam text-detection software may get a bit confused (technically this is sometimes called 'poisoning' a Bayesian classifier) and so won't be able to detect adverts for pills and so on so accurately.
A web search for "hi guys, i have a question about" and "i have a question, need help" shows that around June 2018 Copybots also started posting to other forums that use other types of forum software, including phpBB, myBB, vBulletin, Vanilla, Invision Community and Discourse. (Only in a technical Plone Discourse forum did I see someone notice that people were responding to bots, although moderators delete some threads.) The earliest Copybot thread I've found on the web is called "a quick question about business or public courses" on the thoroughly infested "Singapore Expats Forum" dated 26 April 2018, where the content was obviously different originally and then replaced with Vietnamese spam (I'm not linking to it for obvious reasons).
I've recently been on the forum a lot, and when I see any new post by someone I don't recognise, I check it. First I look at the post, and think about whether the title is written in a matching style to the text; then I look at the first few words and see if they also appear in the 'Related' bar to the right below one of the titles, and if they do, I look at that other post. I also might hover over the user name or avatar of the NASxxxxx poster to get a pop-up that shows how many 'points' they have; or follow the link on that user name to see their profile. So far, for Copybot, there's been nothing written on the profile, and there are 7 or 14 'points'. (An account gets 7 points for each thread started, and 5 for a reply, so 21 might also be suspicious, but we haven't seen a single account as active as that yet.) You can also check the 'Activity' tab of the profile to see if the posts are consistent and genuine.
If still suspicious, I also see if there are distinctive words or phrases and search to see if those have happened before. For example if the phrase 'depersonalization symptoms' appears, that's pretty rare with an unusual spelling, so I can put that into the search bar at the top and press 'Return' - if it shows a previous thread I check that. You can also check using a standard web search engine, by taking half of a well-written sentence (maybe six to ten words or so), putting double quotation marks (") around it and searching - if it only comes up with the latest NAS page, I'd assume it's not Copybot and we have a welcome post from a new user. If it comes up with other, older hits (I've not seen any from outside the NAS site yet, but it's possible), then I compare the two passages to see if they are more or less identical, and if the new post really is a copy.
If it looks genuine to me, I may like the post, or try to add a quick response, hoping other regulars know I check for Copybots. (It probably isn't appropriate to just say 'you're not a bot' politely, and ignore what the real human poster has said.)
If I find it's a copied post, what I do is:
[OK, it really does get dull and technical after this.]
Then it's up to the moderators to lock or delete the post as appropriate. Maybe more abuse reports from different people catches the moderators' attention more. If someone has added a valuable additional reply, I don't see any problem in locking the thread so that reply, and the link to the original thread, is still available. They may want to reassign the post to 'Deleted user' to prevent the spammy user from posting more copies or spam, but
If no obvious action is taken, then I suppose we can communicate with the moderators by mentioning them in this thread, via Direct Message if we've already had a message from them, or the firstname.lastname@example.org address. Forum rules are here by the way: community.autism.org.uk/.../rules
If this becomes a bigger problem, something more may need to be done until Copybot gives up. DongFeng5 suggested using a 'hash' of the text of a post to check for duplicates in an automated way, or use the type of software that claims to score plagiarism by students. I think this is something NAS would have to suggest to the software suppliers as a feature request. I know a bit about this subject (I've written hundreds of anti-spam regexes for a job), and a 'fuzzy hash' should be possible and cope with minor text changes. However, Copybot may also copy anything about autism from other sites so as not to be detected - someone said copied text from an article about baseball had also been used - or possibly use a Markov-chain text from multiple sources to generate random, but vaguely realistic, text. (We have also seen a short post, probably the same or a different bot, keyed to the forum title by NAS38283.)
Copybot seems unable at the moment to set an alias, avatar photo or profile text on Telligent. Therefore requiring a non-default alias in order to post may stop Copybot until its full features are implemented. It has been suggested requiring some kind of name would at least overcome the problem of not being able to tell the difference between 'NASnnnnn' users. If it is possible to require this in the current forum software settings it would seem worth doing. The accessibility problems with screening signups with ReCapctcha are probably prohibitive given many people with communication difficulties, and a maths Captcha probably wouldn't work. The software does have an option for custom fields to be mandatory. On some other forums, a bot sometimes posted spam in Vietnamese about cosmetics and pills and called itself 'amelinda' or 'philomena', so requiring a non-default alias to post may or may not stop Copybot.
StopForumSpam.com seems to be tracking a lot of related spammers, and there should be a free plugin for SFS for Telligent, although it's not listed on the SFS site. See also Project Honeypot, another free anti-spam service which is basically an IP address blocklist. A simple addition would be to use GeoIP to check for forum submissions from particular Asian countries, or if that's not possible could explicitly ban or firewall the main Vietnamese ranges.
Making the site HTTPS, partly to protect anyone from having their site password compromised if using unencrypted wireless, has also been suggested. This was done in June. It had no effect on Copybot. A related consequent suggestion was permitting non-alphanumeric characters in passwords.
[Oh, blimey. I do go on.]
We can also use this thread to report any new instances of Copybot, although I think adding a comment identifying it as Copybot and reporting it as abuse, as described above, is better. Perhaps mentioning the NAS number without linking would show a useful pattern in the spam signups.
The weather forecast for today, Thursday 7th June 2018 is: no Copybot sightings. Nothing on Friday either, so we're doing well. In fact I haven't noticed a peep out of it until:
Saturday 16 June.
Tuesday 19 June:
Thursday 21 June:
Friday 29 June:
Thursday 5 July:
Friday 6 July:
Mon 9 July:
Weds 11 July:
Thurs 12 July
Friday 13, copybotageddon
Saturday 14 July
Sunday 15 July 8am.... coast clear so far.
What was wrong with forum over the weekend?
It did not work.
For ease of reference, the response from WebPM about times when the web server is unavailable is here:
A couple more copybot threads this morning. I'm listing them at the bottom of the head post here, and have marked them for the moderators to deal with. (I don't want to keep bumping this thread to the top of more important conversations.)