Fixing a ‘This site may harm your computer’ warning, part 3: Clearing a spammed forum

Sun setting behind a sculpture in the park near Google Earlier I wrote about the steps you should take if your site has been hacked and is being slapped with a “This site may harm your computer” label. In that post we covered some of the sneaky ways scammers will insert text into your posts on WordPress and other blog software.

But what if it’s even worse? Let’s say you installed a forum like phpBB to play around with but haven’t been keeping up with security updates. Or, even worse, your ftp account has been compromised and spammers have installed their own bulletin board or other content in a subfolder or subdomain. You don’t want Google and Yahoo thinking you are a spammer, so what do you do?

In that worst-case scenario, you’ll first need to change your passwords and make sure you have control of any and all ftp accounts, telnet accounts, etc. You may need to work with your host to make sure everything is locked down. Web server security is a big topic in it’s own right so from here on out we’ll assume you’ve already got that covered.

Step 1 – Delete the spam!

The first thing to do is delete the spammy bulletin board. Go ahead and delete all the contents of the directory. Don’t delete the directory itself quite yet. This does two things – it stops the spammers from getting any benefit from wayward visitors to your site and it causes your web server to start serving 404s (not found) to search engine spiders.

You can go one step further and explicitly tell browsers and spiders that this stuff is gone forever- by serving a 410 (gone). You can do this with any server-side language, my example will be in PHP. Create a new index.php file in your formerly-spammed directory that looks like this:

<?php header("HTTP/1.1 410 Gone");
header("Status: 410 Gone");?>

This will cover the main directory and then you can use mod_rewrite to redirect all the deleted pages to your 410 file.

Step 2 – Update your robots.txt

At this point search engine spiders will be able to figure out that the pages should be removed from their indexes, but only one page at a time as they re-crawl your site. You want it out of there ASAP, so create a robots.txt entry to tell spiders to stay away from the whole directory. It should look something like this:

User-agent: *
Disallow: /forum/

If the spam was in a subdomain, you’ll need to make sure you have a robots.txt file in the root directory of the subdomain that disallows the whole thing:

User-agent: *
Disallow: /

Step 3 – Tell Google about the spam

Log in to Google Webmaster Tools and look under Tools -> Remove URLs.  Create a new removal request for the subdirectory or subdomain you’ve cleaned.  This might seem a little redundant, since you’ve already done two steps that will let search engines know you’re no longer serving up spam.  But it’s worth being as explicit as possible to get your site’s reputation cleared as quickly as possible.

Bonus tip:  Subdomains and Google Webmaster Tools

If your spammed forum was in a subdomain, let’s say http://forum.exmaple.com, you’ll need to add the subdomain as a new site in Google Webmaster Tools.  You’ll need to go through the site verification process for the subdomain, too – it won’t verify automtically like if you had added a subdirectory as a new site.

By the way, if you’d like some more tips about keeping your site clean and tidy, check out this great post on the Google Webmaster Central Blog.

Any questions? Comments?  Tips that I’ve missed?  Please post in the comments section below.

5 thoughts on “Fixing a ‘This site may harm your computer’ warning, part 3: Clearing a spammed forum

  1. Wow, really thorough and useful post. I haven’t taken the initiative to making a forum, basically because I know it wouldn’t get much use and it could be vulnerable for spamming. The good news is that I don’t see the “This Site May Harm Your Computer Anymore” warning! Yay!

  2. That’s a pretty good summary, Jason! However, I would recommend one change: do not disallow the folder in your robots.txt. By disallowing it, you’re restricting search engines from recognizing that the bad content is no longer on your site (the “disallow” is a crawl directive, not an indexing directive). You need to allow them to recrawl those URLs and to see either a page-not-found (404) or your gone (410) result code.

    Another blog post to look at is http://googlewebmastercentral.blogspot.com/2007/09/quick-security-checklist-for-webmasters.html .

  3. Hii!,..
    Its a very Nice Summary!,.. Step by step process is very Excellent!,.. I haven’t taken the initiative to making a forum, basically because I know it wouldn’t get much use and it could be vulnerable for spamming. In this site pages links and Categories links are very nice process!,..

    Great Job!,..
    Blogging News – Blogging News Information You Can Use
    Thanks!,..

Comments are closed.