Posts Tagged ‘blogging’

Akismet Blog comment spam compliment spam Facebook Google how-to internet listserv mailing list Online News PageRank plugin SEO social engineering social software spam Twitter webspam WordPress

Blog Comment Spam is Not Solved

Tuesday, July 14th, 2009

With all the comment spam, trackback spam, and pingback spam out there, developers have created some pretty powerful anti-spam tools. So why did I create a small, not-so-powerful anti-spam WordPress plugin like O RLY?

Here’s a screenshot of my pending comments a little while back. Notice the second comment, which slipped past Akismet:

o-rly-spam-comments1

Apparently some dude named Casey Fronczek wanted to let my readers know about his fishing trips. I clicked on the O RLY button, and here’s what Google had to show me:

o-rly-spam-comments2

This spam comment showed up about 17,000 times!

This is an interesting case because it shows that spammers aren’t always looking to place links or pass PageRank. They are always looking for some kind of payoff though, and you can see the roundabout technique here. Hopefully anyone interested in fishing trips in southern Florida will Google this guys relatively unique name and result in a sale. You may also see phone numbers, ICQ or other IM accounts, and similar contact information in some comment spam.

This is a little tougher to automatically delete because a spammy link is a really good signal for an automated filter. Hopefully if people have enough little tools, we bloggers can improve the state of the web as a whole. Get the plugin from WordPress.org, and please let me know of other good anti-spam plugins in the comments.

Recommendations for an easy, automatic blogging system?

Monday, June 15th, 2009

DSC_0066 I’m looking for some help and suggestions, but first a little background on my latest project.

I’m a bit of a map geek – I’m fascinated by maps and how data can be illustrated with maps. I periodically post things on this blog but I actually run across a lot more cool map apps than I can share in mid- to long-form blog posts here.

I use a number of different social bookmarking and social news sites – it’s a research interest of mine, so I probably have accounts on far too many of them. When I come across a blog post on a cool old map or some interesting new real estate geodata site I’ll save/share it in a number of places, including StumbleUpon, Delicious, Reddit, and sometimes others. I also share things via Google Reader.

This is far to diffuse, so I thought I might make a separate mini-blog just for map geekery. But I already spend more than enough time with the blogs and services I’m using now – I’m only able to support another blog if I can automate some part of this giant messy workflow.

This would be pretty similar to how I manage my microblogging / status updates now. I have my Google Reader items posted to FriendFeed, which updates Twitter, which updates Facebook via the Facebook Twitter app. Convoluted, but now that it’s set up I can post something once and have it seen by friends on different services.

I’ve played around with a few different services:

Tumblr – Tumblr makes it very easy to import feeds, which is great for what I’m looking for. The only drawbacks are that so far I can’t narrow down some feeds to really target map bookmarks and I don’t see any easy way to add geodata.

Vox – I’ve only played around with it a bit, but I’m not sure what sets Vox apart from other blog hosts.

WordPress.com – Actually, I thought this would be perfect given the right plugins, but wordpress.com doesn’t have plugins. Setting up and managing yet another WordPress instance doesn’t sound too appealing.

Blogger – Blogger is great, and I should probably use it a bit more considering it’s a Google product. Unfortunately everything I saw in a quick search about posting to Blogger from RSS showed up on somewhat questionable SEO blogs, so I’m wary.

So I’m still looking. Any recommendations on what would be the easiest tiny-blog system to use?

Sick of compliment spam on your blog?

Sunday, May 31st, 2009

Not amused One of the great things about having a blog is getting comments on your posts. It’s particularly gratifying when someone takes the time to tell you that your post was helpful, entertaining, or well-written.

Spammers know this and exploit it by generating compliment spam. They’ll put together a few lines of general praise and slather them across the web, hoping that bloggers will fall for the trick and post their spammy links.

Abusive social engineering like this really annoys me, so when in doubt I always do a Google exact phrase search to see if the compliment is really for me and not from a bot. This is tedious, so I created a simple WordPress plugin: O RLY Comment Spam Search.

You can get the plugin directly from WordPress.org, where you can also give it a rating to tell other webmasters how great (or non-great) it is. By the way, the plugin browser/installer added in WordPress 2.7 is very cool, and makes it much easier to try out plugins.

Judging by the thousands of blogs my O RLY searches have found, this sort of spam works. But why do spammers do it? Since WordPress (and most major blog systems) nofollow links in comments by default, the spammers can’t expect to gain any PageRank from these links. My guess is most of this spam is either intended to get traffic via clickthroughs or is generated by naive site owners, SEOs and marketers who don’t really understand how things work.

Take a look and let me know if it’s useful in the comments below. Also, let me know if it’s breaking on certain comments or otherwise buggy.

I Love Hospitals With WiFi, or Twittering Childbirth

Tuesday, December 2nd, 2008

When we were looking for hospitals and doctors offices for little Athena, wifi wasn’t really on the list so much as reputation, compatibility with our insurance, and other concerns.  In retrospect, though, thank goodness Stanford Hospital and Palo Alto Medical Foundation have wifi.

We live more than 2,000 miles from most of our family.  Not all of them could make the flight to California for the birth.  We also have too many friends around the country to possibly make all the phone calls we’d have liked to have made that night.  In addition, we had several thousand people all over the world wondering which name we would pick for our baby.

Because of internet connectivity, I was able to do a fair job of including all of them in the process:

1) With my iPhone, I was able to take and post photos during labor and delivery.  Photos of my mom’s new granddaughter were available for her, on Flickr, within minutes of birth:

Wrapped and swaddled

I’m not sure I can properly express here how much it meant to her and the rest of our family to be able to see Athena so quickly.

2)  Using the Twitterific App on my iPhone was was able to post updates to Twitter throughout the whole labor.  This is a perfect example of what Twitter is good for.  Liveblogging while my wife endures the pains of childbirth would be ridiculously insensitive, but there were always minutes of downtime here and there to tap out a few words describing what’s going on.

live-twittering

3)  Using the Twitter App for Facebook, my updates showed up on my Facebook status as well.  This was a big help, since so many more friends and family use Facebook than Twitter.

A fourth option, which we didn’t use but might have had the labor been longer, was videoconferencing with Skype.  We’ve been using Skype to keep in touch with family for some time.  It is currently my grandmother’s favorite thing to do.  Since we’ve been back home Athena has become the star of many family video sessions.

One final thing I have to mention is YouTube – we certainly weren’t going to share the gooey miracle of life with the world in streaming video, but my wife followed the videos fo several other women during pregancy up to and including labor.  We don’t know a lot of other couples having kids right now, so that gave Ann a personal connection with their stories and helped her through some of the tougher times during the last 9 months.  She could see that other people were going through the same things she was and that was an important comfort.

The common theme here, which I think goes a long way toward explaining the growth of the internet as a whole, is communication.  Because of almost universal connectivity, we were able to turn a deep personal experience into a social experience as well.

Okay, we should dump the Electoral College – but no need to spam my blog!

Tuesday, November 4th, 2008

One of the things I mentioned in my last post was how the Electoral College distorts the vote in favor of those in small-population states.

I got a comment from NationalPopularVote.com along these lines…

The major shortcoming of the current system of electing the President is that presidential candidates concentrate their attention on a handful of closely divided “battleground” states. In 2004 two-thirds of the visits and money were focused in just six states; 88% on 9 states, and 99% of the money went to just 16 states. Two-thirds of the states and people were merely spectators to the presidential election. Candidates have no reason to poll, visit, advertise, organize, campaign, or worry about the voter concerns in states where they are safely ahead or hopelessly behind. The reason for this is the winner-take-all rule under which all of a state’s electoral votes are awarded to the candidate who gets the most votes in each separate state.

They make a good argument, and I agree with them, but I find it pretty reprehensible that they are spamming blogs to make their point.  If the comment is on-topic, why do I call it spamming?  Grab a snippet of text and do an exact-phrase Google search by wrapping quotes around it, like this:

http://www.google.com/search?q=%22of+the+states+and+people+were+merely+spectators+to+the+presidential%22

As of this writing there are 233 occurrences of the exact same comment slathered all over the web.  It looks like they’re using an automated program to watch Technorati or Google Blog Search for posts about the Electoral College and autopost the same comment.  I’m going to email them and ask that they stop.

The phrase search is a good technique for discovering compliment spam as well.

In any event, we already looked at the fact that some people’s votes count four times as much because they live in a state with a small population.  Do the NationalPopularVote.com folks have a point about swing states?

One way to look at the power of your vote is to figure out the likelihood that yours will decide the election.  By this definition, it helps a little to be in a small-population state but the most important factor is how close the election is in your state – your best bet is to live in a swing state.  Andrew Gelman has a great article explaining why, but it’s pretty intuitive.  If you live in a safe Democrat state, for example, it doesn’t matter if you live in California or Rhode Island – your vote is much less likely to be the one to flip the state to one side or the other.  The same is true for safe Republican states, from Texas to Wyoming.

Comment Spam Article on the Google Webmaster Central Blog

Friday, October 3rd, 2008

I hate comment spam. I think it’s safe to say we all do. So how do you keep it off your blog or forum? Check out this article I wrote on the Google Webmaster Central Blog with some ways to prevent comment spam.

It’s interesting that one of the commenters brings up compliment spam – I just wrote about it on this blog a little while ago.

This was pretty cool for me, because I can’t really share much about my work at Google. It’s also fun to see my text translated into German.

Next up I’ll post an update on the baby name poll with more fun charts and graphs.

Quick Tip: Keeping Comment Compliment Spam off your Blog

Sunday, September 7th, 2008

Blogs are great because they give you a creative outlet and let your readers comment on you posts, making it a much more social experience.  But spammers take advantage of comment forms, using scripts and bots to fill the web with links back to their site.

What can you do about it?  Even with captchas, systems like Akismet, and other automatic techniques (you can read more about these here), some spam will slip through.  Specifically, compliment spam.

What is compliment spam? Spammers know you and I like to be told what great writers we are, how helpful our posts are, and that we are brilliant geniuses.  So they set their bots to spam you with complimentary comments that just so happen to link back to their crappy blog, online casino, or fake viagra store.  Here’s an example:

Typolight
http://www.typolight-blog.de | info@typolight-blog.de | 82.146.49.61

Thanks, you nice post that helped me alot.

From Keep your WordPress site from being hacked with automatic upgrades, 2008/09/06 at 9:27 AM

So, at first glance this looks like a legit comment.  The post in question was a “how-to”, so it would be nice to hear that someone found my instructions helpful.  But, do a Google search with the comment in quotes (an exact phrase search) and you’ll see the problem:

http://www.google.com/search?q=%22Thanks%2C+you+nice+post+that+helped+me+alot.%22

At the time of this writing, we see 168 instances of this exact comment.  By this same Typolight person.

So that’s my tip – if a comment seems a bit too randomly complimentary, throw it in quotes and do a Google search. Then, if it’s spam, make sure to spam it – systems like Akismet only work because we’re all reporting spam.

If you really want to go after the spam poster, you can also give their site a bad rating on Web of Trust, StumbleUpon, and other reporting systems.

Maybe if I get some time I’ll throw together a WordPress plugin to make this easy to do.  If you’d like a plugin like this (or have other tips), drop me a comment and it will help motivate me.

Doing my small part to preserve digital history

Monday, August 18th, 2008

High cirrus clouds and low fog over the Pacific Ocean Years ago, in an undergrad course, one the of the school’s librarians gave a talk about the big risk of the move to digital publishing – historical preservation.  We know what the ancient Greeks thought in part because their words were carved into stone – would we be so lucky if they had used floppy disks?

I wasn’t completely convinced that the situation was so dire then, and I’m still not really worried.  The production and storage of information continues to grow exponentially, and I think the real problem for future archeologists will be dealing with information overload rather than some hypothetical gap in the written record.  But I have been thinking a lot about my own digital history lately so I spent part of this weekend looking at old papers from college and publishing them on my site.

I don’t think my meager efforts will be much help to future historians (much less reverse the entropy of the universe), but I did find some interesting stuff that I probably should have posted for the world to see a long time ago.

For example:

The more I dig up and paste into my WordPress archives the more I realize a few things.  First, a distinct lack of content between undergrad and grad school – I’m doing a much better job of writing without assignments now than I did then.  Second, a hard drive crash in 2003 resulted in a gap in my saved emails – this hurts more now that I’m looking back through things.  Finally, I need to make a point, for the rest of my life, to just put things out there. It seems like such a shame that I put work into these docs just to have them rot on my hard drive.

I know some of my co-workers, Reid and Wysz, have gone through the process of resurrecting old content to their current website.  Anyone else thinking about doing something similar?  What prompted you to do so?  Or, what prevented you?

Update to Altocumulus WordPress Tagging Plugin – version 0.2

Wednesday, August 6th, 2008

Screenshot of my tag cloud WordPress plugin in action

Everyone has tag clouds all over the web, but are they really useful?  Altocumulus is an attempt to use tag clouds as a real navigational system in WordPress blogs.

Install the plugin and it will automatically put a cloud of related tags at the top of all your Category and Tag pages.  Hopefully this will serve two purposes:

  1. Users who end up on a general category page can click through to a more specific (or more relevant) tag page, and
  2. It should give users a general idea of the topic of the posts on that archive page, increasing the information scent.

Next version I’ll add an options screen where you can change the number of tags, placement, etc.

Please drop me a note if you run into any bugs or are using it on your blog.  Let me know if you have any ideas you’d like to see implemented, too – I am all about implementing and studying folksonomies.  The more folks who are interested, the more likely I am to add features.  Thanks.

Download the Plugin Here

How to keep spam off your blog, bulletin board, or forum

Thursday, July 17th, 2008

Columns of gears in the difference engine Spam, it’s not just for breakfast and email anymore.  Webspam is a huge problem – if you run a blog or a forum, you’re probably familiar with the gobs and gobs of gibberish being posted all over the web by spammers.

This humble blog, which only gets a few hundred visitors per day, has had over 17,000 spam comments since I moved over to WordPress last year.  Having your site inundated with comment spam can be just as big a headache as getting hacked.  No one wants to spend hours every day sorting the good posts from the bad.  I’ve already written about how to totally clear out a spammed forum and erase all traces of it’s reputation-marring existence, but the best solution is prevention.

Here are some steps you can take to help prevent spam on your blog or forum.

Keeping Spam off Your Blog

This section assumes you’re hosting your own blog and can add plugins and make configurartion changes, and my examples will be WordPress-heavy because I’m more familiar with WordPress.

Option 1:  Close or restrict comments. Most blogs give you some options to restrict who can comment on articles.  In WordPress, you can require that users create accounts to comment under Settings -> General.  This might not help too much since I’ve seen hundreds of automated user accounts created right alongside the spam.

You can also require that comments are approved before they appear – in WordPress look under Settings -> Discussion.  This will stop your blog from being graffitied without your knowledge but also requires manual effort.  You can also disallow trackbacks and pingbacks, which are really cool in theory but a major avenue for automated spam.

You can also shut down comments completely, or disable comments on old posts.  At that point you may be throwing the baby out with the bathwater, but it’s certainly effective.

Option 2:  Make sure commenters are real people with a captcha. Even if you’re not familiar with the term, you’re familiar with captchas.  They’re the little widgets at the end of a form where you have to decipher some scrambled text from an image.  Many blogs have captcha options built in, but if you’re looking for a captcha plugin be sure to balance usability with security.

I’ve used the Did You Pass Math plugin with some success.  Jeff Atwood has used an extremely simple captcha for years on his high-traffic blog.  Recaptcha is a really cool project that helps fight automatic posting and digitize old books at the same time.

Option 3:  Use an automatic filtering system. If you’re using WordPress, I have three words for you:  Akismet, Akismet, Akismet! Seriously, Akismet is so good at automatically marking spammy commetns and trackbacks that it’s almost scary.  If you’re not using WordPress, you may still be able to find an Akismet plugin for your blogging platform.  There are other systems worth trying as well such as Spam Karma but I have less experience with those.

Keeping Spam off Your Forum

Again, I’m assuming you are hosting the forum yourself or can otherwise make config changes.  I’ll use phpBB (version 3) as an example because I’ve used it in the past.

Option 1:  Restrict user accounts. This can be a tough call, because when you start a forum you want to make it as easy as possible for people to join in the discussion.  Unfortunately, allowing anyone to register and begin posting without any admin approval also opens the door for spammers.

In phpBB this setting can be found in the Administration Control Panel under Board Configuration -> User Registration Settings.

Option 2:  Again with the captchas. Captchas aren’t 100 percent garanteed to remove spam but they do help.  If your forum software doesn’t have a captcha or a captcha plugin, I would seriously consider upgrading to a version that does or switching forums completely.  I know it’s a huge pain but waking up one morning to find 10,000 spam posts is even worse.

In phpBB3 look under Board Configuration -> User Registration Settings for a setting called “Enable visual confirmation for registrations” and make sure it’s turned on.  You can change the details under Board Configuration -> Visual confirmation settings.

Option 3:  Try to find an automatic filtering system. This is harder than for blogs.  There was an Akismet phpBB mod but it’s apparently not being maintained.  There’s a workaround involving the Spam Words mod that you can read about here.  The Spam Words mod might be worth trying on it’s own too.  Here’s a thread with more options for phpBB2, search around and find what’s available for your forum software.

Even without automated filtering, you can try to slow down the spammers by setting a time limit between posts (most human beings don’t type as quickly as spambots do).  Other options, such as disallowing links and BBCode, are pretty drastic but might make your blog less enticing.

Just for fun:

Spam, spam, bacon, and Spam