Tag Archives: webspam

abuse Akismet blogging comment spam compliment spam Custom Search Engine Google hacked scam search security social software social web spam trust Twitter user-centered design web dev WordPress

How to get Google search results for academic research

A few years ago, before I was a Googler, I was a grad student doing research on information retrieval. I wanted to compare the results of Google and other search engines with folksonomies form social bookmarking sites. It sounds pretty simple – Google does lots of internal search quality studies, so it’s not too surprising that outside researchers would want to execute lots of queries and use the results in their data.

The way I did it was… not optimal, to say the least. I wrote a bunch of PHP code, spaced out participant sessions, etc. to make sure I could get results back. Google tries to make sure that spammers aren’t scraping search results to generate webspam, so any kind of scraping with cURL, Beautiful Soup, etc. can result in a big pile of failure.

The way I did it wasn’t the right way or the easy way, so when I got the job I made a mental note to ask around for the best way to get search results. Then I forgot all about it until an email exchange with Gary Warner of CyberCrime & Doing Time fame.

It turns out Google has a great University research program and API. You have to apply for registration and let us know who you are, what school you’re affiliated with, and what you plan to study. Assuming everyting checks out you’ll get access to a pretty nice API. There’s a some example Python code but you could just as easily use PHP, Java, or whatever to consume the XML responses.

And that research I was doing? I recently noticed that my paper has been cited 7 or 8 times, according to Google Scholar. I used to joke that I had written the least influential paper in the history of academic publishing, but I guess I can’t claim the title anymore. Scopus only shows 4 citations so I will remain humble anyway.

Is This A Scam? Find out with a Google Custom Search Engine

A search engine for scamsIn my Google Blog article about avoiding get-rich-quick scams, I recommended doing a web search to see what other people are saying about any site you’re unsure about. The internet is a big place – chances are if it’s a scam, someone else has already fallen for it and they’re already complaining on their blog or in a forum somewhere.

The only problem with doing a general web search is that not every site on the web is guaranteed to have good information. Some forums are more useful than others, and in the worst cases scammers and spammers spend lots of time trying to get their stuff in the index too.

So, I’ve created something to make it a little easier: a Google Custom Search Engine called Is This A Scam?

Wondering about a home business proposition? Drop a query here. Does your uncle keep falling for pyramid schemes? Send him this link and make him promise to search before he writes the next check.

Custom Search Engines are very useful and are incredibly easy to create. You can create one for your site, or one covering many sites under a certain topic, and you can even make money via AdSense For Search.

This particular search engine works well because I combed the web looking for high-quality sources of information about scams, fraud, snake oil, and consumer protection. The list well over 100 sites, including forums, blogs, news media, government agencies, and non-profit organizations. I’ll post the list here when I get chance.

If you’d like to volunteer to help out with this effort contact me. By the way, this isn’t an official Google product or service, just me in my free time using Google’s great CSE system, so the standard disclaimer applies.

Got bad results? No results? Have you seen a page in the results that has no business being there? Let me know in the comments below.

Watch out for Google Money Scams

I have a post up on the Official Google Blog: How to steer clear of money scams.

These get-rich-quick schemes are all over the place. They take advantage of the Google brand and the large number of people who are out of work now and looking for new opportunities. Read the article for more info but in general, if it looks too good to be true, it probably is.

The opening paragraph is a true story – so thanks mom, for asking about this an prompting me to look into this further.