{"id":686,"date":"2010-01-12T03:21:42","date_gmt":"2010-01-12T08:21:42","guid":{"rendered":"http:\/\/www.jasonmorrison.net\/content\/?p=686"},"modified":"2010-01-12T03:21:42","modified_gmt":"2010-01-12T08:21:42","slug":"how-to-get-google-search-results-for-academic-research","status":"publish","type":"post","link":"http:\/\/www.jasonmorrison.net\/content\/2010\/how-to-get-google-search-results-for-academic-research\/","title":{"rendered":"How to get Google search results for academic research"},"content":{"rendered":"<p>A few years ago, before I was a Googler, I was a grad student doing research on information retrieval.  I wanted to compare the results of Google and other search engines with folksonomies form social bookmarking sites.  It sounds pretty simple &#8211; <a href=\"http:\/\/googleblog.blogspot.com\/2008\/09\/search-evaluation-at-google.html\">Google does lots of internal search quality studies<\/a>, so it&#8217;s not too surprising that outside researchers would want to execute lots of queries and use the results in their data.<\/p>\n<p>The way I did it was&#8230; not optimal, to say the least.  I wrote a bunch of PHP code, spaced out participant sessions, etc. to make sure I could get results back.  Google tries to make sure that spammers aren&#8217;t scraping search results to generate webspam, so any kind of scraping with <a href=\"http:\/\/curl.haxx.se\/\">cURL<\/a>, <a href=\"http:\/\/www.crummy.com\/software\/BeautifulSoup\/\">Beautiful Soup<\/a>, etc. can result in a big pile of failure.<\/p>\n<p>The way I did it wasn&#8217;t the right way or the easy way, so when I got the job I made a mental note to ask around for the best way to get search results.  Then I forgot all about it until an email exchange with <a href=\"http:\/\/garwarner.blogspot.com\/\">Gary Warner of CyberCrime &#038; Doing Time<\/a> fame.<\/p>\n<p>It turns out <a href=\"http:\/\/research.google.com\/university\/search\/\">Google has a great University research program and API<\/a>.  You have to apply for registration and let us know who you are, what school you&#8217;re affiliated with, and what you plan to study.  Assuming everyting checks out you&#8217;ll get access to <a href=\"http:\/\/research.google.com\/university\/search\/docs.html\">a pretty nice API<\/a>.  There&#8217;s a some <a href=\"http:\/\/research.google.com\/university\/search\/example.html\">example Python code<\/a> but you could just as easily use PHP, Java, or whatever to consume the XML responses.<\/p>\n<p>And that research I was doing?  I recently noticed that my paper <a href=\"http:\/\/scholar.google.com\/scholar?cites=15012918091681539839&#038;hl=en&#038;as_sdt=2000\">has been cited 7 or 8 times<\/a>, according to Google Scholar.  I used to joke that I had written the least influential paper in the history of academic publishing, but I guess I can&#8217;t claim the title anymore.  <a href=\"http:\/\/www.scopus.com\/results\/citedbyresults.url?sort=plf-f&#038;cite=2-s2.0-44449104928&#038;src=s&#038;imp=t&#038;sid=j3OCT1b6XEciqq8fiJqYAkb%3a30&#038;sot=cite&#038;sdt=a&#038;sl=0&#038;origin=inward&#038;txGid=j3OCT1b6XEciqq8fiJqYAkb%3a2\">Scopus only shows 4 citations<\/a> so I will remain humble anyway.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A few years ago, before I was a Googler, I was a grad student doing research on information retrieval. I wanted to compare the results of Google and other search engines with folksonomies form social bookmarking sites. It sounds pretty simple &#8211; Google does lots of internal search quality studies, so it&#8217;s not too surprising [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[19,1],"tags":[349,685,683,102,80,684,104,167,686,417,340,310],"class_list":["post-686","post","type-post","status-publish","format-standard","hentry","category-blog","category-uncategorized","tag-academic-research","tag-beautiful-soup","tag-curl","tag-folksonomies","tag-google","tag-google-scholar","tag-information-retrieval","tag-research","tag-screen-scraping","tag-search","tag-web-search","tag-webspam"],"aioseo_notices":[],"_links":{"self":[{"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/posts\/686","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/comments?post=686"}],"version-history":[{"count":6,"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/posts\/686\/revisions"}],"predecessor-version":[{"id":692,"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/posts\/686\/revisions\/692"}],"wp:attachment":[{"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/media?parent=686"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/categories?post=686"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.jasonmorrison.net\/content\/wp-json\/wp\/v2\/tags?post=686"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}