Posts Tagged ‘webspam’

abuse Akismet Blog blogging comment spam compliment spam Custom Search Engine Google hacked scam search security social software social web spam trust Twitter user-centered design web dev WordPress

How to get Google search results for academic research

Tuesday, January 12th, 2010

A few years ago, before I was a Googler, I was a grad student doing research on information retrieval. I wanted to compare the results of Google and other search engines with folksonomies form social bookmarking sites. It sounds pretty simple – Google does lots of internal search quality studies, so it’s not too surprising that outside researchers would want to execute lots of queries and use the results in their data.

The way I did it was… not optimal, to say the least. I wrote a bunch of PHP code, spaced out participant sessions, etc. to make sure I could get results back. Google tries to make sure that spammers aren’t scraping search results to generate webspam, so any kind of scraping with cURL, Beautiful Soup, etc. can result in a big pile of failure.

The way I did it wasn’t the right way or the easy way, so when I got the job I made a mental note to ask around for the best way to get search results. Then I forgot all about it until an email exchange with Gary Warner of CyberCrime & Doing Time fame.

It turns out Google has a great University research program and API. You have to apply for registration and let us know who you are, what school you’re affiliated with, and what you plan to study. Assuming everyting checks out you’ll get access to a pretty nice API. There’s a some example Python code but you could just as easily use PHP, Java, or whatever to consume the XML responses.

And that research I was doing? I recently noticed that my paper has been cited 7 or 8 times, according to Google Scholar. I used to joke that I had written the least influential paper in the history of academic publishing, but I guess I can’t claim the title anymore. Scopus only shows 4 citations so I will remain humble anyway.

Is This A Scam? Find out with a Google Custom Search Engine

Monday, July 20th, 2009

A search engine for scamsIn my Google Blog article about avoiding get-rich-quick scams, I recommended doing a web search to see what other people are saying about any site you’re unsure about. The internet is a big place – chances are if it’s a scam, someone else has already fallen for it and they’re already complaining on their blog or in a forum somewhere.

The only problem with doing a general web search is that not every site on the web is guaranteed to have good information. Some forums are more useful than others, and in the worst cases scammers and spammers spend lots of time trying to get their stuff in the index too.

So, I’ve created something to make it a little easier: a Google Custom Search Engine called Is This A Scam?

Wondering about a home business proposition? Drop a query here. Does your uncle keep falling for pyramid schemes? Send him this link and make him promise to search before he writes the next check.

Custom Search Engines are very useful and are incredibly easy to create. You can create one for your site, or one covering many sites under a certain topic, and you can even make money via AdSense For Search.

This particular search engine works well because I combed the web looking for high-quality sources of information about scams, fraud, snake oil, and consumer protection. The list well over 100 sites, including forums, blogs, news media, government agencies, and non-profit organizations. I’ll post the list here when I get chance.

If you’d like to volunteer to help out with this effort contact me. By the way, this isn’t an official Google product or service, just me in my free time using Google’s great CSE system, so the standard disclaimer applies.

Got bad results? No results? Have you seen a page in the results that has no business being there? Let me know in the comments below.

Watch out for Google Money Scams

Friday, July 10th, 2009

I have a post up on the Official Google Blog: How to steer clear of money scams.

These get-rich-quick schemes are all over the place. They take advantage of the Google brand and the large number of people who are out of work now and looking for new opportunities. Read the article for more info but in general, if it looks too good to be true, it probably is.

The opening paragraph is a true story – so thanks mom, for asking about this an prompting me to look into this further.

Getting the word out about spam profiles and other social network abuse

Sunday, June 28th, 2009

Just a quick post to point out an article I wrote on the Google Webmaster Central Blog, Spam2.0: Fake user accounts and spam profiles. This is a large and growing problem but a lot of folks I’ve talked to didn’t realize they had fake user accounts on their own sites. Excerpt:

Spammers create fake profiles for a number of nefarious purposes. Sometimes they’re just a way to reach users internally on a social networking site. This is somewhat similar to the way email spam works – the point is to send your users messages or friend invites and trick them into following a link, making a purchase, or downloading malware by sending a fake or low-quality proposition.

Spammers are also using spam profiles as yet another avenue to generate webspam on otherwise good domains. They scour the web for opportunities to get their links, redirects, and malware to users. They use your site because it’s no cost to them and they hope to piggyback off your good reputation.

The article got a write up in Information Week, which is pretty cool. Any way to let more people know about the issue.

Seeing more spammers on Twitter lately?

Tuesday, May 12th, 2009

It was inevitable. As Twitter has grown and started pushing into the mainstream, spammers have started ramping up abuse. At first glance, Twitter isn’t the most obvious target – you actually have to follow someone to get content from them, users don’t generally search it for high-cpc stuff like meds and lawyers, and how much spam can you really get into 140 character messages?

But I’m seeing more invites from users like the one below:

Seeing a lot more spammers on Twitter lately...

First: What is Twitterspam? How do I know this is a spammer?

When it comes to spam, most people “know it when they see it,” but it’s helpful to look at the specific signals that this user might not be worth talking to. First off, they have 180 followers and yet haven’t posted a single update. The photo is a dead giveaway. The bio is actually pretty well-done, it’s in English and it’s not outlandish, but the homepage link (http://my-pictures.no.tp/tlow/) – she’s in Portuguese Timor?

Second: Why spam Twitter?

Spammers have two reasons to abuse Twitter: monetary payoff, and because it works.

How can they make money by tweeting a bunch of random people? Well in this case they aren’t, at least not yet. The payoff has to be through the homepage link, which I’m not following and you shouldn’t either. You get a friend invite on a system that, so far, has been a medium of immediate, short, personal communication. Your trust barriers thus weakened, you at least want to see who it is. They don’t have any updates yet, so you click the homepage link and… Virus. Or a maze of PPC affiliate pages and redirections.

Above I said spammers are hitting Twitter because it’s working. How do I know? Look at the number of followers, and the ratio of people followed to followers. About 22 percent of the people spammed so far have responded. I don’t know how many click through to the home page link, but if half the people bother to go that far they’ve got an amazing success rate for spam.

I wish Twitter luck. I know a few people over there, they’ve got their work cut out for them. This sort of thing isn’t easy to fight, it’s an ongoing process. They’ve already taken some visible steps, like using rel=”nofollow” on the Bio link, which at least keeps away blackhat SEOs looking for sources of pagerank. They’ll probably have to do more, most of it on the backend where you and I will never be the wiser. Happy spamfighting!

How spam and malware botnets work – two papers

Tuesday, May 5th, 2009

I read two reports today about large-scale botnets that really pointed out that security is still an open problem on the web. Recently, researchers got access to a nasty botnet, Torpig (original paper: Your Botnet is My Botnet: Analysis of a Botnet Takeover). A few months earlier researchers hijacked the Storm Worm and looked at its profitability (original paper: Spamalytics: An Empirical Analysis of Spam Marketing Conversion). Both papers are fascinating, but terrifying reads.

Some findings:

  • In 10 days, a botnet running on 160,000 machines stole credentials for over 8,000 bank accounts.
  • About 1 in 10 people who open a spam email click through to get infected by the malware.
  • 350 million spam emails resulted in only 28 sales, but the average purchase was $100.

How do these botnets get control of machines? How do they make money? Whether it’s a spammer who needs to get someone to make a purchase on a website or a scammer stealing credit card numbers, passwords, and other information, ultimately you need to get someone to a bad website. Think about all the paths you might take to different sites during the day:

  • Via a web search
  • Clicking on a link in an email
  • Going directly to a favorite site
  • Clicking through an ad

Spammers and scammers try to take advantage of all of those methods, and given the huge volumes of machines at their disposal, it’s a wonder search engines, spam filters, and advertising systems protect users as well as they do now. Between the first and third bullet point above, there’s a huge motivation to hack otherwise good sites to inject drive-by download malware – it can happen to anyone.

So what can we do about it? I think it ultimately comes down to a combination of smarter automated methods, better ways to establish trustworthiness, and removing the economic incentives for spamming, identity theft, and hacking. I have a few posts in mind about some current tools that help with the trust issue and how we might be able to build a social web of trust.

This isn’t a new discussion, Tim Berners-Lee has been writing about the web of trust since the 1990s. But all the work done since then has yet to really solve these problems. And really, so long as a few people are willing to click on a malware link or buy drugs via a spam email, it will never stop.

Thoughts on Blog Usability

Wednesday, April 29th, 2009

DSC_0723 I’ve been kicking around the idea of redesigning my homepage and blog, though I’m not sure I really have the free time to do it. To start, I thought I would to put down a few thoughts about applying usability principles when designing blogs.

When you starting thinking about usability it’s temping to jump right into lists of principles and rules of thumb. It’s a little silly applying Fitt’s Law when you haven’t even established what you want your site to accomplish in the first place. So what, generally, do you want your blog to do?

Personal Goals

  • Share thoughts and work with others
  • Collect a body of work to represent myself (like a portfolio)
  • Collect information for later discovery (by myself and others)
  • Provide an outlet to continue practice writing
  • Allow others to communicate with me and comment

If you’re creating or redesigning a blog for a company, the goal set may be very different. Below are some examples that don’t actually apply in my case.

Business goals

  • Communicate with customers
  • Build long term relationships with customers
  • Produce quality content to drive search traffic
  • Generate revenue through advertising
  • Etc.

Many projects don’t even get this far before the graphic designers and web developers are already making mock-ups, but we still have one more important step to do. We know why you’re building a blog, but why are users coming to it?

(more…)

Stuffing online polls with amazing results

Wednesday, April 22nd, 2009

Having run a big online poll and seen some abuse, I had to share this story posted on the Music Machinery blog. Every year, Time collects their list of 100 most influential people and conducts an online poll. Most years it’s a healthy ballot-stuffing competition between Stephen Colbert fans and fans of the Korean singer Rain.

You can see the list of the top 100 this year here. Does anything look strange to you?

Time.com's mot influential poll

Through a combination of seeding forums with misdirected vote links and clever vote bots, the fans of 4chan not only got moot to the number 1 position but spelled out a message with the first letters of the following positions. That’s a truly amazing hack, and a surprisingly mild response from Time’s developers.

This is also an interesting look into the kind of tactic used by web spammers. Funny in this case, but this is the kind of thing we’re up against.

Open Redirects Under Attack by Spammers

Tuesday, February 3rd, 2009

Albino alligator

I wrote a post last Friday on the Google Webmaster Central Blog about the widespread abuse of open redirects round the web.  If you have some code on your site that will redirect users to an arbitrary destination based on url parameters, watch out.

“But Jason,” you say, “why would I have code that would redirect users to an arbitrary destination based on url parameters?”  You might be surprised.  Code that tracks clicks for ads or analytics, search results pages, and even some login pages are vulnerable.

There are actually lots of legitimate reasons to redirect users, but unfortunately spammers can use them too if you’re not careful.  Read the post to find out more and learn ways to make your site less attractive to attackers.

Comment Spam Article on the Google Webmaster Central Blog

Friday, October 3rd, 2008

I hate comment spam. I think it’s safe to say we all do. So how do you keep it off your blog or forum? Check out this article I wrote on the Google Webmaster Central Blog with some ways to prevent comment spam.

It’s interesting that one of the commenters brings up compliment spam – I just wrote about it on this blog a little while ago.

This was pretty cool for me, because I can’t really share much about my work at Google. It’s also fun to see my text translated into German.

Next up I’ll post an update on the baby name poll with more fun charts and graphs.