Welcome. If you're new here, you may want to subscribe to my RSS feed. Don't forget to leave a comment. I * DO * FOLLOW so you get a link back.
Visit Firestorm Forums for Free Traffic and Promotion Resources

Recently I’ve become aware that there is a scraper site that is scraping content from a lot of blogs. I’ve reported them to their host with little success, reported them to Google but they are still indexed. I emailed him and 5 of my articles were removed but many are still there. So … I took matters a little step further. I have a hellish little blackhat tool and instead of using it to my benefit, I used it to his detriment. Don’t bother going there. I just tried it again and apparently, I did something right because the whole site is redirecting to Yahoo now. You could just type site:myfreelcd.com into Google and click on “cache” to view the site and articles. You might even find your own articles there … ROFLMAO

Definition of Scraper Site from Wikipedia

A scraper site is a website that copies all of its content from other websites using web scraping.[1] No part of a scraper site is original. A search engine is not a scraper site: sites such as Yahoo and Google gather content from other websites and index it so that the index can be searched with keywords. Search engines then display snippets of the original site content which they have scraped in response to your search.

In the last few years, and due to the advent of the Google Adsense web advertising program, scraper sites have proliferated at an amazing rate for spamming search engines.[1] Open content sites such as Wikipedia are a common source of material for scraper sites.

It has hundreds of pages of scraped content indexed in Google, including most of the pages in this blog. No credit is given to the source of the articles and no link back unless your link is in the article itself. They even have an article about scraping and how to make scraped sites look more natural in the recent articles list: ironic since the article was probably scraped.

I have used Google’s spam reporting link to report this scraper site so we’ll see if that has any effect on his search results.

I found a very helpful list of the 20 Best Anti-Plagiarism Tools here as well as some very good articles and remedies on content theft. There are a lot more articles on content theft here and here is a site that tells you specifically what to do in case of copyright infringement.

I recently installed the Digital Fingerprint Wordpress plugin. It inserts a Digital Fingerprint (a unique term that you make up) in each post to make it easy for you to find your content in the search engines. I found the HotCPA Scraper site by the incoming links in my Wordpress management console. A search in Google using site:myfreelcd.com results in hundreds of pages listed by this blog of scraped … stolen content. I emailed the contact listed below in the Whois search I did on the domain and listed about 5 articles that I found of mine. He removed them without replying to my email, but on further investigation, I found many more of my pages buried in his content.

He’s also very good at getting it indexed very quickly. I wrote the article on Google Trends yesterday and within hours his site was indexed for that article. That means, that when Google crawls my site and finds the same article, I’m the one who could get slapped with a duplicate content penalty or not get listed at all.

I also contacted his host, Hostgator and told them about the problem. At first they said that if you have an rss feed published, it is fair game. I responded back that I didn’t consider copyright infringement whether via rss feed or any other method to be fair game. They then sent an email stating that I could pursue it with them if I sent a ton of documentation, pdfs of my content, pdfs of his content, proof that the content is mine, etc, so in reality, I think pursuing the matter through the Report Spam to Google form might achieve better results.

Here’s another scraper site that is scraping my content on a regular basis. The difference here is that they only publish an excerpt and then link back to me. There are two or three of these all with exactly the same design and same advertising widget … probably all one owner running a bunch of scrapers/splogs to get views to their widget/offer and Adsense. This method of scraping content could actually benefit me since I am getting a backlink every time they scrape an excerpt from my site, so scrape away scrapers.

UPDATE: Received an apology from the site owner for scraping the content. I’m a reasonable person and have accepted the apology. I am hoping to see the site deindexed from Google and am still pursuing that so that the original authors don’t suffer duplicate content penalties. The site owner has removed all of the content (it is no longer redirecting to Yahoo, but is returning 404 Not Found errors, so it’s probably just a matter of time before it is deindexed. Thank you for doing that.
Add to any service

Popularity: 1% [?]