Name:
Location: Stroud, Glos, United Kingdom

Sunday, February 12, 2006

Duplicate Content

I am currently looking into altering the content of the plr articles supplied to me via my NPM monthly subscription.

Why? Well, from what I have read, duplicate content is a bad thing as far as SEO is concerned. The rumours are that once your site is deemed to have duplicate content it is likely to be marked down badly by the search engines. One theory was that if two sites had the same content the one with the higher PR would be claimed to be the original creator and the one with the lower PR would be penalised. Obviously the search engines dont want "duplicate sites" being created so they have to look out for duplicate content and the ultimate threat the search engines have over webmasters is totally barring the site from the rankings.

The problem I have is what exactly do the search engines class as duplicate content. Is it simply the visible text of a webpage or is it the html that produces the page - or template if a cms/site generator is used? To change a template is no great shakes - but is it necessary?

If it is just visible text they are looking at then how do they compare the text from different pages/sites? Would changing a few words in the text convince them that the plr articles are different or would they actually compare paragraph lengths/word counts or some other kind of pattern to complicate the issue further?

Just how far do we need to go to create unique articles?

Alot of questions, I know. My thoughts so far (and its purely speculation):-

The site template. I cant see how they can penalise this too much. The number of people using stock versions of popular cms/bloggin/e-commerce software, who genuinely are creating unique content, just with the same tools, would suffer for no reason. My guess on templates is that the search engines would have a list of what they consider "black hat" site generators and they would be on the look out for any website template created by a "known" site scraping/generating piece of software - so if you are using one, it would be wise to change the template quite dramatically.

Visible Text - This one I have thought long and hard about for a the last few days. My conspiracory side tells me that they should be able to look for patterns in the text, so changing a few words wouldnt work. They may be looking at the number and length of paragraphs, number of identical words in a page, identical sentences, etc, etc, etc. However, what about sites that are genuinely quoting from other sites. Again, it seems a little harsh for the search engines to do anything too extreme if you have chosen to reproduce my ramblings (with my permission) on a page in your website when it is entirely relevant to your context. There must be an allowance for this surely.

I am beginnig to wonder if I am being a little too paranoid. I certainly havent found any hard and fast evidence from anyone as to how it all works.

So, I throw the floor open to you more experienced site creators. What do you reckon? What do you aim for when "copying" a little content? What lengths and how much effort do you put in to try and make the content your own?

Finally, while we are on this subject - what are the consequences? I presume, if you are found to be guilty as charged of ripping content (even if it is a plr article), your site will be banished from the search engines for ever and ever, amen. But, what, if any are the further reaching problems? I read recently that someones IP was banned - I cant see how this can work if you are using shared hosting, without a dedicated IP - Does this mean that all the sites on the shared hosting will be dumped into oblivion - thats pretty harsh.

OK, so many questions and so few answers. Just really trying to get some thoughts in order but if you can enlighten me at all it would be much appreciated and, if during the course of this years AIS attempts I actually find any hard and fast answers, I will be sure to report back.

Cheers,
Rich

4 Comments:

Anonymous Anonymous said...

There is no way to be sure, but I believe a lot of my sites have been penalized because they all have the same IP address on a reseller account. The site that convinced me of this is a Wiki that is completely original content. At one point Google had a few hundred pages included in its index, but now has 0. The site wasn't built as an adsenser, and I can't really see any reason for it being dropped, other then the fact that it shares its IP address with some pure adsensers.

2:03 AM  
Blogger richandzhaoyan said...

Dan,

Do you reckon that this means if you take out a basic webhosting account with a company and are on a shared server without a dedicated IP address, you could wind up being heavily penalised by the search engines due to the actions of someone else on the same basic webhosting package? Or is there some way of the search engines knowing who owns the accounts? Surely, if you were using your reseller account for reselling, all your clients would have been dumped by the search engines as well... Makes you think about where your sites are hosted doesnt it?

Gary,

Yes, I have been following the progress of your new tools closely. In fact, I was just about to purchase a copy of your synonymiser when I realised the price had nearly doubled!!! More fool me for not having got in quick. Judging by what I am reading, both tools are doing very well, financially, for you and good luck with them.

I actually reckon I have found a similar method of changing the articles using a batch find and replace tool - however, the biggest problem I have is creating a useful synonym list. Mine needs a little more work doing on it - I can create complete gobbledygook across a thousand articeles at the press of a button right now!

Btw, I was very pleased to see you posting a little on the NPM forum. Nice to have someone who knows what they are talking about writing knowledgable and straight talking posts. If things dont pick up soon, I reckon I may just have to cut my losses and run. It has been good to see the packages they provide and I have no doubt that I can make good use of the resources but the forum has been a big dissapointment. - (It has to be said that I have been a total lurker and havent actually contributed myself yet - so maybe if there are others lurking it could pick up in the future.)

Anyhow, sorry to hear about your poorlyness - I was wondering why there was no weekend blogging.

Get well soon,
Rich

12:25 AM  
Anonymous Anonymous said...

Addressing some of the points mentioned:

Duplicate content - this is not necessarily a penalty but a filter, and a filter doesn't push your site down in the SERPS but it filters out the duplicate pages and just ranks those lower than the page that the search engines believe to be the original. Think about a site that has a printable version of certain pages. Google can't penalise a site for creating something like that. When many sites happen to use the same articles there's just a lot more competition to rank well on those articles. When you optimise the article then you're giving yourself more of a chance.

Spiders do read your code as well as your text. I do believe that you're right by saying that spiders know what to see as a standard footprint of say an NPB site and if you don't alter the template or the generated code you'll be starting to see drops in your SERPS if you're ranking for the site anyhow. This is why a lot of people are turning to software such as WordPress etc as it's so widely used the spiders can't just ignore everyone who uses it because it's being abused by a few. I think it's a fine combination of good thought out coding and also optimising your articles. I've used articles for a site off Wikipedia and there's been no issue with Google. I doubt I'll rank very high with them but I put them there for the site's quality, not for SEO.

As for banning an IP, Search Engines would be daft to do something like that. But sites using the same templates and similar content on a server will not all rank that well. At a company I contract for they have 1 server running 6 web sites. The main company site plus departmental sites, all with their own domain name. Whilst Google has indexed all sites, only the main site pulls in visitors on phrases that in theory should be pulling up the department sites, but all sites are on the same IP and Google can see this, along with the same template and colour scheme. Yet MSN and Yahoo rank all domains equally.

That's an understandable example of a shared IP causing issues, but plenty of people host on a shared network and no your site cannot be held responsible for another site, therefore Google (I would at least hope) should only be looking to penalise the domain name and not the IP.

Going back to duplicate content, take the SEO test over at Mike Davidsons Blog. He set up a test on a word and created several pages of duplicate content with minor coding changes. Okay the test was to see how various coding could affect your rankings and what would rank higher but the example also proves that duplicate content is still just a filter as all pages have been indexed for over a month now I believe. You can see the Google results for the search here.

Articles with optimised page titles and semantic use of headers along with (if possible) well worded anchor text from an external source will help. Of course any further changes either manually or using Gary's software (from what I've read) will just improve your chances even more.

But I wouldn't worry about being banned it's a rare occurrence, and often pages 'falling' out of the google index are due to other reasons such as googlebot not reaching your site when trying to spider it (eg due to server being offline) and therefore thinking it's gone, or even sometimes spiders slightly malfunctioning. Also google runs off several servers, perhaps you're on a server that's checking a less up to date database. If you really want to be thorough, check all IPs of Google.

10:25 AM  
Blogger richandzhaoyan said...

Sarah,

Many thanks for taking the time to post such an in depth comment.

I think you have a pretty good grasp of the situation and reading your points, they all seem to make pretty good sense.

There does seem to be alot of paranoia concerning duplicate content - maybe its just the AIS type forums where its most present.

It has been bothering me though as my proper concern is my online osC store - which is rapidly overtaking the shop in turnover, so is really my main bread winner now. As with your point about print version pages, the online store features thousands of products which use a description taken directly from the manufacturer, either there website or brochure. This "content" must also appear on websites/online stores of my rivals - it may only be a small part of the generated product page but it is still, essentially duplicate content. Yet, it is there to serve a purpose so should not really be hauled up by the search engines, and in the very least, it should not condemn the rest of the site.

(It doesnt appear to have as my site now ranks in somewhere in the first page or two for the search term "Gift" - Oh man, wouldnt I just love to try adsense on it for a couple of days - :) )

One comment that you mentioned about the templates, has really highlighted one aspect that I want to pursue - when I finally get round to uploading any more adsensers - Cms systems may well be alot more effective than a site generator. Also, if a site generator is the way to go, then there are two options - firstly, heavily modify any templates used, or, secondly, write or have written your own site generating software, ala Gary or Dan.

I really want to try both methods but it is going to take some time -and progress is pretty slow unfortunately.

Thanks again for posting, its given me plenty to think about,

Cheers,
Rich

12:59 AM  

Post a Comment

<< Home