The Joomla! Community Magazine™

Duplicate Pages in Joomla: Causes, Most Common Errors, Solutions

Written by | Saturday, 01 June 2013 00:00 | Published in 2013 June
Everyone who has a Joomla website sooner or later faces the problem of duplicate pages. So let’s have a look at this problem from inside, we’ll sort out what duplicate pages are, how search engines react to them, and how to get rid of these evil twins.

So, page duplicates are identical documents on your websites that are available via different URLs. Actually page duplicates are weak spots of most CMSs, not only of Joomla. All these pages are harmless until they are indexed by crawlers. The main thing to watch for indexing and promptly take action to remove these pages from the results, it's much best to think over the structure in advance and avoid these situations.

The main reason for duplicates to appear is the wrong structure of the website. First of all, you need to think over the hierarchy of categories and menu items. If you create multiple categories in advance this can prevent duplicate pages from appearing in future.

Let’s have a look at most common mistakes. Suppose you have “News" category, and it has several subcategories - "Politics", "Economy", etc., all materials you link directly to child categories. If you create menu item only subcategories, like - "Politics", the links can be as follows (for materials):

https://website.com/politics/23-material

https://website.com/1-newsi/politics/23-material

https://website.com/1-newsi/2-politics/23-material

https://website.com/index.php?option=com_content&task=view&id=23&Itemid=1

And so on and so forth, there plenty of possible variants of url forming, and all these will be just copies of a single page.That’s an example of how it’s not supposed to be done. What's more, search engines may index technical copies, that are available via links "Print", "PDF", "Send to a friend".

When you just start a website you need to think over its structure and identify main categories and subcategories. With the growth of your resource there is no need to change existing structure, simply add necessary additional sections. Restructuring your website at the peak of development is a heavy blow, which can sit you for months. In fact, some pages (or all of them) will change their addresses, which will lead to inevitable reduction of positions in search results and attendance.

So How Do Search Engines Find Duplicate Pages?

The most common cause are the extensions and components that devs install onto Joomla website. Example - news module on homepage can give different address. Sometimes these page duplicates can be seen even in the sitemap, for example, if you have Xmap installed.

If your site is already indexed, then its not that hard to find duplicate content, just copy couple unique offerings from the page and insert this material in quotation marks in Google search. If your site is not indexed, you can try XENU (in case your website is not really big), it will find all possible ways to the site.

Why Do We Need to Get Rid of Page Duplication?

Search engines "don't like" and try not to index websites having great amount of identical pages cause their necessity for search is somehow questionable. While optimizing pages for certain keywords most relevant pages are selected and content is optimized for them. If those pages have duplicates, during update search engine may substitute relevant pages with their doubles (this is accompanied with a sharp fall in positions and subsidence of traffic).

6 Ways to Get Rid of Doubles

Each subject (website) is unique, let’s have a look at most popular methods that work perfectly fine in 99% cases. Feel free to choose any of these options or use some of them in complex. Generally, these solutions can be applied to any other CMS.

All of these examples are valid if you turned on standard SEF and URL redirection in general settings of Joomla.

1. StyleWare Content Canonical Plugin

The content canonical plugin resolves the issue with multiple URLs of single page. So if you have component/content/article/32-something.html and something.html, both will be indexed from one URL (something.html). That's an awesome plugin that does what it’s supposed to.

2. Robots.txt for Joomla

The file is included into the standard package of Joomla, planted into the root directory and is available via link yourwebsite.com/robots.txt. It gives instruction to search engines on how to index website. With it’s help you can shut down some parts of your website. Additionally you can use following instruction to the default data in the file:

Disallow: /*? #each links that contains this sign will not be indexed *

Just one line saves you from storing great amount of chunk, like:

  • page materials for printing;
  • link to rss feed;
  • the search results page on the site;
  • will also close the page pagination;
  • there may be other options, depending on the extensions used.

Whether to use this line* or not, is up to you, keep in mind that too large robots file is considered to be fully permitting. Please note: the line must not cover something important like sitemap, thus you can simply add line Allow: /site_map You can read more about Sitemap in google help.

3. Redirect 301

Redirect 301 is appropriate if you have changed URLs, but all documents still exist. In this case, for proper bonding you need to use redirect 301 in .htaccess file. Search engines will know that the document was moved to a new address. This method allows you save both your website performance and PR.

301 redirect can be used also for gluing page doubles. For example, widely-known duplicates of the home page in a Joomla website are /index.php and the alias of the menu Home, for example: /home or /homepage, these items can be glued pretty easily: open your .htaccess file and enter:

Redirect 301 /index.php http://site.com/

...or else you can make the redirect 301 in the index file of your template:

<?php
if($_SERVER['REQUEST_URI'] == '/index.php') {
header("Location: /",TRUE,301);
exit();
}
?>

Here is a classy example of a website url with and without www.

RewriteEngine On
RewriteCond %{HTTP_HOST} ^www.example.com$ [NC]
RewriteRule ^(.*)$ http://example.com/$1 [R=301,L]

* example.com replace with your site domain name.

In this blog post you can find more ways on how to use redirect 301.

4. Meta Tag Robots

One more way on how to prevent indexing page duplicates - use meta tag robots:

<meta name="robots" content="noindex"/>

For now this option is much better for Google than to use blocking instructions in robots.txt file. For example to close the page version for printing you need to take the address, you can open the component.php file in the root of your template and enter the tag ?tmpl = component in <head> </ head>.

To close pages for search results you can use standard com_search in index.php of your template simply add following condition:

<?php if ($option == 'com_search') : ?>
<meta name="robots" content="noindex"/>
<?php endif; ?>

But firstly you need to specify variable:

$option = JRequest::getVar('option', null);

5. Deleting URLs from Webmaster's Panel

One more way to get rid of duplicates is to delete them manually from webmaster’s panel, for Google — https://www.google.com/webmasters/tools/home?hl=en

6. X-Robots-Tag Titles

Google recommends to use X-Robots-Tag as an alternative to the 4th method in this roundup.

HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
...
X-Robots-Tag: noindex
...

As you see there are lots of ways on how to remove duplicates, though you need to understand how each of them works, to pick most appropriate option for you.

Read 38619 times
Tagged under Sitebuilders, English
Alex Bulat

Alex Bulat

Sometimes I feel like there is not enough furry covers for sun-beds on the upper deck of a yacht, but then suddenly I wake up and have to do some blogging. Luckily, I know where to get things you need...Psst, looking for a new Joomla theme for your site? Ask me where to get one, or just say hello on Google+.