Avoid duplicate URL's
Like, the www or non-www issue, any kind of duplicate content in your site could be a hazard for your search engine rankings. Of course you will have to make sure that your content is unique and is not copied from somewhere else, or re-used in other parts of your sites, but you will also have to make sure that the same page cannot be accessed through multiple URL's.
A lot of open source CMS's can have a possible issue with this, and Joomla is one of them. Even when you have SEF links turned on in your Joomla global configuration, the non-SEF URL still exists. This means 2 URL's with the same content, and often there are more. Duplicated URL's can exist because of the following reasons:
- www or non-www issues, as discussed in the previous article.
- Pages ending with index.html, index.php, etc, which show the same information as the page without the index-part.
- Parameters in the URL, like ..../page1?font-size=large
- Trailing slashes
- Sometimes even uppercase, lowercase issues
- In Joomla specifically: The same article, reached from multiple menu-items.
- non-SEF URL still reachable, despite SEF-URL's being activated.
Having pages being reachable from multiple URL's could harm your rankings, so it's best to prevent this. This can be done in many different ways. Some can be used on their own, but you can also combine techniques to totally get rid of your duplicates:
1: Set a canonical tag to the correct page
Set a canonical tag to the correct page, so that the non-SEF URL is not being indexed. There are ways to achieve this, but it is only usefull for experienced users. Doing this wrongly might have the opposite effect. The easiest way to achieve this is probably by using an extension. Most SEF extensions off solutions for this.
If you set the tag correctly, all possible duplicates of a Joomla page have the tag in the head section, like for example the page you are currently looking at. It can be reached in 2 ways:
- https://joomla-seo.net/index.php?option=com_content&Itemid=125&catid=15&id=18&lang=en&view=article
- https://joomla-seo.net/Checklist/avoid-duplicate-url-s
The first URL is currently rerouted, but if it wasn't, configuring a canonical URL will tell Google that it is the same page as the SEF URL:
<link href="/Checklist/avoid-duplicate-url-s" rel="canonical"/>
Using this technique, you can prevent having duplicate URL's indexed by Google, even when they are still accessible.
The only option you can set in Joomla is in the settings for the System - SEF plugin. It allows you to set a Site Domain. However, it is only usefull if you make the same website available through multiple domains (parked domains):
You should be aware that currently (Joomla 3.2, fixed by now in 3.2.1), there may be some issues with how canonical URL's are treated. You may need to use an extension to set them as you wish.
2: Use 301 redirects
Using 301-redirects means that you tell anyone who accesses such a URL: This link has permanently moved, please go here so if somebody goes to:
https://joomla-seo.net/index.php?option=com_content&Itemid=125&catid=15&id=18&lang=en&view=article
he is forwarded to:
https://joomla-seo.net/Checklist/avoid-duplicate-url-s
You can achieve 301 redirects either in your .htaccess file, or using an extension, like ReDJ, which is a very nice and simple extension for this.
More on 301-redirects can be found in the article about re-routing old URL's.
3: Set up rules in .htaccess
Using your Joomla .htaccess file you can solve quite a few of your duplicate URL issues (provided URL-rewriting is on). We already discussed how to reroute www and non-www URL's, but you can also use it to get rid of your trailing slashes:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.+)/$ http://%{HTTP_HOST}/$1 [R=301,L]
Again, test if the trailing slah is indeed removed, plus if your site actually still works. Allways be carefull with .htaccess changes!
Similar issues could arise because of parameters, like setting a font size, leading Google to think that 2 different pages exist:
4: Set-up robots.txt
You can set-up your robots.txt file in such a way that it disallows any URL with a query string, i.e. a '?' from being indexed, see the article about robots.txt for the code. It both prevents issues with duplicate UR's because of non-SEF URL's, but also real query strings, like these:
5: Use an extension
For smaller sites, preventing issues can easily be done by configuring .htaccess, robots.txt, and possibly a small extension for 301-redirects, but for larger sites, using a SEF extension is probably more efficient. It takes some time to learn how these extensions work, so start trying it out on a site that is not that important. If used correctly, it will ban all duplicate URL issues from your site. However if used incorrectly, it could have the total opposite.
Some well-known SEF-extensions:
Check the extensions section of this site for information about these and others.
6: Google Webmaster Tools
Using Google Webmaster Tools is an alternative way of getting rid of duplicate URL's. Preferably you should use any of the discussed techniques to prevent issues showing up in your Webmaster Tools, and even if they do, first go back and review your set-up. However, sometimes you may not be able to prevent duplicates from showing up.
Please note: Don't panic when you see issues like this as warnings in Webmaster Tools. Especially with new sites, Google often encounters these issues, but usually, especially with parameters, it learns that this is not a separate page, and the warnings disappear after a few weeks.... Deal with the remaining issues, but remember that this is an advanced topic. For more information read our article on this subject.
Some other ways to deal with duplicate URL's in Joomla are discussed in this recent article on the Joomla Magazine.