RSS

Search Engine Optimization & Performance tuning for your Joomla website

Web Hosting

Robots.txt

Want to support this site? Buy my E-book: Joomla 3.5 SEO & Performance!

The robots.txt file is a file that basically contains information about which part of the site should be made publicly available. It is there especially for the search engines bots that crawl the websites to determine which page should be made part of the index.

Joomla ships with a standard robots.txt file which should work fine for most sites, except for one quite important issue: It blocks the /images folder. This prevents the images for your site from being indexed, which of course you should not want. Therefore, either comment out this line, or remove it completely:

User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
# Disallow: /images/    <-------- Commented out using #
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /logs/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/

Next to this, you can also use the file to prevent specifc pages from being part of the index, like login- or 404-pages.

You can also check whether your robots.txt file works well using the Blocked URL's section of your Google Webmaster Tools.

Advanced tweaking with robots.txt

Advanced users can use the robots.txt file to block pages from being indexed using pattern-matching. You could for example block any page containing a '?' to prevent duplicate content from non-SEF URL's:

User-agent: *
Disallow: /*?*

No need to say you need to be cauteous with this. More example can be found on searchengineland.com.

Web Hosting