enrii.blog

A passionate programmer’s findings in the world of internet.

Robots.txt for WordPress

April 26th, 2007

A simple search for "wordpress robots file" made me realize the importance of robots.txt file. Lots of people seems to be looking for a perfect robots.txt file for WordPress blogs.

After looking at JohnTP's article, I created one that looks like this:

User-agent: *
Disallow: /wp-content/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-
Disallow: /trackback/
Disallow: /cgi-bin/

For them who have never heard of robots.txt, it's basically a file that a web crawler would read to know what they should not crawl at your website.

If my article helped you solved your problem, consider buy me a beer!

Share this article: del.icio.us | digg it

Tags: ,

Related posts:

3 Responses


Edwin says:

Thanks, I added the robots.txt to my blog.


Wasabi says:

You try this:

# This rule means it applies to all user-agents
User-agent: *

# Disallow all directories and files within
Disallow: /cgi-bin/
Disallow: /stats/
Disallow: /dh_
Disallow: /about/legal-notice/
Disallow: /about/copyright-policy/
Disallow: /about/terms-and-conditions/
Disallow: /contact/
Disallow: /tag/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/
Disallow: /wp-
Disallow: /trackback/

# The Googlebot is the main search bot for google
User-agent: Googlebot

# Disallow all files ending with these extensions
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.wmv$
Disallow: /*.tar$
Disallow: /*.tgz$
Disallow: /*.cgi$
Disallow: /*.xhtml$

# Disallow Google from parsing indididual post feeds and trackbacks..
Disallow: */feed/
Disallow: */trackback/

# Disallow all files with ? in url
Disallow: /*?*
Disallow: /*?

# Disallow all archived monthlies
Disallow: /2006/0*
Disallow: /2007/0*
Disallow: /2006/1*
Disallow: /2007/1*

# The Googlebot-Image is the image bot for google
User-agent: Googlebot-Image

# Allow Everything
Allow: /*

# This is the ad bot for google
User-agent: Mediapartners-Google*

# Allow Everything
Allow: /*

# SiTeMap per i motori di ricerca
Sitemap: http://siteweb/sitemap.xml


EngLee says:

Wow, that’s an advanced version. Will examine your robots.txt file. Thanks.