V7ndotcom Elursrebmem
 

Welcome to Toprankingcompany

Monday, July 27, 2009

SEO Q&A: How to Prevent Google from Crawling AND Indexing a Page?

by Ann Smarty

So here's my detailed explanation of the problem as well as the solution:

To me, blocking pages via Robots.txt has always been primarily about saving the bot's time than actually trying to hide anything. Search bots crawl on a budget: thus the more "extra" pages you exclude from the very start, the more time it will spend looking for more content-rich pages and including (or updating) them in the index.

What standard "Disallow" directive cannot still do is to make Google drop the page out of the index. So you may end up seeing those blocked pages in Google SERPs - Google won't know what they actually contain, so it will make judgements based on both internal and external references to that pages.

So quite a natural question caused by the above mentioned stuation is "How do I make Google ignore those "extra" pages completely: not to waste the crawl's time on them and not listing them in SERPs?"

The answer is not that simple as it may seem. The widely used "NoIndex" meta tag won't work because Google won't see it: the page is blocked from Google, so Google can't enter it to see the Robots meta tag.

There are two other possible solutions though:

1. Use Robots.txt Disallow meta tag and then use the URL removal tool within Google Webmaster Tools;

2. Use Robots.txt Noindex direcive - it is unofficially supported by Google and can be one of the ways to help sculp PageRank. This directive is going to block the page from being crawled and indexed:

user-agent: googlebot
noindex: /login.php
disallow: /login.php

[searchenginejournal]

Labels: ,

Digg Our News Here ->                 

 

Main Menu

Previous Posts

Archives