Archived blog with a focus on DotNetNuke news, tips and tricks, DNN SEO, and insights and opinions about the DNN community at large.

First time here? You may want to check out the blog archive, subscribe to the RSS feed, sign up for free email updates, or follow me on Twitter. Thanks for visiting!

Keep Search Engine Bots at Bay During Development 

The Internet brought a whole new level to the idea of remote collaboration, especially for Web design itself and broader software development. We commonly share custom DNN skins and modules at certain stages of the development cycle with clients and partners at demo.domain.com or similar.

Keep Search Engine Bots at Bay During DevelopmentHowever, if done carelessly, exposing work-in-progress sites to the Internet may cause SEO nightmares before you know it. The trouble starts when Google and company decide to pay you a visit and start indexing your staging site. Granted this usually does not occur from one day to the next, but give it enough time and search engine bots will eventually come across a “misplaced link” or other means to reach your development playground. Due to the time involved to get the job done, the chances of this happening are more pronounced for larger projects such as moving an existing (static) site with a good number of pages to DNN or other CMS, which ultimately will become the new “live” website as opposed to setting up demo.domain.com for skin or module demonstration / testing only.

Adding more fuel to the fire, you most likely won’t notice the damage until you are actually trying to get your shiny new site indexed or the index updated. At this point you are SEO-back paddling and therefore wasting precious time dealing with duplicated content hell and a spammy index.

So how do you avoid such rude awakening? You take advantage of the Robots Exclusion Protocol (REP) by placing a robot.txt file into the root of your development website right before it sees the light of the public Internet. At a minimum, your robots.txt will contain 2 lines:

User-agent: *
Disallow: /

The first line addresses all search engines bots / spiders /crawlers across the board (even though nothing forces them to obey the REP standard, but the majority of them do.) And the second line shuts the door in their faces by “disallowing” crawling and indexing of the site in question. Now, and this is important, you do need to remember to update your robots.txt file once you are ready to release your live site, because from this moment on forward you certainly want it to be found, indexed, and ranked by search engines.

Ever had a spider wreak havoc in any of your development or test beds? Please share in the comments.




Comments

Mitchel Sellers Mitchel Sellers says:

Great post and recommendation Tom!

I've experienced this a number of times with testing sites that I put out, or for the short time when students of my HTML class were working on assignments on a subdomain. It is a very important configuration for any test/development environment.

....now to triple check my test environments...

Tom Kraak Tom Kraak says:

Thanks Mitch. You must have a number of dev sites "out there" at any given time.

Don Don says:

Some years ago we were hired to help a company improve their search engine rankings. Their site had been underperforming for many years. We discovered pretty quickly that their robot.txt file was telling all search engines to 'stay away'. We assume this was a remnant from their development site that never got updated when the site went live.

Tom Kraak Tom Kraak says:

@ Don - yes, that's a valid concern, but on the other hand, creating / updating the robots.txt file should be part of a proper "going live" to-do list anyway.

Penny Penny says:

It is very embarrassing to be 'caught' in this scenario when your client comes to you and wants to know why his new site is going to a sub portal on 'not his domain'. Can you tell I've been there? As usual, very insightful comments and advice. Thanks!

Tom Kraak Tom Kraak says:

@ Penny - very unpleasant situation for sure. Thanks for chiming in.

froggertv froggertv says:

getting pages indexed during developmental stages is really bad, but i have found it that when your site gets fully loaded all the pages indexed during developmental stage does not have any significance.
The search engine now index your fully loaded site.

Tom Kraak Tom Kraak says:

@ froggertv - true, the fact that your dev site has been indexed will not keep search engines from crawling your live site, but when it comes to ranking, you may actually compete against yourself for while.

Duplicate content may also become an issue, depending on how far you took your dev site.

Cliff Nelson Cliff Nelson says:

Hi Tom,

I have placed a robots.txt file in the root of my DNN development site with only 2 lines.

User-agent: *
Disallow: /

Googlebot still shows the site as "Allowed" in webmaster tools.
Any idea why and how to correct it.?

Tom Kraak Tom Kraak says:

@ Cliff - are you sure Google has "seen" your robots.txt already?

Comments are closed

Subscribe to our Feeds Follow on Twitter