Keep Search Engine Bots at Bay During Development
The Internet brought a whole new level to the idea of remote collaboration, especially for Web design itself and broader software development. We commonly share custom DNN skins and modules at certain stages of the development cycle with clients and partners at demo.domain.com or similar.
However, if done carelessly, exposing work-in-progress sites to the Internet may cause SEO nightmares before you know it. The trouble starts when Google and company decide to pay you a visit and start indexing your staging site. Granted this usually does not occur from one day to the next, but give it enough time and search engine bots will eventually come across a “misplaced link” or other means to reach your development playground. Due to the time involved to get the job done, the chances of this happening are more pronounced for larger projects such as moving an existing (static) site with a good number of pages to DNN or other CMS, which ultimately will become the new “live” website as opposed to setting up demo.domain.com for skin or module demonstration / testing only.
Adding more fuel to the fire, you most likely won’t notice the damage until you are actually trying to get your shiny new site indexed or the index updated. At this point you are SEO-back paddling and therefore wasting precious time dealing with duplicated content hell and a spammy index.
So how do you avoid such rude awakening? You take advantage of the
Robots Exclusion Protocol (REP) by placing a robot.txt file into the root of your development website right before it sees the light of the public Internet. At a minimum, your robots.txt will contain 2 lines:
User-agent: *
Disallow: /
The first line addresses all search engines bots / spiders /crawlers across the board (even though nothing forces them to obey the REP standard, but the majority of them do.) And the second line shuts the door in their faces by “disallowing” crawling and indexing of the site in question. Now, and this is important, you do need to remember to update your robots.txt file once you are ready to release your live site, because from this moment on forward you certainly want it to be found, indexed, and ranked by search engines.
Ever had a spider wreak havoc in any of your development or test beds? Please share in the comments.
Comments are closed