Be Careful With Your Robots.txt

March 6, 2009 at 12:39 pm Leave a comment

How One False Move Equals Search Engine Obscurity

Sometimes, my grandson (who I have forgot to introduce to you at first as Alex) has been tied up at work to the point he brings his work home. As he is first and foremost a web designer, he is an SEO noob, so one of his fellow colleagues introduced him to the wonderful world of search engine optimisation. One day, he was building a site for a local vending machine company. He rang me up at 2 in the morning asking me what a Robots.txt file was, and as to why he couldn’t find his client in Google, MSN, Yahoo!, Uncle Tom Cobley and all. In the end, his fellow colleague Darren sent me this article which is worth including.

Be Careful With Your Robots.txt: How One False Move Equals Search Engine Obscurity
I was recently asked to cast an eye over a commercial website for an acquaintance, and was directed to his website developer and host for FTP access. The developers within their website proudly boast that they ensure all their websites are search engine friendly. Great, I thought, making an SEO consultant’s job much easier.

However, there was one catch; I first found that none of their pages were cached in any search engine.

These days, I would expect a relatively new, if not, brand new website to be at least partly cached and indexed, and that the main reason the search engine results were poor would be down to optimisation. Going through the usual motions I did a few checks. Is it a brand new domain or website? Is it hosted on its’ own down and not forwarding to some free web space in the depths on the internet wilderness? All seemed OK, no clues there. Quick robot.txt check… here we go…

User-agent: *
Disallow: /

So much for ensuring that all their websites are search engine friendly! When quizzed the developers said they were being bombarded by a spam bot. Their action was a little drastic but is there anything more effective and less destructive can be done? Too much attention from unwanted guests can use up bandwidth reduce your server response time and increase page load time.

Once we know the spam bots name can we just banish that particular spam bot. How about just allowing the ones we want to admit? “If your name’s not down, you’re not coming in.”

I was thinking of something like the following but bearing in mind the title of this post I want to be sure before I use it with a clients’ website. I seem to remember a lecturer at university or on a course I’ve done talking about the importance of something called ‘testing’. I think I know what he was talking about now, hence:

User-agent: googlebot-image
Allow: /
User-agent: googlebot-mobile
Allow: /
User-agent: yahoo-mmcrawler
Allow: /
User-agent: psbot
Allow: /
User-agent: asterias
Allow: /
User-agent: yahoo-blogs/v3.9
Allow: /
User-agent: msnbot
Allow: /
User-agent: googlebot
Allow: /
User-agent: yahoo-slurp
Allow: /
User-agent: teoma
Allow: /
User-agent: twiceler
Allow: /
User-agent: gigabit
Allow: /
User-agent: scrubby
Allow: /
User-agent: robozilla
Allow: /
User-agent: nutch
Allow: /
User-agent: ia_archiver
Allow: /
User-agent: baiduspider
Allow: /
User-agent: naverbot, yeti
Allow: /
User-agent: baiduspider
Allow: /
User-agent: *
Disallow: /

The following post has some advice: http://www.teckitech.com/tips/how-to-block-the-bad-robots-crawling-your-site. However the feedback would indicate that some spam bots don’t pay that much attention to robots.txt files.

On a broader note the more spam you receive the more exposure your website has. Greater visibly means greater visibility whether the attention is wanted or not. If spam bots are finding you with increased regularity then you probably have more inbound links and therefore more genuine visitors, providing you’ve not neglected content and optimisation.

It’s not all bad though; I’ve know some clients albeit very rarely to form good business relations with someone who they first found when they ‘spammed’ them. For a commercial venture relaying on World Wide Web exposure, I wouldn’t be disallowing anyone from indexing a website.

Wise words indeed. I asked Alex if he understood the article. He first found it difficult to comprehend, but I would assume that’s because he uses Adobe Photoshop CS4 and Dreamweaver to build his sites with rather than notepad. The last I heard was that Darren had to make the amendment himself. Since then, his client has started to be seen in search engines.

Well, I’d better leave you now, as I can hear the microwave roaring. I’m doing a couple of pies for my dinner (some gorgeous meat pies which I bought from Oven King in Hyde yesterday) – a good excuse to avoid ‘Loose Women’ no doubt.

T.U., 06 March 2009.

Advertisements

Entry filed under: Internet, Robots.txt, Search Engine Optimisation, Search Engines, SEO.

What a Mistake-a to Make-a SEO: The Musical

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed


March 2009
M T W T F S S
    Apr »
 1
2345678
9101112131415
16171819202122
23242526272829
3031  

%d bloggers like this: