Be Careful With Your Robots.txt

How One False Move Equals Search Engine Obscurity

Sometimes, my grandson (who I have forgot to introduce to you at first as Alex) has been tied up at work to the point he brings his work home. As he is first and foremost a web designer, he is an SEO noob, so one of his fellow colleagues introduced him to the wonderful world of search engine optimisation. One day, he was building a site for a local vending machine company. He rang me up at 2 in the morning asking me what a Robots.txt file was, and as to why he couldn’t find his client in Google, MSN, Yahoo!, Uncle Tom Cobley and all. In the end, his fellow colleague Darren sent me this article which is worth including.

Be Careful With Your Robots.txt: How One False Move Equals Search Engine Obscurity
I was recently asked to cast an eye over a commercial website for an acquaintance, and was directed to his website developer and host for FTP access. The developers within their website proudly boast that they ensure all their websites are search engine friendly. Great, I thought, making an SEO consultant’s job much easier.

However, there was one catch; I first found that none of their pages were cached in any search engine.

These days, I would expect a relatively new, if not, brand new website to be at least partly cached and indexed, and that the main reason the search engine results were poor would be down to optimisation. Going through the usual motions I did a few checks. Is it a brand new domain or website? Is it hosted on its’ own down and not forwarding to some free web space in the depths on the internet wilderness? All seemed OK, no clues there. Quick robot.txt check… here we go…

User-agent: *
Disallow: /

So much for ensuring that all their websites are search engine friendly! When quizzed the developers said they were being bombarded by a spam bot. Their action was a little drastic but is there anything more effective and less destructive can be done? Too much attention from unwanted guests can use up bandwidth reduce your server response time and increase page load time.

Once we know the spam bots name can we just banish that particular spam bot. How about just allowing the ones we want to admit? “If your name’s not down, you’re not coming in.”

I was thinking of something like the following but bearing in mind the title of this post I want to be sure before I use it with a clients’ website. I seem to remember a lecturer at university or on a course I’ve done talking about the importance of something called ‘testing’. I think I know what he was talking about now, hence:

User-agent: googlebot-image
Allow: /
User-agent: googlebot-mobile
Allow: /
User-agent: yahoo-mmcrawler
Allow: /
User-agent: psbot
Allow: /
User-agent: asterias
Allow: /
User-agent: yahoo-blogs/v3.9
Allow: /
User-agent: msnbot
Allow: /
User-agent: googlebot
Allow: /
User-agent: yahoo-slurp
Allow: /
User-agent: teoma
Allow: /
User-agent: twiceler
Allow: /
User-agent: gigabit
Allow: /
User-agent: scrubby
Allow: /
User-agent: robozilla
Allow: /
User-agent: nutch
Allow: /
User-agent: ia_archiver
Allow: /
User-agent: baiduspider
Allow: /
User-agent: naverbot, yeti
Allow: /
User-agent: baiduspider
Allow: /
User-agent: *
Disallow: /

The following post has some advice: http://www.teckitech.com/tips/how-to-block-the-bad-robots-crawling-your-site. However the feedback would indicate that some spam bots don’t pay that much attention to robots.txt files.

On a broader note the more spam you receive the more exposure your website has. Greater visibly means greater visibility whether the attention is wanted or not. If spam bots are finding you with increased regularity then you probably have more inbound links and therefore more genuine visitors, providing you’ve not neglected content and optimisation.

It’s not all bad though; I’ve know some clients albeit very rarely to form good business relations with someone who they first found when they ‘spammed’ them. For a commercial venture relaying on World Wide Web exposure, I wouldn’t be disallowing anyone from indexing a website.

Wise words indeed. I asked Alex if he understood the article. He first found it difficult to comprehend, but I would assume that’s because he uses Adobe Photoshop CS4 and Dreamweaver to build his sites with rather than notepad. The last I heard was that Darren had to make the amendment himself. Since then, his client has started to be seen in search engines.

Well, I’d better leave you now, as I can hear the microwave roaring. I’m doing a couple of pies for my dinner (some gorgeous meat pies which I bought from Oven King in Hyde yesterday) – a good excuse to avoid ‘Loose Women’ no doubt.

T.U., 06 March 2009.

March 6, 2009 at 12:39 pm Leave a comment

What a Mistake-a to Make-a

No, it is not about my ‘Allo ‘Allo obsession or how I forgot to tape ‘Dancing on Ice’ for my wife Judy. Eagle eyed readers would have noticed this textual malaprop within my first post:

“Following my Epiphany in 2004, I have five years down the line discovered SEO and how to create my own websites with hidden text, spam and use of tables for governing layout.”

Oops!

It should have read:

“Following my Epiphany in 2004, I have five years down the line discovered SEO and how to create my own websites without hidden text, spam and use of tables for governing layout.”

I must be going senile.

T.U, 06 March 2009

March 6, 2009 at 11:36 am Leave a comment

Papa’s Got a Brand New Blog!

Well, it has took me quite a long time.  Having spent the last five years in Heaton Chapel library doing computer courses, I have finally put keyboard to blog account and set up my own weblog.

My health condition and then ultimate employers forced my to retire in 1991.  I used to work for the CEGB before then bedsteads decided to privatise it and split it to Powergen, the National Grid and National Power.  On the site of my first power station is a sprawling retail park.  At my last one, a prison is on the site.  Progress or what?!?

Never one to be beaten I discovered computers.  I first found my enthusiasm for new technology when my grandson went to Dixons for his new Amiga A500.  This was 1989 when he plumped for the Batman pack over an Atari ST.  I thought he made a good move, because he’s now a web designer and has one of these trendy iMacs.  I stuck to the PC and am able to defend the indefensible by saying Windows Vista is a great operating system.  It looks good.

To keep the wolf from the door (or rather the DSS), I took up a computing course at Heaton Chapel branch library.  Five years on, the habit remains.  I have still to get my own Facebook page, though boast a Twitter account and swear by Google’s range of tools.

I now use my PC as a notebook, CD/DVD player, convenient typewriter and a games machine.  My youngest grandchild (7) has discovered (and drawn me into) ‘Little Big Planet’.   Following my Epiphany in 2004, I have five years down the line discovered SEO and how to create my own websites with hidden text, spam and use of tables for governing layout.

Well, I’d better love you and leave you as ‘This Morning’ has just started.  After being so used to Richard and Judy presenting it, I first found Fern Britton and Phillip Schofield a poor substitute, but I have warmed to them now.  I’ll be lost without in the summer if ITV gets the go ahead with their latest cutbacks.

T.U, 06 March 2009

March 6, 2009 at 9:49 am Leave a comment

Newer Posts


June 2017
M T W T F S S
« Apr    
 1234
567891011
12131415161718
19202122232425
2627282930