Yahoo-Overture does not respect robots.txt

Today I received the following message in my mailbox:

an improper scan has caused a ban on your site

date: Tue Feb 24 18:30:20 2004
ip: 66.77.73.32
host: shop-gw.sac.overture.com
agent: Yahoo-VerticalCrawler-FormerWebCrawler/3.9 crawler at trd dot overture dot com; http://www.alltheweb.com/help/webmaster/crawler

I regulary receive this kind of messages, usually created because bad robots or script kiddies access my spam trap. But Yahoo and Overture are well respected companies, and I would assume that they would have respected my robots.txt file, in which I explicitly deny access to the /private folder:

User-agent: * Disallow: /cgi-bin
Disallow: /dummy/dummy.html
Disallow: /errors
Disallow: /fimcap
Disallow: /js
Disallow: /mailtemplates
Disallow: /mt-static
Disallow: /private
Disallow: /spam

So I looked in my access log and found that they indeed violated my robots file!!

66.77.73.32 - - [24/Feb/2004:17:20:24 -0500] "GET /robots.txt HTTP/1.0" 200 758 "-" "Yahoo-VerticalCrawler-FormerWebCrawler/3.9 crawler at trd dot overture dot com; http://www.alltheweb.com/help/webmaster/crawler"
66.77.73.32 - - [24/Feb/2004:17:57:22 -0500] "GET /private/ HTTP/1.0" 200 4815 "-" "Yahoo-VerticalCrawler-FormerWebCrawler/3.9 crawler at trd dot overture dot com; http://www.alltheweb.com/help/webmaster/crawler"
66.77.73.32 - - [24/Feb/2004:18:30:20 -0500] "GET /private/welcome.html HTTP/1.0" 200 351 "-" "Yahoo-VerticalCrawler-FormerWebCrawler/3.9 crawler at trd dot overture dot com; http://www.alltheweb.com/help/webmaster/crawler"

Notice that the page mentioned in the User Agent string states that Yahoo-Overture does support the robots exclusion protocol!

Leave a comment

Monthly Archives

Recent Entries

  • Comments and tweets

    A recent trend in the blogosphere has been to add **tweets as comments** on posts. I understand where this comes from, as in general...

  • Manifiesto «En defensa de los derechos fundamentales en internet»

    Ante la inclusión en el Anteproyecto de Ley de Economía sostenible de modificaciones legislativas que afectan al libre ejercicio de las libertades de expresión, información...

  • Commenting not possible

    I just discovered that it currently is not possible to comment on this blog. At first sight it looks like the problem is caused...

  • Back from the CeBIT

    I am back from the CeBIT show. Actually, I came back on Monday, but have been too busy to post. This year has been...

  • Downloading viruses?

    I just saw the screen above, which is the ClamWin antivirus software uploading its virus database. On the left hand side they have an animation...

Close