Yahoo-Overture does not respect robots.txt

| | Comments (0) | TrackBacks (0)

Today I received the following message in my mailbox:

an improper scan has caused a ban on your site

date: Tue Feb 24 18:30:20 2004
ip: 66.77.73.32
host: shop-gw.sac.overture.com
agent: Yahoo-VerticalCrawler-FormerWebCrawler/3.9 crawler at trd dot overture dot com; http://www.alltheweb.com/help/webmaster/crawler

I regulary receive this kind of messages, usually created because bad robots or script kiddies access my spam trap. But Yahoo and Overture are well respected companies, and I would assume that they would have respected my robots.txt file, in which I explicitly deny access to the /private folder:

User-agent: * Disallow: /cgi-bin
Disallow: /dummy/dummy.html
Disallow: /errors
Disallow: /fimcap
Disallow: /js
Disallow: /mailtemplates
Disallow: /mt-static
Disallow: /private
Disallow: /spam

So I looked in my access log and found that they indeed violated my robots file!!

66.77.73.32 - - [24/Feb/2004:17:20:24 -0500] "GET /robots.txt HTTP/1.0" 200 758 "-" "Yahoo-VerticalCrawler-FormerWebCrawler/3.9 crawler at trd dot overture dot com; http://www.alltheweb.com/help/webmaster/crawler"
66.77.73.32 - - [24/Feb/2004:17:57:22 -0500] "GET /private/ HTTP/1.0" 200 4815 "-" "Yahoo-VerticalCrawler-FormerWebCrawler/3.9 crawler at trd dot overture dot com; http://www.alltheweb.com/help/webmaster/crawler"
66.77.73.32 - - [24/Feb/2004:18:30:20 -0500] "GET /private/welcome.html HTTP/1.0" 200 351 "-" "Yahoo-VerticalCrawler-FormerWebCrawler/3.9 crawler at trd dot overture dot com; http://www.alltheweb.com/help/webmaster/crawler"

Notice that the page mentioned in the User Agent string states that Yahoo-Overture does support the robots exclusion protocol!

Categories:

Leave a comment

0 TrackBacks

Listed below are links to blogs that reference this entry: Yahoo-Overture does not respect robots.txt.

TrackBack URL for this entry: http://mt.ai-no.com/mt-tb.cgi/360

About this Entry

This page contains a single entry by Jeroen Sangers published on February 25, 2004.

How big is my world? was the previous entry in this blog.

Rain and snow is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Powered by Movable Type 4.01