My website isn't getting crawled to my knowledge at all or very infrequently does my robots.txt file have something to do with that? My current file looks like
User-agent: *
Disallow: /
Is this good or bad for crawls?
My website isn't getting crawled to my knowledge at all or very infrequently does my robots.txt file have something to do with that? My current file looks like
User-agent: *
Disallow: /
Is this good or bad for crawls?
Dude,
What the text meant is not to crawl any pages within and below the directory where the robot.txt is located.
If you don't want the spider to crawl the image directory, you can give the instruction like this
User-agent: *
Disallow: /images/
this
User-agent: *
is for all the robots or spiders.. I would disallow a crawl from an evil spider such as slurp, and allow the rest. so, my code will be something like this
User-agent: *
User-agent: Slurp
Disallow: /
The above will disallow the spider Slurp to crawl my site.
You must itimized all the not allowed bots.
Just to add to what has already been said, crawlers don't need to follow a robots.txt file. The majority will follow it, such as Google and Yahoo's crawlers will obey to the rules but there is nothing to stop me writing a crawler to crawl your site and completely ignore the rules you have set.
If you don't want the site crawled due to privacy etc., then you shall need a more secure method of stopping crawlers such as using .htaccess.
Robots.txt:
IOError
no robots.txt = crawl as much as you want.
We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.