User Tools

Site Tools


apache:use_.htaccess_to_hard-block_spiders_and_crawlers

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
apache:use_.htaccess_to_hard-block_spiders_and_crawlers [2016/10/09 12:59] peterapache:use_.htaccess_to_hard-block_spiders_and_crawlers [2023/07/17 11:20] (current) – removed peter
Line 1: Line 1:
-====== Apache - Use .htaccess to hard-block spiders and crawlers ====== 
- 
-The .htaccess is a (hidden) file which can be found in any directory. 
- 
-<color red>**WARNING**</color>: Make a backup copy of the .htaccess file, as one dot or one comma too much or too little, can render your site inaccessible.  
- 
-One of the things you can do, with **.htaccess**, is redirect web requests coming from certain IP addresses or user agents. 
- 
-This blocks excessively active crawlers/bots by catching a string in the USER_AGENT field, and redirect their web requests to a “403 – Forbidden”, before the request even hits the webserver.  
- 
-Add the following lines to a website's .htaccess file: 
- 
-<file bash .htaccess> 
-# Redirect bad bots to one page. 
-RewriteEngine on 
-RewriteCond %{HTTP_USER_AGENT} facebookexternalhit [NC,OR]  
-RewriteCond %{HTTP_USER_AGENT} Twitterbot [NC,OR] 
-RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC,OR] 
-RewriteCond %{HTTP_USER_AGENT} MetaURI [NC,OR] 
-RewriteCond %{HTTP_USER_AGENT} mediawords [NC,OR] 
-RewriteCond %{HTTP_USER_AGENT} FlipboardProxy [NC] 
-RewriteCond %{REQUEST_URI} !\/nocrawler.html 
-RewriteRule .* http://yoursite/nocrawler.html [L] 
-</file> 
- 
-This catches the server-hogging spiders, bots, crawlers with a substring of their user.agent’s name (case insensitive).  End each line with the user-agent string with **[NC, OR]**, except the last bot which has **[NC]** only. 
- 
-This piece of code redirects the unwanted crawlers to a dummy html file http://yoursite/nocrawler.html in your root directory. 
- 
-An example could be: 
- 
-<file html nocrawler.html> 
-<!DOCTYPE html> 
-<html> 
-<body> 
-<p>This crawler was blocked</p> 
-</body> 
-</html>  
-</file> 
- 
-<color red>**NOTE:**</color>  The last line **RewriteCond %{REQUEST_URI} !\/nocrawler.html** is needed to avoid looping.  
  
apache/use_.htaccess_to_hard-block_spiders_and_crawlers.1476017954.txt.gz · Last modified: 2020/07/15 09:30 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki