User Tools

Site Tools


apache:use_.htaccess_to_hard-block_spiders_and_crawlers

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
apache:use_.htaccess_to_hard-block_spiders_and_crawlers [2016/10/09 12:52] – created peterapache:use_.htaccess_to_hard-block_spiders_and_crawlers [2023/07/17 11:20] (current) – removed peter
Line 1: Line 1:
-====== Apache - Use .htaccess to hard-block spiders and crawlers ====== 
- 
-The .htaccess is a (hidden) file which can be found in any directory. 
- 
-<color red>**WARNING**</color>: Make a backup copy of the .htaccess file, as one dot or one comma too much or too little, can render your site inaccessible.  
- 
-One of the things you can do, with **.htaccess**, is redirect web requests coming from certain IP addresses or user agents. 
- 
-This blocks excessively active crawlers/bots by catching a string in the USER_AGENT field, and redirect their web requests to a “403 – Forbidden”, before the request even hits the webserver.  
- 
-Add the following lines to a website's .htaccess file: 
- 
-<file bash .htaccess> 
-#redirect bad bots to one page  
-RewriteEngine on 
-RewriteCond %{HTTP_USER_AGENT} facebookexternalhit [NC,OR]  
-RewriteCond %{HTTP_USER_AGENT} Twitterbot [NC,OR] 
-RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC,OR] 
-RewriteCond %{HTTP_USER_AGENT} MetaURI [NC,OR] 
-RewriteCond %{HTTP_USER_AGENT} mediawords [NC,OR] 
-RewriteCond %{HTTP_USER_AGENT} FlipboardProxy [NC] 
-RewriteCond %{REQUEST_URI} !\/nocrawler.htm 
-RewriteRule .* http://yoursite/nocrawler.htm [L] 
-</file> 
- 
  
apache/use_.htaccess_to_hard-block_spiders_and_crawlers.1476017554.txt.gz · Last modified: 2020/07/15 09:30 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki