apache:use_.htaccess_to_hard-block_spiders_and_crawlers
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
apache:use_.htaccess_to_hard-block_spiders_and_crawlers [2016/10/09 12:58] – peter | apache:use_.htaccess_to_hard-block_spiders_and_crawlers [2023/07/17 11:20] (current) – removed peter | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Apache - Use .htaccess to hard-block spiders and crawlers ====== | ||
- | |||
- | The .htaccess is a (hidden) file which can be found in any directory. | ||
- | |||
- | <color red> | ||
- | |||
- | One of the things you can do, with **.htaccess**, | ||
- | |||
- | This blocks excessively active crawlers/ | ||
- | |||
- | Add the following lines to a website' | ||
- | |||
- | <file bash .htaccess> | ||
- | # Redirect bad bots to one page. | ||
- | RewriteEngine on | ||
- | RewriteCond %{HTTP_USER_AGENT} facebookexternalhit [NC, | ||
- | RewriteCond %{HTTP_USER_AGENT} Twitterbot [NC,OR] | ||
- | RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC,OR] | ||
- | RewriteCond %{HTTP_USER_AGENT} MetaURI [NC,OR] | ||
- | RewriteCond %{HTTP_USER_AGENT} mediawords [NC,OR] | ||
- | RewriteCond %{HTTP_USER_AGENT} FlipboardProxy [NC] | ||
- | RewriteCond %{REQUEST_URI} !\/ | ||
- | RewriteRule .* http:// | ||
- | </ | ||
- | |||
- | This catches the server-hogging spiders, bots, crawlers with a substring of their user.agent’s name (case insensitive). | ||
- | |||
- | This piece of code redirects the unwanted crawlers to a dummy html file http:// | ||
- | |||
- | An example could be: | ||
- | |||
- | <file html nocrawler.html> | ||
- | < | ||
- | < | ||
- | < | ||
- | < | ||
- | </ | ||
- | </ | ||
- | </ | ||
- | |||
- | <color red> | ||
apache/use_.htaccess_to_hard-block_spiders_and_crawlers.1476017931.txt.gz · Last modified: 2020/07/15 09:30 (external edit)