Iis – Block Bots with IIS 7.5 and 8.0

iisiis-7.5

I would like to block a bot with IIS. With Apache you can add a command to your .htaccess file, as outlined here. How would I accomplish this with IIS 7.5?

Update

In addition to answer below, there are a total of approaches I discovered since posting this question:

  1. URL Scan option listed in the accepted answer.
  2. Define a Request Filtering rule (example below)
  3. Define a URL Rewriting rule (example below)

Request Filter Rule

 <system.webServer>
    <security>
      <requestFiltering>
        <filteringRules>
          <filteringRule name="BlockSearchEngines" scanUrl="false" scanQueryString="false">
            <scanHeaders>
              <clear />
              <add requestHeader="User-Agent" />
            </scanHeaders>
            <appliesTo>
              <clear />
            </appliesTo>
            <denyStrings>
              <clear />
              <add string="YandexBot" />
            </denyStrings>
          </filteringRule>
        </filteringRules>
      </requestFiltering>
    </security>
    [...]
 </system.webServer>

URL Rewriting rule

<rule name="RequestBlockingRule1" patternSyntax="Wildcard" stopProcessing="true">
                    <match url="*" />
                    <conditions>
                        <add input="{HTTP_USER_AGENT}" pattern="YandexBot" />
                    </conditions>
                    <action type="CustomResponse" statusCode="403" statusReason="Forbidden: Access is denied." statusDescription="Get Lost." />
                </rule>

For my last project I ended going with option 2 since it is security focused and based on the integrated URL Scan built into IIS 7.

Best Answer

Normally you use robots.txt. It will work on all well behaved bots.

For bots that are not well behaved there is often little you can do. You can limit connection counts or bandwidth in your firewall or webserver, but major bots will typically use multiple IP addresses. Limiting based on user-agent strings is usually not a good idea, as those are trivial for the bot to spoof, and bots that does not care about robots.txt have a tendency to spoof useragent strings as well. It works in the specific case when the bot sends a correct user agent, but does not obey the robots.txt.

Edit: If you really want to block based on useragent instead of pushing it back to your firewall or similar I think the easiest way is to use URLScan. You write a rule that looks something like this:

[Options]
 RuleList=DenyYandex

[DenyYandex]
 DenyDataSection=Agents
 ScanHeaders=User-Agent

[Agents]
 Yandex
Related Topic