Though it seems like it should be pretty straightforward, I have been unable to configure apache so that googlebot's requests are not stored in the access log. I've tried the following lines:
SetEnvIfNoCase User-Agent googlebot dontlog
BrowserMatchNoCase googlebot dontlog
CustomLog "/foo/bar/access_log" combined env=!dontlog
and I restarted apache after adding them, but the log is still recording all of google bot's requests. My understanding is that SetEnvIf User-Agent and BrowserMatch do the same thing. i tried each of them but neither works.
Best Answer
Find a log entry that you suspect is the Googlebot and make a note of the IP address.
Next do a lookup on that IP address with the following command:
Don't forget to substitute the IP address you recorded earlier with this command.
If the result looks something like this then you know it's the Googlebot. You want make sure it ends in
googlebot.com
:Next, go to your Apache2 Virtualhost and add these directives adapted for your site:
You can repeat this process for the bingbot:
The entry should have something that ends in
search.msn.com
like thisSo you would add the additional line in the Virtualhost file after the Googlebot line:
Usually the Googlebot and MSN bot will use the same IP to check your pages, but if not you may need to add additional entries. You may just want to use
"^66"
out of convenience.https://support.google.com/webmasters/answer/80553
https://blogs.bing.com/webmaster/2012/08/31/how-to-verify-that-bingbot-is-bingbot/