I have nginx webserver.
I have a rich content site and i found that some malicious bots are trying to crawl my content. I blocked any curl or wget coming to my site like this
if ($http_user_agent ~* (curl|wget)) { return 301 $scheme://www.google.com/; }
but i found that i can access the content if i changed the –user-agent in the curl request like below
curl --user-agent "Googlebot/2.1 (+http://www.google.com/bot.html)" http://example.com/mypage.php OR curl --user-agent "whatever" http://example.com/mypage.php
Any idea how could i block any request generated from curl or wget using Nginx regardless of the fake user-agent that has been sent
Best Answer
User-Agent
User-Agent can always be spoofed. There are other headers you can check, but more clever bots spoof those as well. e.g. Accept, Accept-Language, Connection and some others that are not always used per object type
Cookies
Lesser intelligent bots will not properly accept and send cookies, so you can protect some resources using cookies. This may have privacy implications that you should consider.
Javascript
Some bots are unable to process javascript. You can have a hidden javascript "puzzle" so to speak, that requires the browser compute the answer to a simple random math problem. This will break many API Restful Clients unless you find a clever way to exclude them.
Authentication
If you have resources you want to keep bots away from, then you will need to protect those resources with authentication.
Keep-Alive
If you are certain that everyone hitting your site will support keep-alive; including proxies, then you could block connections that do not support it. Some will find this option unorthadox.
Obscure options that may also limit browers and/or API clients
I have also found that some bots don't handle TLS1.2+SNI, as they are often using older libraries that do not support TLS1.2 and even less common, SNI. This will limit the ability of your users to hit API's on your site if that is relevant.
I will leave it to you to research how you might test and implement each of those things and which of them may or may not be appropriate. One size does not fit all.