How to block this URL pattern in Varnish VCL

spamvarnishweb-crawler

My website is getting badly hit by spambots and scrappers. I've used Cloudflare but the problem still remains there. The problem is spambots accessing non-existing urls causing a lot of load to my drupal backend which goes all the way and bootstraps db just to serve a 404 error doc.

I cant simply dish out non-drupal 404's for all page not found errors, as I need to have drupal catch them. Since, varnish is in front it can check if the bot is acting nice and asking for valid url – if not it servers them a 404 or 403. These bots are causing errors using this pattern :

http://www.megaleecher.net/http:/www.megaleecher.net/Using_iPhone_As_USB_Mass_S/Using_iPhone_As_USB_Mass_S/Using_iPhone_As_USB_Mass_S/Using_iPhone_As_USB_Mass_S/Using_iPhone_As_USB_Mass_S/Using_iPhone_As_USB_Mass_S/Using_iPhone_As_USB_Mass_S/Using_iPhone_As_USB_Mass_Storage

Now, pls. suggest a regex varnbisg VCL directive which catches this URL pattern and serves a 404 error from varnish, preventing it from reaching apache/drupal ?

Best Answer

Have you tried looking for url paths where that begin with /http ?

if (req.url ~ "^/https?:") {
  error 404 "Not found" 
}
Related Topic