How are these ‘bad bots’ finding the closed webserver

apache-2.2web-crawler

I've installed Apache a while ago, and a quick look at my access.log shows that all sorts of unknown IPs are connecting, mostly with a status code 403, 404, 400, 408. I have no idea how they're finding my IP, because i only use it for personal use, and added a robots.txt hoping it'd keep search engines away. I block indexes and there's nothing really important on it.

How are these bots (or people) finding the server? Is it common for this to happen? Are these connections dangerous/what can I do about it?

Also, lots of the IPs come from all sorts of countries, and don't resolve a hostname.

Here's a bunch of examples of what comes through:

in one large sweep, this bot tried to find phpmyadmin:

"GET /w00tw00t.at.blackhats.romanian.anti-sec:) HTTP/1.1" 403 243 "-" "ZmEu"
"GET /3rdparty/phpMyAdmin/scripts/setup.php HTTP/1.1" 404 235 "-" "ZmEu"
"GET /admin/mysql/scripts/setup.php HTTP/1.1" 404 227 "-" "ZmEu"
"GET /admin/phpmyadmin/scripts/setup.php HTTP/1.1" 404 232 "-" "ZmEu"

i get plenty of these:

"HEAD / HTTP/1.0" 403 - "-" "-"

lots of "proxyheader.php", i get quite a bit requests with http:// links in the GET

"GET http://www.tosunmail.com/proxyheader.php HTTP/1.1" 404 213 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

"CONNECT"

"CONNECT 213.92.8.7:31204 HTTP/1.0" 403 - "-" "-"

"soapCaller.bs"

"GET /user/soapCaller.bs HTTP/1.1" 404 216 "-" "Morfeus Fucking Scanner"

and this really sketchy hex crap..

"\xad\r<\xc8\xda\\\x17Y\xc0@\xd7J\x8f\xf9\xb9\xc6x\ru#<\xea\x1ex\xdc\xb0\xfa\x0c7f("400 226 "-" "-"

empty

"-" 408 - "-" "-"

That's just the gist of it. I get all sorts of junk, even with win95 user-agents.

Thanks.

Best Answer

Welcome to the internet :)

  • How they found you: Chances are, brute force IP scanning. Just like their constant stream of vulnerability scanning on your host once they found it.
  • To prevent in the future: While not totally avoidable, you can inhibit security tools like Fail2Ban on Apache or rate limits - or manually banning - or setting up ACL's
  • It's very common to see this on any outside accessible hardware that responds on common ports
  • It's only dangerous if you have unpatched versions of software on the host that may be vulnerable. These are merely blind attempts to see if you've got anything 'cool' for these script kiddies to tinker with. Think of it as someone walking around the parking lot trying car doors to see if they're unlocked, make sure yours is and chances are he'll leave yours alone.
Related Topic