UPDDATE AT BOTTOM –>
I’m using a Red Hat Enterprise Linux Server release 7.4 (Maipo) VM in my OS class of about 20 students who generally launch about two ssh connections to this machine with their own specific user ids. This seems to work fine as students trickle into the classroom.
However, at the start of class when most students try to log in, I have students who are unable to log into the system with an “ssh: connect to host xxx.xxx.xxx.xxx port 22: Connection refused”
message. Waiting 20 minutes or so seems to eventually let some more people in. sshd is definitely running. The set of users being refused varies, and sometimes also includes me. I might have connected via ssh successfully a few minutes before, but then can't start a second session.
All of our outgoing traffic uses a Many-to-1 NAT setup, so all of the incoming ssh connections on the server will appear as coming form the same IP number.
After looking at the docs and doing some digging I changed the following two parameter in the sshd_config file:
#MaxSessions 10
MaxSessions 500
and
#MaxStartups 10:30:100
MaxStartups 75:10:200
As I understand it MaxSessions governs the number of active ssh connection to the server – even if coming just from one IP number, while MaxStartups relates to initial connection attempts (e.g., people trying to log in who haven’t provided a password yet) so in this case I could accommodate 75 at startup, and then the rate would go by 10% until it would reach a limit of 200 (so should I set MaxSessions and this number be the same?)
I’m using password authentication, and root login is disabled. We generally log in from Windows 10 machines using the git bash shell (though I have also used putty to see if that would make a difference, it didn't).
In any case, am I on the right track here dealing with the login in issue? The problem is that I can’t reliably reproduce this at will. This problem only seems to occur in class when there are a bunch of connection attempts at the same time, I’m logging in and out w/o any troubles at other times, and none of the students has reported this problems at other times.
What else can I try to help diagnose and fix this problem? I know that this seems to be a type of error that occurs to many, and I've read a fair bit here, but haven't found one working fix yet.
UPDATE
So when I try to reproduce this problem with this small script (credit to @RobbieMckennie for giving me this idea)
LIMIT=5
for i in $(seq $LIMIT)
do
echo
echo "============================= ${i} ==================="
ssh -vvv userid@xx.xx.xxx.xx
echo
done
I'll get this after 3 login attempts:
$ ssh -vvv useridl@xx.xx.xxx.xx
OpenSSH_7.5p1, OpenSSL 1.0.2k 26 Jan 2017
debug1: Reading configuration data /etc/ssh/ssh_config
debug2: resolving "xx.xx.xxx.xx" port 22
debug2: ssh_connect_direct: needpriv 0
debug1: Connecting to xx.xx.xxx.xx [xx.xx.xxx.xx] port 22.
debug1: connect to address xx.xx.xxx.xx port 22: Connection refused
ssh: connect to host xx.xx.xxx.xx port 22: Connection refused
in fact I am able to reproduce this "by hand" if I login in 3 times quickly one after the other, the 4th attempt results in this. The originating ip number is in my ignoreip list in fail2ban (jail.local
) and seems to work as far as I can tell
2017-10-12 07:38:04,481 fail2ban.filter [52845]: WARNING Determined IP using DNS Lookup: c-yy-yy-yy-yyy.hsd1.il.comcast.net = ['yy.yy.yy.yyy']
2017-10-12 07:38:04,482 fail2ban.filter [52845]: INFO [sshd] Ignore yy.yy.yy.yyy by ip
though I'm not sure if the warning means anything.
So, two questions:
-
What is causing this rejection? I don't even get to the system as far as I can tell. Is there a configuration setting I need to tweak?
-
More importantly, when my 22 students all try to log in from campus, all of these connections originate from the same IP number due to our Many-to-1-NAT, would that explain this? It seems to me it might(?)
The only thing that is different, it takes about 15 minutes or so for students to be able to log in when this rejection happens, while in my experiment above I get back in within a few seconds. Is that maybe due to some sort of backlog?
In particular this entry I just discovered this entry in the IPtables
Chain INPUT_direct (1 references)
target prot opt source destination
tcp -- anywhere anywhere tcp dpt:ssh state NEW recent: SET name: DEFAULT side: source mask: 255.255.255.255
REJECT tcp -- anywhere anywhere tcp dpt:ssh state NEW recent: UPDATE seconds: 30 **hit_count: 4** name: DEFAULT side: source mask: 255.255.255.255 reject-with tcp-reset
This would explain the limit of 3 logins, but again, I’m not sure that would explain the 15 minute wait or so on campus to log back in when we encounter this.
Best Answer
I know I'm late ;-) but I guess you have fail2ban running, or something similar?
Fail2ban can help to protect all kinds of daemons against brute fore attacks. For sshd, fail2ban temporarily blocks ports for IP addressed that fail to login repeatedly. There several approaches to solve this situation: Stop fail2ban, whitelist the school's IP, ...