Cisco SSH Disconnect Issues – Troubleshooting Guide

ciscossh

I have a Cisco 867VAE with 15.5(3)M which is disconnecting from a particular legitimate SSH client with %SSH-4-SSH2 _UNEXPECTED_MSG: Unexpected message type

It's on a bench, not in service. I have full control over the Cisco R1 not much over the client.

R1 192.168.19.194/24
|
Client 192.168.19.1/24

I've excluded what appears to be the usual caulses:

  • Cached old keys on client (I regenerated on R1, it's not this)
  • People far away attempting break in (it's definitely me)

What else could this be? What changes can I make to the router to give more information or avoid the problem?

The only workaround I've found is enabling telnet on R1, which obviously I'm loathe to do.

Logs follow.

Thanks in advance

Jonathan

Cisco is 867VAE with C860VAE-ADVSECURITYK9-M), Version 15.5(3)M

R1#debug ip ssh 
Incoming SSH debugging is on

Oct 22 16:29:12.496: SSH0: starting SSH control process
Oct 22 16:29:12.496: SSH0: sent protocol version id SSH-1.99-Cisco-1.25
Oct 22 16:29:13.572: SSH0: protocol version id is - SSH-2.0-dropbear_2015.67
Oct 22 16:29:13.576: SSH2 0: Server certificate trustpoint not found. Skipping hostkey algo = x509v3-ssh-rsa
Oct 22 16:29:13.576: SSH2 0: kexinit sent: hostkey algo = ssh-rsa
Oct 22 16:29:13.576: SSH2 0: kexinit sent: encryption algo = aes128-ctr,aes192-ctr,aes256-ctr,aes128-cbc,3des-cbc,aes192-cbc,aes256-cbc
Oct 22 16:29:13.576: SSH2 0: kexinit sent: mac algo = hmac-sha1,hmac-sha1-96
Oct 22 16:29:13.576: SSH2 0: send:packet of  length 368 (length also includes padlen of 5)
Oct 22 16:29:13.576: SSH2 0: SSH2_MSG_KEXINIT sent
Oct 22 16:29:13.576: SSH2 0: input: total packet length of 336 bytes
Oct 22 16:29:13.576: SSH2 0: partial packet length(block size)8 bytes,needed 328 bytes,
               maclen 0
Oct 22 16:29:13.576: SSH2 0: input: padlength 8 bytes
Oct 22 16:29:13.576: SSH2 0: SSH2_MSG_KEXINIT received
Oct 22 16:29:13.576: SSH2 0: kex: client->server enc:aes128-ctr mac:hmac-sha1 
Oct 22 16:29:13.576: SSH2 0: kex: server->client enc:aes128-ctr mac:hmac-sha1 
Oct 22 16:29:13.576: SSH2 0: Using kex_algo = diffie-hellman-group14-sha1
Oct 22 16:29:13.768: SSH2 0: expecting SSH2_MSG_KEXDH_INIT
Oct 22 16:29:13.768: SSH2 0: input: total packet length of 272 bytes
Oct 22 16:29:13.768: SSH2 0: partial packet length(block size)8 bytes,needed 264 bytes,
               maclen 0
Oct 22 16:29:13.768: SSH2 0: input: padlength 5 bytes
Oct 22 16:29:13.768: SSH2 0: SSH2_MSG_KEXDH_INIT received
Oct 22 16:29:14.408: SSH2 0: signature length 271
Oct 22 16:29:14.408: SSH2 0: send:packet of  length 832 (length also includes padlen of 8)
Oct 22 16:29:14.408: SSH2: kex_derive_keys complete
Oct 22 16:29:14.408: SSH2 0: send:packet of  length 16 (length also includes padlen of 10)
Oct 22 16:29:14.408: SSH2 0: newkeys: mode 1
Oct 22 16:29:14.408: SSH2 0: SSH2_MSG_NEWKEYS sent
Oct 22 16:29:14.408: SSH2 0: waiting for SSH2_MSG_NEWKEYS
Oct 22 16:29:14.408: SSH2 0: ssh_receive: 272 bytes received 
Oct 22 16:29:14.408: SSH2 0: input: total packet length of 272 bytes
Oct 22 16:29:14.408: SSH2 0: partial packet length(block size)8 bytes,needed 264 bytes,
               maclen 0
Oct 22 16:29:14.408: SSH2 0: input: padlength 5 bytes
Oct 22 16:29:14.408: SSH2 0: send:packet of  length 80 (length also includes padlen of 15)
Oct 22 16:29:14.408: SSH2 0: computed MAC for sequence no.#3 type 1
Oct 22 16:29:14.408: %SSH-4-SSH2_UNEXPECTED_MSG: Unexpected message type has arrived. Terminating the connection from 192.168.19.1
Oct 22 16:29:14.512: SSH0: Session disconnected - error 0x00

EDIT: config attached. It's a bench config in order to narrow down this problem for units in the field.

version 15.5
no service pad
service timestamps debug datetime msec
service timestamps log datetime msec
no service password-encryption
!
hostname R1
!
boot-start-marker
boot-end-marker
!
aaa new-model
!
aaa authentication login default local
aaa authentication enable default enable
aaa authorization console
aaa authorization exec default local 
!
aaa session-id common
wan mode ethernet
!
ip domain name example.com
ip cef
no ipv6 cef
!
username root privilege 15 password 0 SECRET
!
controller VDSL 0
!
interface ATM0
 no ip address
 shutdown
 no atm ilmi-keepalive
!
interface Ethernet0
 no ip address
 shutdown
!
interface FastEthernet0
 no ip address
!
interface FastEthernet1
 no ip address
!
interface FastEthernet2
 no ip address
!
interface FastEthernet3
 switchport access vlan 1000
 no ip address
!
interface GigabitEthernet0
 no ip address
!
interface GigabitEthernet1
 no ip address
 shutdown
 duplex auto
 speed auto
!
interface Vlan1
 no ip address
!
interface Vlan1000
 ip address 192.168.19.194 255.255.255.0
!
ip forward-protocol nd
no ip http server
no ip http secure-server
!
ip route 0.0.0.0 0.0.0.0 192.168.19.1
!
line con 0
 no modem enable
line aux 0
line vty 0 4
 privilege level 15
 transport input telnet ssh
!
scheduler allocate 60000 1000
!
end

Best Answer

[EDIT: updates at bottom]

It turns out to be to do with the encryption algorithm, and thus indirectly related to the version of Cisco IOS.

The router on the bench is just to help me debug the problem. The real issue is with a moderate amount of installed CPE routers in a global telemetry network. The configs of the tested routers are almost identical as regards SSH access, and vary somewhat as regards interfaces. [EDIT: of course none in the field accept telnet, only on the bench to help with debugging.]

[EDIT2] After further testing, the differentiator is the encryption negotiated between the two sides.

A survey found this client works against with AES128-CBC and fails with AES128-CTR.

A survey of some available routers shows that Cisco SSH server works with default configuration on 15.2(4)M3, 15.2(4)M6, 15.3(2)T2, 15.3(2)T, 15.3(3)M; and fails with 15.4(3)M1, 15.4(3)M3, 15.5(3)M.

We are using 867VAE-K8 + 15.5(3)M, and the following minimal pair config change makes it fail/work:

ip ssh server algorithm encryption aes128-ctr # FAILS
ip ssh server algorithm encryption aes128-cbc # WORKS

Clearly it's an interoperability problem, and there might also be a problem at the client end. The balance of resources strongly suggests it's a client bug rather than a server one, but stranger things have happened.

Available fixes (struck-out those which aren't any longer reasonable)

  • Use telnet
  • Downgrading to the highest one that works, ie 15.3(3)M
  • Configure Ciscos to use aes128-cbc (or best found to work)
  • Change/upgrade client (might not be possible)

The main thing outstanding is how to get a Cisco to give anything like a post-decryption packet dump (monitor capture appears only able to get pre-decryption info) from the SSH process. Without this, all we see is the Cisco complaint "other side sent wrong thing", which leads to a finger-pointing game between suppliers rather than resolution.

Thanks if anyone has a better suggestion.

[EDIT 2017-10-26] Case opened at Cisco TAC yesterday and two hours of phone calls are starting to get somewhere.

  • I edited the version list above as I had made an error in the cutover point.
  • It's clear the exchange is going wrong after SSH server sends MSG_NEWKEYS and is waiting for MSG_NEWKEYS
  • On the systems that work, the SSH server receives MSG_NEWKEYS
  • It is as-yet-impossible to see what the router thinks it has received when it gets the unexpected message
  • Thus impossible to see whether it was incorrectly encrypted or incorrectly decrypted or other problem

[EDIT2: 2017-10-29] Edits throughout to reflect encryption issue rather than anything else. We are in close contact with client authors to see what they can find.