Java – Intermittent SSL handshake error

active-directoryjavaSecuritysslweblogic

We have an issue with SSL and I am 99% this is not your usual certificates trust store merry-go-round.

We have a Weblogic server trying to make SSL connections to Active Directory via LDAPS, underlying SSL implementation is the JSSE.

Some of the time, it works. Usually for a few hours after restarting Weblogic.

After which we start getting SSL Handshake errors, with SSL debug turned on we see:

[ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default
(self-tuning)', handling exception: java.net.SocketException:
Connection reset [ACTIVE] ExecuteThread: '10' for queue:
'weblogic.kernel.Default (self-tuning)', SEND TLSv1 ALERT: fatal,
description = unexpected_message [ACTIVE] ExecuteThread: '10' for
queue: 'weblogic.kernel.Default (self-tuning)', WRITE: TLSv1 Alert,
length = 32 [ACTIVE] ExecuteThread: '10' for queue:
'weblogic.kernel.Default (self-tuning)', Exception sending alert:
java.net.SocketException: Broken pipe

So far I have tried the following to understand/replicate it:

  • Connecting via OpenSSL with the certs loaded – works OK every time
  • Connecting via secure ldapsearch with the certs loaded – works OK every time
  • Connecting via a custom test Java client – works OK every time
  • Decrypting the SSL handshake with Wireshark and the private key.

What I noticed with Wireshark for the "bad" hand shake, is that after the client sends a Change Cipher Spec, Finished message AD does not reply in kind. More so, Wireshark cannot decrypt the SSL handshake, failing with:

ssl_decrypt_pre_master_secret wrong pre_master_secret length (109,
expected 48) dissect_ssl3_handshake can't decrypt pre master secret

Note Wireshark SSL decryption works perfectly when the SSL handshake works perfectly.

I can't see any significant differences in the good and bad SSL handshakes, until the point where the AD server does not respond.

At this point I'm stumped… I'm really struggling to understand why this would fail for some of the time and work the rest, at this point I am really just hoping for some suggestions as to what might be going on.

Oh yes, almost forgot. There is an error in the Active Directory Event log:

Event ID: 36888 The following fatal alert was raised: 20. The state of
the internal error is 960.

Which, after a bit of research I managed to discover corresponds to an SSL "BAD_RECORD_MAC" error.

The only theory I have at this point, is that for some reason the wrong public key is being used to encrypt the handshake… I can't see otherwise why the server (and Wireshark) would fail to decrypt the finished message.

Thanks!

Updates:

I've compared the bad and good cases, the cipher spec in both cases is the same: TLS_RSA_WITH_AES_128_CBC_SHA. I have also compared the packets from both the client and server side, barring the normal Ethernet and IP protocol differences they are all seemingly identical.

Best Answer

So after a great deal of research, experimentation and soul searching. We eventually tracked this issue down to a third party library we were using to connect to an external system. Which upon initialization would add itself as a security provider ahead of the JSSE default provider. I don't know exactly why this then went on to break all subsequent SSL connections... but it did.

Thanks for your help.

Related Topic