Linux – Kerberos service login only possible for 30 minutes after running ktpass.exe

active-directorykerberoslinux

I'm trying to Kerberize an Apache-server, and allow the created server principal to sign on to the Active Directory. I've followed one of the numerous tutorials available online, and it seems to work fine. I'm on the Linux side of the project, and Corporate IT is on the Windows side.

IT has provided me with a service account and a service principal for it. In this example, I'll refer to it as HTTP/mysite.mycorp.com@MYCORP.COM. They have provided me with a keytab file for said principal, which involves running a tool called ktpass.exe on the AD server.

I've verified that the KVNOs of the AD/KDC and the keytab file match. All is well.

There is a proper DNS A-record for the hostname, and a proper PTR record for the IP. Both servers are in time sync.

I'm able to request a ticket from the AD/KDC for the above mentioned service principal with the issued keytab file, like this:

kinit -k -t http.keytab HTTP/mysite.mycorp.com@MYCORP.COM

This works. I obtain a ticket, and I'm able to use this ticket for things like querying the AD/LDAP directory. The keytab also works great for running a Single Signon Apache site, which is partly the goal of this exercise.

Half an hour passes.

Attempts to log on with the above kinit command now fails with this message:

Client not found in Kerberos database

I'm unable to authenticate as the service principal, much as if the principal was deleted on the AD server.

Now it gets weird, at least for me:

By request, the AD administrator runs the ktpass.exe tool again, building a fresh keytab file for my service. The KVNO (Key Version Number) is incremented on the server, causing our Apache test server to stop validating Kerberos single signon. This is expected with my present configuration. What surprised all of us, was that now the kinit command worked again. We bought ourselves another half hour, and then it stopped working again.

Our IT department is at a loss here, and they're speculating that this is a problem with the AD server itself. I'm thinking it's configuration, but according to them, there are no half hour limits anywhere in their setup.

I've followed http://www.grolmsnet.de/kerbtut/ (see section 7) but the method seems to be the same in all the documentation I've found. I haven't found any reference to time limits on service principals.

EDIT: This seems to be a replication issue. Although no errors are reported in the replication process, the SPN value of the service account is changed (reverted?) from "HTTP/mysite.mycorp.com@MYCORP.COM" to "name-of-service-account@mycorp.com" after 30 minutes.

Best Answer

Thanks for all your input, guys. We got Microsoft on board, and they helped us debug the authentication process on the AD side. Everything worked as it was supposed to, but failed after thirty minutes.

While we were doing a remote debugging session, on of the participants noticed that the UPN/SPN of the service account was suddenly resat from HTTP/mysite.mycorp.com@MYCORP.COM to service-account@mycorp.com. After a LOT of digging around, including debugging AD replication, we found the culprit:

Someone had made a script that ran periodically (or probably by event, since it was exactly thirty minutes after running ktpass.exe), which updated the UPN/PSN to "ensure cloud connectivity". I do not have any supplemental information on the reasons for doing this.

The script was changed to allow existing UPN/SPN values ending in @mycorp.com, effectively solving the problem.

Tips for debugging issues like this:

  • Ensure all the participants in the authentication supports the same encryption types. Avoid DES - it's outdated and insecure.
  • Make sure to enable AES-128 and AES-256 encryption on the service account
  • Be aware that enabling DES on the service account means "use ONLY DES for this account", even if you enabled any of the AES encryptions. Do a search for UF_USE_DES_KEY_ONLY for details on this.
  • Make sure the UPN/SPN value is currect and matches the value in the issued keytab file (i.e. through an LDAP lookup)
  • Make sure the KVNO (Key Version Number) in the keytab file matches the on on the server
  • Inspect traffic between server and client (i.e. with tcpdump and/or WireShark)
  • Enable debugging of authentication on the AD side - inspect logs
  • Enable debugging of replication on the AD side - inspect logs

Again, thank you for your input.