Docker hello-world container won’t start on Windows Server 2016 on top of VMWare

containersdockermcafeewindows-server-2016

I've been trying to get Docker installed and running on a Windows VM to get a better understanding of the runtime for downstream work, and I'm running into the issues starting the hello-world container.

Environment:

  • VMWare virtual hardware:
    • 4 GB RAM
    • Intel Xenon CPU (2 cores)
  • Windows Server 2016 Standard (Version 1607)
  • Some antivirus and firewall considerations (I'm getting more info on those)

Output from docker version:

Client:
 Version:      17.06.2-ee-6
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   e75fdb8
 Built:        Mon Nov 27 22:46:09 2017
 OS/Arch:      windows/amd64

Server:
 Version:      17.06.2-ee-6
 API version:  1.30 (minimum version 1.24)
 Go version:   go1.8.3
 Git commit:   e75fdb8
 Built:        Mon Nov 27 22:55:16 2017
 OS/Arch:      windows/amd64
 Experimental: false

What's worked:

What hasn't:

Running any container. We've tried a few:

  • hello-world:nanoserver
  • hello-world:latest
  • microsoft/nanoserver:latest
  • microsoft/windowsservercore:latest

What have I already tried (with no success):

  • Relaxing our Group Policy Settings
  • Enabling the Hyper-V Windows Optional Component

What actually happens:

When I attempt to start a container using docker run {container-name-here}, PowerShell hangs for a substantial amount of time (a couple of minutes) and prints the following message:

C:\Program Files\docker\docker.exe: Error response from daemon: container
    {container-id-here} encountered an error during Start: failure in a
    Windows system call: This operation returned because the timeout
    period expired. (0x5b4).

In the docker events log, I get the following messages at the same time:

2018-04-18T09:36:27.881680400-04:00 container create {container-id-here} (image=hello-world:nanoserver, name=confident_ardinghelli)
2018-04-18T09:36:27.883680800-04:00 container attach {container-id-here} (image=hello-world:nanoserver, name=confident_ardinghelli)
2018-04-18T09:36:28.753726900-04:00 network connect {network-id-here} (container={container-id-here}, name=nat, type=nat)
2018-04-18T09:40:21.373395500-04:00 network disconnect {network-id-here}(container={container-id-here}, name=nat, type=nat)

We get the timeout message between the network connect and the network disconnect.

The references I've found in my searching (here, and here) indicate that this may be an antivirus issue, but I've been unable to find any documentation on how to confirm that it's an antivirus problem or which antivirus component may be the problem short of disabling the antivirus and trying again. I'm working on getting with the folks that have access to that part of the system and trying again, I'll update with results.

So, what am I actually asking?

  • Has anyone else seen this or a similar issue before? What steps were you able to take to diagnose the root cause, and what ended up being the issue in your case?
  • Are there any other Docker or Windows logs I should be looking at to better diagnose the cause of the issue?
  • Any other "shots in the dark" we should try? We're running out of ideas after we get through our security debug.

Update (2018-4-20):

We spoke with the security team, and went through enabling and disabling various antivirus components. When we turned off McAfee Host IPS (HIPS), we were able to start any of our containers, as expected. When we turn it back on, the containers break again! We've found an alert in the HIPS log for a denied registry read that matches up time-wise with our debug session, and we've traced that registry access back to the docker.exe process using Process Monitor from Microsoft Sysinternals. Looks like we have our culprit!

I'll report back after we add a whitelist entry for the rule and confirm the fix.

Best Answer

The solution

In this case, McAfee Host Intrusion Prevention Service (HIPS) was the issue preventing Docker from running. McAfee HIPS provides a number of intrusion monitoring rules, and one of them preventing unwarranted registry access was getting triggered. We disabled the rule for docker.exe, and it's been smooth sailing since!

Steps to debug

We identified this by debugging Docker by disabling individual security components until docker was able to function, and then re-enabling everything except HIPS to verify that nothing else was interfering. We then re-enabled HIPS, reproduced the issue, and checked the HIPS logs for an alert that matched up time-wise.

The Docker CLI was attempting to access the following registry keys and being denied access:

HKLM\Software\Microsoft\Windows NT\CurrentVersion\Image File Execution Options
HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\docker.exe

We used Process Monitor from Microsoft Sysinternals to verify the registry access was associated with docker.exe.

Finally, we disabled that HIPS rule for docker.exe, and now we can successfully run arbitrary containers.

Related Topic