Google Cloud: Health Check is not removing failed instance from HTTP Load Balancer

google-cloud-platformgoogle-compute-engineload balancing

I have an instance group with 2 instances behind a HTTP Load Balancer. one instance is up and functioning normally (returning http 200), the other is crashed (HTTP requests timeout). I am not sure what I'm doing wrong, but according to the documentation the failed instance should automatically be removed from the load balancer.

Here is the related docs: https://cloud.google.com/compute/docs/load-balancing/health-checks with the related paragraph:

For a health check to be deemed successful, the backend must return a
valid HTTP response with code 200 and close the connection normally
within the timeoutSec period. If an instance fails its health check,
it is removed from the group or pool without any notification being
sent. If it later passes a heath check, it is returned to the group or
pool, again without any notification.

Here is what I currently see on my google cloud console page for the HTTP Load Balancer's Backend.

enter image description here

When visiting my site (http://stage2.phantomjscloud.com) Rougly half the time I get

Error: Server Error The server encountered a temporary error and could
not complete your request. Please try again in 30 seconds.

The HTTP Load Balancer (and health check) clearly detects the failed instance, but traffic is still being served to it regardless.

How can I resolve the issue?

Best Answer

health checks managed instance groups VS health checks load balancing

The health checks used by managed instance groups are the same health checks used by load balancing, with some differences in behavior. Health checks that you apply to load balancing services help a load balancer determine where to direct network traffic. These health checks do not cause Compute Engine to recreate instances. Health checks that you apply to managed instance groups will proactively signal to the managed instance group to delete and recreate instances if they become UNHEALTHY.

For the majority of scenarios, use separate health checks for load balancing and for monitoring managed instance groups. Health checking for load balancing can and should be more aggressive since these health checks determine whether an instance receives user traffic. Since customers might rely on your services, you want to catch non-responsive instances quickly so you can redirect traffic if necessary. In contrast, health checking for instance groups will cause Compute Engine proactively replace failing instances so you could create health checks that are more conservative than health checks for a load balancer.

https://cloud.google.com/compute/docs/instance-groups/creating-groups-of-managed-instances#monitoring_groups

Related Topic