Cannot connect to Cloud SQL Postgres from GKE via Private IP

google-cloud-platformgoogle-cloud-sqlgoogle-kubernetes-enginevpc-peering

I am having trouble accessing a Cloud SQL instance running Postgres from a GKE cluster using the database's private IP. All the documentation I've found suggests using a VPC-enabled cluster to accomplish this, but I am still having trouble reaching the database.

Specifically, I can reach the database from the nodes in my cluster, but I cannot reach the database from within a container on the node unless I run the docker container using the host's network. This leads me to believe that I have a misunderstanding with how the networking components of a GCP VPC and Kubernetes interact with each other.

VPC

My VPC has one subnet with two secondary ranges:

IP Range: 10.0.0.0/16
Secondary Range – pods: 10.1.0.0/16
Secondary Range – services: 10.2.0.0/16

This is created using the following Terraform configuration:

resource "google_compute_subnetwork" "cluster" {
  ip_cidr_range            = "10.0.0.0/16"
  name                     = "cluster"
  network                  = google_compute_network.vpc.self_link

  secondary_ip_range {
    ip_cidr_range = "10.1.0.0/16"
    range_name    = "pods"
  }

  secondary_ip_range {
    ip_cidr_range = "10.2.0.0/16"
    range_name    = "services"
  }
}

Database

My cloud SQL database is running Postgres 11 and configured to only allow connections via private IP. I have set up a peering connection with a set of global compute addresses to allow access to the Cloud SQL instance from my VPC. In this case I ended up with the following values:

Private Service Connection IP Range: 172.26.0.0/16
Database Private IP: 172.26.0.3

These resources are provisioned with the following Terraform configuration:

resource "google_compute_global_address" "db_private_ip" {
  provider = "google-beta"

  name          = "db-private-ip"
  purpose       = "VPC_PEERING"
  address_type  = "INTERNAL"
  prefix_length = 16
  network       = google_compute_network.vpc.self_link
}

resource "google_service_networking_connection" "db_vpc_connection" {
  network                 = google_compute_network.vpc.self_link
  service                 = "servicenetworking.googleapis.com"
  reserved_peering_ranges = [google_compute_global_address.db_private_ip.name]
}


resource "google_sql_database_instance" "db" {
  depends_on = [google_service_networking_connection.db_vpc_connection]

  database_version = "POSTGRES_11"

  settings {
    availability_type = "ZONAL"
    tier              = "db-f1-micro"

    ip_configuration {
      ipv4_enabled    = false
      private_network = google_compute_network.vpc.self_link
    }
  }
}

Cluster

My GKE cluster is configured to be VPC-native and to use the secondary ranges from the cluster subnet of the VPC. Some of the relevant cluster information:

Master Version: 1.14.8-gke.17
Network: my-vpc
Subnet: cluster
VPC-native: Enabled
Pod address range: 10.1.0.0/16
Service address range: 10.2.0.0/16

The cluster is created using the following Terraform configuration:

resource "google_container_cluster" "primary" {
  location           = var.gcp_region
  min_master_version = data.google_container_engine_versions.latest_patch.latest_master_version
  name               = "my-cluster"
  network            = google_compute_network.vpc.self_link
  subnetwork         = google_compute_subnetwork.cluster.self_link

  # We can't create a cluster with no node pool defined, but we want to only use
  # separately managed node pools. So we create the smallest possible default
  # node pool and immediately delete it.
  remove_default_node_pool = true
  initial_node_count       = 1

  ip_allocation_policy {
    use_ip_aliases                = true
    cluster_secondary_range_name  = "pods"
    services_secondary_range_name = "services"
  }

  master_auth {
    username = ""
    password = ""

    client_certificate_config {
      issue_client_certificate = false
    }
  }
}

Connection Attempts

I've made attempts to connect to the database from many different contexts to try to figure out the problem.

Standalone Instance

I spun up a new Ubuntu compute VM in my VPC and was able to connect to the database using both nping and psql.

From a Container on a Node

By either using kubectl attach on a pod in my cluster or SSH-ing into a node and running my own docker command, I see that all packets to the database do not make it.

# SSH-ing and running a docker container.
docker run -it ubuntu /bin/bash -c 'apt update && apt install -y nmap && nping --tcp -p 5432 172.26.0.3'

From a Container on a Node with Host Networking

If I repeat the command from above but use the host's network, I can connect to the database.

docker run -it --net host ubuntu /bin/bash -c 'apt update && apt install -y nmap && nping --tcp -p 5432 172.26.0.3'

Suggestions?

Seeing as most questions about connecting to a Cloud SQL instance from GKE via private IP are solved when they configure their cluster to be VPC-native, I assume my problem lies somewhere in my networking configuration. I would appreciate any suggestions and I'm happy to provide any additional information. Thanks.

Related Questions

Issue Connecting to Cloud SQL Postgres using Private IP from GKE

Update 2019-12-05

Converting the commands from the related question linked above into Terraform (call this the MVP config), I am able to connect to the Postgres instance using a private IP so I now believe the issue lies deeper in my configuration. I still haven't determined which exact piece of my infrastructure differs from the MVP config.

My next attempt will probably be to enhance the MVP config to use a separately configured node pool rather than the default node pool to see if that accounts for the behavior I am seeing.

Best Answer

There are specific network requirements Cloud SQL instances must adhere to when communicating via a private connection. One of which is that your CloudSQL and GKE instances are located in the same region and VPC network. [1]

Regarding "I cannot reach the database from within a container on the node", does this mean you have your database and container located in different networks? If so, you cannot access a Cloud SQL instance on its private IP address from another network using a Cloud VPN tunnel, instance based VPN, or Cloud interconnect.

[1] https://cloud.google.com/sql/docs/mysql/private-ip#network_requirements.

Related Topic