NGINX proxy to GCS bucket with redirect all urls to index.html, getting 200 blank responses on nested routes

google-cloud-platformgoogle-cloud-storagenginx

I am configuring a reverse-proxy from NGINX to a GCP Cloud Storage bucket containing static HTML, JS, image files, with a rewrite for all non-matching URLS to index.html since it is a single-page-application.

Config:

user nginx;
worker_processes  1;

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;


events {
    worker_connections  1024;
}


http {
    include           /etc/nginx/mime.types;
    default_type      application/octet-stream;
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for" upstream: "$upstream_addr"';
    access_log        /var/log/nginx/access.log  main;

    server_tokens     off;

    sendfile        on;

    keepalive_timeout  65;

    gzip              on;
    gzip_disable      "msie6";
    gzip_comp_level   6;
    gzip_min_length   1100;
    gzip_buffers      16 8k;
    gzip_proxied      any;
    gzip_types
        text/plain
        text/css
        text/js
        text/xml
        text/javascript
        application/javascript
        application/x-javascript
        application/json
        application/xml
        application/xml+rss;

    resolver          8.8.8.8 valid=300s ipv6=off;
    resolver_timeout  10s;

    upstream gcs {
       server          storage.googleapis.com:443;
       keepalive       128;
    }


    proxy_cache_path      /var/cache/nginx keys_zone=google-cloud-storage:10m inactive=1h;
    proxy_cache           google-cloud-storage;
    proxy_cache_key       "$host/$proxy_host$uri";
    proxy_cache_valid     200 1m;


    server {
        listen          8080;

        recursive_error_pages on;

        if ( $request_method !~ "GET|HEAD" ) {
            return 405;
        }

        location = / {
            rewrite ^.*$ /index.html last;
        }

        location = /healthz/ {
            access_log off;
            return 200;
        }

        location / {
            proxy_set_header        Host storage.googleapis.com;
            proxy_set_header        Cookie "";
            proxy_set_header        Authorization "";
            proxy_set_header        Connection "";
            proxy_hide_header       x-goog-hash;
            proxy_hide_header       x-goog-generation;
            proxy_hide_header       x-goog-metageneration;
            proxy_hide_header       x-goog-stored-content-encoding;
            proxy_hide_header       x-goog-stored-content-length;
            proxy_hide_header       x-goog-storage-class;
            proxy_hide_header       x-guploader-uploadid;
            proxy_hide_header       x-xss-protection;
            proxy_hide_header       x-goog-meta-goog-reserved-file-mtime;
            proxy_hide_header       accept-ranges;
            proxy_hide_header       alternate-protocol;
            proxy_hide_header       Set-Cookie;
            proxy_hide_header       Expires;
            proxy_hide_header       Cache-Control;
            proxy_ignore_headers    Set-Cookie;
            proxy_http_version      1.1;
            proxy_intercept_errors  on;
            proxy_method            GET;
            proxy_pass_request_body off;

            proxy_ignore_headers    "Expires" "Cache-Control";


            add_header              X-Cache $upstream_cache_status;



            error_page              404 =200 /index.html;



            expires 1h;
            add_header Cache-Control "private";


            proxy_pass              https://gcs/my-bucket-name$uri;
        }
    }
}

So here's the issue:

  1. Without a proxy_cache present, first request to /nested/path returns 200 OK with index.html
  2. A soft reload from the browser sends headers if-modified-since and/or if-none-match headers to the proxy, but gets a 200 OK response with blank content. (It should really be 304?)
  3. A hard reload returns 200 with the correct index.html content.
  4. With a proxy_cache present, 304 is correctly returned.
  5. Request to root path / behaves correctly without a proxy_cache.

How can I ensure correct behavior on a soft reload without a proxy_cache?

Best Answer

I could not make this work in any way, although here is a supposedly working example: https://github.com/presslabs/gs-proxy

I ended up mounting the bucket and simply using nginx by the file system.

See https://github.com/maciekrb/gcs-fuse-sample