Varnish – How to Fix 502 Errors on Large Files

varnish

Whenever I am serving large files with Varnish, is there some way to make Varnish just pipe files that are larger than a certain size? Varnish is giving me some 502 errors on large files (For example, 845472505 bytes (6.3G)) which I assume is because I only have 256m of malloc cache.

Current configuration:

DAEMON_OPTS="-a :80 -T localhost:6082 -f /etc/varnish/default.vcl -S /etc/varnish/secret -p send_timeout=604800 -p http_resp_hdr_len=8192 -s malloc,256m"

Note that I cannot just 'remove' the file – I just dont want to cache files over a certain size.

EDIT:
Since I cant answer my own question until 8 hours have passed…

So it seems that my backend timeout was not set high enough to transfer the file. I increased it, and it seems to have stopped giving me 502 errors on large files. I also converted to using the file cache (this was on a SSD VPS) so that I could store the whole file.

Best Answer

If you wish to prevent Varnish from caching a certain file, you can just send a no-cache header from your backend server. For example:

Cache-Control: private, no-cache, no-store, max-age=0

Alternatively you could examine the Content-Length header in VCL and take action based on that.

sub vcl_fetch {

  if (beresp.http.Content-Length !~ "[0-9]{10,12}") {
    return(deliver);
  }
}

MySQL

Let's say you want to use half of your 400 MB, giving MySQL 200MB. Let me make it clear that I'm not familiar with Drupal's requirements, or whether it uses MyISAM or InnoDB. If you were configuring InnoDB you'd use the innodb_buffer_pool_size variable and just set that to 200M. You can expect MySQL to use more than this however for things like query cache (if used), open tables, connection handling, thread tracking, sort buffers, join buffers, and countless other configuration options. If you're using MyISAM it's even more complicated because there are a lot more variables involved, key_buffer and myisam_sort_buffer are just two of several. So, assuming InnoDB with a 200M innodb_buffer_pool_size and the query cache disabled let's say MySQL consumes 216 MB of RAM.

Apache

You now have 184 MB of RAM left for Apache to use. First, lets take a moment to clear up some of the really confusing things in your question.

I've learned that I should calculate my average Apache process size in Mb by MaxClients, and that this number shouldn't go above the memory available to the system.

No. You observe your average httpd process size when your site is in use. Using the average size per httpd process (assuming prefork MPM, the default) you calculate what MaxClients can be so that you don't exceed the memory allotted to httpd or the machine, causing it to swap.

Each process size is a little under 7% (this would be about 1.4Mb, right?) according to Top. 512/1.5 = 341... this seems awfully big to me. Am I misunderstanding something?

Yes, you are. First, stop using percentage to "calculate" the size of the httpd processes.

Edit

Wait! What? 7% of 512 is 35.84. I'm not sure where you got 1.4 Mb from. My answer still stands, and I won't be adjusting my answer to compensate for your 35M httpd processes.

End Edit

The size of the httpd processes is listed plainly in top under the RES column. For example:

top - 21:48:44 up 168 days,  4:46,  1 user,  load average: 0.02, 0.09, 0.08
Tasks:  66 total,   2 running,  64 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2072832k total,  1994784k used,    78048k free,   407976k buffers
Swap:   787176k total,      300k used,   786876k free,  1321988k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                       
 8725 mysql     20   0  729m  69m 3396 S  0.0  3.4  53:45.84 mysqld                                                                                                         
22217 apache    20   0  161m  17m 7960 S  0.0  0.9   0:08.42 apache2                                                                                                        
26193 apache    20   0  161m  16m 6944 S  0.0  0.8   0:00.26 apache2                                                                                                        
 4470 apache    20   0  161m  16m 6948 S  0.0  0.8   0:01.52 apache2                                                                                                        
 6193 apache    20   0  161m  15m 6608 S  0.0  0.8   0:01.35 apache2                                                                                                        
 4014 apache    20   0  161m  15m 6616 S  0.0  0.8   0:01.48 apache2                                                                                                        
 6939 apache    20   0  161m  15m 6608 S  0.0  0.8   0:01.48 apache2                                                                                                        
 6685 apache    20   0  161m  15m 6608 S  0.0  0.8   0:01.32 apache2                                                                                                        
26146 apache    20   0  161m  15m 6604 S  0.0  0.8   0:00.38 apache2                                                                                                        
 6443 apache    20   0  161m  14m 5712 S  0.0  0.7   0:01.38 apache2                                                                                                        
26450 apache    20   0  161m  14m 5704 S  0.0  0.7   0:00.19 apache2                                                                                                        
 2524 root      20   0  159m  13m 5524 S  0.0  0.6   0:03.34 apache2

Technically the memory used per process the difference of RES and SHR (shared). This is because SHR is memory shared by other processes. You can see on the example I showed you this is roughly 9 MB on average, for my unique use case. This is simply a machine running Cacti with virtually no traffic -- maybe 5-10 hits per day if I happen to look at it. I am skeptical that Drupal utilizes so little memory, but you'll be able to easily tell. It definitely uses much more than 1.4 MB.

Now, lets take a very unrealistic assumption that your httpd processes will utilize an even 10 MB of RAM every time. With 184 MB of RAM "allocated" for Apache, this leaves you with a MaxClients of 18 (10 MB * 18 = 180 MB). Much much much less than 341.

Varnish

First, lets evaluate the current state of your server. Assuming you properly configured MySQL and httpd to not swap under load, you're running with what is a pretty anemic MySQL configuration and an httpd configuration that will start to refuse requests if you ever get more than 18 concurrent requests. By any standards this machine is in no shape to handle traffic that will "grow very quickly".

Now you want to add a third application in and allocate 256MB of RAM to it?! That RAM will have to come from either MySQL or Apache, and maybe you could get away with stealing some from the OS itself. Either way you're further gimping one of the core services on your machine.

It's technically possible that you could find the sweet spot of configuration settings for Varnish, Apache, and MySQL on the same host that allowed all to operate at ideal efficiency with just the right amount of RAM, but I'm skeptical.

The Solution

Use what I've taught you about configuring MySQL and Apache correctly to do exactly that: configure them correctly. Your MaxClients should be nowhere near 300, very likely under 20, and quite possibly under 10. Another thing I haven't mentioned is that httpd processes can be a little reluctant to relinquish RAM when they've "peaked" much higher than the average. e.g. If a httpd worker hit 20MB for a single request, that worker will continue to use 20MB indefinitely (afaik) until it is reaped. You can address this by lowering your MaxRequestsPerChild setting. Lowering this means child processes are reaped more frequently. This will slow down your performance under load (forking new processes is relatively expensive), but it will help keep your memory usage manageable.

Configured properly your server should never swap. If you configure your server properly and you see issues such as refused connections under load, then I would suggest either expanding your VM, or look into adding Varnish on a separate dedicated VM.

You're off to the right start by reading the docs and seeking help online. If you get stuck or need in-depth help please feel free to ask in another question, but don't forget to search first! It's quite possible that you can find your answer in another.

Static file download from browser breaking in varnish but works fine in Apache

You are hitting the Varnish send_timeout limit. The default value for send_timeout was 600s, with Varnish 3.0 it was changed to 60s. This may intefere with downloads taking longer than 60s.

You can check the value of the send_timeout parameter with varnishadm:

varnishadm param.show send_timeout

This will output something like:

send_timeout           60 [seconds]
                       Default is 60
                       Send timeout for client connections. If the HTTP
                       response hasn't been transmitted in this many
                       seconds the session is closed. 
                       See setsockopt(2) under SO_SNDTIMEO for more
                       information.

                       NB: This parameter may take quite some time to
                       take (full) effect.

You can set it to 600s with:

varnishadm param.set send_timeout 600s

To make this setting persistent, you have to add "-p sendtimeout 600" to the startup parameters of Varnish. This depends on the distribution you are using. In case of Debian/Ubuntu yo may want to edit /etc/default/varnish.

Best Answer

Related Solutions

Configuring HTTPD Behind Varnish on a 512MB VPS

MySQL

Apache

Varnish

The Solution

Static file download from browser breaking in varnish but works fine in Apache

Related Topic