I was running into this with Apache, and the solution was a combination of the following (note that I'm using Apache/2.4.7 (Ubuntu) + varnish 3.0.5-2 on Ubuntu 14.04 LTS in AWS EC2):
Please keep in mind that this was made for an M3.Medium instance on Amazon EC2 (1x Intel Xeon E5-2670 core + 3.75GB RAM). Adjust as necessary for your hardware!
In /etc/default/varnish
, edit your start-up options:
DAEMON_OPTS="-a :80 \
-T localhost:6082 \
-f /etc/varnish/default.vcl \
-S /etc/varnish/secret \
-p thread_pools=2 \
-p thread_pool_max=600 \
-p listen_depth=1024 \
-p lru_interval=900 \
-p connect_timeout=600 \
-p max_restarts=6 \
-s malloc,1G"
In /etc/varnish/default.vcl
or whatever your VCL is, change the back-end timeouts (note that we're also setting these in /etc/default/varnish):
backend default {
.host = "127.0.0.1";
.port = "8000";
.connect_timeout = 600s;
.first_byte_timeout = 600s;
.between_bytes_timeout = 600s;
}
Disable KeepAlives. This page has more information (varies depending on back-end web server software): http://www.feedthebot.com/pagespeed/keep-alive.html
For Apache, all I had to do was change line 92 in /etc/apache2/apache2.conf
to the following:
KeepAlive Off
What I think is going on here is that the KeepAlives, as implemented in the back-end web server software, are sending explicit connection resets, which Varnish doesn't work well with. There is probably more to this story, and I encourage you to dig into this and post your findings here for future generations to learn from.
Additional reading:
- https://www.varnish-cache.org/trac/wiki/Future_Feature#Keepalivetimeoutonbackendconnections
( and a few more, but can't post the links. Some Googling for "varnish keepalive backend timeout" should surface what you want)
More debugging help:
If you're still stuck, try doing the following:
- start varnishlog -w err.log
on your Varnish server
- On your client, get Siege: http://www.joedog.org/siege-home/ and load it up with some of the URLs you've seen 503 (hint: urls.txt, use -i -b -c500 -r10
and it should be enough to trigger the 503s)
- start varnishlog -r temp -c -m 'TxStatus:503' > err-parsed.txt
. This will grab all the Varnish log entries where Varnish returned a 503. FWIW, here's the full text of one of my errors. TL;DR the error Varnish was reporting was FetchError c http first read error: -1 0 (Success)
:
936 SessionOpen c 10.8.226.98 51895 :80
936 ReqStart c 10.8.226.98 51895 357447130
936 RxRequest c GET
936 RxURL c /ip/69.120.68.54
936 RxProtocol c HTTP/1.1
936 RxHeader c Host: 10.201.81.157
936 RxHeader c Accept: */*
936 RxHeader c Accept-Encoding: gzip
936 RxHeader c User-Agent: Mozilla/5.0 (apple-x86_64-darwin11.4.2) Siege/3.0.5
936 RxHeader c Connection: close
936 VCL_call c recv lookup
936 VCL_call c hash
936 Hash c /ip/69.120.68.54
936 Hash c 10.201.81.157
936 VCL_return c hash
936 HitPass c 357445183
936 VCL_call c pass pass
936 Backend c 103 default default
936 FetchError c http first read error: -1 0 (Success)
936 Backend c 269 default default
936 FetchError c http first read error: -1 0 (Success)
936 VCL_call c error deliver
936 VCL_call c deliver deliver
936 TxProtocol c HTTP/1.1
936 TxStatus c 503
936 TxResponse c Service Unavailable
936 TxHeader c Server: Varnish
936 TxHeader c Content-Type: text/html; charset=utf-8
936 TxHeader c Retry-After: 5
936 TxHeader c Content-Length: 418
936 TxHeader c Accept-Ranges: bytes
936 TxHeader c Date: Thu, 05 Jun 2014 23:05:48 GMT
936 TxHeader c X-Varnish: 357447130
936 TxHeader c Age: 0
936 TxHeader c Via: 1.1 varnish
936 TxHeader c Connection: close
936 Length c 418
Hope this helps!
Best Answer
No one else has offered anything up here. I thought I would let you know you're not alone trying to find the answer.
I have the same question and I'm starting to think varnishstat perhaps doesn't report metrics with entirely null values.
I have a test-bed server that I might deliberately starve of cache storage and see what happens. If I can confirm this behaviour, I will report back.
Perhaps someone might offer their expertise?
Update OK I can confirm that varnishstat appears to save terminal space by not reporting counters that are entirely 0 or null.
This is because of the continuously updated display mode.
If you run varnishstat non-interactive / continuous then the stat does indeed show.