Current Load is WARNING – why

cpu-usagepostfix

On a VPS with 1 CPU core and 2GB RAM, I run a mysql+apache2 for a low traffic website. Sometimes the machine slows down or stops delivering through apache or mysql.

That's why I set up nagios which is sending me alerts like "Service Alert: localhost/Current Load is WARNING" after 5-10 days of running. Then I can login through SSH and check RAM with "free" which is still enough, 500MB+ available and only 60MB of swap in use.

Since the system slowed down again, I checked the syslog and found lots of these entries:

Jun 30 23:46:31 cl22 postfix/error[2190]: 46D8974323:
to=, relay=none, delay=294806,
delays=294803/3/0/0, dsn=4.4.3, status=deferred (delivery temporarily
suspended: Host or domain name not found. Name service error for
name=zombine.com type=MX: Host not found, try again) Jun 30 23:46:31
cl22 postfix/error[2193]: 49CB374123: to=,
relay=none, delay=154189, delays=154185/3.1/0/0, dsn=4.4.3,
status=deferred (delivery temporarily suspended: Host or domain name
not found. Name service error for name=zombine.com type=MX: Host not
found, try again) Jun 30 23:46:31 cl22 postfix/error[2153]:
4E2C874250: to=, relay=none, delay=433708,
delays=433704/3.1/0/0, dsn=4.4.3, status=deferred (delivery
temporarily suspended: Host or domain name not found. Name service
error for name=zombine.com type=MX: Host not found, try again) Jun 30
23:46:31 cl22 postfix/error[2176]: 480D874180:
to=, relay=none, delay=174308,
delays=174304/3.1/0/0, dsn=4.4.3, status=deferred (delivery
temporarily suspended: Host or domain name not found. Name service
error for name=zombine.com type=MX: Host not found, try again)

How can I find out which process is consuming all the load? It's really lots of overload for a 1-core VPS: WARNING – load average: 3.06, 5.79, 3.42

mysql is OK, apache2 seems to be OK. postfix maybe not? anything else I did not identify yet?

Please let me know how to find out the bad process and temporarily renice or un-priorize postfix etc. to make sure that apache2 and mysql remain healthy. These 2 processes are important to me. The outgoing emails, too, because it's sending messages to clients.

Best Answer

That error you're seeing is not an error related to the email address; it's a DNS problem. Make sure you can see the MX record for your domain zombine.com from this server if it sends emails:

dig in mx zombine.com

Postfix will continue attempting to send these emails over and over for days in case of a "recoverable" failure like this one.

One other thing to check out is whether you are having disk load problems (check out the hardware interrupt CPU usage, "hi", in top). If that is the issue, you can install and run iotop to see what is taking up all the load.

You can configure these parameters (in days) to adjust how long postfix tries to deliver undeliverable mail for:

maximal_queue_lifetime (normal messages)
bounce_queue_lifetime (bounce messages)

Additionally, make sure the following settings are correct to ensure you are not operating an open relay (this can be a source of unwanted SMTP traffic as people use your server to send spam):

mynetworks (in this case probably 127.0.0.1/32)
smtpd_client_restrictions (probably permit_mynetworks, reject_unauth_destination)

Then, empty your mail queue:

postsuper -d ALL

This is case-sensitive for safety reasons. You should then find that postqueue -p shows an empty queue.