We are running load test against an application that hits a Postgres database.
During the test, we suddenly get an increase in error rate.
After analysing the platform and application behaviour, we notice that:
- CPU of Postgres RDS is 100%
- Freeable memory drops on this same server
And in the postgres logs, we see:
2018-08-21 08:19:48 UTC::@:[XXXXX]:LOG: server process (PID XXXX) was terminated by signal 9: Killed
After investigating and reading documentation, it appears one possibility is linux oomkiller running having killed the process.
But since we're on RDS, we cannot access system logs /var/log messages to confirm.
So can somebody:
- confirm that oom killer really runs on AWS RDS for Postgres
- give us a way to check this ?
- give us a way to compute max memory used by Postgres based on number of connections ?
I didn't find the answer here:
Best Answer
Even if the OOM killer did not act (it probably did), sustained 100% CPU and very low free memory is bad for performance.
Use a larger instance size and see if the problem goes away. Test a smaller size on a non-RDS Postgres you control and see if the OOM killer gets angry.
Number of connections is not necessarily the dominating factor in memory consumption: shared memory is used for other things, and not every query uses a large chunk of memory. See also this conversation: PostgreSql allocate memory for each connection.
Additional advice from Best Practices for Amazon RDS