Linux – Oracle 10g on RHEL – All memory gets eaten up nightly at 20.15

linuxmemorymemory leakoracleredhat

Okay, to start things off, I'm brand new to Oracle DB. Much experience with Microsoft products (boo) and Ubuntu linux, but RHEL and Oracle are very new to me.

The environment … Oracle DB 10g Standard v10.2.0.1.0 – 64bit, on RedHat Enterprise Linux 5. Total database size is less than 8GB.

The issue … Every night at 20.15 on the dot, the server's memory drops to ZERO. We previously had 4GB of RAM in the server, so our memory statistics always showed ~100MB available. We never knew there was an issue besides insufficient RAM until recently. The other day, we upgraded 4GB -> 12GB, and saw some legitimate memory available. Please reference the attached images for details.

Daily Memory Usage : This graph shows a period of 24 hours, where the beginning until 20.15 shows how memory usage is during the day (when users are heavily pounding the server). At 20.15, after all users have been gone from the building for over 2 hours, the memory pretty much disappears until the server is rebooted.

Weekly Memory Usage : This graph shows the period of time before we upgraded the memory, and the three days since upgrading. As you can see, there is essentially no physical RAM available at all before the upgrade, though there is memory being used for cache. After the upgrade, we have a huge chunk of available RAM and cache — until 20.15, when both disappear. Each time you see the amount of RAM jump is immediately after a reboot. This available RAM lasts until, you guessed it, 20.15.

The vendor who built this server has absolutely no answer for us. In fact, what they tell us is absolutely ridiculous, and makes them look grossly incapable. They truly have no idea about anything, and it's obvious. So there's no way we're going to get an answer that way. About a week ago, we were assured that the server did not need more RAM and had more than sufficient resources. It was also built with only two physical hard disks (2x146GB 15K RPM), I believe set up as a RAID1.

I have checked (what I believe to be) all scheduler jobs, all cron jobs, and any other timed tasks/jobs I can possibly find. I have disconnected all idle database sessions to no avail. The only evidence I can find pointing to the culprit is an Oracle process that begins taking up ~50% of the CPU after 20.15. During the day, there are a few dozen (~40+) Oracle processes that show about 2.2GB worth of VM usage each — this holds true immediately after a reboot as well, and also after the 20.15 "event".

I'm beyond stumped. And our software/hardware vendor is worthless.

Any suggestions or help would be greatly appreciated! Thank you!
~Laz Peterson

Best Answer

First off, what the Oracle people say isn't a "ridiculous answer", it is just that Oracle is running some housekeeping tasks at that time, probably through cron. If that time is inconvenient, you should ask how to reschedule them.

Your memory usage graphs look fine. When Linux loads data into memory, it stays there until the space is needed for something else (that is what free(1) reports as cache/buffers). The logic behind this is that deleting it is explicit work; if the data is needed again, it is available for free. There are 2GiB really free there. Unless this changes (i.e., memory leaks somewhere) I would not worry for now.