How to minimise SpamAssassin (spamd) memory use

memoryoptimizationspamspamassassin

I'm using SpamAssassin on Debian (the default configuration with Pyzor, AWL and Bayes disabled, and sa-compile enabled), and each of the spamd child processes consume around 100 to 150MB of memory (around 50MB of real memory) on the 32 bit servers, and about double this (logically enough) on the 64 bit servers. There are generally two child processes, but at busy times there can be five (the maximum) running.

ISTM that 200 to 600MB is a lot of memory for this task. I'd like to continue using SA as part of my filtering structure, but it's becoming difficult to justify so much memory.

Are there any ways to reduce the amount of memory that each child process uses? (Or alternatively, make a single child process so fast that I can set the maximum children to something like 2?). I'm willing to consider any options, including ones that will or may result in reduced accuracy.

I've already read the "Out of Memory Problems" page on the SA wiki; nothing there is of any use. Messages larger than 5 MB are not scanned with SA.

Best Answer

I think you're misunderstanding the way Linux reports memory usage. When a process forks, it results in a second process that shares a lot of resources with the original process. Included in that is memory. However, Linux uses a technique known as Copy On Write (COW) for this. What that means is that each forked child process will see the same data in memory as the original process, but whenever that data changes (by the child or parent), the changes are copied and only then point to a new location.

Until one of the processes makes changes to that data, they are sharing the same copy. As a result, I could have a process that uses 100MB of RAM, and fork it 10 times. Each of those forked processes would show 100MB of RAM being used, but if you looked at the overall memory usage on the box, it might only show that 130MB of RAM is being used (100MB shared between the processes, plus a few MB of overhead, plus another dozen MB or two for the rest of the system).

As a final example, I have a box right now with 30 apache processes running. Each process is showing a usage of 22MB of RAM. However, when I run free -m to show my overall RAM usage, I get:

topher@crucible:/tmp$ free -m
             total       used       free     shared    buffers     cached
Mem:           349        310         39          0         24         73
-/+ buffers/cache:        212        136
Swap:          511         51        460

As you can see, this box doesn't even have enough RAM to run 30 processes that were each using 18MB of "real" RAM. Unless you're literally running out of RAM or your apps are swapping heavily, I wouldn't worry about things.

UPDATE: Also, check out this tool called smem, mentioned by jldugger in the answer to another question on Linux memory usage here.

Related Topic