Linux – Set default numa policy to “interleave” system wide

central-processing-unitlinuxmemorynumaredhat

I know it is possible to set the numa mode to "interleave" (see NB below) for a specific process using numactrl --interleave, but I'd like to know if it is possible to make this the system wide default (aka change the "system policy"). For example, if there a kernel boot flag to achieve this?

NB: here I'm talking about the kernel behavior which interleaves allocated pages across NUMA nodes – not the memory controller behavior setting at the BIOS level which interleaves cache lines across

Best Answer

If using RHEL/CentOS/Fedora, I'd suggest using the numad daemon. (Red Hat paywall link).

While I don't have much use for the numactl --interleave directive, it seems you've determined that your workload requires it. Can you explain why this is the case in order to provide some better context?

Edit:

It seems that most applications that recommend explicit numactl definition either make a libnuma library call or incorporate numactl in a wrapper script.

For the numad side, there's a configuration option that can be specified on the command line or in /etc/numad.conf...

-K <0|1>
   This option controls whether numad keeps interleaved  memory  spread  across  NUMA  nodes,  or
   attempts to merge interleaved memory to local NUMA nodes.  The default is to merge interleaved
   memory.  This is the appropriate setting to localize processes in a  subset  of  the  system’s
   NUMA  nodes.   If  you  are running a large, single-instance application that allocates inter-
   leaved memory because the workload will have continuous unpredictable memory  access  patterns
   (e.g. a large in-memory database), you might get better results by specifying -K 1 to instruct
   numad to keep interleaved memory distributed.

Some say that trying this with something like numad -K 1 -u X, where X is 100 x core count, may help for this. Try it.

Also see HP's ProLiant Whitepaper on Linux and NUMA.