C Performance – How Efficient is malloc and How Do Implementations Differ?

cmallocperformance

If I use malloc, does malloc always use the same algorithm regardless of what it is allocating or does it look at the data and select an appriopriate algorithm?

Can we make malloc faster or smarter by choosing a more efficient algorithm? In my tests, the builtin official system malloc of Ubuntu is 10 times slower than a school project if my test results are correct. What is the catch? I'm surprised that malloc performed so bad in the tests because it should be optimized. Does it always use the same algorithm? Is there a reference implementation of malloc? If I want to look at the source for malloc , which should I look at? The tests I run are the following:

/* returns an array of arrays of char*, all of which NULL */
char ***alloc_matrix(unsigned rows, unsigned columns) {
    char ***matrix = malloc(rows * sizeof(char **));
    unsigned row = 0;
    unsigned column = 0;
    if (!matrix) abort();

    for (row = 0; row < rows; row++) {
        matrix[row] = calloc(columns, sizeof(char *));
        if (!matrix[row]) abort();
        for (column = 0; column < columns; column++) {
            matrix[row][column] = NULL;
        }
    }
    return matrix;
}

/* deallocates an array of arrays of char*, calling free() on each */
void free_matrix(char ***matrix, unsigned rows, unsigned columns) {
    unsigned row = 0;
    unsigned column = 0;
    for (row = 0; row < rows; row++) {
        for (column = 0; column < columns; column++) {
            /*    printf("column %d row %d\n", column, row);*/
            free(matrix[row][column]);
        }
        free(matrix[row]);
    }
    free(matrix);
}


int main(int agrc, char **argv) {

    int x = 10000;
    char *** matrix = alloc_matrix(x, x);
    free_matrix(matrix, x, x);
    return (0);
}

Is the test alright? I also use this test:

   for (i = 0; i < 1000000; i++) {
        void *p = malloc(1024 * 1024 * 1024);
        free(p);
   }
  • update

According to the comment, I should make variably sized chunks and free in different order than allocating, so I try:

int main(int agrc, char **argv) {
     int i;
    srand(time(NULL));
    int randomnumber;
    int size = 1024;
    void *p[size];
    for (i = 0; i < size; i++) {
        randomnumber = rand() % 10;
        p[i] = malloc(1024 * 1024 * randomnumber);
    }

    for (i = size-1; i >= 0; i--) {
        free(p[i]);
    }

    int x = 1024;
    char *** matrix = alloc_matrix(x, x);
    free_matrix(matrix, x, x);
    return (0);
}

Then my custom malloc is no longer faster:

$ time ./gb_quickfit 

real    0m0.154s
user    0m0.008s
sys 0m0.144s
dac@dac-Latitude-E7450:~/ClionProjects/omalloc/openmalloc/overhead$ time ./a.out 

real    0m0.014s
user    0m0.008s
sys 0m0.004s

The algorithm I used was:

void *malloc_quick(size_t nbytes) {

    Header *moreroce(unsigned);
    int index, i;
    index = qindex(nbytes);

    /* 
     * Use another strategy for too large allocations. We want the allocation
     * to be quick, so use malloc_first().
     */

    if (index >= NRQUICKLISTS) {
        return malloc_first(nbytes);
    }

    /* Initialize the quick fit lists if this is the first run. */
    if (first_run) {
        for (i = 0; i < NRQUICKLISTS; ++i) {
            quick_fit_lists[i] = NULL;
        }
        first_run = false;
    }


    /*
     * If the quick fit list pointer is NULL, then there are no free memory
     * blocks present, so we will have to create some before continuing.
     */

    if (quick_fit_lists[index] == NULL) {
        Header* new_quick_fit_list = init_quick_fit_list(index);
        if (new_quick_fit_list == NULL) {
            return NULL;
        } else {
            quick_fit_lists[index] = new_quick_fit_list;
        }
    }

    /*
     * Now that we know there is at least one free quick fit memory block,
     * let's use return that and also update the quick fit list pointer so that
     * it points to the next in the list.
     */

    void* pointer_to_return = (void *)(quick_fit_lists[index] + 1);
    quick_fit_lists[index] = quick_fit_lists[index]->s.ptr;
   /* printf("Time taken %d seconds %d milliseconds", msec/1000, msec%1000);*/
    return pointer_to_return;
}

Best Answer

There are multiple implementations of malloc and they can use very different algorithms. Two very widely used implementations are jemalloc and dlmalloc. In both cases the links have a lot of information about the thought process and trade-offs made in a general purpose allocator.

Bear in mind a malloc implementation has very little information to go on, just the size of the allocation requested. A free implementation only has the pointer and whatever data 'malloc' may have secretly attached to it. As such there ends up being a fair amount of heuristics i.e. inspired guesswork in deciding how to allocate and deallocate.

Some issues that an allocator needs to address are:

  1. alignment - some memory alignment are faster than others
  2. fragmentation - naive allocation and freeing can leave holes that cause bloat
  3. performance - going back to the OS for more memory can be expensive
  4. Common requests - in practice applications (especially C++) often do a lot of small allocations so making these efficient can help a lot

Taking all this into account, what you'll find is that the allocators tend to be complex pieces of software so that, in general usage, they perform well. However, in specific cases, it may well be better to manage memory outside the allocator if you happen to know a lot more about the structure of the data. Obviously the downside is that you need to do the work yourself.

Related Topic