High CPU caused by pthread_cond_wait or nanosleep

cpu-usage

First of all, I've searched in stackoverflow.com and googled, but no effective results are found.

My question is: Why does pthread_cond_wait consume so much CPU? I don't think it's normal.

My program suffered that %CPU became high intermittently and stayed high for more than ten seconds. When %CPU was stably low, it was around 1. When it became high, it was between 50 and 300.

I used top -H -p to find the single thread consuming the most CPU when %CPU of the process became high, and then I used strace -T -r -c -p to find more information:

strace -T -r -c -p 1701

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 88.54    0.482646          43     11157      3020 futex
  9.85    0.053682           0    131052           read
  1.50    0.008192          38       213           nanosleep
  0.04    0.000214           1       239           write
  0.03    0.000154           1       213           open
  0.02    0.000111           1       213           munmap
  0.02    0.000085           0       239           stat
  0.01    0.000044           0       213           mmap
  0.00    0.000018           0       213           close
  0.00    0.000000           0       213           fstat
  0.00    0.000000           0       213           lseek
------ ----------- ----------- --------- --------- ----------------
100.00    0.545146                144178      3020 total

And the stack of that thread:

Thread 6 (Thread 0x7f1404f41700 (LWP 1701)):

#0  0x0000003d6f60b63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000406045 in foo(void*) ()
#2  0x0000003d6f607a51 in start_thread () from /lib64/libpthread.so.0
#3  0x0000003d6eee893d in clone () from /lib64/libc.so.6

And related code snippet:

static std::deque<std::string> conveyor;
static pthread_mutex_t conveyor_mtx = PTHREAD_MUTEX_INITIALIZER;
static pthread_cond_t conveyor_cond = PTHREAD_COND_INITIALIZER;

#define POP_NUM 4

static void *foo(void *arg)
{
    write_log(LOG_INFO, "thread foo created");

    int ret = pthread_detach(pthread_self());
    if (ret != 0) {
        write_log(LOG_ERR, "pthread_detach[foo] failed with errno[%d]", ret);
        write_log(LOG_ERR, "thread foo exiting");
        return (void *)-1;
    }

    std::string paths[POP_NUM], topic;
    int n;

    do {
        if ((ret = pthread_mutex_lock(&conveyor_mtx)) != 0) {
            write_log(LOG_WARNING, "pthread_mutex_lock[conveyor_mtx] failed"
                      " with errno[%d]", ret);
            sleep(1);
            continue;
        }
        while (conveyor.empty()) {
            write_log(LOG_INFO, "conveyor empty");
            pthread_cond_wait(&conveyor_cond, &conveyor_mtx);
        }
        for (n = 0; n < POP_NUM; n++) {
            paths[n].assign(conveyor.front());
            conveyor.pop_front();
            if (conveyor.empty()) break;
        }
        if ((ret = pthread_mutex_unlock(&conveyor_mtx)) != 0) {
            write_log(LOG_WARNING, "pthread_mutex_unlock[conveyor_mtx] failed"
                      " with errno[%d]", ret);
        }
        for (int i = 0; i < n; i++) {
            if (!extract_topic_from_path(paths[i], topic)) continue;
            produce_msgs_and_save_offset(topics[topic],
                                      const_cast<char *>(paths[i].c_str()));
        }
    } while (true);

    write_log(LOG_ERR, "thread foo exiting");

    return (void *)0;
}

static void *bar(void *arg)
{
    write_log(LOG_INFO, "thread bar created");

    int inot_fd = (int)(intptr_t)arg, n, ret;
    struct pollfd pfd = { inot_fd, POLLIN | POLLPRI, 0 };

    do {
        //n = poll(&pfd, 1, -1);
        //n = poll(&pfd, 1, 300000);
        n = poll(&pfd, 1, 120000);
        if (n == -1) {
            if (errno == EINTR) {
                write_log(LOG_WARNING, "poll interrupted by a signal");
                continue;
            }
            write_log(LOG_ERR, "poll failed with errno[%d]", errno);
            write_log(LOG_ERR, "thread bar exiting");
            return (void *)-1;
        } else if (n == 0) {
            write_log(LOG_WARNING, "poll timed out after 120 seconds");
            sleep(60);
        }

        int i;
        for (i = 0; i < 3; i++) {
            if ((ret = pthread_mutex_lock(&conveyor_mtx)) != 0) {
                write_log(LOG_WARNING, "pthread_mutex_lock[conveyor_mtx] failed"
                          "[%d] with errno[%d]", i, ret);
                continue;
            } else {
                break;
            }
        }
        if (i == 3) {
            write_log(LOG_ERR, "thread bar exiting");
            return (void *)-1;
        }
        if ((n = baz(inot_fd)) > 0) {
            pthread_mutex_unlock(&conveyor_mtx);
            pthread_cond_broadcast(&conveyor_cond);
        } else if (n == 0) {
            pthread_mutex_unlock(&conveyor_mtx);
        } else {
            pthread_mutex_unlock(&conveyor_mtx);
            pthread_cond_broadcast(&conveyor_cond);
            write_log(LOG_ERR, "thread bar exiting");
            return (void *)-1;
        }

        if ((n = poll_producer(producer, 1000, 2)) > 0) {
            write_log(LOG_INFO, "rdkafka poll events[%d] of producer"
                      " for possible big outq size", n);
        }
    } while (true);

    write_log(LOG_ERR, "thread bar exiting");

    return (void *)0;
}

And what's more, if I didn't use pthread_cond_wait/pthread_cond_broadcast, and replaced the "pthread_cond_wait" in the above snippet with "sleep", strace would show that the most costly syscall was nanosleep.

uname -a Linux d144122 2.6.32-358.el6.x86_64 #1 SMP Fri Feb 22 00:31:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Best Answer

I had this issue with a kernel that did not have support for the futex syscall.

The relevant kernel option is CONFIG_FUTEX. Make sure that your kernel was built with that option (most usually are).

The fact that your strace output shows so many futex errors leads me to strongly suspect that this is the problem.

(I know that this question is quite old, but this was a frustrating problem and I wanted to document the solution for other poor lost souls)

Related Topic