How does a single CPU handle Multi-threaded and multi-process applications

cpumultithreading

As I have read that for a multi-process application, a single CPU can handle only one task at a time, switching contexts between two processes. In a multi-threaded application, a single CPU can handle multiple threads. I do not understand this. Does the CPU handle one thread at a time if there is only one CPU? If yes, where is the advantage of having multi-threaded application vs multi-process application if CPU can handle one thing at a time.

Best Answer

TL;DR

Multithreading on a single core can speed up the application by using thread and instruction level parallelism.

If a single CPU has multiple cores it will run a process on each of the cores. If it does not, it will need to switch between processes on the single core.

Multithreading and multiprocessing can be combined for better results.

Full Explanation

Where multiprocessing systems include multiple complete processing units, multithreading aims to increase utilization of a single core by using thread-level as well as instruction-level parallelism. As the two techniques are complementary, they are sometimes combined in systems with multiple multithreading CPUs and in CPUs with multiple multithreading cores. Multithreading | WIkipedia

Example

A single CPU handles multi-threading in this way.

Let's say that we have two processes A and B which need to run a set of commands. After each command, the threads need the result. Here are the threads and the commands they need to run.

| # | Thread A | Thread B|
|---|----------|---------|
| 1 | 1        | 5       |
| 2 | 3        | 1       |
| 3 | Wait     | 3       |
| 4 | 4        | 2       |

Now lets look at how the CPU would execute those (theoretically)

CPU Starts with thread A

CPU Pipeline
x x x x x x (1)
1 x x x x x (2) # Thread A, command 1 (1)
5 1 x x x x (3) # Thread B, command 1 (5)
3 5 1 x x x (4) # Thread A, command 2 (3)
1 3 5 1 x x (5) # Thread B, command 2 (1)
3 1 3 5 1 x (6) # Thread B, command 3 (3)... A is waiting for result of command 2
2 3 1 3 5 1 (7) # Thread B, command 4 (2)
x 2 3 1 3 5 (8)
x x 2 3 1 3 (9)
x x x 2 3 1 (10)
4 x x x 2 3 (11) # Thread A, command 4... A now has the result and can continue.
x 4 x x x 2 (12)
x x 4 x x x (13)
x x x 4 x x (14)
x x x x 4 x (15)
x x x x x 4 (16)
x x x x x x (17)

This is how that would look without multi threading.

CPU Pipeline
x x x x x x (1)
1 x x x x x (2) # Thread A, command 1 (1)
3 1 x x x x (3) # Thread A, command 2 (3)... A is waiting for result of command 2
x 3 1 x x x (4)
x x 3 1 x x (5)
x x x 3 1 x (6)
x x x x 3 1 (7)
x x x x x 3 (8)
4 x x x x x (9)  # Thread A, command 4 (4)... A now has the result and can continue
x 4 x x x x (10)
x x 4 x x x (11)
x x x 4 x x (12)
x x x x 4 x (13)
x x x x x 4 (14)
5 x x x x x (15) # Thread B, command 1 (5)
1 5 x x x x (16) # Thread B, command 2 (1)
3 1 5 x x x (17) # Thread B, command 3 (3)
2 3 1 5 x x (18) # Thread B, command 4 (2)
x 2 3 1 5 x (19)
x x 2 3 1 5 (20)
x x x 2 3 1 (21)
x x x x 2 3 (22)
x x x x x 2 (23)
x x x x x x (24)

Thus with multithreading the threads would complete after 17 time steps, without it would take 24.

Questions?

Windows

Some of the above values are easily available from the appropriate Win32 API, I just list them here for completeness. Others, however, need to be obtained from the Performance Data Helper library (PDH), which is a bit "unintuitive" and takes a lot of painful trial and error to get to work. (At least it took me quite a while, perhaps I've been only a bit stupid...)

Note: for clarity all error checking has been omitted from the following code. Do check the return codes...!

Total Virtual Memory:
```
#include "windows.h"

MEMORYSTATUSEX memInfo;
memInfo.dwLength = sizeof(MEMORYSTATUSEX);
GlobalMemoryStatusEx(&memInfo);
DWORDLONG totalVirtualMem = memInfo.ullTotalPageFile;
```
Note: The name "TotalPageFile" is a bit misleading here. In reality this parameter gives the "Virtual Memory Size", which is size of swap file plus installed RAM.

Virtual Memory currently used:

Same code as in "Total Virtual Memory" and then

 DWORDLONG virtualMemUsed = memInfo.ullTotalPageFile - memInfo.ullAvailPageFile;

Virtual Memory currently used by current process:

#include "windows.h"
#include "psapi.h"

PROCESS_MEMORY_COUNTERS_EX pmc;
GetProcessMemoryInfo(GetCurrentProcess(), (PROCESS_MEMORY_COUNTERS*)&pmc, sizeof(pmc));
SIZE_T virtualMemUsedByMe = pmc.PrivateUsage;

Total Physical Memory (RAM):

Same code as in "Total Virtual Memory" and then
```
DWORDLONG totalPhysMem = memInfo.ullTotalPhys;
```
Physical Memory currently used:

Same code as in "Total Virtual Memory" and then
```
DWORDLONG physMemUsed = memInfo.ullTotalPhys - memInfo.ullAvailPhys;
```
Physical Memory currently used by current process:

Same code as in "Virtual Memory currently used by current process" and then
```
SIZE_T physMemUsedByMe = pmc.WorkingSetSize;
```

CPU currently used:

#include "TCHAR.h"
#include "pdh.h"

static PDH_HQUERY cpuQuery;
static PDH_HCOUNTER cpuTotal;

void init(){
    PdhOpenQuery(NULL, NULL, &cpuQuery);
    // You can also use L"\\Processor(*)\\% Processor Time" and get individual CPU values with PdhGetFormattedCounterArray()
    PdhAddEnglishCounter(cpuQuery, L"\\Processor(_Total)\\% Processor Time", NULL, &cpuTotal);
    PdhCollectQueryData(cpuQuery);
}

double getCurrentValue(){
    PDH_FMT_COUNTERVALUE counterVal;

    PdhCollectQueryData(cpuQuery);
    PdhGetFormattedCounterValue(cpuTotal, PDH_FMT_DOUBLE, NULL, &counterVal);
    return counterVal.doubleValue;
}

CPU currently used by current process:

#include "windows.h"

static ULARGE_INTEGER lastCPU, lastSysCPU, lastUserCPU;
static int numProcessors;
static HANDLE self;

void init(){
    SYSTEM_INFO sysInfo;
    FILETIME ftime, fsys, fuser;

    GetSystemInfo(&sysInfo);
    numProcessors = sysInfo.dwNumberOfProcessors;

    GetSystemTimeAsFileTime(&ftime);
    memcpy(&lastCPU, &ftime, sizeof(FILETIME));

    self = GetCurrentProcess();
    GetProcessTimes(self, &ftime, &ftime, &fsys, &fuser);
    memcpy(&lastSysCPU, &fsys, sizeof(FILETIME));
    memcpy(&lastUserCPU, &fuser, sizeof(FILETIME));
}

double getCurrentValue(){
    FILETIME ftime, fsys, fuser;
    ULARGE_INTEGER now, sys, user;
    double percent;

    GetSystemTimeAsFileTime(&ftime);
    memcpy(&now, &ftime, sizeof(FILETIME));

    GetProcessTimes(self, &ftime, &ftime, &fsys, &fuser);
    memcpy(&sys, &fsys, sizeof(FILETIME));
    memcpy(&user, &fuser, sizeof(FILETIME));
    percent = (sys.QuadPart - lastSysCPU.QuadPart) +
        (user.QuadPart - lastUserCPU.QuadPart);
    percent /= (now.QuadPart - lastCPU.QuadPart);
    percent /= numProcessors;
    lastCPU = now;
    lastUserCPU = user;
    lastSysCPU = sys;

    return percent * 100;
}

Linux

On Linux the choice that seemed obvious at first was to use the POSIX APIs like getrusage() etc. I spent some time trying to get this to work, but never got meaningful values. When I finally checked the kernel sources themselves, I found out that apparently these APIs are not yet completely implemented as of Linux kernel 2.6!?

In the end I got all values via a combination of reading the pseudo-filesystem /proc and kernel calls.

Total Virtual Memory:

#include "sys/types.h"
#include "sys/sysinfo.h"

struct sysinfo memInfo;

sysinfo (&memInfo);
long long totalVirtualMem = memInfo.totalram;
//Add other values in next statement to avoid int overflow on right hand side...
totalVirtualMem += memInfo.totalswap;
totalVirtualMem *= memInfo.mem_unit;

Virtual Memory currently used:

Same code as in "Total Virtual Memory" and then

long long virtualMemUsed = memInfo.totalram - memInfo.freeram;
//Add other values in next statement to avoid int overflow on right hand side...
virtualMemUsed += memInfo.totalswap - memInfo.freeswap;
virtualMemUsed *= memInfo.mem_unit;

Virtual Memory currently used by current process:

#include "stdlib.h"
#include "stdio.h"
#include "string.h"

int parseLine(char* line){
    // This assumes that a digit will be found and the line ends in " Kb".
    int i = strlen(line);
    const char* p = line;
    while (*p <'0' || *p > '9') p++;
    line[i-3] = '\0';
    i = atoi(p);
    return i;
}

int getValue(){ //Note: this value is in KB!
    FILE* file = fopen("/proc/self/status", "r");
    int result = -1;
    char line[128];

    while (fgets(line, 128, file) != NULL){
        if (strncmp(line, "VmSize:", 7) == 0){
            result = parseLine(line);
            break;
        }
    }
    fclose(file);
    return result;
}

Total Physical Memory (RAM):

Same code as in "Total Virtual Memory" and then

long long totalPhysMem = memInfo.totalram;
//Multiply in next statement to avoid int overflow on right hand side...
totalPhysMem *= memInfo.mem_unit;

Physical Memory currently used:

Same code as in "Total Virtual Memory" and then

long long physMemUsed = memInfo.totalram - memInfo.freeram;
//Multiply in next statement to avoid int overflow on right hand side...
physMemUsed *= memInfo.mem_unit;

Physical Memory currently used by current process:

Change getValue() in "Virtual Memory currently used by current process" as follows:

int getValue(){ //Note: this value is in KB!
    FILE* file = fopen("/proc/self/status", "r");
    int result = -1;
    char line[128];

    while (fgets(line, 128, file) != NULL){
        if (strncmp(line, "VmRSS:", 6) == 0){
            result = parseLine(line);
            break;
        }
    }
    fclose(file);
    return result;
}

CPU currently used:

#include "stdlib.h"
#include "stdio.h"
#include "string.h"

static unsigned long long lastTotalUser, lastTotalUserLow, lastTotalSys, lastTotalIdle;

void init(){
    FILE* file = fopen("/proc/stat", "r");
    fscanf(file, "cpu %llu %llu %llu %llu", &lastTotalUser, &lastTotalUserLow,
        &lastTotalSys, &lastTotalIdle);
    fclose(file);
}

double getCurrentValue(){
    double percent;
    FILE* file;
    unsigned long long totalUser, totalUserLow, totalSys, totalIdle, total;

    file = fopen("/proc/stat", "r");
    fscanf(file, "cpu %llu %llu %llu %llu", &totalUser, &totalUserLow,
        &totalSys, &totalIdle);
    fclose(file);

    if (totalUser < lastTotalUser || totalUserLow < lastTotalUserLow ||
        totalSys < lastTotalSys || totalIdle < lastTotalIdle){
        //Overflow detection. Just skip this value.
        percent = -1.0;
    }
    else{
        total = (totalUser - lastTotalUser) + (totalUserLow - lastTotalUserLow) +
            (totalSys - lastTotalSys);
        percent = total;
        total += (totalIdle - lastTotalIdle);
        percent /= total;
        percent *= 100;
    }

    lastTotalUser = totalUser;
    lastTotalUserLow = totalUserLow;
    lastTotalSys = totalSys;
    lastTotalIdle = totalIdle;

    return percent;
}

CPU currently used by current process:

#include "stdlib.h"
#include "stdio.h"
#include "string.h"
#include "sys/times.h"
#include "sys/vtimes.h"

static clock_t lastCPU, lastSysCPU, lastUserCPU;
static int numProcessors;

void init(){
    FILE* file;
    struct tms timeSample;
    char line[128];

    lastCPU = times(&timeSample);
    lastSysCPU = timeSample.tms_stime;
    lastUserCPU = timeSample.tms_utime;

    file = fopen("/proc/cpuinfo", "r");
    numProcessors = 0;
    while(fgets(line, 128, file) != NULL){
        if (strncmp(line, "processor", 9) == 0) numProcessors++;
    }
    fclose(file);
}

double getCurrentValue(){
    struct tms timeSample;
    clock_t now;
    double percent;

    now = times(&timeSample);
    if (now <= lastCPU || timeSample.tms_stime < lastSysCPU ||
        timeSample.tms_utime < lastUserCPU){
        //Overflow detection. Just skip this value.
        percent = -1.0;
    }
    else{
        percent = (timeSample.tms_stime - lastSysCPU) +
            (timeSample.tms_utime - lastUserCPU);
        percent /= (now - lastCPU);
        percent /= numProcessors;
        percent *= 100;
    }
    lastCPU = now;
    lastSysCPU = timeSample.tms_stime;
    lastUserCPU = timeSample.tms_utime;

    return percent;
}

TODO: Other Platforms

I would assume, that some of the Linux code also works for the Unixes, except for the parts that read the /proc pseudo-filesystem. Perhaps on Unix these parts can be replaced by getrusage() and similar functions?

Php – How to one use multi threading in PHP applications

Multi-threading is possible in php

Yes you can do multi-threading in PHP with pthreads

From the PHP documentation:

pthreads is an object-orientated API that provides all of the tools needed for multi-threading in PHP. PHP applications can create, read, write, execute and synchronize with Threads, Workers and Threaded objects.

Warning: The pthreads extension cannot be used in a web server environment. Threading in PHP should therefore remain to CLI-based applications only.

Simple Test

#!/usr/bin/php
<?php
class AsyncOperation extends Thread {

    public function __construct($arg) {
        $this->arg = $arg;
    }

    public function run() {
        if ($this->arg) {
            $sleep = mt_rand(1, 10);
            printf('%s: %s  -start -sleeps %d' . "\n", date("g:i:sa"), $this->arg, $sleep);
            sleep($sleep);
            printf('%s: %s  -finish' . "\n", date("g:i:sa"), $this->arg);
        }
    }
}

// Create a array
$stack = array();

//Initiate Multiple Thread
foreach ( range("A", "D") as $i ) {
    $stack[] = new AsyncOperation($i);
}

// Start The Threads
foreach ( $stack as $t ) {
    $t->start();
}

?>

First Run

12:00:06pm:     A  -start -sleeps 5
12:00:06pm:     B  -start -sleeps 3
12:00:06pm:     C  -start -sleeps 10
12:00:06pm:     D  -start -sleeps 2
12:00:08pm:     D  -finish
12:00:09pm:     B  -finish
12:00:11pm:     A  -finish
12:00:16pm:     C  -finish

Second Run

12:01:36pm:     A  -start -sleeps 6
12:01:36pm:     B  -start -sleeps 1
12:01:36pm:     C  -start -sleeps 2
12:01:36pm:     D  -start -sleeps 1
12:01:37pm:     B  -finish
12:01:37pm:     D  -finish
12:01:38pm:     C  -finish
12:01:42pm:     A  -finish

Real World Example

error_reporting(E_ALL);
class AsyncWebRequest extends Thread {
    public $url;
    public $data;

    public function __construct($url) {
        $this->url = $url;
    }

    public function run() {
        if (($url = $this->url)) {
            /*
             * If a large amount of data is being requested, you might want to
             * fsockopen and read using usleep in between reads
             */
            $this->data = file_get_contents($url);
        } else
            printf("Thread #%lu was not provided a URL\n", $this->getThreadId());
    }
}

$t = microtime(true);
$g = new AsyncWebRequest(sprintf("http://www.google.com/?q=%s", rand() * 10));
/* starting synchronization */
if ($g->start()) {
    printf("Request took %f seconds to start ", microtime(true) - $t);
    while ( $g->isRunning() ) {
        echo ".";
        usleep(100);
    }
    if ($g->join()) {
        printf(" and %f seconds to finish receiving %d bytes\n", microtime(true) - $t, strlen($g->data));
    } else
        printf(" and %f seconds to finish, request failed\n", microtime(true) - $t);
}

Best Answer

Related Solutions

C++ – How to determine CPU and memory consumption from inside a process

Windows

Linux

TODO: Other Platforms

Php – How to one use multi threading in PHP applications

Multi-threading is possible in php

Related Topic