TBB is pretty much all there is that comes close. boost::thread
is much too low level. You're looking in the wrong place at the TPL, by the way- Microsoft ship a separate library called the PPL for native code. It, of course, only supports Windows.
However, if you find TBB to be complicated to use, I'd question whether or not the developers you're thinking about are actually capable of creating parallel applications. TBB has one of the friendliest interfaces around with simple things like parallel_for_each
. If you can't cope with a couple of lambdas, I find it hard to see how you'll cope with concurrency.
The I/O schemes you are describing are in current use in computers.
why the CPU actually has to stay there, practically not doing anything else than just waiting for IO?
This is the simplest possible I/O method: programmed I/O. Many embedded systems and low/end microprocessors have only a single input instruction and a single output instruction. The processor must execute an explicit sequence of instructions for every character read or written.
but it should be made possible for the cpu to wait or to check regularly, while actually performing lots of other tasks and only going back to the IO process when it's ready
Many personal computers have other I/O schemes. Instead of waiting in a tight loop for the device to become ready (busy waiting), the CPU starts the I/O device asking it to generate an interrupt when it's done (interrupt driven I/O).
Although interrupt-driven I/O is a step forward (compared to programmed I/O), it requires an interrupt for every character transmitted and it's expensive...
Like for instance there could be some kind of mini cpu which would just wait for it and deliver the small part of data to the real cpu as soon as it gets back to the process and so the process would be repeated and we wouldn't have to practically dedicate a whole cpu core for the data copy process...
The solution to many problems lies in having someone else do the work! :-)
The DMA controller/chip (Direct Memory Access) allows programmed I/O but having somebody else do it!
With DMA the CPU only has to initialize a few registers and it's free to do something else until transfer is finished (and an interrupt is raised).
Even DMA isn't totally free: high speed devices can use many bus cycles for memory references and device references (cycle stealing) and the CPU has to wait (DMA chip always has a higher bus priority).
I/O wait is 12.1%. This server has 8 cores (via cat /proc/cpuinfo). This is very close to (1/8 cores = 0.125)
I think this is from:
Understanding Disk I/O - when should you be worried?
Well it isn't strange: the system (mySQL) must fetch all the rows before manipulating data and there aren't other activities.
Here there isn't a computer architecture / OS issue. It's just how the example is set.
At most it could be a RDBMS tuning problem or a SQL query problem (missing index, bad query plan, bad query...)
Best Answer
Cloud computing deals with ridiculously parallel problems by default, like serving up resources from a URL. There are several ways to achieve parallelism, regardless of the number of cores you have. You should build your application knowing how you intend to take advantage of it. You can get cloud instances with multiple cores and lots of RAM, but they cost more.
Most web services run within an embedded web server (like Spring Boot web services for example). The parallelism you need is taken care of by the server, so as long as you don't add points of contention your service remains ridiculously parallel and you don't have to think about threads at all.
That said, one service can only handle so many clients at once. That's why cloud solutions typically bring another instance online and distribute traffic between the instances of your service. Many times it is much cheaper to have another instance for a short burst of traffic than it is to have one instance with multiple cores.
What you aren't seeing is that your service is usually hosted on a server with multiple cores, but it only looks like one to you. When you have multiple copies of your web service running, you are also using multiple cores.
The point being the parallelism is there, you just need to know how not to mess it up. For that you need to understand how parallelism works, etc.
You mentioned the Task Parallel Library, and that is a key feature in Microsoft's approach for web services--particularly when paired with
async
andawait
. Understanding how that works will really help your application handle more concurrent users. It is time well spent.