Rust vs C++ – How Does Rust Diverge from C++ Concurrency Facilities?

cc++14concurrencyrust

Questions

I am trying to understand whether Rust fundamentally and sufficiently improves upon the concurrency facilities of C++ so that to decide if I should spend the time to learn Rust.

Specifically, how does idiomatic Rust improve upon, or at any rate diverge from, the concurrency facilities of idiomatic C++?

Is the improvement (or divergence) mostly syntactical, or is it substantially an improvement (divergence) in paradigm? Or is it something else? Or is it not really an improvement (divergence) at all?

Rationale

I have recently been trying to teach myself C++14's concurrency facilities, and something feels not quite right. Something feels off. What feels off? Hard to say.

It feels almost as though the compiler were not really trying to help me to write correct programs when it comes to concurrency. It feels almost as though I were using an assembler rather than a compiler.

Admittedly, it is entirely probable that I yet suffer from a subtle, faulty concept when it comes to concurrency. Maybe I do not yet grok Bartosz Milewski's tension between stateful programming and data races. Maybe I don't quite understand how much of sound concurrent methodology is in the compiler and how much of it is in the OS.

Best Answer

A better concurrency story is one of the main goals of the Rust project, so improvements should be expected, provided we trust the project to achieve its goals. Full disclaimer: I have a high opinion of Rust and am invested in it. As requested, I'll try to avoid value judgements and describe differences rather than (IMHO) improvements.

Safe and unsafe Rust

"Rust" is composed of two languages: One that tries very hard to isolate you from the dangers of systems programming, and a more powerful one without any such aspirations.

Unsafe Rust is a nasty, brutish language that feels a lot like C++. It allows you to do arbitrarily dangerous things, talk to the hardware, (mis-)manage memory manually, shoot yourself in the foot, etc. It is very much like C and C++ in that the correctness of the program is ultimately in your hands and the hands of all other programmers involved in it. You opt into this language with the keyword unsafe, and as in C and C++, a single mistake in a single location can bring the whole project crashing down.

Safe Rust is the "default", the vast vast majority of Rust code is safe, and if you never mention the keyword unsafe in your code, you never leave the safe language. The rest of the post will mostly concern itself with that language, because unsafe code can break any and all of the guarantees that safe Rust works so hard to give you. On the flip side, unsafe code is not evil and not treated as such by the community (it is, however, strongly discouraged when not necessary).

It's dangerous, yes, but also important, because it allows building the abstractions that safe code uses. Good unsafe code uses the type system to prevent others from misusing it, and therefore the presence of unsafe code in a Rust program need not disturb the safe code. All the following differences exist because Rust's type systems has tools that C++'s doesn't have, and because the unsafe code that implements the concurrency abstractions uses these tools effectively.

Non-difference: Shared/mutable memory

Although Rust places more emphasis on message passing and very strictly controls shared memory, it does not rule out shared memory concurrency and explicitly supports the common abstractions (locks, atomic operations, condition variables, concurrent collections).

Moreover, like C++ and unlike functional languages, Rust really likes traditional imperative data structures. There's no persistent/immutable linked list in the standard library. There's std::collections::LinkedList but it's like std::list in C++ and discouraged for the same reasons as std::list (bad use of cache).

However, with reference to the title of this section ("shared/mutable memory"), Rust has one difference to C++: It strongly encourages that memory be "shared XOR mutable", i.e., that memory is never shared and mutable at the same time. Mutate memory as you like "in the privacy of your own thread", so to speak. Contrast this with C++ where shared mutable memory is the default option and widely used.

While the shared-xor-mutable paradigm is very important to the below differences, it is also a quite different programming paradigm that takes a while to get used to, and that places significant restrictions. Occasionally one has to opt out of this paradigm, e.g., with atomic types (AtomicUsize is the essence of shared mutable memory). Note that locks also obey the shared-xor-mutable rule, because it rules out concurrent reads and writes (while one thread writes, no other threads can read or write).

Non-difference: Data races are undefined behavior (UB)

If you trigger a data race in Rust code, it's game over, just as in C++. All bets are off and the compiler can do whatever it pleases.

However, it is a hard guarantee that safe Rust code does not have data races (or any UB for that matter). This extends both to the core language and to the standard library. If you can write a Rust program that doesn't use unsafe (including in third party libraries but excluding the standard library) which triggers UB, then that is considered a bug and will be fixed (this has already happened several times). This if of course in stark contrast to C++, where it's trivial to write programs with UB.

Difference: Strict locking discipline

Unlike C++, a lock in Rust (std::sync::Mutex, std::sync::RwLock, etc.) owns the data it's protecting. Instead of taking a lock and then manipulating some shared memory that is associated to the lock only in the documentation, the shared data is inaccessible while you don't hold the lock. A RAII guard keeps the lock and simultaneously gives access to the locked data (this much could be implemented by C++, but isn't by the std:: locks). The lifetime system ensures that you can't keep accessing the data after you release the lock (drop the RAII guard).

You can of course have a lock that contains no useful data (Mutex<()>), and just share some memory without explicitly associating it with that lock. However, having potentially unsynchronized shared memory requires unsafe.

Difference: Prevention of accidental sharing

Although you can have shared memory, you only share when you explicitly ask for it. For example, when you use message passing (e.g. the channels from std::sync), the lifetime system ensures that you don't keep any references to the data after you sent it to another thread. To share data behind a lock, you explicitly construct the lock and give it to another thread. To share unsynchronized memory with unsafe you, well, have to use unsafe.

This ties into the next point:

Difference: Thread-safety tracking

Rust's type system tracks some notion of thread safety. Specifically, the Sync trait denotes types that can be shared by several threads without risk of data races, while Send marks those that can be moved from one thread to another. This is enforced by the compiler throughout the program, and thus library designers dare make optimizations that would be stupidly dangerous without these static checks. For example, C++'s std::shared_ptr which always uses atomic operations to manipulate its reference count, to avoid UB if a shared_ptr happens to be used by several threads. Rust has Rc and Arc, which differ only in that Rc uses non-atomic refcount operations and isn't threadsafe (i.e. doesn't implement Sync or Send) while Arc is very much like shared_ptr (and implements both traits).

Note that if a type doesn't use unsafe to manually implement synchronization, the presence or absence of the traits are inferred correctly.

Difference: Very strict rules

If the compiler cannot be absolutely sure that some code is free from data races and other UB, it will not compile, period. The aforementioned rules and other tools can get you quite far, but sooner or later you will want to do something that's correct, but for subtle reasons that escape the compiler's notice. It could be a tricky lock-free data structure, but it could also be something as mundane as "I write to random locations in a shared array but the indices are computed such that every location is written to by only one thread".

At that point you can either bite the bullet and add a bit of unnecessary synchronization, or you reword the code such that the compiler can see its correctness (often doable, sometimes quite hard, occasionally impossible), or you drop into unsafe code. Still, it's extra mental overhead, and Rust does not give you any guarantees for the correctness of the unsafe code.

Difference: Fewer tools

Because of the aforementioned differences, in Rust it's much more rare that one writes code that may have a data race (or a use after free, or a double free, or ...). While this is nice, it has the unfortunate side effect that the ecosystem for tracking down such errors is even more underdeveloped than one would expect given the youth and small size of the community.

While tools like valgrind and LLVM's thread sanitizer could in principle be applied to Rust code, whether this actually works yet varies from tool to tool (and even those that work may be hard to set up, especially since you may not find any up-to-date resources on how to do it). It doesn't really help that Rust currently lacks a real specification and in particular a formal memory model.

In short, writing unsafe Rust code correctly is harder than writing C++ code correctly, despite both languages being roughly comparable in terms of capabilities and risks. Of course this must be weighted against the fact that a typical Rust program will contain only a relatively small fraction of unsafe code, whereas a C++ program is, well, fully C++.