C++ – Parallel for vs omp simd: when to use each

copenmpperformancesimd

OpenMP 4.0 introduces a new construct called "omp simd". What is the benefit of using this construct over the old "parallel for"? When would each be a better choice over the other?

EDIT:
Here is an interesting paper related to the SIMD directive.

Best Answer

A simple answer:

OpenMP only used to exploit multiple threads for multiple cores. This new simd extention allows you to explicitly use SIMD instructions on modern CPUs, such as Intel's AVX/SSE and ARM's NEON.

(Note that a SIMD instruction is executed in a single thread and a single core, by design. However, the meaning of SIMD can be quite expanded for GPGPU. But, but I don't think you need to consider GPGPU for OpenMP 4.0.)

So, once you know SIMD instructions, you can use this new construct.

In a modern CPU, roughly there are three types of parallelism: (1) instruction-level parallelism (ILP), (2) thread-level parallelism (TLP), and (3) SIMD instructions (we could say this is vector-level or so).

ILP is done automatically by your out-of-order CPUs, or compilers. You can exploit TLP using OpenMP's parallel for and other threading libraries. So, what about SIMD? Intrinsics were a way to use them (as well as compilers' automatic vectorization). OpenMP's simd is a new way to use SIMD.

Take a very simple example:

for (int i = 0; i < N; ++i)
  A[i] = B[i] + C[i];

The above code computes a sum of two N-dimensional vectors. As you can easily see, there is no (loop-carried) data dependency on the array A[]. This loop is embarrassingly parallel.

There could be multiple ways to parallelize this loop. For example, until OpenMP 4.0, this can be parallelized using only parallel for construct. Each thread will perform N/#thread iterations on multiple cores.

However, you might think using multiple threads for such simple addition would be a overkill. That is why there is vectorization, which is mostly implemented by SIMD instructions.

Using a SIMD would be like this:

for (int i = 0; i < N/8; ++i)
  VECTOR_ADD(A + i, B + i, C + i);

This code assumes that (1) the SIMD instruction (VECTOR_ADD) is 256-bit or 8-way (8 * 32 bits); and (2) N is a multiple of 8.

An 8-way SIMD instruction means that 8 items in a vector can be executed in a single machine instruction. Note that Intel's latest AVX provides such 8-way (32-bit * 8 = 256 bits) vector instructions.

In SIMD, you still use a single core (again, this is only for conventional CPUs, not GPU). But, you can use a hidden parallelism in hardware. Modern CPUs dedicate hardware resources for SIMD instructions, where each SIMD lane can be executed in parallel.

You can use thread-level parallelism at the same time. The above example can be further parallelized by parallel for.

(However, I have a doubt how many loops can be really transformed to SIMDized loops. The OpenMP 4.0 specification seems a bit unclear on this. So, real performance and practical restrictions would be dependent on actual compilers' implementations.)

To summarize, simd construct allows you to use SIMD instructions, in turn, more parallelism can be exploited along with thread-level parallelism. However, I think actual implementations would matter.

Related Solutions

C++ – a smart pointer and when should I use one

UPDATE

This answer is rather old, and so describes what was 'good' at the time, which was smart pointers provided by the Boost library. Since C++11, the standard library has provided sufficient smart pointers types, and so you should favour the use of std::unique_ptr, std::shared_ptr and std::weak_ptr.

There was also std::auto_ptr. It was very much like a scoped pointer, except that it also had the "special" dangerous ability to be copied — which also unexpectedly transfers ownership.
It was deprecated in C++11 and removed in C++17, so you shouldn't use it.

std::auto_ptr<MyObject> p1 (new MyObject());
std::auto_ptr<MyObject> p2 = p1; // Copy and transfer ownership. 
                                 // p1 gets set to empty!
p2->DoSomething(); // Works.
p1->DoSomething(); // Oh oh. Hopefully raises some NULL pointer exception.

OLD ANSWER

A smart pointer is a class that wraps a 'raw' (or 'bare') C++ pointer, to manage the lifetime of the object being pointed to. There is no single smart pointer type, but all of them try to abstract a raw pointer in a practical way.

Smart pointers should be preferred over raw pointers. If you feel you need to use pointers (first consider if you really do), you would normally want to use a smart pointer as this can alleviate many of the problems with raw pointers, mainly forgetting to delete the object and leaking memory.

With raw pointers, the programmer has to explicitly destroy the object when it is no longer useful.

// Need to create the object to achieve some goal
MyObject* ptr = new MyObject(); 
ptr->DoSomething(); // Use the object in some way
delete ptr; // Destroy the object. Done with it.
// Wait, what if DoSomething() raises an exception...?

A smart pointer by comparison defines a policy as to when the object is destroyed. You still have to create the object, but you no longer have to worry about destroying it.

SomeSmartPtr<MyObject> ptr(new MyObject());
ptr->DoSomething(); // Use the object in some way.

// Destruction of the object happens, depending 
// on the policy the smart pointer class uses.

// Destruction would happen even if DoSomething() 
// raises an exception

The simplest policy in use involves the scope of the smart pointer wrapper object, such as implemented by boost::scoped_ptr or std::unique_ptr.

void f()
{
    {
       std::unique_ptr<MyObject> ptr(new MyObject());
       ptr->DoSomethingUseful();
    } // ptr goes out of scope -- 
      // the MyObject is automatically destroyed.

    // ptr->Oops(); // Compile error: "ptr" not defined
                    // since it is no longer in scope.
}

Note that std::unique_ptr instances cannot be copied. This prevents the pointer from being deleted multiple times (incorrectly). You can, however, pass references to it around to other functions you call.

std::unique_ptrs are useful when you want to tie the lifetime of the object to a particular block of code, or if you embedded it as member data inside another object, the lifetime of that other object. The object exists until the containing block of code is exited, or until the containing object is itself destroyed.

A more complex smart pointer policy involves reference counting the pointer. This does allow the pointer to be copied. When the last "reference" to the object is destroyed, the object is deleted. This policy is implemented by boost::shared_ptr and std::shared_ptr.

void f()
{
    typedef std::shared_ptr<MyObject> MyObjectPtr; // nice short alias
    MyObjectPtr p1; // Empty

    {
        MyObjectPtr p2(new MyObject());
        // There is now one "reference" to the created object
        p1 = p2; // Copy the pointer.
        // There are now two references to the object.
    } // p2 is destroyed, leaving one reference to the object.
} // p1 is destroyed, leaving a reference count of zero. 
  // The object is deleted.

Reference counted pointers are very useful when the lifetime of your object is much more complicated, and is not tied directly to a particular section of code or to another object.

There is one drawback to reference counted pointers — the possibility of creating a dangling reference:

// Create the smart pointer on the heap
MyObjectPtr* pp = new MyObjectPtr(new MyObject())
// Hmm, we forgot to destroy the smart pointer,
// because of that, the object is never destroyed!

Another possibility is creating circular references:

struct Owner {
   std::shared_ptr<Owner> other;
};

std::shared_ptr<Owner> p1 (new Owner());
std::shared_ptr<Owner> p2 (new Owner());
p1->other = p2; // p1 references p2
p2->other = p1; // p2 references p1

// Oops, the reference count of of p1 and p2 never goes to zero!
// The objects are never destroyed!

To work around this problem, both Boost and C++11 have defined a weak_ptr to define a weak (uncounted) reference to a shared_ptr.

Javascript – Which “href” value should I use for JavaScript links, “#” or “javascript:void(0)”

I use javascript:void(0).

Three reasons. Encouraging the use of # amongst a team of developers inevitably leads to some using the return value of the function called like this:

function doSomething() {
    //Some code
    return false;
}

But then they forget to use return doSomething() in the onclick and just use doSomething().

A second reason for avoiding # is that the final return false; will not execute if the called function throws an error. Hence the developers have to also remember to handle any error appropriately in the called function.

A third reason is that there are cases where the onclick event property is assigned dynamically. I prefer to be able to call a function or assign it dynamically without having to code the function specifically for one method of attachment or another. Hence my onclick (or on anything) in HTML markup look like this:

onclick="someFunc.call(this)"

onclick="someFunc.apply(this, arguments)"

Using javascript:void(0) avoids all of the above headaches, and I haven't found any examples of a downside.

So if you're a lone developer then you can clearly make your own choice, but if you work as a team you have to either state:

Use href="#", make sure onclick always contains return false; at the end, that any called function does not throw an error and if you attach a function dynamically to the onclick property make sure that as well as not throwing an error it returns false.

Use href="javascript:void(0)"

The second is clearly much easier to communicate.

Best Answer

Related Solutions

C++ – a smart pointer and when should I use one

Javascript – Which “href” value should I use for JavaScript links, “#” or “javascript:void(0)”

Related Topic