I want to test #pragma omp parallel for
and #pragma omp simd
for a simple matrix addition program. When I use each of them separately, I get no error and it seems fine. But, I want to test how much performance can be gained using both of them. If I use #pragma omp parallel for
before the outer loop and #pragma omp simd
before the inner loop I get no error as well. The error occures when I use both of them before the outer loop. I get an error at runtime not compile time. ICC
and GCC
return error but Clang
doesn't. It might be because Clang
regect the parallelization. In my experiments, Clang does not parallelize and run the program with only one thread.
The program is here:
#include <stdio.h>
//#include <x86intrin.h>
#define N 512
#define M N
int __attribute__(( aligned(32))) a[N][M],
__attribute__(( aligned(32))) b[N][M],
__attribute__(( aligned(32))) c_result[N][M];
int main()
{
int i, j;
#pragma omp parallel for
#pragma omp simd
for( i=0;i<N;i++){
for(j=0;j<M;j++){
c_result[i][j]= a[i][j] + b[i][j];
}
}
return 0;
}
The error for:
ICC:
IMP1.c(20): error: omp directive is not followed by a parallelizable
for loop #pragma omp parallel for ^compilation aborted for IMP1.c (code 2)
GCC:
IMP1.c: In function ‘main’:
IMP1.c:21:10: error: for statement
expected before ‘#pragma’ #pragma omp simd
Because in my other testes pragma omp simd
for outer loop gets better performance I need to put that there (don't I?).
Platform: Intel Core i7 6700 HQ, Fedora 27
Tested compilers: ICC 18, GCC 7.2, Clang 5
Compiler command line:
icc -O3 -qopenmp -xHOST -no-vec
gcc -O3 -fopenmp -march=native -fno-tree-vectorize -fno-tree-slp-vectorize
clang -O3 -fopenmp=libgomp -march=native -fno-vectorize -fno-slp-vectorize
Best Answer
From OpenMP 4.5 Specification:
You can also write: