Cuda gridDim and blockDim

cuda

I get what blockDim is, but I have a problem with gridDim. Blockdim gives the size of the block, but what is gridDim? On the Internet it says gridDim.x gives the number of blocks in the x coordinate.

How can I know what blockDim.x * gridDim.x gives?

How can I know that how many gridDim.x values are there in the x line?

For example, consider the code below:

int tid = threadIdx.x + blockIdx.x * blockDim.x;
double temp = a[tid];
tid += blockDim.x * gridDim.x;

while (tid < count)
{
    if (a[tid] > temp)
    {
       temp = a[tid];
    }
    tid += blockDim.x * gridDim.x;
}

I know that tid starts with 0. The code then has tid+=blockDim.x * gridDim.x. What is tid now after this operation?

Best Answer

  • blockDim.x,y,z gives the number of threads in a block, in the particular direction
  • gridDim.x,y,z gives the number of blocks in a grid, in the particular direction
  • blockDim.x * gridDim.x gives the number of threads in a grid (in the x direction, in this case)

block and grid variables can be 1, 2, or 3 dimensional. It's common practice when handling 1-D data to only create 1-D blocks and grids.

In the CUDA documentation, these variables are defined here

In particular, when the total threads in the x-dimension (gridDim.x*blockDim.x) is less than the size of the array I wish to process, then it's common practice to create a loop and have the grid of threads move through the entire array. In this case, after processing one loop iteration, each thread must then move to the next unprocessed location, which is given by tid+=blockDim.x*gridDim.x; In effect, the entire grid of threads is jumping through the 1-D array of data, a grid-width at a time. This topic, sometimes called a "grid-striding loop", is further discussed in this blog article.

You might want to consider taking a couple of the introductory CUDA webinars available on the NVIDIA webinar page. For example, these 2:

  • GPU Computing using CUDA C – An Introduction (2010) An introduction to the basics of GPU computing using CUDA C. Concepts will be illustrated with walkthroughs of code samples. No prior GPU Computing experience required
  • GPU Computing using CUDA C – Advanced 1 (2010) First level optimization techniques such as global memory optimization, and processor utilization. Concepts will be illustrated using real code examples

It would be 2 hours well spent, if you want to understand these concepts better.

The general topic of grid-striding loops is covered in some detail here.

Related Topic