Function inside the cuda kernel

cuda

Is there any ways i can have a function inside cuda kernel. I mean my cuda kernel gets pretty long and hard to debug at one point. Thanks.

Best Answer

yes, just mark function with __device__ and it will be callable only from GPU. Check CUDA Programming guide, section B.1 Here is the direct link

Related Solutions

CUDA: Calling a device function from a kernel

CUDA actually inlines all functions by default (although Fermi and newer architectures do also support a proper ABI with function pointers and real function calls). So your example code gets compiled to something like this

__global__ void Kernel(int *ptr)
{
    if(threadIdx.x<2)
        if(ptr[threadIdx.x]==threadIdx.x)
            ptr[threadIdx.x]++;
}

Execution happens in parallel, just like normal code. If you engineer a memory race into a function, there is no serialization mechanism that can save you.

How to get the CUDA version

As Jared mentions in a comment, from the command line:

nvcc --version

(or /usr/local/cuda/bin/nvcc --version) gives the CUDA compiler version (which matches the toolkit version).

From application code, you can query the runtime API version with

cudaRuntimeGetVersion()

or the driver API version with

cudaDriverGetVersion()

As Daniel points out, deviceQuery is an SDK sample app that queries the above, along with device capabilities.

As others note, you can also check the contents of the version.txt using (e.g., on Mac or Linux)

cat /usr/local/cuda/version.txt

However, if there is another version of the CUDA toolkit installed other than the one symlinked from /usr/local/cuda, this may report an inaccurate version if another version is earlier in your PATH than the above, so use with caution.

Best Answer

Related Solutions

CUDA: Calling a __device__ function from a kernel

How to get the CUDA version

Related Topic

CUDA: Calling a device function from a kernel