I have this low level for loop I've written in C that a friend suggested I write in CUDA. I've set up my CUDA enviroment and have been looking at the docs, but i'm still struggling with the syntax for what's been well over 2 weeks now. Can anyone help me out? What would this look like in CUDA?
float* red = new float [N];
float* green = new float [N];
float* blue = new float [N];
for (int y = 0; y < h; y++)
{
// Get row ptr from the color image
const unsigned char* src = rowptr<unsigned char>(color, 0, y, w);
// Get row ptrs for the destination channel features
float* rptr = rowptr<float>(red, 0, y, w);
float* gptr = rowptr<float>(green, 0, y, w);
float* bptr = rowptr<float>(blue, 0, y, w);
for (int x = 0; x < w; x++)
{
*rptr++ = (float)*src++;
*gptr++ = (float)*src++;
*bptr++ = (float)*src++;
}
}
Best Answer
Here is some sample code. I don't know if it will really answer your questions. Probably you will need to learn more about CUDA. If you can spare the time, taking this webinar and this webinar from the nvidia webinar page would be 2 hours well spent. Also the cuda C programmers manual is a good readable reference.
In response to a question posed in the comments, here is a modified kernel that shows how to use the rowptr<> template as defined in the comments. Just replace the kernel code above with this: