将结构传递给CUDA内核

问题描述：

我是CUDA C的新手，正在尝试将typedef'd结构传递给内核。我的方法工作得很好，当我尝试一个只包含int的结构时，但当我切换到浮动时，我得到了无意义的数字作为结果。我认为这与对齐有关，并且我尝试了包括__align__以及我的类型声明，但无济于事。有人能给我举例说明这是如何完成的，或者提供一种替代方法？我试图设置它，以便我可以轻松地添加或删除字段，而无需更改结构和内核以外的任何其他字段。我的代码：将结构传递给CUDA内核

typedef struct __align__(8) 
{ 
    float a, b; 
} point; 

__global__ void testKernel(point *p) 
{ 
    int i = blockIdx.x * blockDim.x + threadIdx.x; 
    p[i].a = 1.1; 
    p[i].b = 2.2; 
} 

int main(void) 
{ 
     // set number of points 
    int numPoints = 16, 
     gpuBlockSize = 4, 
     pointSize = sizeof(point), 
     numBytes  = numPoints * pointSize, 
     gpuGridSize = numPoints/gpuBlockSize; 

     // allocate memory 
    point *cpuPointArray = new point[numPoints], 
      *gpuPointArray = new point[numPoints]; 
    cpuPointArray = (point*)malloc(numBytes); 
    cudaMalloc((void**)&gpuPointArray, numBytes); 

     // launch kernel 
    testKernel<<<gpuGridSize,gpuBlockSize>>>(gpuPointArray); 

     // retrieve the results 
    cudaMemcpy(cpuPointArray, gpuPointArray, numBytes, cudaMemcpyDeviceToHost); 
    printf("testKernel results:\n"); 
    for(int i = 0; i < numPoints; ++i) 
    { 
     printf("point.a: %d, point.b: %d\n",cpuPointArray[i].a,cpuPointArray[i].b); 
    } 

     // deallocate memory 
    free(cpuPointArray); 
    cudaFree(gpuPointArray); 

    return 0; 
}

point * gpuPointArray = new ...对我来说似乎不对吗？你在主机上分配，然后在设备上做一个cudaMalloc ... – Bart 2010-11-14 08:41:26

在将它作为参数传递给内核之前，我不需要分配内存吗？将cudaMalloc行退出会导致“未指定的启动失败”。我也可以将gpuPointArray设置为NULL，但它似乎没有改变我的原始结果。 – Paul 2010-11-14 08:56:32

当然。你需要cudaMalloc。尽管如此，你并不需要“新”。 cpuPointArray也一样。使用malloc和free（你正在编程C），不要在这里使用新的。（从来没有混合新的malloc删除和免费） – Bart 2010-11-14 09:02:48

答

看看它是如何在你的CUDA include目录下的vector_types.h头文件中完成的。这应该已经给你一些指示。

但是，这里的主要问题是您拨打printf时的%d。你正在尝试打印浮动，而不是整数。所以那些真的应该使用%f来代替。

好吧，我看了vector_types.h，我试着做他们做的：typedef struct __align __（2 * sizeof（float））point {'...，但它仍然会产生相同的结果。这里还有别的东西，我应该看到吗？ – Paul 2010-11-14 09:13:33

顺便说一句，改变你的printf使用％f而不是％d ...这会改变什么吗？你正在尝试打印漂浮物，而不是ints ... – Bart 2010-11-14 09:32:47

哈！这样做，谢谢。有时明显是最容易错过的东西... – Paul 2010-11-14 09:40:51

答

由于似乎没有任何体面的文件说明如何做到这一点，我想我会在这里发布最终修订的代码。事实证明，__align__部分也是不必要的，实际的问题是在尝试打印浮动元素时，printf中使用％d。

#include <stdlib.h> 
#include <stdio.h> 

typedef struct 
{ 
    float a, b; 
} point; 

__global__ void testKernel(point *p) 
{ 
    int i = blockIdx.x * blockDim.x + threadIdx.x; 
    p[i].a = 1.1; 
    p[i].b = 2.2; 
} 

int main(void) 
{ 
     // set number of points 
    int numPoints = 16, 
     gpuBlockSize = 4, 
     pointSize = sizeof(point), 
     numBytes  = numPoints * pointSize, 
     gpuGridSize = numPoints/gpuBlockSize; 

     // allocate memory 
    point *cpuPointArray, 
      *gpuPointArray; 
    cpuPointArray = (point*)malloc(numBytes); 
    cudaMalloc((void**)&gpuPointArray, numBytes); 

     // launch kernel 
    testKernel<<<gpuGridSize,gpuBlockSize>>>(gpuPointArray); 

     // retrieve the results 
    cudaMemcpy(cpuPointArray, gpuPointArray, numBytes, cudaMemcpyDeviceToHost); 
    printf("testKernel results:\n"); 
    for(int i = 0; i < numPoints; ++i) 
    { 
     printf("point.a: %f, point.b: %f\n",cpuPointArray[i].a,cpuPointArray[i].b); 
    } 

     // deallocate memory 
    free(cpuPointArray); 
    cudaFree(gpuPointArray); 

    return 0; 
}

将结构传递给CUDA内核

相关推荐