cuda学习笔记_1005 主题:grid, block 和thread的相关观念解释
cuda学习笔记_1005
主题:grid, block 和thread的相关观念解释
1. blockIdx和threadIdx都属于uint3类型(头文件vector_types.h中定义)
struct __device_builtin__ uint3
{
usigned int x, y, z;
};
typedef __device_builtin__struct uint3 uint3;
因而,对于blockIdx和threadIdx,其都包含x, y, z三个方向
2. gridDim 和 blockDim都属于dim3类型,类似于uint3,其包含:
gridDim.x, gridDim.y, gridDim.z 和 blockDim.x, blockDim.y 和blockDim.z
所有内置变量,只能在核函数中可见,且每一个block是独立运行的,因而输出的时候,可能不会按照前期排列的先后顺序给出结果
对于对于多维的grid和block,可有如下定义:
dim3 grid_size(Gx, Gy, Gz);
dim3 block_size(Bx, By, Bz);
当z方向维度为1时,上述表述为:
dim3 grid_size(Gx, Gy);
dim3 block_size(Bx, By);
3. 应用举例:
code1:
#include <stdio.h>
#include <cuda_runtime.h>
__global__ void hello_from_gpu()
{
const int b = blockIdx.x;
const int tx = threadIdx.x;
const int ty = threadIdx.y;
printf("Hello World from ___block %d ___ threadX %d ___threadY %d\n", b, tx, ty);
}
int main(void)
{
const dim3 block_size(2,3);
hello_from_gpu<<<1, block_size>>>();
cudaDeviceSynchronize();
char c = getchar();
return 0;
}
对应的结果:
当把上述block_size修改成(2, 3, 4)时,对应的结果为:
根据两种情况下的输出结果,若想给每一个thread编写一个label,则label的表达式如下:
threadIdx.z*blockDim.x*blockDim.y + threadIdx.y*blockDim.x + threadIdx.x (2D或者3D的block_size都适用)
4. 设置size时的限制(对于开普勒架构)
grid部分:
gridDim.x <= 2^{31}-1
gridDim.y <= 2^{16}-1=65535
gridDim.z <= 2^{16-1}=65535
block部分:
blockDim.x <=1024
blockDim.y <=1024
blockDim.z <=64
同时:
blockDim.x * blockDim.y * blockDim.z <=1024