HLSL 总结

先把表列出来，后面我再详细解释个别的

Name	Syntax	Description
abs	abs(x)	返回x的绝对值。对x的每个元素都会独立计算一次。Absolute value (per component).
acos	acos(x)	返回x的反余弦值。对x的每个元素都会独立计算一次。Returns the arccosine of each component of x.
all	all(x)	检测x的所有元数的值是否为0.Test if all components of x are nonzero.
any	any(x)	检测x是否有某个元数的值为0.Test if any component of x is nonzero.
asfloat	asfloat(x)	将x转换为float类型。Convert the input type to a float.
asin	asin(x)	返回x的反正弦值。对x的每个元素都会独立计算一次。
asint	asint(x)	将x转换为int类型。Convert the input type to an integer.
asuint	asuint(x)	将x转换为uint类型。
atan	atan(x)	返回x的反正切值。
atan2	atan2(y, x)	返回y、x的反正切值。
ceil	ceil(x)	返回大于或等于x的最小整数。
clamp	clamp(x, min, max)	将x截取在[min, max]范围内。
clip	clip(x)	如果x中存在值小于0的参数，则丢弃当前像素。
cos	cos(x)	返回x的余弦值。
cosh	cosh(x)	返回x的双曲余弦值。
cross	cross(x, y)	返回x、y的叉积。
D3DCOLORtoUBYTE4	D3DCOLORtoUBYTE4(x)	混合和缩放4D向量x用于补偿一些对UBYTE4支持的硬件。Swizzles and scales components of the 4D vector x to compensate for the lack of UBYTE4 support in some hardware.
ddx	ddx(x)	返回关于屏幕坐标x轴的偏导数。
ddy	ddy(x)	返回关于屏幕坐标y轴的偏导数。
degrees	degrees(x)	将x（弧度）转换到角度。
determinant	determinant(m)	返回的正方形矩阵m的行列式。
distance	distance(x, y)	返回x、y之间的距离。
dot	dot(x, y)	返回x、y的点积。
exp	exp(x)	返回以e为底数，x为指数的指数函数值。
exp2	exp2(x)	返回以2为底数，x为指数的指数函数值。对x的每个字段都会计算一次。
faceforward	faceforward(n, i, ng)	检测多边形是否位于正面。-n * sign(•(i, ng))。
floor	floor(x)	返回小于等于x的最大整数。
fmod	fmod(x, y)	返回x/y的浮点余数。
frac	frac(x)	返回x的小数部分。
frexp	frexp(x, exp)	返回x的尾数和指数。
fwidth	fwidth(x)	返回abs(ddx(x)) + abs(ddy(x))，
GetRenderTargetSampleCount	GetRenderTargetSampleCount()	返回渲染目标采样器的个数。Returns the number of render-target samples.
GetRenderTargetSamplePosition	GetRenderTargetSamplePosition(x)	返回关于给定采样器的一个采样点(x,y)。Returns a sample position (x,y) for a given sample index.
isfinite	isfinite(x)	如果x为有限值则返回true，否则返回false。
isinf	isinf(x)	如果x为无限值则返回true，否则返回false。
isnan	isnan(x)	如果x为NAN或QNAN则返回true，否则返回false。
ldexp	ldexp(x, exp)	frexp的逆运算，返回x * 2 ^ exp。
length	length(v)	返回v向量的长度。
lerp	lerp(x, y, s)	对x、y进行插值计算。Returns x + s(y - x)。
lit	lit(n • l, n • h, m)	返回光照向量（环境光，漫反射光，镜面高光，1）。
log	log(x)	返回以e为底的对数。
log10	log10(x)	返回以10为底的对数。
log2	log2(x)	返回以2为底的对数。
max	max(x, y)	返回x、y中较大值。
min	min(x, y)	返回x、y中较小值。
modf	modf(x, out ip)	把x分割为整数和小数部分。ip输出的是整数部分
mul	mul(x, y)	返回x、y矩阵相乘的积。
noise	noise(x)	Generates a random value using the Perlin-noise algorithm.
normalize	normalize(x)	返回单位化向量，定义为x / length(x)。
pow	pow(x, y)	返回x^y。
radians	radians(x)	将x（角度）转换到弧度。
reflect	reflect(i, n)	返回入射光线i对表面法线n的反射光线。
refract	refract(i, n, R)	返回在入射光线i，表面法线n，折射率为R下的折射光线。
round	round(x)	返回最接近x的整数。
rsqrt	rsqrt(x)	返回x平方根的倒数。 1 / sqrt(x) 。
saturate	saturate(x)	把x截取在[0, 1]之间。
sign	sign(x)	返回x的符号。
sin	sin(x)	返回x的正弦值。
sincos	sincos(x, out s, out c)	返回x的正弦值和余弦值。
sinh	sinh(x)	返回x的双曲正弦值。
smoothstep	smoothstep(min, max, x)	如果x的范围是[min, max]，则返回一个介于0和1之间的Hermite插值。
sqrt	sqrt(x)	返回x的平方根，对x的每个字段都会计算一次。
step	step(a, x)	返回(x >= a) ? 1 : 0 。
tan	tan(x)	返回x的正切值。
tanh	tanh(x)	返回x的双曲正切值。
tex1D	tex1D(s, t)	返回纹理s在t位置的颜色。1D texture lookup.
tex1Dbias	tex1Dbias(s, t)	使用bias返回纹理s在t位置的颜色。1D texture lookup with bias.
tex1Dgrad	tex1Dgrad(s, t, ddx, ddy)	1D texture lookup with a gradient.
tex1Dlod	tex1Dlod(s, t)	使用LOD返回纹理s在t位置的颜色。1D texture lookup with LOD.
tex1Dproj	tex1Dproj(s, t)	使用透视分离返回纹理s在t位置的颜色。1D texture lookup with projective divide.
tex2D	tex2D(s, t)	返回纹理s在t位置的颜色。
tex2Dbias	tex2Dbias(s, t)	2D texture lookup with bias.
tex2Dgrad	tex2Dgrad(s, t, ddx, ddy)	2D texture lookup with a gradient.
tex2Dlod	tex2Dlod(s, t)	2D texture lookup with LOD.
tex2Dproj	tex2Dproj(s, t)	2D texture lookup with projective divide.
tex3D	tex3D(s, t)	3D texture lookup.
tex3Dbias	tex3Dbias(s, t)	3D texture lookup with bias.
tex3Dgrad	tex3Dgrad(s, t, ddx, ddy)	3D texture lookup with a gradient.
tex3Dlod	tex3Dlod(s, t)	3D texture lookup with LOD.
tex3Dproj	tex3Dproj(s, t)	3D texture lookup with projective divide.
texCUBE	texCUBE(s, t)	Cube texture lookup.
texCUBEbias	texCUBEbias(s, t)	Cube texture lookup with bias.
texCUBEgrad	texCUBEgrad(s, t, ddx, ddy)	Cube texture lookup with a gradient.
texCUBElod	tex3Dlod(s, t)	Cube texture lookup with LOD.
texCUBEproj	texCUBEproj(s, t)	Cube texture lookup with projective divide.
transpose	transpose(m)	返回m的转置矩阵。
trunc	trunc(x)	将x的所有元素从浮点值截断到整数值。

【1】Clamp函数会把超出【min，max】范围的值置为0，而不是丢弃掉。

【2】sin和cos使用的是弧度计算，而不是角度。

【3】modf的ip输出的是整数部分。

【4】关于DDX和DDY讲的就比较多了。这里我翻了一下两篇博客，我觉得讲得很清楚：

HLSL 总结

Partial difference derivative functions (ddx and ddy in HLSL^[a],dFdx and dFdy in GLSL^[b]) (in the rest of this article I will use both terms according to the code examples I will provide) are fragment shader instructions wich can be used to compute the rate of variation of any value with respect to the screen-space coordinates.

HLSL 总结

Derivatives computation

During triangles rasterization, GPUs run many instances of a fragment shader at a time organizing them in blocks of 2×2 pixels. Derivatives are calculated by taking differences between the pixel values in a block;dFdx subtracts the values of the pixels on the left side of the block from the values on the right side, anddFdy subtracts the values of the bottom pixels from the top ones. See the image below where the grid represents the rendered screen pixels anddFdx, dFdy expressions are provided for the generic value p evaluated by the fragment shader instance at (x, y) screen coordinates and belonging to the 2×2 block highlighted in red.

Derivatives can be evaluated for every variable in a fragment shader. For vector and matrix types, derivatives are computed element-wise.

Derivatives functions are fundamental for texture mipmaps implementation and are very useful in a series of algorithms and effects, in particular when there is some kind of dependence on screen space coordinates (for example when rendering wireframe edges with uniform screen pixel thickness).

Derivatives and branches

Derivatives computation is based on the parallel execution on the GPU’s hardware of multiple instances of a shader. Scalar operations are executed with a SIMD (Single Instruction Multiple Data) architecture on registers containing a vector of 4 values for a block of 2×2 pixels. This means that at every step of execution, the shader instances belonging to each 2×2 block are synchronized making derivative computation fast and easy to implement in hardware, being a simple subtraction of values contained in the same register.

But what happens in the case of a conditional branch? In this case, if not all of the threads in a core take the same branch, there is a divergence in the code execution. In the image below an example of divergence is shown: a conditional branch execution in a GPU core with 8 shader instances. Three instances take the first branch (yellow). During the yellow branch execution the other 6 instances are inactive (an execution bitmask is used to activate/deactivate execution). After the yellow branch, the execution mask is inverted and the blue branch is executed by the remaining 6 instances.

In addition to the efficiency and performance loss of the branch, the divergence is breaking the synchronization between the pixels in a block making derivatives operations undefined. This is a problem for texture sampling which needs derivatives for mipmap level selection, anisotropic filtering, etc. When facing such a problem, a shader compiler could flatten the branch (thus avoiding it) or try to rearrange the code moving texture reads outside of the branch control flow. This problem can be avoided by using explicit derivatives or mipmap level when sampling a texture.

我们知道在光栅化的时刻，GPUs会在同一时刻并行运行很多Fragment Shader，但是并不是一个pixel一个pixel去执行的，而是将其组织在2x2的一组pixels分块中，去并行执行。而偏导数就正好是计算的这一块像素中的变化率。从上图可以看出来ddx就是右边的像素块的值减去左边像素块的值，而ddy就是下面像素块的值减去上面像素块的值。其中的x，y代表的是屏幕坐标。

注意：偏导数ddx/y可以计算我们FragmentShader中任意的变量。向量，矩阵等等。

用DDX和DDY可以重构法线。但是重构出来的结果是这样的：

HLSL 总结

可以看到，法线不是平滑的。原因是这样的：

经验证，原来ddx/ddy这两个操作，在forward rendering与deferred rendering中存在着微妙的应用区别。

在forward rendering中，GPU shader会自动地判断其2x2像素区域是否仅有部分落在当前绘制的三角面所覆盖的光栅化interpolate范围内。

而在dr中，当将ddx/ddy操作应用于一个render target(即NDC quad)时，GPU shader这一免费的“合法性校验”操作便失效了。用于计算ddx/ddy的2x2像素区域有可能一部分位于模型的三角面A、而另一部分则位于模型的三角面B。也就是说：参与ddx/ddy运算的像素，有可能超出了模型中同一三角面的插值范围，从而导致ddx/ddy得到错误的结果，进而导致模型edge上的artifacts。这一问题在dr中使用像素world(或view)坐标重建几何法线时(normalize(cross(ddx(posW), ddy(posW))))，尤为突出。

总结：ddx/ddy与forward rendering的兼容性更佳。使用ddx/ddy，切记一定要确保其2x2区域位于同一三角面的光栅化范围内，不能跨三角面。在deferred rendering中，GPU shader不会自动地保障上述前提成立，所以没有引入其他额外机制的前提下，宜避免使用ddx/ddy计算几何法线。

原博客链接：

http://www.aclockworkberry.com/shader-derivative-functions/#footnote_3_1104

http://www.cnblogs.com/neoragex2002/p/4156702.html

下面是shader里常用的一些周期函数，做shader的时候挺常用的。

HLSL 总结