循环矢量化给出了不同的答案

问题描述：

我正在构建一些单元测试，并发现我的代码在向量化时给出了略微不同的结果。在下面的示例中，数组a在一个维中求和并添加到初始值x。 a的大多数元素太小而无法更改x。代码是：循环矢量化给出了不同的答案

module datamod 
    use ISO_FORTRAN_ENV, only : dp => REAL64 
    implicit none 

    ! -- Array dimensions are large enough for gfortran to vectorize 
    integer, parameter :: N = 6 
    integer, parameter :: M = 10 
    real(dp) :: x(N), a(N,M) 

contains 
subroutine init_ax 
    ! -- Set a and x so the issue manifests 

    x = 0. 
    x(1) = 0.1e+03_dp 

    a = 0. 
    ! -- Each negative component is too small to individually change x(1) 
    ! -- But the positive component is just big enough 
    a( 1, 1) = -0.4e-14_dp 
    a( 1, 2) = -0.4e-14_dp 
    a( 1, 3) = -0.4e-14_dp 
    a( 1, 4) = 0.8e-14_dp 
    a( 1, 5) = -0.4e-14_dp 
end subroutine init_ax 
end module datamod 

program main 
    use datamod, only : a, x, N, M, init_ax 
    implicit none 
    integer :: i, j 

    call init_ax 

    ! -- The loop in question 
    do i=1,N 
     do j=1,M 
     x(i) = x(i) + a(i,j) 
     enddo 
    enddo 

    write(*,'(a,e26.18)') 'x(1) is: ', x(1) 
end program main

该代码在不带循环向量化的gfortran中给出以下结果。请注意0包含在-O3中，所以在使用-O3时也会出现问题。

mach5% gfortran -O2 main.f90 && ./a.out 
x(1) is: 0.100000000000000014E+03 
mach5% gfortran -O2 -ftree-vectorize main.f90 && ./a.out 
x(1) is: 0.999999999999999858E+02

我知道某些编译器选项可以更改答案，如-fassociative-math。但是，根据gcc optimization options页面，这些标准-O3优化包中没有包含这些标准。

在我看来，就好像向量化代码首先将a的所有组件相加，然后将其添加到x。但是，这是不正确的，因为编写的代码需要将a的每个组件添加到x。

这是怎么回事？可能循环矢量化在某些情况下改变了答案？ Gfortran版本4.7和5.3有同样的问题，但是Intel 16.0和PGI 15.10没有。

不太确定我在这里看到你的问题。绝大多数编译器优化是重新排序指令，以便更好地适应机器，同时保持程序在数学上（但不一定是数值上）等价。所以只是因为你按顺序编写它并不意味着计算机必须按照这个顺序执行它。语言允许这样做，你为什么期望它不是这样？ –

我不认为''gfortran'和'-O3'应该是这种情况。根据问题中包含的链接，必须明确启用所有控制浮点运算行为的编译器选项。如果我错了，所有的优化都可以做到这一点，那么我会对'gfortran'的选项感兴趣，就像英特尔的'-fp-model precise'一样，它要求优化不会改变最终结果。 – Ross

答

我复制了你提供的代码（到一个名为test.f90的文件），然后使用gfortran4.8.5版本编译和运行它。我发现-O2和-O2 -ftree-vectorize选项的结果不同，就像您的结果不同。但是，当我简单地使用-O3时，我发现结果匹配-O2。

$ gfortran --version 
GNU Fortran (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11) 
Copyright (C) 2015 Free Software Foundation, Inc. 

GNU Fortran comes with NO WARRANTY, to the extent permitted by law. 
You may redistribute copies of GNU Fortran 
under the terms of the GNU General Public License. 
For more information about these matters, see the file named COPYING 

$ gfortran -O2 test.f90 && ./a.out 
x(1) is: 0.100000000000000014E+03 
$ gfortran -O2 -ftree-vectorize test.f90 && ./a.out 
x(1) is: 0.999999999999999858E+02 
$ gfortran -O3 test.f90 && ./a.out 
x(1) is: 0.100000000000000014E+03

循环矢量化给出了不同的答案

相关推荐