计算64×64 int产品的高64位C

问题描述：

我希望我的C函数有效地计算两个64位带符号整数的乘积的高64位。我知道如何在x86-64程序集中做到这一点，并使用imulq并将结果从％rdx中提取出来。但是我完全不知道如何用C编写它，更不用说哄编译器来高效地完成它了。计算64×64 int产品的高64位C

有没有人有任何建议在C写这个？这是对性能敏感的，所以“手动方法”（如俄罗斯农民或者bignum库）已经不在了。

这傻乎乎的内联汇编函数我写的作品和大致经过我的代码生成：

static long mull_hi(long inp1, long inp2) { 
    long output = -1; 
    __asm__("movq %[inp1], %%rax;" 
      "imulq %[inp2];" 
      "movq %%rdx, %[output];" 
      : [output] "=r" (output) 
      : [inp1] "r" (inp1), [inp2] "r" (inp2) 
      :"%rax", "%rdx"); 
    return output; 
}

答

一般的答案是x * y可以分解成(a + b) * (c + d)，其中a和c是高阶部分。

一是扩大到ac + ad + bc + bd

现在，你乘的条款存储为long long 32位数字（或更好，但uint64_t），你只要记住，当你乘更高的订单号码，您需要缩放32位。然后你做了补充，记住要检测进位。跟踪标志。当然，你需要做一些补充。

我喜欢用因子h。这给出（ha + b）*（hc + d）= hhac + had + hbc + bd。 'h'基本上是一种跟踪32位的方式。每个术语都需要64位（忽略h因子），给出32位格雷克斯，但是（2^n）-1 *（2^n）-1 =（2^2n）-2（2^n）+ 1，这是 Steve314 2009-10-09 02:38:50

Steve314：你以前做过！好点。我昨晚输入了一个实现，并将其作为新的答案发送。 – DigitalRoss 2009-10-09 22:00:11

答

等待，你有一个完美的优化组装解决方案，已经为此工作了，并且你想退出并尝试将它写入这个不支持128位数学的环境？我没有跟随。

正如你所知道的，这个操作是一个关于 x86-64的单指令。显然，你所做的任何事情都不会让它工作得更好。如果你真的想要便携式C，你需要做一些类似于 DigitalRoss的代码，并希望你的优化器能够找出你正在做什么。

如果您需要架构便携性，而且愿意自己限制到GCC平台，有__int128_t（和__uint128_t）类型的编译器内在会做你想要什么。

答

如果你在x86_64使用相对较新的GCC：

int64_t mulHi(int64_t x, int64_t y) { 
    return (int64_t)((__int128_t)x*y >> 64); 
}

在-O1及更高版本，编译到你想要的东西：

_mulHi: 
0000000000000000 movq %rsi,%rax 
0000000000000003 imulq %rdi 
0000000000000006 movq %rdx,%rax 
0000000000000009 ret

我相信铛和VC++也有__int128_t类型的支持，所以这也应该在这些平台上工作，通常有关于自己尝试的警告。

答

关于您的组装解决方案，请勿硬编码mov说明！让编译器为你做。这是你的代码的修改版本：

static long mull_hi(long inp1, long inp2) { 
    long output; 
    __asm__("imulq %2" 
      : "=d" (output) 
      : "a" (inp1), "r" (inp2)); 
    return output; 
}

有益的参考：Machine Constraints

答

既然你做了很好的工作解决了与机器代码你自己的问题，我想你应该得到一些帮助，便携式版本。如果在x86上的gnu中，我会在ifdef中只使用程序集。

无论如何，这里是一个实现...我敢肯定这是正确的，但没有保证，昨晚我刚刚敲出了这个......你可能应该摆脱静态positive_result []和result_negative，这些只是我的单元测试的文物......

#include <stdlib.h> 
#include <stdio.h> 

// stdarg.h doesn't help much here because we need to call llabs() 

typedef unsigned long long uint64_t; 
typedef signed long long int64_t; 

#define B32 0xffffffffUL 

static uint64_t positive_result[2]; // used for testing 
static int result_negative;   // used for testing 

static void mixed(uint64_t *result, uint64_t innerTerm) 
{ 
    // the high part of innerTerm is actually the easy part 

    result[1] += innerTerm >> 32; 

    // the low order a*d might carry out of the low order result 

    uint64_t was = result[0]; 

    result[0] += (innerTerm & B32) << 32; 

    if (result[0] < was) // carry! 
     ++result[1]; 
} 


static uint64_t negate(uint64_t *result) 
{ 
    uint64_t t = result[0] = ~result[0]; 
    result[1] = ~result[1]; 
    if (++result[0] < t) 
    ++result[1]; 
    return result[1]; 
} 

uint64_t higherMul(int64_t sx, int64_t sy) 
{ 
    uint64_t x, y, result[2] = { 0 }, a, b, c, d; 

    x = (uint64_t)llabs(sx); 
    y = (uint64_t)llabs(sy); 

    a = x >> 32; 
    b = x & B32; 
    c = y >> 32; 
    d = y & B32; 

    // the highest and lowest order terms are easy 

    result[1] = a * c; 
    result[0] = b * d; 

    // now have the mixed terms ad + bc to worry about 

    mixed(result, a * d); 
    mixed(result, b * c); 

    // now deal with the sign 

    positive_result[0] = result[0]; 
    positive_result[1] = result[1]; 
    result_negative = sx < 0^sy < 0; 
    return result_negative ? negate(result) : result[1]; 
}

计算64×64 int产品的高64位C

相关推荐