如何优化嵌套for循环R

问题描述：

我想优化此嵌套for循环，其中需要最少的2个数字，然后将结果添加到数据框。我可以通过矢量化和初始化大大减少它，但我不太确定如何将该逻辑应用于嵌套for循环。有没有一种快速的方法来使这个运行更快？坐在超过5小时的运行时间。如何优化嵌套for循环R

“模拟” 具有100K的价值观，和 “极限” 已经5427个值

每

output <- data.frame(matrix(nrow = nrow(simulation),ncol = nrow(limits))) 
res <- character(nrow(simulation)) 

for(i in 1:nrow(limits)){ 
    for(j in 1:nrow(simulation)){ 
     res[j] <- min(limits[i,1],simulation[j,1]) 
    } 
    output[,i] <- res 
}

编辑*

dput(head(simulation)) 
    structure(list(simulation = c(124786.7479,269057.2118,80432.47896,119513.0161,660840.5843,190983.7893)), .Names = "simulation", row.names = c(NA,6L), class = "data.frame") 

dput(head(limits)) 
    structure(list(limits = c(5000L,10000L,20000L,25000L,30000L)), .Names = "limits", row.names = c(NA, 6L), class = "data.frame")

看看了'apply'家庭，我想'lapply'会工作在你的情况。它可以有效地替代'for'并且运行得更快（或者我发现并阅读了其他人的发现）。另外，我们可以得到一个输入（头（模拟））和输出（头（限制））吗？所以我们可以看到数据的结构？如果你是完全向量化的，“sapply”可能会完成这项工作（尽管如此，我并不擅长）。 – Badger

您正在做5.42亿次计算。你究竟会怎样处理所得到的输出矩阵呢？ – thelatemail

@thelatemail计算有限的方差/标准差。开发复杂的分布，没有好的公式来计算理论值，所以我们使用的是模拟 –

答

如果您有> 15GB的RAM（〜100K * 5500 * 8个字节数* 3（结果+外X瓦尔斯+外Ÿ丘壑）），您可以尝试：

outer(simulation[[1]], limits[[1]], pmin)

虽然在现实中，你可能需要比15GB BEC更多因为我认为pmin会更多地复制东西。如果你没有内存，你将不得不分解这个问题（例如，依靠一次编写某列的代码或某些代码）。

答

基本上，当你有一个双循环时，使用Rcpp通常很有用。

此外，我将使用包bigstatsr为您节省一些RAM。您可以创建和访问存储在磁盘上的矩阵。

所以，你可以这样做：

simulation <- structure(list(simulation = c(124786.7479,269057.2118,80432.47896,119513.0161,660840.5843,190983.7893)), .Names = "simulation", row.names = c(NA,6L), class = "data.frame") 
limits <- structure(list(limits = c(5000L,10000L,15000L, 20000L,25000L,30000L)), .Names = "limits", row.names = c(NA, 6L), class = "data.frame") 

library(bigstatsr) 
# Create the filebacked matrix on disk (in `/tmp/` by default) 
mat <- FBM(nrow(simulation), nrow(limits)) 
# Fill this matrix in Rcpp 
Rcpp::sourceCpp('fill-FBM.cpp') 
fillMat(mat, limits[[1]], simulation[[1]]) 
# Access the whole matrix in RAM to verify 
# or you could access only block of columns 
mat[] 
mat[, 1:3]

其中 '填写FBM.cpp' 是

// [[Rcpp::depends(bigstatsr, BH)]] 
#include <bigstatsr/BMAcc.h> 
#include <Rcpp.h> 
using namespace Rcpp; 


// [[Rcpp::export]] 
void fillMat(Environment BM, 
      const NumericVector& limits, 
      const NumericVector& simulation) { 

    XPtr<FBM> xpBM = BM["address"]; 
    BMAcc<double> macc(xpBM); 

    int n = macc.nrow(); 
    int m = macc.ncol(); 

    for (int i = 0; i < m; i++) 
    for (int j = 0; j < n; j++) 
     macc(j, i) = std::min(limits[i], simulation[j]); 
}

如何优化嵌套for循环R

相关推荐