R - 如何使用新的随机数字重复数据帧处理100x并绘制删除
问题描述:
我是R的新用户,并且正在尝试创建数据框的多个子采样。我将我的数据分配到4层(STRATUM = 1,2,3,4),并且希望在每个层中只保留指定数量的行。为了实现这一点,我导入我的数据,按分层值排序,然后为每行分配一个随机数。我想保留我原来的随机数字分配,因为我需要在未来的分析中再次使用它们,所以我用这些值保存了一个.csv。接下来,我按他们的层次对数据进行分组,然后指定我想要在每个层中保留的记录数。最后,我重新加入数据并保存为新的.csv。代码有效,但是,我想重复这个过程100次。在每种情况下,我想要保存带有随机数字的.csv,以及最终的.csv随机选择的图。我不确定如何让这段代码重复100次,以及如何为每次迭代分配一个唯一的文件名。任何帮助将非常感激。R - 如何使用新的随机数字重复数据帧处理100x并绘制删除
DataFiles <- "//Documents/flownData_JR.csv"
PlotsFlown <- read.table (file = DataFiles, header = TRUE, sep = ",")
#Sort the data by the stratification
FlownStratSort <- PlotsFlown[order(PlotsFlown$STRATUM),]
#Create a new column with a random number (no duplicates)
FlownStratSort$RAND_NUM <- sample(137, size = nrow(FlownStratSort), replace = FALSE)
#Sort by the stratum, then random number
FLOWNRAND <- FlownStratSort[order(FlownStratSort$STRATUM,FlownStratSort$RAND_NUM),]
#Save a csv file with the random numbers
write.table(FLOWNRAND, file = "//Documents/RANDNUM1_JR.csv", sep = ",", row.names = FALSE, col.names = TRUE)
#Subset the data by stratum
FLOWNRAND1 <- FLOWNRAND[which(FLOWNRAND$STRATUM=='1'),]
FLOWNRAND2 <- FLOWNRAND[which(FLOWNRAND$STRATUM=='2'),]
FLOWNRAND3 <- FLOWNRAND[which(FLOWNRAND$STRATUM=='3'),]
FLOWNRAND4 <- FLOWNRAND[which(FLOWNRAND$STRATUM=='4'),]
#Remove data from each stratum, specifying the number of records we want to retain
FLOWNRAND1 <- FLOWNRAND1[1:34, ]
FLOWNRAND2 <- FLOWNRAND2[1:21, ]
FLOWNRAND3 <- FLOWNRAND3[1:7, ]
FLOWNRAND4 <- FLOWNRAND4[1:7, ]
#Rejoin the data
FLOWNRAND_uneven <- rbind(FLOWNRAND1, FLOWNRAND2, FLOWNRAND3, FLOWNRAND4)
#Save the table with plots removed from each stratum flown in 2017
write.table(FLOWNRAND_uneven, file = "//Documents/Flown_RAND_uneven_JR.csv", sep = ",", row.names = FALSE, col.names = TRUE)
答
这里有一个data.table
解决方案,如果你只需要知道哪些行是在每一组。
library(data.table)
df <- data.table(dat = runif(100),
stratum = sample(1:4, 100, replace = T))
# Gets specified number randomly from each strata
get_strata <- function(df, n, i){
# Subset data frame to randomly chosen w/in strata
# replace stratum with var name
f <- df[df[, .I[sample(.N, n)], by = stratum]$V1]
# Save as CSV, replace path
write.csv(f, file = paste0("path/df_", i),
row.names = F, col.names = T)
}
for (i in 1:100){
# replace 10 with number needed
get_strata(df, 10, i)
}
作为最终结果我想具有所有我的原始数据列的一个.csv,但是从层1只34行,从层数为2 21点的行,从第3层7行,和从地层4 7行。我希望每个阶层中的这些行都是随机选择的,这样每次重复我都会在阶层中获得不同的行子集。我想重复这个过程100倍,生成100个.csv文件。 –