从R中的行中提取值
问题描述:
我有一个包含大量列的数据框,每行都有一堆-1值,我只想保留一行中不是-1的值。例如,如果我的数据是:从R中的行中提取值
A1 A2 A3 A4 A5
-1 -1 2 -1 6
2 -1 -1 -1 -1
4 -1 -1 -1 3
6 5 -1 2 2
我所要的输出在一排除了-1提取所有的值与其它变量,说:
V1 V2 V3 V4
2 6
2
4 3
6 5 2 2
行1和行3有两个值不是-1,所以这两个值将移动V1和V2,然后V3和V4变空。第2行有1个值,因此它占用了V1,所以V2,V3和V4对于该行将为空。第4行有四个值不是-1。然后所有这些值将被新变量V1到V4占用。
答
看起来我们可以做到这一点与apply
Filter(function(x) !all(is.na(x)), as.data.frame(t(apply(df1, 1,
function(x) c(x[x!= -1], rep(NA, sum(x == -1)))))))
# V1 V2 V3 V4
#1 2 6 NA NA
#2 2 NA NA NA
#3 4 3 NA NA
#4 6 5 2 2
答
dt2
是最终输出。
# Create example data frame
dt <- read.table(text = "A1 A2 A3 A4 A5
-1 -1 2 -1 6
2 -1 -1 -1 -1
4 -1 -1 -1 3
6 5 -1 2 2",
header = TRUE)
# Replace -1 with NA
dt[dt == -1] <- NA
# Sort each row in the data frame, the result is a list
dt_list <- apply(dt, 1, sort)
# Find the maximum length of each row with non-NA values
max_len <- max(sapply(dt_list, length))
# Add NA based on the length of each row
dt_list2 <- lapply(dt_list, function(x){
if (length(x) < max_len){
x <- c(x, rep(NA, max_len - length(x)))
}
return(x)
})
# Combine all rows, create a new data frame
dt2 <- as.data.frame(do.call(rbind, dt_list2))
# Change the column name
colnames(dt2) <- paste0("V", 1:ncol(dt2))
dt2
V1 V2 V3 V4
1 2 6 NA NA
2 2 NA NA NA
3 3 4 NA NA
4 2 2 5 6
答
con <- textConnection("
A1 A2 A3 A4 A5
-1 -1 2 -1 6
2 -1 -1 -1 -1
4 -1 -1 -1 3
6 5 -1 2 2")
df <- read.delim(con, sep = " ")
df2 <- df
df2[,] <- ""
m <- 0
for(i in 1:nrow(df)) {
x <- df[i,][df[i,] != -1]
df2[i,1:length(x)] <- x
m <- max(m, length(x))
}
df2 <- df2[, 1:m]
colnames(df2) <- paste0("V", 1:m)
df2
# V1 V2 V3 V4
# 1 2 6
# 2 2
# 3 4 3
# 4 6 5 2 2