如何识别具有归一化向量的数据帧的每个实例为分位数(如0,0.25,0.5,0.75,1)?
问题描述:
我有一个数据框有20个变量和400k实例。所有的变量用均值0和标准差1归一化。我想写一个函数,可以将每个变量的每个实例分类为分位数。如何识别具有归一化向量的数据帧的每个实例为分位数(如0,0.25,0.5,0.75,1)?
Lets say we have a normalized vector
a <- c(0.2132821 -1.5136988 0.6450274 1.5085178 0.2132821 1.5085178 0.6450274)
And the quantiles for this vector are
quant.a <- c(-1.5136988 -1.0819535 0.2132821 1.0767726 1.5085178)
where -1.5136988 is 0%
-1.0819535 is 25%
0.2132821 is 50%
1.0767726 is 75%
1.5085178 is 100% (all are elements in vector 'quant.a')
Now, I want to classify each element of vector 'a' as follows
new.a <- c(0.5, 0, 0.75, 1, 0.5, 1, 0.75)
You can use the following code to workout through the example as it is not possible for me to share the actual data
# Generate random data
set.seed(99)
# All variables are on a scale of 1-9
a <- floor(runif(500, min = 1, max = 9))
b <- floor(runif(500, min = 1, max = 9))
c <- floor(runif(500, min = 1, max = 9))
# store variables as dataframe
x <- data.frame(cbind(a,b,c))
#Scale variables
scaled.dat <- data.frame(scale(x))
# check that we get mean of 0 and sd of 1
colMeans(scaled.dat)
apply(scaled.dat, 2, sd)
# generate quantiles for each variables
quantiles <- data.frame(apply(scaled.dat,2,quantile))
预先感谢
答
library(dplyr)
yourdataframe %>%
mutate_all(funs(ntile(., 4)/4)
答
a <- c(0.2132821, -1.5136988, 0.6450274 , 1.5085178 , 0.2132821 , 1.5085178 , 0.6450274)
quant.a = quantile(a)
aux_matrix = findInterval(a, quant.a)
new.a = ifelse(aux_matrix == 1|aux_matrix == 0, 0,
ifelse(aux_matrix == 2, 0.5,
ifelse(aux_matrix==3,0.75,
1)))
print(new.a)
0.50 0.00 0.75 1.00 0.50 1.00 0.75
嘿,Brian!它是找到第i个分位数的好方法(如分位数1,分位数2,等等)。但我一直在寻找输出形式(0,.25,.5,.75,1)。但无论如何谢谢。总是善于学习新东西 – Nikhil
@尼克希尔,你只需要包含'/ 4'来获得这个表示。如果你想要更明确的标签,你可以使用'percent_rank'代替,然后再次调用'mutate_all(funs(cut(。,0:4/4)))' – Brian
非常感谢你!它运作良好! – Nikhil