1. summary()函数可以获取描述性统计量

可以提供最小值、最大值、四分位数和数值型变量的均值，以及因子向量和逻辑型向量的频数统计

2. misc包中的describe()函数

可返回变量和观测的数量、缺失值和唯一值的数目、平均值、分位数，以及五个最大的值和五个最小的值

3.psych包中的describe()函数

psych包也拥有一个名为describe()的函数，它可以计算非缺失值的数量、平均数、标准差、中位数、截尾均值、绝对中位差、最小值、最大值、值域、偏度、峰度和平均值的标准误

4.pastecs包中的stat.desc()的函数

可以计算种类繁多的描述性统计量。使用格式为：stat.desc（x,basic=TRUE,desc=TRUE,norm=FALSE,p=0.95）
其中的x是一个数据框或时间序列。若basic=TRUE（默认值），则计算其中所有值、空值、缺失值的数量，以及最小值、最大值、值域，还有总和。若desc=TRUE（同样也是默认值），则计算中位数、平均数、平均数的标准误、平均数置信度为95%的置信区间、方差、标准差以及变异系数。最后，若norm=TRUE（不是默认的），则返回正态分布统计量，包括偏度和峰度（以及它们的统计显著程度）和Shapiro–Wilk正态检验结果

5.str()函数

以简洁的方式显示对象的数据结构及内容，可以查看数据框中每个变量的属性

6. attributes()函数

可以提取对象除长度和模式以外的各种属性

7.aggregate()函数

仅允许在每次调用中使用平均数、标准差这样的单返回值函数，它无法一次返回若干个统计量

8.by()函数

格式为：by(data,INDICES,FUN)，其中data是一个数据框或矩阵，INDICES是一个因子或因子组成的列表，定义了分组，FUN是任意函数。

9.doBy包中的summaryBy()函数

10.psych包中的describe.by()函数

1.查看数据

> dim(iris)
[1] 150   5
> names(iris)
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" 
[5] "Species"     
> str(iris)
'data.frame':	150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
> attributes(iris)
$names
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" 
[5] "Species"     

$row.names
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
 [16]  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30
 [31]  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45
 [46]  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60
 [61]  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75
 [76]  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
 [91]  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105
[106] 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
[121] 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135
[136] 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150

$class
[1] "data.frame"

> iris[1:5,]
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
> tail(iris)
    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
145          6.7         3.3          5.7         2.5 virginica
146          6.7         3.0          5.2         2.3 virginica
147          6.3         2.5          5.0         1.9 virginica
148          6.5         3.0          5.2         2.0 virginica
149          6.2         3.4          5.4         2.3 virginica
150          5.9         3.0          5.1         1.8 virginica

> iris[1:10,"Sepal.Length"]
 [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9
> iris$Sepal.Length[1:10]
 [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9

2.探索单个变量

> summary(iris)
  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
       Species  
 setosa    :50  
 versicolor:50  
 virginica :50

> quantile(iris$Sepal.Length)
  0%  25%  50%  75% 100% 
 4.3  5.1  5.8  6.4  7.9 
> quantile(iris$Sepal.Length,c(.1,.3,.65))
 10%  30%  65% 
4.80 5.27 6.20

> var(iris$Sepal.Length)
[1] 0.6856935
> hist(iris$Sepal.Length)

> plot(density((iris$Sepal.Length)))

R语言中描述统计量的多种方法summary()、describe()、str()等

> table(iris$Species)   # 计算因子的频率数

    setosa versicolor  virginica 
        50         50         50
> pie(table(iris$Species))

> barplot(table(iris$Species))

R语言中描述统计量的多种方法summary()、describe()、str()等

3. 探索多个变量

> cov(iris$Sepal.Length,iris$Petal.Length)  # 协方差
[1] 1.274315
> cov(iris[,1:4])
             Sepal.Length Sepal.Width
Sepal.Length    0.6856935  -0.0424340
Sepal.Width    -0.0424340   0.1899794
Petal.Length    1.2743154  -0.3296564
Petal.Width     0.5162707  -0.1216394
             Petal.Length Petal.Width
Sepal.Length    1.2743154   0.5162707
Sepal.Width    -0.3296564  -0.1216394
Petal.Length    3.1162779   1.2956094
Petal.Width     1.2956094   0.5810063

> cor(iris$Sepal.Length,iris$Petal.Length)  # 相关系数
[1] 0.8717538
> cor(iris[,1:4])
             Sepal.Length Sepal.Width
Sepal.Length    1.0000000  -0.1175698
Sepal.Width    -0.1175698   1.0000000
Petal.Length    0.8717538  -0.4284401
Petal.Width     0.8179411  -0.3661259
             Petal.Length Petal.Width
Sepal.Length    0.8717538   0.8179411
Sepal.Width    -0.4284401  -0.3661259
Petal.Length    1.0000000   0.9628654
Petal.Width     0.9628654   1.0000000

> aggregate(Sepal.Length~Species,summary,data=iris)
     Species Sepal.Length.Min.
1     setosa             4.300
2 versicolor             4.900
3  virginica             4.900
  Sepal.Length.1st Qu. Sepal.Length.Median
1                4.800               5.000
2                5.600               5.900
3                6.225               6.500
  Sepal.Length.Mean Sepal.Length.3rd Qu.
1             5.006                5.200
2             5.936                6.300
3             6.588                6.900
  Sepal.Length.Max.
1             5.800
2             7.000
3             7.900
> boxplot(Sepal.Length~Species,data = iris)

R语言中描述统计量的多种方法summary()、describe()、str()等

> with(iris,plot(Sepal.Length,Sepal.Width,col = Species, pch = as.numeric(Species)))

> pairs(iris)

R语言中描述统计量的多种方法summary()、describe()、str()等

R语言中描述统计量的多种方法summary()、describe()、str()等

1. summary()函数可以获取描述性统计量

2. misc包中的describe()函数

3.psych包中的describe()函数

4.pastecs包中的stat.desc()的函数

5.str()函数

6. attributes()函数

7.aggregate()函数

8.by()函数

9.doBy包中的summaryBy()函数

10.psych包中的describe.by()函数

1.查看数据

2.探索单个变量

相关推荐