Numpy的简单介绍

注:本文大部分是看python数据分析这本书做的笔记加上一些自己的解释,以后作为复习

Numpy中最重要的一个对象就是 ndarray  -----  一个多维数组对象

创建ndarray

创建ndarray的方法
方法 描述
array Convert input data (list, tuple, array, or other sequence type) to an ndarray either by
inferring a dtype or explicitly specifying a dtype. Copies the input data by default.
asarray Convert input to ndarray, but do not copy if the input is already an ndarray
arange Like the built-in range but returns an ndarray instead of a list.
ones, ones_like Produce an array of all 1’s with the given shape and dtype. ones_like takes another array and produces a ones array of the same shape and dtype.
zeros, zeros_like Like ones and ones_like but producing arrays of 0’s instead
empty, empty_like Create new arrays by allocating new memory, but do not populate with any values like
ones and zeros
eye, identity Create a square N x N identity matrix (1’s on the diagonal and 0’s elsewhere)
每个数组都有:
        shape -- a tuple indicating the size of each dimension
        dtype -- an object describing the data type of the array
np.ones(),np.zeros()函数均可由np.full()函数替代  例:a = np.full((3,3),0)
np.random.random()  # 产生0到1之间的随机数

NumPy 的数据类型

        类型                                  类型代码                                 描述
int8, uint8 i1, u1 Signed and unsigned 8-bit (1 byte) integer types
int16, uint16 i2, u2 Signed and unsigned 16-bit integer types
int32, uint32 i4, u4 Signed and unsigned 32-bit integer types
int64, uint64 i8, u8 Signed and unsigned 32-bit integer types
float16 f2 Half-precision floating point
float32 f4 or f Standard single-precision floating point. Compatible with C float
float64, float128 f8 or d Standard double-precision floating point. Compatible with C double
and Python
float object
float128  f16 or g  Extended-precision floating point
complex64, complex128,
complex256
c8, c16,
c32
Complex numbers represented by two 32, 64, or 128 floats, respectively
bool 
?  Boolean type storing True and False values
object   O Python object type
string_  
S Fixed-length string type (1 byte per character). For example, to create a string dtype with length 10, use 'S10'.
unicode_   U Fixed-length unicode type (number of bytes platform specific). Same specification semantics as string_ (e.g. 'U10')
     

ndarray之间的数据类型可以使用 astype 函数转化:

       In [12]: arr = np.array([1, 2, 3, 4, 5])
       In [13]: arr.dtype
       Out[13]: dtype('int64')
       In [14]: float_arr = arr.astype(np.float64)
       In [15]: float_arr.dtype
       Out[15]: dtype('float64')

数字值的字符串转换为 numeric :

       In [16]: numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)
       In [17]: numeric_strings.astype(float)
       Out[17]: array([ 1.25, -9.6 , 42. ])

还可以使用别的数组的类型进行转换或者使用数据类型代码指定类型:

    In [18]: calibers = np.array([.22, .270, .357, .380, .44, .50], dtype=np.float64)
    In [19]: int_array.astype(calibers.dtype)
    Out[19]: array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

    In [20]: empty_uint32 = np.empty(8, dtype='u4') 
astype函数总是会产生数据的副本,即使转换前后的数据类型一样

ndarray的shape属性和reshape函数

a = np.array([1,2,3])
a.shape  # (3,)  
# reshape函数可以明确指定维数改变 each dimension -- 参数一个tuple
a = a.reshape((1,-1)) # (1行3列) 
a = a.reshape((3,-1)) # (3行1列)
#其中-1是个占位符,不表示任何意义
a = np.arange(16).reshape((2, 2, 4))  # 产生一个三维数组:一维数组中包含2个元素,每个元素是一个包含2个
                                      # 元素的数组,这2个元素每个元素同样又是一个包含4个元素的数组
In [51]: a = np.arange(16).reshape((2, 2, 4))
In [52]: a
Out[52]:
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],
       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

索引

numpy中切片出来的数组都是原始数据的一个视图,并不是原数据的一个副本。python中切片出来的是原数据的一个副本。
如果要创建原始数据的一个副本需要明确指定  例:arr[:].copy()
one-dimension 的数组很简单,跟python中的数组切片差不多。
In [21]: arr = np.arange(10)
In [22]: arr
Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [23]: arr[5]
Out[23]: 5
In [24]: arr[5:8]
Out[24]: array([5, 6, 7])
In [25]: arr[5:8] = 12    # numpy中的广播特性,python中不可以
In [26]: arr
Out[26]: array([ 0, 1, 2, 3, 4, 12, 12, 12, 8, 9])
higher dimension 索引的选择就比较多了。

Numpy的简单介绍

        2d数组的索引如上图

In [28]: arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
In [29]: arr2d
Out[29]:
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
In [30]: arr2d[2]
Out[30]: array([7, 8, 9])
In [31]: arr2d[0][2]
Out[31]: 3
In [32]: arr2d[0,2]
Out[32]: 3
In [34]: a = arr2d[0,2]
In [35]: a.shape
Out[35]: ()     # 0维

索引切片

Numpy的简单介绍


In [38]: a = arr2d[0,2:3]
In [39]: a
Out[39]: array([3])
In [40]: a.shape
Out[40]: (1,)
In [41]: a = arr2d[0:3,2:3]
In [42]: a
Out[42]:
array([[3],
       [6],
       [9]])
进行索引时,若对其中一个维度进行整数操作,则结果维度减一
>>> a
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> b = a[2,1:3]
>>> b.shape
(2,)
>>> b = a[2:3,1:3]
>>> b.shape
(1, 2)
布尔索引

In [83]: names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
In [84]: data = randn(7, 4)In [87]: names == 'Bob'  # 产生一个bool数组
Out[87]: array([ True, False, False, True, False, False, False], dtype=bool)
In [88]: data[names == 'Bob']  # 选取为True的行, bool数组的长度必须与要操作的数组的索引长度相同,
                               # 还可以data[-(names == 'Bob')]
Out[88]:
array([[-0.048 , 0.5433, -0.2349, 1.2792],
       [ 2.1452, 0.8799, -0.0523, 0.0672]])
In [89]: data[names == 'Bob', 2:]  # 可以用切片或者整数选取部分
Out[89]:
array([[-0.2349, 1.2792],
       [-0.0523, 0.0672]])
In [90]: data[names == 'Bob', 3]   # 整数
Out[90]: array([ 1.2792, 0.0672])
In [93]: mask = (names == 'Bob') | (names == 'Will')  # | (or)  & (and)
In [94]: mask
Out[94]: array([True, False, True, True, True, False, False], dtype=bool)
In [95]: data[mask]
花式索引  --用整数数组描述索引
In [100]: arr = np.empty((8, 4))  # 产生 8 x 4 的数组
In [101]: for i in range(8):           #  维数组赋值
       ..... :     arr[i] = i
In [103]: arr[[4, 3, 0, 6]]               # 一次选择下标为 4,3,0,6 的行
In [104]: arr[[-3, -5, -7]]              # 使用负数索引也是可以的   负数索引从 -1 开始
In [105]: arr = np.arange(32).reshape((8, 4))
In [107]: arr[[1, 5, 7, 2], [0, 3, 1, 2]]    # (1, 0), (5, 3), (7,1), (2, 2),是不是和自己想的不一样?就是这样的
Out[107]: array([ 4, 23, 29, 10])
In [108]: arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]]  # 产生一个矩阵
Out[108]:
array([[ 4, 7, 5, 6],
           [20, 23, 21, 22],
           [28, 31, 29, 30],
           [ 8, 11, 9, 10]])
In [109]: arr[np.ix_([1, 5, 7, 2], [0, 3, 1, 2])]  # np.ix_()函数作用同 arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]]
花式索引总是产生原始数组的一个副本

数学运算与常见函数

Numpy的简单介绍

Numpy的简单介绍

Numpy的简单介绍

(以上函数均不是矩阵运算)
+  -->  np.add(a,b)
-  -->  np.suntract(a,b)
*  -->  np.multiply(a,b)
/  -->  np.divide(a,b)
对a的第二列加10   #  数组 a 的shape (3,3)
>>> a[np.arange(3),1] +=10
>>> a[np.arange(3),[1,1,1]] +=10
>>> a[[0,1,2],[1,1,1]] +=10
选取a中大于0 的值
>>> re = a>10
>>> re
array([[False,  True, False, False],
       [False,  True, False, False],
       [False,  True, False,  True]])
>>> a[re]
array([31, 35, 39, 11])
>>> a[a>10]
array([31, 35, 39, 11])       

常用函数

Numpy的简单介绍
np.sum()
>>> a = np.array([[1,2],[3,4]])
>>> a
array([[1, 2],
       [3, 4]])
>>> a.sum()
10
>>> np.sum(a)
10
>>> np.sum(a,axis=0) # 每一列求和
array([4, 6])
>>> np.sum(a,axis=1) # 每一行求和
array([3, 7])
>>> a.sum(axis=0)
array([4, 6])
>>> a.sum(axis=1)
array([3, 7])
np.random.uniform()        # 产生随机数
np.tile(array,(,))         # 将指定的数组重复一定的次数
np.argsort()               # 排序  返回下标

T,transpose,swapaxes

1)a.T                  # 属性
2)np.transpose(a)      # 内置方法
3)a.transpose()        # 数组方法 不带参数的话 = a.T
4) a.swapaxes()         # 交换维度
a = np.arange(16).reshape((2, 2, 4))   # a.shape  (2,2,4)
a.T                                    # a.shape  (4,2,2)
a.transpose()                          # a.shape  (4,2,2)
a.transpose((1,0,2))   # a.shape (2,2,4), 
                       # a.shape (2,2,4) 的下标表示 --->(0,1,2), 其中 (1,0,2) 就是交换第一,二个元素。
                       # transpose参数中的(1,0,2)相当于把原始数据中每个数据的第一,二个下标交换之后组成新的数组。
a.swapaxes(1,2)        # a.shape (2,4,2)  参数仍然是维度下标

numpy 中的 where 条件函数

In [147]: arr = randn(4, 4)
In [149]: np.where(arr > 0, 2, -2)
In [150]: np.where(arr > 0, 2, arr) # set only positive values to 2

any和all boolean函数

In [162]: bools = np.array([False, False, True, False])
In [163]: bools.any()
Out[163]: True
In [164]: bools.all()
Out[164]: False

sort函数

默认是按行排序,也可以指定按行(1)、列(0)排序

In [80]: arr = np.random.randn(5, 3)
In [82]: arr.sort()  # 默认是按行排序
In [83]: arr
Out[83]:
array([[-1.2629921 , -0.75419353,  0.24817741],
       [ 0.5467019 ,  1.46272747,  1.50331672],
       [-1.19504888,  0.61300717,  0.83061943],
       [-1.22133562,  0.49668954,  1.73834466],
       [-2.25860226, -0.90163896, -0.53758088]])
In [84]: arr.sort(0)   # 列(0)
In [85]: arr
Out[85]:
array([[-2.25860226, -0.90163896, -0.53758088],
       [-1.2629921 , -0.75419353,  0.24817741],
       [-1.22133562,  0.49668954,  0.83061943],
       [-1.19504888,  0.61300717,  1.50331672],
       [ 0.5467019 ,  1.46272747,  1.73834466]])
In [86]: arr.sort(axis=1)  # 行(1)

Unique函数和一些其他函数

Method                                 Description

unique(x) Compute the sorted, unique elements in x
intersect1d(x, y) Compute the sorted, common elements in x and y
union1d(x, y) Compute the sorted union of elements
in1d(x, y) Compute a boolean array indicating whether each element of x is contained in y
setdiff1d(x, y) Set difference, elements in x that are not in y
setxor1d(x, y) Set symmetric differences; elements that are in either of the arrays, but not both
In [89]: names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
In [90]: np.unique(names)
Out[90]: array(['Bob', 'Joe', 'Will'], dtype='<U4')
In [91]: ints = np.array([3, 3, 3, 2, 2, 1, 1, 4, 4])
In [92]: np.unique(ints)
Out[92]: array([1, 2, 3, 4])
In [93]: sorted(set(names))
Out[93]: ['Bob', 'Joe', 'Will']
In [94]: values = np.array([6, 0, 0, 3, 2, 5, 6])
In [95]: np.in1d(values, [2, 3, 6])
Out[95]: array([ True, False, False,  True,  True, False,  True])

文件操作

In [96]: arr = np.arange(10)
In [98]: np.save('D:\save_arr',arr)
In [101]: np.load('D:/save_arr.npy')
Out[101]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
np.loadtxt()  #  加载TXT文件

常用的 numpy.linalg 函数

Numpy的简单介绍

矩阵乘法
1)a.dot(b)
2)np.dot(a,b)

numpy.random中的函数

Function           Description
seed Seed the random number generator
permutation Return a random permutation of a sequence, or return a permuted range
shuffle Randomly permute a sequence in place
rand Draw samples from a uniform distribution
randint Draw random integers from a given low-to-high range
randn Draw samples from a normal distribution with mean 0 and standard deviation 1 (MATLAB-like interface)
binomial Draw samples a binomial distribution
normal Draw samples from a normal (Gaussian) distribution
beta Draw samples from a beta distribution
chisquare Draw samples from a chi-square distribution
gamma Draw samples from a gamma distribution
uniform Draw samples from a uniform [0, 1) distribution
用到什么函数不懂的话百度一下就OK了。这里介绍一个常用的函数 : normal
Draw random samples from a normal (Gaussian) distribution.  也就是高斯分布numpy.random.normal(loc=0.0, scale=1.0, size=None)
np.random.randn(size)所谓标准正态分布  loc = 0,scale = 1
来自官方文档:
loc : float or array_like of floats
        Mean ("centre") of the distribution.  即均值
scale : float or array_like of floats
        Standard deviation (spread or "width") of the distribution.  即标准差
size : int or tuple of ints, optional
        Output shape.  If the given shape is, e.g., ``(m, n, k)``, then
        ``m * n * k`` samples are drawn.  If size is ``None`` (default),
        a single value is returned if ``loc`` and ``scale`` are both scalars.
        Otherwise, ``np.broadcast(loc, scale).size`` samples are drawn.
         # 校验均值和标准差
In [112]: mu, sigma = 0, 0.1
In [113]: s = np.random.normal(mu, sigma, 1000)
In [114]: abs(mu - np.mean(s)) < 0.01
Out[114]: True
In [115]: abs(sigma - np.std(s, ddof=1)) < 0.01   # ddof,delta degrees of freedom,表示*度
                                                  # 一般取1,表示无偏估计,
Out[115]: True
          # 用 matplotlib 拟合
In [116]: import matplotlib.pyplot as plt
In [118]: count, bins, ignored = plt.hist(s, 30, density=True)
In [119]: plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) *
     ...:     np.exp( - (bins - mu)**2 / (2 * sigma**2) ),
     ...:     linewidth=2, color='r')
Out[119]: [<matplotlib.lines.Line2D at 0x212dc9f5a90>]
In [120]: plt.show()
高斯分布概率密度函数:

Numpy的简单介绍


拟合结果:

Numpy的简单介绍