Numpy的简单介绍

注：本文大部分是看python数据分析这本书做的笔记加上一些自己的解释，以后作为复习

Numpy中最重要的一个对象就是 ndarray ----- 一个多维数组对象

创建ndarray

创建ndarray的方法
方法	描述
array	Convert input data (list, tuple, array, or other sequence type) to an ndarray either by inferring a dtype or explicitly specifying a dtype. Copies the input data by default.
asarray	Convert input to ndarray, but do not copy if the input is already an ndarray
arange	Like the built-in range but returns an ndarray instead of a list.
ones, ones_like	Produce an array of all 1’s with the given shape and dtype. ones_like takes another array and produces a ones array of the same shape and dtype.
zeros, zeros_like	Like ones and ones_like but producing arrays of 0’s instead
empty, empty_like	Create new arrays by allocating new memory, but do not populate with any values like ones and zeros
eye, identity	Create a square N x N identity matrix (1’s on the diagonal and 0’s elsewhere)

每个数组都有：
        shape -- a tuple indicating the size of each dimension
        dtype -- an object describing the data type of the array

np.ones(),np.zeros()函数均可由np.full()函数替代  例：a = np.full((3,3),0)

np.random.random()  # 产生0到1之间的随机数

NumPy 的数据类型

类型类型代码描述

int8, uint8	i1, u1	Signed and unsigned 8-bit (1 byte) integer types
int16, uint16	i2, u2	Signed and unsigned 16-bit integer types
int32, uint32	i4, u4	Signed and unsigned 32-bit integer types
int64, uint64	i8, u8	Signed and unsigned 32-bit integer types
float16	f2	Half-precision floating point
float32	f4 or f	Standard single-precision floating point. Compatible with C float

float64, float128	f8 or d	Standard double-precision floating point. Compatible with C double and Python float object
float128	f16 or g	Extended-precision floating point
complex64, complex128, complex256	c8, c16, c32	Complex numbers represented by two 32, 64, or 128 floats, respectively
bool	?	Boolean type storing True and False values
object	O	Python object type
string_	S	Fixed-length string type (1 byte per character). For example, to create a string dtype with length 10, use 'S10'.
unicode_	U	Fixed-length unicode type (number of bytes platform specific). Same specification semantics as string_ (e.g. 'U10')

ndarray之间的数据类型可以使用 astype 函数转化：

       In [12]: arr = np.array([1, 2, 3, 4, 5])
       In [13]: arr.dtype
       Out[13]: dtype('int64')
       In [14]: float_arr = arr.astype(np.float64)
       In [15]: float_arr.dtype
       Out[15]: dtype('float64')

数字值的字符串转换为 numeric ：

       In [16]: numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)
       In [17]: numeric_strings.astype(float)
       Out[17]: array([ 1.25, -9.6 , 42. ])

还可以使用别的数组的类型进行转换或者使用数据类型代码指定类型：

    In [18]: calibers = np.array([.22, .270, .357, .380, .44, .50], dtype=np.float64)
    In [19]: int_array.astype(calibers.dtype)
    Out[19]: array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

    In [20]: empty_uint32 = np.empty(8, dtype='u4')

astype函数总是会产生数据的副本，即使转换前后的数据类型一样

ndarray的shape属性和reshape函数

a = np.array([1,2,3])
a.shape  # (3,)  
# reshape函数可以明确指定维数改变 each dimension -- 参数一个tuple
a = a.reshape((1,-1)) # (1行3列） 
a = a.reshape((3,-1)) # (3行1列)
#其中-1是个占位符，不表示任何意义
a = np.arange(16).reshape((2, 2, 4))  # 产生一个三维数组：一维数组中包含2个元素，每个元素是一个包含2个
                                      # 元素的数组，这2个元素每个元素同样又是一个包含4个元素的数组
In [51]: a = np.arange(16).reshape((2, 2, 4))
In [52]: a
Out[52]:
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],
       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

索引

numpy中切片出来的数组都是原始数据的一个视图，并不是原数据的一个副本。python中切片出来的是原数据的一个副本。

如果要创建原始数据的一个副本需要明确指定  例：arr[:].copy()

one-dimension 的数组很简单，跟python中的数组切片差不多。

In [21]: arr = np.arange(10)
In [22]: arr
Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [23]: arr[5]
Out[23]: 5
In [24]: arr[5:8]
Out[24]: array([5, 6, 7])
In [25]: arr[5:8] = 12    # numpy中的广播特性，python中不可以
In [26]: arr
Out[26]: array([ 0, 1, 2, 3, 4, 12, 12, 12, 8, 9])

higher dimension 索引的选择就比较多了。

Numpy的简单介绍

2d数组的索引如上图

In [28]: arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
In [29]: arr2d
Out[29]:
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
In [30]: arr2d[2]
Out[30]: array([7, 8, 9])
In [31]: arr2d[0][2]
Out[31]: 3
In [32]: arr2d[0,2]
Out[32]: 3
In [34]: a = arr2d[0,2]
In [35]: a.shape
Out[35]: ()     # 0维

索引切片

Numpy的简单介绍

In [38]: a = arr2d[0,2:3]
In [39]: a
Out[39]: array([3])
In [40]: a.shape
Out[40]: (1,)
In [41]: a = arr2d[0:3,2:3]
In [42]: a
Out[42]:
array([[3],
       [6],
       [9]])

进行索引时，若对其中一个维度进行整数操作，则结果维度减一
>>> a
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> b = a[2,1:3]
>>> b.shape
(2,)
>>> b = a[2:3,1:3]
>>> b.shape
(1, 2)

布尔索引

In [83]: names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
In [84]: data = randn(7, 4)In [87]: names == 'Bob'  # 产生一个bool数组
Out[87]: array([ True, False, False, True, False, False, False], dtype=bool)
In [88]: data[names == 'Bob']  # 选取为True的行, bool数组的长度必须与要操作的数组的索引长度相同,
                               # 还可以data[-(names == 'Bob')]
Out[88]:
array([[-0.048 , 0.5433, -0.2349, 1.2792],
       [ 2.1452, 0.8799, -0.0523, 0.0672]])
In [89]: data[names == 'Bob', 2:]  # 可以用切片或者整数选取部分
Out[89]:
array([[-0.2349, 1.2792],
       [-0.0523, 0.0672]])
In [90]: data[names == 'Bob', 3]   # 整数
Out[90]: array([ 1.2792, 0.0672])
In [93]: mask = (names == 'Bob') | (names == 'Will')  # | (or)  & (and)
In [94]: mask
Out[94]: array([True, False, True, True, True, False, False], dtype=bool)
In [95]: data[mask]

花式索引 --用整数数组描述索引

In [100]: arr = np.empty((8, 4))  # 产生 8 x 4 的数组
In [101]: for i in range(8):           #  维数组赋值
       ..... :     arr[i] = i
In [103]: arr[[4, 3, 0, 6]]               # 一次选择下标为 4,3,0,6 的行
In [104]: arr[[-3, -5, -7]]              # 使用负数索引也是可以的   负数索引从 -1 开始
In [105]: arr = np.arange(32).reshape((8, 4))
In [107]: arr[[1, 5, 7, 2], [0, 3, 1, 2]]    # (1, 0), (5, 3), (7,1), (2, 2)，是不是和自己想的不一样？就是这样的
Out[107]: array([ 4, 23, 29, 10])
In [108]: arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]]  # 产生一个矩阵
Out[108]:
array([[ 4, 7, 5, 6],
           [20, 23, 21, 22],
           [28, 31, 29, 30],
           [ 8, 11, 9, 10]])
In [109]: arr[np.ix_([1, 5, 7, 2], [0, 3, 1, 2])]  # np.ix_()函数作用同 arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]]
花式索引总是产生原始数组的一个副本

数学运算与常见函数

Numpy的简单介绍

(以上函数均不是矩阵运算)
+  -->  np.add(a,b)
-  -->  np.suntract(a,b)
*  -->  np.multiply(a,b)
/  -->  np.divide(a,b)

对a的第二列加10   #  数组 a 的shape (3,3)
>>> a[np.arange(3),1] +=10
>>> a[np.arange(3),[1,1,1]] +=10
>>> a[[0,1,2],[1,1,1]] +=10
选取a中大于0 的值
>>> re = a>10
>>> re
array([[False,  True, False, False],
       [False,  True, False, False],
       [False,  True, False,  True]])
>>> a[re]
array([31, 35, 39, 11])
>>> a[a>10]
array([31, 35, 39, 11])

常用函数

np.sum()
>>> a = np.array([[1,2],[3,4]])
>>> a
array([[1, 2],
       [3, 4]])
>>> a.sum()
10
>>> np.sum(a)
10
>>> np.sum(a,axis=0) # 每一列求和
array([4, 6])
>>> np.sum(a,axis=1) # 每一行求和
array([3, 7])
>>> a.sum(axis=0)
array([4, 6])
>>> a.sum(axis=1)
array([3, 7])

np.random.uniform()        # 产生随机数
np.tile(array,(,))         # 将指定的数组重复一定的次数
np.argsort()               # 排序  返回下标

T,transpose,swapaxes

1）a.T                  # 属性
2）np.transpose(a)      # 内置方法
3）a.transpose()        # 数组方法 不带参数的话 = a.T
4) a.swapaxes()         # 交换维度
a = np.arange(16).reshape((2, 2, 4))   # a.shape  (2,2,4)
a.T                                    # a.shape  (4,2,2)
a.transpose()                          # a.shape  (4,2,2)
a.transpose((1,0,2))   # a.shape (2,2,4), 
                       # a.shape (2,2,4) 的下标表示 --->(0,1,2), 其中 (1,0,2) 就是交换第一，二个元素。
                       # transpose参数中的(1,0,2)相当于把原始数据中每个数据的第一，二个下标交换之后组成新的数组。
a.swapaxes(1,2)        # a.shape (2,4,2)  参数仍然是维度下标

numpy 中的 where 条件函数

In [147]: arr = randn(4, 4)
In [149]: np.where(arr > 0, 2, -2)
In [150]: np.where(arr > 0, 2, arr) # set only positive values to 2

any和all boolean函数

In [162]: bools = np.array([False, False, True, False])
In [163]: bools.any()
Out[163]: True
In [164]: bools.all()
Out[164]: False

sort函数

默认是按行排序，也可以指定按行(1)、列(0)排序

In [80]: arr = np.random.randn(5, 3)
In [82]: arr.sort()  # 默认是按行排序
In [83]: arr
Out[83]:
array([[-1.2629921 , -0.75419353,  0.24817741],
       [ 0.5467019 ,  1.46272747,  1.50331672],
       [-1.19504888,  0.61300717,  0.83061943],
       [-1.22133562,  0.49668954,  1.73834466],
       [-2.25860226, -0.90163896, -0.53758088]])
In [84]: arr.sort(0)   # 列(0)
In [85]: arr
Out[85]:
array([[-2.25860226, -0.90163896, -0.53758088],
       [-1.2629921 , -0.75419353,  0.24817741],
       [-1.22133562,  0.49668954,  0.83061943],
       [-1.19504888,  0.61300717,  1.50331672],
       [ 0.5467019 ,  1.46272747,  1.73834466]])
In [86]: arr.sort(axis=1)  # 行(1)

Unique函数和一些其他函数

Method Description

unique(x)	Compute the sorted, unique elements in x
intersect1d(x, y)	Compute the sorted, common elements in x and y
union1d(x, y)	Compute the sorted union of elements
in1d(x, y)	Compute a boolean array indicating whether each element of x is contained in y
setdiff1d(x, y)	Set difference, elements in x that are not in y
setxor1d(x, y)	Set symmetric differences; elements that are in either of the arrays, but not both

In [89]: names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
In [90]: np.unique(names)
Out[90]: array(['Bob', 'Joe', 'Will'], dtype='<U4')
In [91]: ints = np.array([3, 3, 3, 2, 2, 1, 1, 4, 4])
In [92]: np.unique(ints)
Out[92]: array([1, 2, 3, 4])
In [93]: sorted(set(names))
Out[93]: ['Bob', 'Joe', 'Will']
In [94]: values = np.array([6, 0, 0, 3, 2, 5, 6])
In [95]: np.in1d(values, [2, 3, 6])
Out[95]: array([ True, False, False,  True,  True, False,  True])

文件操作

In [96]: arr = np.arange(10)
In [98]: np.save('D:\save_arr',arr)
In [101]: np.load('D:/save_arr.npy')
Out[101]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

np.loadtxt()  #  加载TXT文件

常用的 numpy.linalg 函数

Numpy的简单介绍

矩阵乘法
1）a.dot(b)
2）np.dot(a,b)

numpy.random中的函数

Function Description

seed	Seed the random number generator
permutation	Return a random permutation of a sequence, or return a permuted range
shuffle	Randomly permute a sequence in place
rand	Draw samples from a uniform distribution
randint	Draw random integers from a given low-to-high range
randn	Draw samples from a normal distribution with mean 0 and standard deviation 1 (MATLAB-like interface)
binomial	Draw samples a binomial distribution
normal	Draw samples from a normal (Gaussian) distribution
beta	Draw samples from a beta distribution
chisquare	Draw samples from a chi-square distribution
gamma	Draw samples from a gamma distribution
uniform	Draw samples from a uniform [0, 1) distribution

用到什么函数不懂的话百度一下就OK了。这里介绍一个常用的函数 : normal

Draw random samples from a normal (Gaussian) distribution.  也就是高斯分布。
numpy.random.normal(loc=0.0, scale=1.0, size=None)

np.random.randn(size)所谓标准正态分布  loc = 0，scale = 1

来自官方文档：
loc : float or array_like of floats
        Mean ("centre") of the distribution.  即均值
scale : float or array_like of floats
        Standard deviation (spread or "width") of the distribution.  即标准差
size : int or tuple of ints, optional
        Output shape.  If the given shape is, e.g., ``(m, n, k)``, then
        ``m * n * k`` samples are drawn.  If size is ``None`` (default),
        a single value is returned if ``loc`` and ``scale`` are both scalars.
        Otherwise, ``np.broadcast(loc, scale).size`` samples are drawn.

         # 校验均值和标准差
In [112]: mu, sigma = 0, 0.1
In [113]: s = np.random.normal(mu, sigma, 1000)
In [114]: abs(mu - np.mean(s)) < 0.01
Out[114]: True
In [115]: abs(sigma - np.std(s, ddof=1)) < 0.01   # ddof，delta degrees of freedom，表示*度
                                                  # 一般取1，表示无偏估计，
Out[115]: True

          # 用 matplotlib 拟合
In [116]: import matplotlib.pyplot as plt
In [118]: count, bins, ignored = plt.hist(s, 30, density=True)
In [119]: plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) *
     ...:     np.exp( - (bins - mu)**2 / (2 * sigma**2) ),
     ...:     linewidth=2, color='r')
Out[119]: [<matplotlib.lines.Line2D at 0x212dc9f5a90>]
In [120]: plt.show()

高斯分布概率密度函数：

Numpy的简单介绍

拟合结果：

Numpy的简单介绍

创建ndarray

NumPy 的数据类型

ndarray的shape属性和reshape函数

索引

one-dimension 的数组很简单，跟python中的数组切片差不多。

higher dimension 索引的选择就比较多了。

布尔索引

花式索引 --用整数数组描述索引

数学运算与常见函数

常用函数

T,transpose,swapaxes

numpy 中的 where 条件函数

any和all boolean函数

sort函数

Unique函数和一些其他函数

文件操作

常用的 numpy.linalg 函数

numpy.random中的函数

相关推荐