Python数据可视化 - Pandas + DataFrame( 作图 )

Pandas模块中常见函数

  • pandas.read_csv("path")
    • 读取文件时会自动判定每列的数据类型,如果一列出现多种数据类型使用.info()查看时就会显示当前列属性为object
    • 可以使用 "a[字段名].value_counts()" 来对该object类型中各个类型进行统计


data = DataFrame(np.arange(20).reshape(4,5),index = list("ABCD"),columns=list("abcde"))

  • data.head()
    • 查看前五条记录
  • data.info()
    • 查看各个字段的信息
  • data.describe()
    • 返回对每列数据基本处理后的各个数据 (mean/max之类
  • data.shape[0] / len(data)
    •  行数
  • data.shape[1] / data.columns.size
    • 列数
  • data.iloc[1:3,1:3]
    • 切片访问(Index:左闭右开)
  • data.mean[0]   +   data.mean[1]
    • 参数0表示求行平均值,1表示求列平均值


 

DataFrame绘图:

1> Plot折线图

Python数据可视化 - Pandas + DataFrame( 作图 )

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
a = pd.DataFrame(np.arange(15).reshape(3,5),columns=['Data-1','Data-2','Data-3','Data-4','Data-5'])   # Row_Name:Index
b = a.describe()
print(b)
b.plot()
plt.legend(['Data-1','Data-2','Data-3','Data-4','Data-5'],loc="upper left")
plt.show()

2> Hist直方图

https://blog.csdn.net/qq_42292831/article/details/89180775https://blog.csdn.net/qq_42292831/article/details/89180775

3> 散点图( demo涉及DataFrame行列的增加

import pandas as pd
from pandas import DataFrame
import numpy as np
import matplotlib.pyplot as plt

data = DataFrame([{"A":1,"B":2,"C":3}])
#print(data)

data = data.append([{"A":11,"B":22,"C":33},{"A":29.558,"B":55,"C":89}])
#print(data)

for i in range(20):
    b = DataFrame([{"A":np.random.rand()*100,"B":np.random.rand()*100,"C":np.random.rand()*100}])
    data = data.append(b,ignore_index=True)
#print(data)

data["D"]=np.random.ranf(23)*100
#print(data)

data.plot.scatter(x="B",y="C",color="red",alpha=0.3)
plt.show()

Python数据可视化 - Pandas + DataFrame( 作图 )



 

向DataFrame格式数据中插入一行与一列:

1> 插入一行

        使用append()函数:

                1. data = data.append([{"A":1,"B":2,"C":3}, {"A":11,"B":22,"C":33}, {"A":111,"B":222,"C":333}])

                2. data = data.append(new_data, ignore_index=True)                       

2> 插入一列( 行数较少/较多时报错 )

        data["New_Name"] = [..., ..., ...]

import pandas as pd
from pandas import DataFrame
import numpy as np
import matplotlib.pyplot as plt

data = DataFrame([{"A":1,"B":2,"C":3}])
print(data)
data = data.append([{"A":11,"B":22,"C":33},{"A":29.558,"B":55,"C":89}])
print(data)

for i in range(20):
    b = DataFrame([{"A":np.random.rand()*100,"B":np.random.rand()*100,"C":np.random.rand()*100}])
    data = data.append(b,ignore_index=True)
print(data)

data["D"]=np.random.ranf(23)*100
#print(data)

   Result:

Python数据可视化 - Pandas + DataFrame( 作图 )



 

DataFrame转List:

https://blog.csdn.net/qq_42292831/article/details/89182921