使用熊猫阅读带稀疏标签的列标题的CSV
问题描述:
我有一个.csv我试图读入一个有多列列标题的熊猫数据框,但第一行的标签是稀疏的。使用熊猫阅读带稀疏标签的列标题的CSV
例如:
Binned_average_and_predicted_H2O_spectra_sorted_by_RH-class.,,,,,,,,
,RH=0.8,,,,RH=0.9,,,
,n_=_60,,,,n_=_29,,,
nat_freq,avrg_sp(T),avrg_sp(h2o),denoised_avrg_sp(h2o),pred_sp(h2o),avrg_sp(T),avrg_sp(h2o),denoised_avrg_sp(h2o),pred_sp(h2o)
6.10E-04,8.40E-02,0.117551351,0.117550357,8.64E-02,0.128696811,0.163304381,0.163304015,0.127552704
1.22E-03,7.49E-02,0.126467592,0.126465605,7.70E-02,9.05E-02,0.200350295,0.200349563,8.97E-02
1.83E-03,7.54E-02,0.124370072,0.124367091,7.76E-02,8.54E-02,0.121274897,0.121273799,8.46E-02
2.44E-03,7.76E-02,0.136590839,0.136586865,7.99E-02,5.45E-02,0.100995665,0.100994202,5.40E-02
3.05E-03,8.73E-02,0.141422799,0.141417832,8.98E-02,7.57E-02,0.170033442,0.170031614,7.50E-02
3.66E-03,7.29E-02,0.143599074,0.143593115,7.50E-02,0.10001777,0.165468366,0.165466173,9.91E-02
当我读了CSV,
Cosp2 = pd.read_csv(DPath,index_col=0, header=[1,3])
print(Cosp2)
我结束了无名:对所有的头第一级标头都没有明确标注#_level_0标签。
RH=0.8 Unnamed: 2_level_0 Unnamed: 3_level_0 \
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o)
0.00061 0.0840 0.117551 0.117550
0.00122 0.0749 0.126468 0.126466
0.00183 0.0754 0.124370 0.124367
0.00244 0.0776 0.136591 0.136587
0.00305 0.0873 0.141423 0.141418
0.00366 0.0729 0.143599 0.143593
Unnamed: 4_level_0 RH=0.9 Unnamed: 6_level_0 \
nat_freq pred_sp(h2o) avrg_sp(T) avrg_sp(h2o)
0.00061 0.0864 0.128697 0.163304
0.00122 0.0770 0.090500 0.200350
0.00183 0.0776 0.085400 0.121275
0.00244 0.0799 0.054500 0.100996
0.00305 0.0898 0.075700 0.170033
0.00366 0.0750 0.100018 0.165468
Unnamed: 7_level_0 Unnamed: 8_level_0
nat_freq denoised_avrg_sp(h2o) pred_sp(h2o)
0.00061 0.163304 0.127553
0.00122 0.200350 0.089700
0.00183 0.121274 0.084600
0.00244 0.100994 0.054000
0.00305 0.170032 0.075000
0.00366 0.165466 0.099100
有没有办法让熊猫在整个未标记的列上传播0级标签?我想的东西,看起来像这样:
RH=0.8 \
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o)
0.00061 0.0840 0.117551 0.117550 0.0864
0.00122 0.0749 0.126468 0.126466 0.0770
0.00183 0.0754 0.124370 0.124367 0.0776
0.00244 0.0776 0.136591 0.136587 0.0799
0.00305 0.0873 0.141423 0.141418 0.0898
0.00366 0.0729 0.143599 0.143593 0.0750
RH=0.9
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o)
0.00061 0.128697 0.163304 0.163304 0.127553
0.00122 0.090500 0.200350 0.200350 0.089700
0.00183 0.085400 0.121275 0.121274 0.084600
0.00244 0.054500 0.100996 0.100994 0.054000
0.00305 0.075700 0.170033 0.170032 0.075000
0.00366 0.100018 0.165468 0.165466 0.099100
答
您可以使用get_level_values
与to_series
为Series
第一:
a = Cosp2.columns.get_level_values(0).to_series()
print (a)
RH=0.8 RH=0.8
Unnamed: 2_level_0 Unnamed: 2_level_0
Unnamed: 3_level_0 Unnamed: 3_level_0
Unnamed: 4_level_0 Unnamed: 4_level_0
RH=0.9 RH=0.9
Unnamed: 6_level_0 Unnamed: 6_level_0
Unnamed: 7_level_0 Unnamed: 7_level_0
Unnamed: 8_level_0 Unnamed: 8_level_0
dtype: object
如果startswith
Unnamed
然后使用mask
为NaN
s,不ffill
更换NaN
(fillna
与method='ffill'
)
b = a.mask(a.str.startswith('Unnamed')).ffill()
print (b)
RH=0.8 RH=0.8
Unnamed: 2_level_0 RH=0.8
Unnamed: 3_level_0 RH=0.8
Unnamed: 4_level_0 RH=0.8
RH=0.9 RH=0.9
Unnamed: 6_level_0 RH=0.9
Unnamed: 7_level_0 RH=0.9
Unnamed: 8_level_0 RH=0.9
dtype: object
末创建新MultiIndex
通过from_arrays
:
Cosp2.columns = pd.MultiIndex.from_arrays([b, Cosp2.columns.get_level_values(1)])
print (Cosp2)
RH=0.8 \
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o)
0.00061 0.0840 0.117551 0.117550 0.0864
0.00122 0.0749 0.126468 0.126466 0.0770
0.00183 0.0754 0.124370 0.124367 0.0776
0.00244 0.0776 0.136591 0.136587 0.0799
0.00305 0.0873 0.141423 0.141418 0.0898
0.00366 0.0729 0.143599 0.143593 0.0750
RH=0.9
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o)
0.00061 0.128697 0.163304 0.163304 0.127553
0.00122 0.090500 0.200350 0.200350 0.089700
0.00183 0.085400 0.121275 0.121274 0.084600
0.00244 0.054500 0.100996 0.100994 0.054000
0.00305 0.075700 0.170033 0.170032 0.075000
0.00366 0.100018 0.165468 0.165466 0.099100