使用Numpy和Pandas替换缺失值和更新数据帧中的旧值
问题描述:
我试图用np.nan
值替换我的数据框中由'...'反映的缺失值。 我也想更新一些旧的值,但我的方法似乎不工作。使用Numpy和Pandas替换缺失值和更新数据帧中的旧值
这里是我的代码:
import numpy as np
import pandas as pd
def func():
energy=pd.ExcelFile('Energy Indicators.xls').parse('Energy')
energy=energy.iloc[16:][['Environmental Indicators: Energy','Unnamed: 3','Unnamed: 4','Unnamed: 5']].copy()
energy.columns=['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable']
o="..."
n=np.NaN
# Trying to replace missing values with np.nan values
energy[energy['Energy Supply']==o]=n
energy['Energy Supply']=energy['Energy Supply']*1000000
# Here, I want to replace old values by new ones ==> Same problem
old=["Republic of Korea","United States of America","United Kingdom of "
+"Great Britain and Northern Ireland","China, Hong "
+"Kong Special Administrative Region"]
new=["South Korea","United States","United Kingdom","Hong Kong"]
for i in range(0,4):
energy[energy['Country']==old[i],'Country']=new[i]
return energy
这里是.xls
文件我的工作:https://drive.google.com/file/d/0B80lepon1RrYeDRNQVFWYVVENHM/view?usp=sharing
答
我会用正则表达式做基于df.replace
:
energy = energy.replace(r'\s*\.+\s*', np.nan, regex=True)
MaxU提出了一个alternative,这将工作我如果你的单元格不包含除点之外的任何特殊/空白字符。
energy = energy.replace('...', np.nan, regex=False)
我觉得应该是'能量= energy.replace( '...',np.nan,正则表达式= FALSE)' – MaxU
@MaxU正则表达式默认为false,这意味着有什么事不对劲列值(可能导致空白),所以我决定去正则表达式。也会加入你的! –
'energy = energy.replace('...',np.nan)'效果很好 – sali333