无法检索框架中的数据
问题描述:
我试图从特定条件的数据框中检索特定数据,但它显示空的数据框。我是数据科学的新手,尝试学习数据科学。这是我的代码。无法检索框架中的数据
file = open('/home/jeet/files1/files/ch03/adult.data', 'r')
def chr_int(a):
if a.isdigit(): return int(a)
else: return 0
data = []
for line in file:
data1 = line.split(',')
if len(data1) == 15:
data.append([chr_int(data1[0]), data1[1],
chr_int(data1[2]), data1[3],
chr_int(data1[4]), data1[5],
data1[6], data1[7], data1[8],
data1[9], chr_int(data1[10]),
chr_int(data1[11]),
chr_int(data1[12]),
data1[13], data1[14]])
import pandas as pd
df = pd.DataFrame(data)
df.columns = ['age', 'type-employer', 'fnlwgt', 'education','education_num', 'marital','occupation', 'relationship','race','sex','capital_gain','capital_loss','hr_per_week','country','income']
ml = df[(df.sex == 'Male')] # here i retrive data who is male
ml1 = df[(df.sex == 'Male') & (df.income == '>50K\n')]
print(ml1.head()) # here i printing that data
fm =df[(df.sex == 'Female')]
fm1 = df [(df.sex == 'Female') & (df.income =='>50K\n')]
输出:
Empty DataFrame
Columns: [age, type-employer, fnlwgt, education, education_num, marital, occupation, relationship, race, sex, capital_gain, capital_loss, hr_per_week, country, income]
Index: []
有什么错的代码。为什么数据框是空的。
答
如果你仔细检查值,您可能会看到问题:
print(df.income.unique())
>>> [' <=50K\n' ' >50K\n']
有每个值前面的空格。所以值应该被处理,以摆脱这些空间,或代码应该像这样修改:
ml1 = df[(df.sex == 'Male') & (df.income == ' >50K\n')]
fm1 = df [(df.sex == 'Female') & (df.income ==' <=50K\n')]
你是否起诉,'收入'栏中的值是字符串,并包含'\ n'? –
是的,他们是字符串。 –
然后试试这个:print(df.income.unique())。打印的值是否有'\ n'? –