熊猫将包含字符串的分隔列分隔为多列
问题描述:
我无法拆分包含分号的熊猫系列。是因为我使用列名('Social_Media')作为索引,还是因为python不会识别分号作为分隔符?或者我的脚本有问题?熊猫将包含字符串的分隔列分隔为多列
#Filters the NaN columns
df2 = df[df['Social_Media'].notnull()]
# Splitter for semicolon
df2['Social_Media'].apply(lambda x: x.split(';')[0])
#This is my output after the split
Timestamp
2017-06-01 18:10:46 Twitter;Facebook;Instagram;WhatsApp;Google+
2017-06-01 19:24:04 Twitter;Facebook;Instagram;WhatsApp;Google+
2017-06-01 19:25:21 Twitter;Facebook;Instagram;WhatsApp;Google+
我需要看到的输出。
Timestamp name_a name_b name_c name_d name_e
2017-06-01 18:10:46 Twitter Facebook Instagram WhatsApp Google+
2017-06-01 19:24:04 Twitter Facebook Instagram WhatsApp Google+
2017-06-01 19:25:21 Twitter Facebook Instagram WhatsApp Google+
答
您可以通过字母使用str.split
df = df['Social_Media'].str.split(';', expand=True).add_prefix('name_')
print (df)
name_0 name_1 name_2 name_3 name_4
Timestamp
2017-06-01 18:10:46 Twitter Facebook Instagram WhatsApp Google+
2017-06-01 19:24:04 Twitter Facebook Instagram WhatsApp Google+
2017-06-01 19:25:21 Twitter Facebook Instagram WhatsApp Google+
而对于列名:
import string
L = list(string.ascii_lowercase)
names = dict(zip(range(len(L)), ['name_' + x for x in L]))
df = df['Social_Media'].str.split(';', expand=True).rename(columns=names)
print (df)
name_a name_b name_c name_d name_e
Timestamp
2017-06-01 18:10:46 Twitter Facebook Instagram WhatsApp Google+
2017-06-01 19:24:04 Twitter Facebook Instagram WhatsApp Google+
2017-06-01 19:25:21 Twitter Facebook Instagram WhatsApp Google+
非常感谢,这真是棒极了。还要感谢关于如何处理字符串的附加建议 – Gwiji