如何将映射到每个要素的tf-idf值提取回数据框?
我有一个数据帧,它由少数包含字符串值的列组成。在这些列上计算TF-IDF可以返回我可以映射回数据框的数组列表,但现在这些值是数组(有点像多值),这使得进一步的计算变得非常困难。如何将映射到每个要素的tf-idf值提取回数据框?
我想得到那些映射到它们的功能列表(有点像扩展数据框),我可以直接放在我的原始数据框中。
我该如何做到这一点?
的样本数据:
print(d1['Keywords']) 1 APS17P, auditing standards, attestation standa... 2 APS17P, auditing standards, attestation standa... 3 AAMAAM17P, SAS No. 131, SAS No. 132, CPE, Audi... 4 AAMAAM17P, SAS No. 131, SAS No. 132, CPE, Audi... 5 APT13PHI, AICPA Professional Standards, Techni... 6 005184wz, 005184, 005186HI, 005187HI, 005188HI... 7 PAOCBOA, Special purpose framework, SPF, finan... 8 PAOCBOA, Special purpose framework, SPF, finan... 9 PAOCBOA, Special purpose framework, SPF, finan... 10 ATTNPO, Not-for-profit financial statements, N... 11 ATTNPO, Not-for-profit financial statements, N...
这就是你需要做什么:
from sklearn.feature_extraction.text import TfidfVectorizer
v = TfidfVectorizer()
# 1. Apply tfidf on your data
x = v.fit_transform(df['keywords'])
# 2. convert results of tfidf to a dataframe
df1 = pd.DataFrame(x.toarray(), columns=v.get_feature_names())
# 3. concatenate the tfidf dataframe to the original one
res = pd.concat([df, df1], axis=1)
有关详细解释执行的,在这里检查我的答案是:Append tfidf to pandas dataframe
@piepi有看看我的答案。我想这是你需要的 – MedAli