词云--《红楼梦》--jieba库--wordcloud库

《红楼梦》
1.人物出场统计
词云--《红楼梦》--jieba库--wordcloud库

import jieba
f=open('F:/2级python/test/T10/sucai/红楼梦.txt','r',encoding='utf-8')
txt=f.read()
f.close()
words=jieba.lcut(txt)
counts={}
for word in words:
    if len(word)==1:
        continue
    else:
        counts[word]=counts.get(word,0)+1
items=list(counts.items())
items.sort(key=lambda x:x[1],reverse=True)
for i in range(15):
    word,count=items[i]
    print('{0:<10}{1:>5}'.format(word,count))

运行结果:

宝玉 3748
什么 1613
一个 1451
贾母 1228
我们 1220
那里 1174
凤姐 1100
王夫人 1011
你们 1009
如今 999
说道 973
知道 967
老太太 966
起来 949
姑娘 941

从结果可以看出并不是都是人物名称,对此,需对代码进行加工:
2. 加工:
引入排除词库excludes
代码:

import jieba 
f=open('F:/2级python/test/T10/sucai/红楼梦.txt','r',encoding='utf-8')
txt=f.read()
f.close()
words=jieba.lcut(txt)
counts={}
for word in words:
    if len(word)==1:
        continue
    else:
        counts[word]=counts.get(word,0)+1
        
excludes = {"什么","一个","我们","那里","你们","如今", \
            "说道","知道","老太太","起来","姑娘","这里", \
            "出来","他们","众人","自己","一面","太太", \
            "只见","怎么","奶奶","两个","没有","不是", \
            "不知","这个","听见"}
for word in excludes:
    del(counts[word])
items=list(counts.items())
items.sort(key=lambda x:x[1],reverse=True)
for i in range(5):
    word,count=items[i]
    print('{0:<10}{1:>5}'.format(word,count))

运行结果:
宝玉 3748
贾母 1228
凤姐 1100
王夫人 1011
贾琏 670

总结:
可以看出:宝玉出现次数最多,贾母,凤姐,王夫人等出现次数也不少,频率也差不多 从排除词库可看出:作者喜欢用“我们”,“你们”,“姑娘”,“奶奶”等。因此,如果只通过人物名称来判断出场次数似乎不太好。本文将不在此完善该问题。
3.
词云--《红楼梦》--jieba库--wordcloud库

#3人物出场词云
import jieba
from wordcloud import WordCloud
#读文本文件
f=open('F:/2级python/test/T10/sucai/红楼梦.txt','r',encoding='utf-8') 
txt=f.read()
f.close()
words=jieba.lcut(txt)
newtxt=' '.join(words)
excludes = {"什么","一个","我们","那里","你们","如今", \
            "说道","知道","老太太","起来","姑娘","这里", \
            "出来","他们","众人","自己","一面","太太", \
            "只见","怎么","奶奶","两个","没有","不是", \
            "不知","这个","听见"}
wc=WordCloud(background_color='white',font_path='msyh.ttc',height=600,width=800,\
                    max_words=200,max_font_size=80,stopwords=excludes)
wordcloud=wc.generate(newtxt)
wordcloud.to_file('F:/2级python/test/T10/tmp/红楼梦基本词云.png')

运行结果:

词云--《红楼梦》--jieba库--wordcloud库
词云--《红楼梦》--jieba库--wordcloud库

import jieba
from wordcloud import WordCloud
#读文本文件
f=open('F:/2级python/test/T10/sucai/红楼梦.txt','r',encoding='utf-8') 
txt=f.read()
f.close()
words=jieba.lcut(txt)
newtxt=' '.join(words)
excludes = {"什么","一个","我们","那里","你们","如今", \
            "说道","知道","老太太","起来","姑娘","这里", \
            "出来","他们","众人","自己","一面","太太", \
            "只见","怎么","奶奶","两个","没有","不是", \
            "不知","这个","听见"}
wc=WordCloud(background_color='white',font_path='msyh.ttc',height=400,width=200,\
                    max_words=5,max_font_size=80,stopwords=excludes)
wordcloud=wc.generate(newtxt)
wordcloud.to_file('F:/2级python/test/T10/tmp/红楼梦基本词云.png')

词云--《红楼梦》--jieba库--wordcloud库
词云--《红楼梦》--jieba库--wordcloud库