对文本文件中的单词进行排序和计数
我是编程新手,一直困在当前的程序中。我必须从文件中读入故事,对文字进行排序,并计算每个单词的出现次数。它会统计单词,但不会对单词进行排序,删除标点符号或重复单词。我失去了为什么它不工作。任何意见将是有益的。对文本文件中的单词进行排序和计数
ifile = open("Story.txt",'r')
fileout = open("WordsKAI.txt",'w')
lines = ifile.readlines()
wordlist = []
countlist = []
for line in lines:
wordlist.append(line)
line = line.split()
# line.lower()
for word in line:
word = word.strip(". , ! ? : ")
# word = list(word)
wordlist.sort()
sorted(wordlist)
countlist.append(word)
print(word,countlist.count(word))
您必须为排序方法提供一个关键函数。 试试这个 r = sorted(wordlist, key=str.lower)
您不需要提供键。这取决于你想要的。 –
punctuation = ".,!?: "
counts = {}
with open("Story.txt",'r') as infile:
for line in infile:
for word in line.split():
for p in punctuation:
word = word.strip(p)
if word not in counts:
counts[word] = 0
counts[word] += 1
with open("WordsKAI.txt",'w') as outfile:
for word in sorted(counts): # if you want to sort by counts instead, use sorted(counts, key=counts.get)
outfile.write("{}: {}\n".format(word, counts[word]))
有在你的代码的主要问题是在生产线(9号线):
wordlist.append(line)
要附加整条生产线进入wordlist
,我怀疑那是你想。当你这样做时,在添加到wordlist
之前,添加的单词不是.strip()
。
,你所要做的就是添加了字就strip()
编它只之后,并确保你只有做到这一点后,你检查,有没有其他的同样的话(不重复):
ifile = open("Story.txt",'r')
lines = ifile.readlines()
wordlist = []
countlist = []
for line in lines:
# Get all the words in the current line
words = line.split()
for word in words:
# Perform whatever manipulation to the word here
# Remove any punctuation from the word
word = word.strip(".,!?:;'\"")
# Make the word lowercase
word = word.lower()
# Add the word into wordlist only if it is not in wordlist
if word not in wordlist:
wordlist.append(word)
# Add the word to countlist so that it can be counted later
countlist.append(word)
# Sort the wordlist
wordlist.sort()
# Print the wordlist
for word in wordlist:
print(word, countlist.count(word))
另一种方法是使用词典,将词存储为键,并将出现次数存储为值:
ifile = open("Story.txt", "r")
lines = ifile.readlines()
word_dict = {}
for line in lines:
# Get all the words in the current line
words = line.split()
for word in words:
# Perform whatever manipulation to the word here
# Remove any punctuation from the word
word = word.strip(".,!?:;'\"")
# Make the word lowercase
word = word.lower()
# Add the word to word_dict
word_dict[word] = word_dict.get(word, 0) + 1
# Create a wordlist to display the words sorted
word_list = list(word_dict.keys())
word_list.sort()
for word in word_list:
print(word, word_dict[word])
非常感谢。最后一个问题是什么时候我会将这些词转换为小写? –
@ KennyI。在'.append()'到'wordlist'之前,你只需要对这个单词进行任何操作。查看最近的编辑。 –
您想如何对单词进行排序?按字母顺序还是按计数? – inspectorG4dget
[Python - 计算文本文件中的单词]的可能重复(http://*.com/questions/25778341/python-counting-words-in-a-text-file) – TessellatingHeckler