对文本文件中的单词进行排序和计数

问题描述：

我是编程新手，一直困在当前的程序中。我必须从文件中读入故事，对文字进行排序，并计算每个单词的出现次数。它会统计单词，但不会对单词进行排序，删除标点符号或重复单词。我失去了为什么它不工作。任何意见将是有益的。对文本文件中的单词进行排序和计数

ifile = open("Story.txt",'r') 
fileout = open("WordsKAI.txt",'w') 
lines = ifile.readlines() 

wordlist = [] 
countlist = [] 

for line in lines: 
    wordlist.append(line) 
    line = line.split() 
    # line.lower() 

    for word in line: 
     word = word.strip(". , ! ? : ") 
     # word = list(word) 
     wordlist.sort() 
     sorted(wordlist) 
     countlist.append(word) 

     print(word,countlist.count(word))

您想如何对单词进行排序？按字母顺序还是按计数？ – inspectorG4dget

[Python - 计算文本文件中的单词]的可能重复（http://*.com/questions/25778341/python-counting-words-in-a-text-file） – TessellatingHeckler

答

您必须为排序方法提供一个关键函数。试试这个 r = sorted(wordlist, key=str.lower)

您不需要提供键。这取决于你想要的。 –

答

punctuation = ".,!?: " 
counts = {} 
with open("Story.txt",'r') as infile: 
    for line in infile: 
     for word in line.split(): 
      for p in punctuation: 
       word = word.strip(p) 
      if word not in counts: 
       counts[word] = 0 
      counts[word] += 1 

with open("WordsKAI.txt",'w') as outfile: 
    for word in sorted(counts): # if you want to sort by counts instead, use sorted(counts, key=counts.get) 
     outfile.write("{}: {}\n".format(word, counts[word]))

答

有在你的代码的主要问题是在生产线（9号线）：

wordlist.append(line)

要附加整条生产线进入wordlist，我怀疑那是你想。当你这样做时，在添加到wordlist之前，添加的单词不是.strip()。

，你所要做的就是添加了字就strip()编它只之后，并确保你只有做到这一点后，你检查，有没有其他的同样的话（不重复）：

ifile = open("Story.txt",'r') 
lines = ifile.readlines() 

wordlist = [] 
countlist = [] 

for line in lines: 
    # Get all the words in the current line 
    words = line.split() 
    for word in words: 
     # Perform whatever manipulation to the word here 
     # Remove any punctuation from the word 
     word = word.strip(".,!?:;'\"") 
     # Make the word lowercase 
     word = word.lower() 

     # Add the word into wordlist only if it is not in wordlist 
     if word not in wordlist: 
      wordlist.append(word) 

     # Add the word to countlist so that it can be counted later 
     countlist.append(word) 

# Sort the wordlist 
wordlist.sort() 

# Print the wordlist 
for word in wordlist: 
    print(word, countlist.count(word))

另一种方法是使用词典，将词存储为键，并将出现次数存储为值：

ifile = open("Story.txt", "r") 
lines = ifile.readlines() 

word_dict = {} 

for line in lines: 
    # Get all the words in the current line 
    words = line.split() 
    for word in words: 
     # Perform whatever manipulation to the word here 
     # Remove any punctuation from the word 
     word = word.strip(".,!?:;'\"") 
     # Make the word lowercase 
     word = word.lower() 

     # Add the word to word_dict 
     word_dict[word] = word_dict.get(word, 0) + 1 

# Create a wordlist to display the words sorted 
word_list = list(word_dict.keys()) 
word_list.sort() 

for word in word_list: 
    print(word, word_dict[word])

非常感谢。最后一个问题是什么时候我会将这些词转换为小写？ –

@ KennyI。在'.append（）'到'wordlist'之前，你只需要对这个单词进行任何操作。查看最近的编辑。 –

对文本文件中的单词进行排序和计数

相关推荐