对文本文件中的单词进行排序和计数

问题描述:

我是编程新手,一直困在当前的程序中。我必须从文件中读入故事,对文字进行排序,并计算每个单词的出现次数。它会统计单词,但不会对单词进行排序,删除标点符号或重复单词。我失去了为什么它不工作。任何意见将是有益的。对文本文件中的单词进行排序和计数

ifile = open("Story.txt",'r') 
fileout = open("WordsKAI.txt",'w') 
lines = ifile.readlines() 

wordlist = [] 
countlist = [] 

for line in lines: 
    wordlist.append(line) 
    line = line.split() 
    # line.lower() 

    for word in line: 
     word = word.strip(". , ! ? : ") 
     # word = list(word) 
     wordlist.sort() 
     sorted(wordlist) 
     countlist.append(word) 

     print(word,countlist.count(word)) 
+0

您想如何对单词进行排序?按字母顺序还是按计数? – inspectorG4dget

+0

[Python - 计算文本文件中的单词]的可能重复(http://*.com/questions/25778341/python-counting-words-in-a-text-file) – TessellatingHeckler

您必须为排序方法提供一个关键函数。 试试这个 r = sorted(wordlist, key=str.lower)

+0

您不需要提供键。这取决于你想要的。 –

punctuation = ".,!?: " 
counts = {} 
with open("Story.txt",'r') as infile: 
    for line in infile: 
     for word in line.split(): 
      for p in punctuation: 
       word = word.strip(p) 
      if word not in counts: 
       counts[word] = 0 
      counts[word] += 1 

with open("WordsKAI.txt",'w') as outfile: 
    for word in sorted(counts): # if you want to sort by counts instead, use sorted(counts, key=counts.get) 
     outfile.write("{}: {}\n".format(word, counts[word])) 

有在你的代码的主要问题是在生产线(9号线):

wordlist.append(line) 

要附加整条生产线进入wordlist,我怀疑那是你想。当你这样做时,在添加到wordlist之前,添加的单词不是.strip()

,你所要做的就是添加了字就strip()编它只之后,并确保你只有做到这一点后,你检查,有没有其他的同样的话(不重复):

ifile = open("Story.txt",'r') 
lines = ifile.readlines() 

wordlist = [] 
countlist = [] 

for line in lines: 
    # Get all the words in the current line 
    words = line.split() 
    for word in words: 
     # Perform whatever manipulation to the word here 
     # Remove any punctuation from the word 
     word = word.strip(".,!?:;'\"") 
     # Make the word lowercase 
     word = word.lower() 

     # Add the word into wordlist only if it is not in wordlist 
     if word not in wordlist: 
      wordlist.append(word) 

     # Add the word to countlist so that it can be counted later 
     countlist.append(word) 

# Sort the wordlist 
wordlist.sort() 

# Print the wordlist 
for word in wordlist: 
    print(word, countlist.count(word)) 

另一种方法是使用词典,将词存储为键,并将出现次数存储为值:

ifile = open("Story.txt", "r") 
lines = ifile.readlines() 

word_dict = {} 

for line in lines: 
    # Get all the words in the current line 
    words = line.split() 
    for word in words: 
     # Perform whatever manipulation to the word here 
     # Remove any punctuation from the word 
     word = word.strip(".,!?:;'\"") 
     # Make the word lowercase 
     word = word.lower() 

     # Add the word to word_dict 
     word_dict[word] = word_dict.get(word, 0) + 1 

# Create a wordlist to display the words sorted 
word_list = list(word_dict.keys()) 
word_list.sort() 

for word in word_list: 
    print(word, word_dict[word]) 
+0

非常感谢。最后一个问题是什么时候我会将这些词转换为小写? –

+0

@ KennyI。在'.append()'到'wordlist'之前,你只需要对这个单词进行任何操作。查看最近的编辑。 –