不包含特殊字符的字符串中的字符数

问题描述：

我需要计算给定文件中的字符数。问题是，我没有正确地分割文件。如果我的输入文件的内容是“The！dog-ate ##### the，cat”，我不需要输出中的特殊字符。 o/p：t：4 h：2 e：3！：1 d：1 o：1 g：1 - ：1＃：5 ....此外，我需要删除“ - ”符号并确保该词不连接。不包含特殊字符的字符串中的字符数

from collections import Counter 
    import sys 
    filename = sys.argv[1] 
    reg = '[^a-zA-Z+]' 
    f = open(filename, 'r') 
    x = f.read().strip() 
    lines=[] 
    for line in x: 
     line = line.strip().upper() 
     if line: 
      lines.append(line) 
    print(Counter(lines))

有人可以帮助我吗？

如果你的问题得到回答，请[接受最有用的答案（https://*.com/help/有人-答案）。 –

答

只需删除值，你不想：

c = Counter(lines) 
del c['#'] 
del c['-'] 
del c[','] 
print(c)

答

使用re.sub和删除特殊字符。

import re 

with open(filename) as f: 
    content = re.sub('[^a-zA-Z]', '', f.read(), flags=re.M)  
counts = Counter(content)

演示：

In [1]: re.sub('[^a-zA-Z]', '', "The! dog-ate #####the,cat") 
Out[1]: 'Thedogatethecat' 

In [2]: Counter(_) 
Out[2]: 
Counter({'T': 1, 
     'a': 2, 
     'c': 1, 
     'd': 1, 
     'e': 3, 
     'g': 1, 
     'h': 2, 
     'o': 1, 
     't': 3})

注意，如果你要计算大写和小写计算在一起，你可以转换content为小写：

counts = Counter(content.lower())

答

foo.txt的

asdas 

[email protected]#[email protected] 


asdljh 


12j3l1k23j

来源：

https://docs.python.org/3/library/string.html#string.ascii_letters

import string 
from collections import Counter 

with open('foo.txt') as f: 
    text = f.read() 

filtered_text = [char for char in text if char in in string.ascii_letters] 
counted = Counter(filtered_text) 
print(counted.most_common())

输出

[('a', 3), ('j', 3), ('s', 3), ('d', 2), ('l', 2), ('h', 1), ('k', 1)]

将'ascii_letters'转换为预先设置将提高效率。在字符串上查找是线性时间。 –

不包含特殊字符的字符串中的字符数

相关推荐