将列重新格式化为只有前5个字符

问题描述：

我是Python的新手，我正在为这部分工作而苦苦挣扎。在一个文本文件中有大约25个列和50,000多行。对于其中一列，＃11（ZIP），此列包含此格式“07598-XXXX”的客户的所有邮政编码值，我只想获得前5个，所以“ “，我需要为整个专栏做到这一点，但我基于我目前的逻辑如何编写它感到困惑。到目前为止，我的代码能够删除包含某些字符串的行，并且我还使用'|'分隔符很好地将其格式化为CSV。将列重新格式化为只有前5个字符

州| ZIP（＃11）|第12列| ...

NY | 60169-8547 | 98

NY | 60169-8973 | 58

NY | 11219-4598 | 25

NY | 11219-8475 | 12

NY | 20036-4879 | 56

如何遍历ZIP列并显示前5个字符？感谢您的帮助！

import csv 

my_file_name = "NVG.txt" 
cleaned_file = "cleanNVG.csv" 
remove_words = ['INAC-EIM','-INAC','TO-INAC','TO_INAC','SHIP_TO-inac','SHIP_TOINAC'] 


with open(my_file_name, 'r', newline='') as infile, open(cleaned_file, 'w',newline='') as outfile: 
    writer = csv.writer(outfile) 
    for line in csv.reader(infile, delimiter='|'): 
     if not any(remove_word in element for element in line for remove_word in remove_words): 
     writer.writerow(line)

答

分别处理标题行，然后像你一样逐行读取，只需通过截短为5个字符修改第二个line列。

import csv 

my_file_name = "NVG.txt" 
cleaned_file = "cleanNVG.csv" 
remove_words = ['INAC-EIM','-INAC','TO-INAC','TO_INAC','SHIP_TO-inac','SHIP_TOINAC'] 


with open(my_file_name, 'r', newline='') as infile, open(cleaned_file, 'w',newline='') as outfile: 
    writer = csv.writer(outfile) 
    cr = csv.reader(infile, delimiter='|') 
    # iterate over title line and write it as-is 
    writer.writerow(next(cr)) 
    for line in cr: 
     if not any(remove_word in element for element in line for remove_word in remove_words): 
      line[1] = line[1][:5] # truncate 
      writer.writerow(line)

交替，你可以使用line[1] = line[1].split("-")[0]这将保留一切对连字符的左侧。

请注意标题行的特殊处理：cr是一个迭代器。我只是在for循环之前手动使用它来执行传递处理。

这工作表现！谢谢！！问题，“writer.writerow（next（cr））”是如何工作的？我对这部分有点困惑。特别是里面的cr部分。 – Cesar

在我的新编辑中看到我的解释。 –

答

'{:.5}'.format(zip_)

其中zip_是包含邮政编码字符串。更多关于format这里：https://docs.python.org/2/library/string.html#format-string-syntax

请不要使用'zip'作为字符串的名称。这是一个内置函数。 – dawg

非常好的点 –

它也许更好（更高效和习惯）来切片得到一个子字符串：''11219-4598'[：5]' – dawg

答

获得第5个字符在字符串中使用str[:6]

你的情况：

with open(my_file_name, 'r', newline='') as infile, open(cleaned_file, 'w',newline='') as outfile: 
    writer = csv.writer(outfile) 
    for line in csv.reader(infile, delimiter='|'): 
     if not any(remove_word in element for element in line for remove_word in remove_words): 
      line[1] = line[1][:6] 
      writer.writerow(line)

line[1] = line[1][:6]会在你的文件中第2列设置为前5个字符本身。

呃不，它有6个字符... –

@jean你是对的 – Dule

将列重新格式化为只有前5个字符

相关推荐