打开CSV文件，对特定列进行排序并覆盖现有的CSV

问题描述：

我被困在这一段时间了。我试图打开一个csv，按严重性（Critical，High，Medium，Low）排序，然后覆盖现有文件。我也想忽略第一个写或添加标题行。打开CSV文件，对特定列进行排序并覆盖现有的CSV

原始CSV

IP Address Severity Score 
10.0.0.1 High  2302 
172.65.0.1 Low   310 
192.168.0.1 Critical 5402 
127.0.0.1 Medium  1672`

修改/排序CSV

IP Address Severity Score 
192.168.0.1 Critical 5402 
10.0.0.1 High  2302 
127.0.0.1 Medium  1672 
172.65.0.1 Low   310

代码

import csv 
crit_sev = "Critical" 
high_sev = "High" 
med_sev = "Medium" 
low_sev = "Low" 
reader = csv.reader(open('sample.csv', 'r')) 
row=0 
my_list = [] 
for row in reader: 
    if row[1] == crit_sev: 
     my_list.append(row) 
    elif row[1] == high_sev: 
     my_list.append(row) 
    elif row[1] == med_sev: 
     my_list.append(row) 
    elif row[1] == low_sev: 
     my_list.append(row) 

writer = csv.writer(open("sample.csv", 'w')) 
header = ['IP Address', 'Severity', 'Score'] 
writer.writerow([header]) 
for word in my_list: 
    writer.writerow([word])

任何帮助WO不胜感激。

“或添加标题行” - 这正是你有什么想说的？ – DyZ

为什么不在Excel中打开CSV或其他东西并在那里排序呢？ – TigerhawkT3

CSV ==逗号分隔值。我的文件中没有看到任何逗号，所以这可能是第一个问题。它可能是制表符分隔还是固定格式？修复似乎不太可能，因为当IP地址192.168.0.254出现时，您将没有足够的空间。总体思路是读取每条记录，根据严重性对其进行分类，并将其存储在数据结构中。然后完成后，按严重性顺序编写新的数据结构。 –

答

你可以使用Python的csv图书馆要做到这一点，如下所示：

import socket  
import csv 

severity = {"Critical" : 0, "High" : 1, "Medium" : 2, "Low" : 3}  

with open('sample.csv', 'rb') as f_input: 
    csv_input = csv.reader(f_input) 
    header = next(csv_input) 
    data = sorted(csv_input, key=lambda x: (severity[x[1]], socket.inet_aton(x[0]))) 

with open('sample.csv', 'wb') as f_output: 
    csv_output = csv.writer(f_output) 
    csv_output.writerow(header) 
    csv_output.writerows(data)

这将保留现有的头和排序基础上，severity列中的条目。接下来，它也（可选）按IP地址进行分类（对您可能有用或不可用），使用socket.inet_aton()将IP地址转换为可排序的数字。

例如：

IP Address,Severity,Score 
10.168.0.1,Critical,5402 
192.168.0.1,Critical,5402 
10.0.0.1,High,2302 
127.0.0.1,Medium,1672 
172.65.0.1,Low,310

非常感谢！ –

答

这里有一个pandas解决方案：

import pandas as pd 
# Read the CSV file 
data = pd.read_csv('sample.csv') 

# Configure the levels of severity 
levels = pd.Series({"Critical" : 0, "High" : 1, "Medium" : 2, "Low" : 3}) 
levels.name='Severity' 

# Add numeric severity data to the table 
augmented = data.join(levels,on='Severity',rsuffix='_') 

# Sort and select the original columns 
sorted_df = augmented.sort_values('Severity_')[['IP Address', 'Severity','Score']] 

# Overwrite the original file 
sorted_df.to_csv('sample.csv',index=False)

您是否需要定义各个级别，因为它们是单词而不是数字？ – TigerhawkT3

@ TigerhawkT3理论上，是的。但在这种情况下，严重性的顺序与字母顺序相匹配（'C' DyZ

'M' TigerhawkT3

打开CSV文件，对特定列进行排序并覆盖现有的CSV

相关推荐