你如何结合python中的三个列表使用字典？

问题描述：

我需要读取多个文件中的行;每行中的第一个值是运行时，第三个是作业ID，第四个是状态。我创建了列表来存储每个这些值。现在我不知道如何连接所有这些列表，并根据前20个运行时间最快的行对它们进行排序。有人对我如何做到这一点有什么建议吗？谢谢！你如何结合python中的三个列表使用字典？

for filePath in glob.glob(os.path.join(path1, '*.gz')): 
    with gzip.open(filePath, 'rt', newline="") as file: 
     reader = csv.reader(file) 
     for line in file: 
      for row in reader: 
       runTime = row[0] 
       ID = row[2] 
       eventType = row[3] 
       jobList.append(ID) 
       timeList.append(runTime) 
       eventList.append(eventType) 

    jobList = sorted(set(jobList)) 
    counter = len(jobList) 
    print ("There are %s unique jobs." % (counter)) 
    i = 1 
    while i < 21: 
     print("#%s\t%s\t%s\t%s" % (i, timeList[i], jobList[i], eventList[i])) 
     i = i + 1

只是一个样式注释 - 使用诸如'run_time'和'event_type'这样的名称而不是'runTime'和'eventType'更加pythonic。 – dmlicht

答

而不是使用三种不同的列表，你可以使用一个单独的列表，并附加元组到list..Like所以

combinedList.append((runTime, ID, eventType))

然后可以排序的元组combinedList如下所示：How to sort (list/tuple) of lists/tuples?

你可以让更多的改进，如在Python中使用namedtuples等找一找在SO或谷歌

注意：可能有其他“有效”的方法来做到这一点。例如，使用python heapq库，并创建一个大小为20的堆，按前20个运行时间进行排序。您可以了解他们对Python的网站或堆栈溢出，但你可能需要一些更多的算法背景

好的，我明白，但是如果我要为此创建一本字典，我该如何对它们进行排序，以便仅打印前20个最长的运行时间作业？我想我要问的是如何通过该值对字典进行排序 – Liz

基本上，您将以ID和值的形式存储关键字（runTime，eventType）。然后按照如下所示的方式按最长的运行时排序：http： //stackoverflow.com/questions/7349646/sorting-a-dictionary-of-tuples-in-python – labheshr

答

而是维持三个列表jobList，timeList，eventList，你可以存储(runTime, eventType)元组在字典中，使用ID为重点的，由更换

jobList = [] 
timeList = [] 
eventList = [] 
… 
jobList.append(ID) 
timeList.append(runTime) 
eventList.append(eventType)

通过

jobs = {} # an empty dictionary 
… 
jobs[ID] = (runTime, eventType)

要遍历该字典，通过增加排序210个值：如果您在数据结构中保持runTime，ID和eventType一起

for ID, (runTime, eventType) in sorted(jobs.items(), key=lambda item: item[1][0]): 
    # do something with it

答

使用内置在Python sorted会工作更好地为您。我建议使用namedtuple，因为它可以让你清楚你在做什么。你可以做到以下几点：

from collections import namedtuple 
Job = namedtuple("Job", "runtime id event_type")

那么你的代码可以改变为：

for filePath in glob.glob(os.path.join(path1, '*.gz')): 
    with gzip.open(filePath, 'rt', newline="") as file: 
     reader = csv.reader(file) 
     for line in file: 
      for row in reader: 
       runTime = row[0] 
       ID = row[2] 
       eventType = row[3] 
       job = Job(runTime, ID, eventType) 
       jobs.append(job) 

    jobs = sorted(jobs) 
    n_jobs = len(jobs) 
    print("There are %s unique jobs." % (n_jobs)) 
    for job in jobs[:20]: 
     print("#%s\t%s\t%s\t%s" % (i, job.runtime, job.id, job.event_type))

值得一提的，这个排序，会因为默认情况下正常工作，元组是由他们的第一个元素进行排序。如果有一条平局，你的排序算法会将比较移动到元组的下一个元素。

你如何结合python中的三个列表使用字典？

相关推荐