在Python中比较两个坐标列表并使用坐标值分配值

问题描述：

我有两组数据取自两个单独的导入文件，这两个导入文件都被导入到python中，并且当前已放置在列表中，如下所示。在Python中比较两个坐标列表并使用坐标值分配值

列表1是在以下形式：

（附图标记中，x坐标，y坐标）

实施例列表1：[[1，0,0]，[2,0，10 ]，[3,0，20]，[4,0，30]，[5,0，40]]

列表2是形式为：

（x坐标，Y坐标，温度）

示例列表2：[[0,0,100]，[0,10,110]，[0,20,120]，[0,30,130]，[0,40,140]]

我需要使用x和y坐标来比较两个列表，如果他们发现匹配产生包含相应参考数字和温度的新列表。

例如从输出列表上方的两个列表将遵循以下形式：

（参考号，温度）

示例输出列表：[[1，100]，[2，110 ]，[3,120]，[4,130]，[5,140]]

这是要做大量的数据，我真的很努力地找到一个解决方案，任何帮助将真的很感激。欢呼声

定义“大量数据” - 您是否需要多台机器？或者只是寻找在单台机器上运行的相对高效的解决方案？换句话说，它是否是大数据问题？ – amit 2015-02-08 10:51:40

这可以通过嵌套for循环来完成。你还可以分享你迄今为止所尝试过的吗？ – 2015-02-08 10:55:07

答

这个作品0(n^2)但它很容易阅读和理解。

result = [] 
for reference, x, y in list1: 
    for a, b, temperature in list2: 
     if x == a and y == b: 
      result.append([temperature, reference])

您可以通过遍历列表中的复杂性降低到0(n)，存放在dict坐标如下：

dict1 = {} 
for reference, x, y in list1: 
    dict[(x, y)] = reference 

dict2 = {} 
for x, y, temperature in list2: 
    dict2[(x, y)] = temperature 

result = [] 
for coordinate, reference in dict1.iteritems(): 
    temperature = dict2.get(coordinate) 
    if temperature: 
     result.append([temperature, reference])

不是我，但你的旧回答非常低效，OP明确提到大尺寸，因此效率是一个问题;你编辑的似乎是一个现有的（我的）答案的实现。 – amit 2015-02-08 11:18:10

答

您可以使用map-reduce完成此任务。

伪代码：

map1(list): #runs on first file 
    for each (i,x,y) in list: 
    emit ((x,y),(1,i)) 
map2(list): #runs on 2nd file 
    for each (x,y,temp) in list: 
    emit ((x,y),(2,temp)) 
reduce((x,y),list): #runs on output of both mappers 
    for each (aux, val) in list: 
     if aux == 1: 
      i = val 
     else: 
      temp = val 
    if both i and temp initialized: 
     emit(i,temp)

的map-reduce是一个框架，让您轻松实现大数据的问题，如果将其建模成一系列的map-reduce任务，上面的伪代码解释了什么是可能的map-reduce的步骤可能是。

这种方法很容易处理海量数据（包括peta尺度），并让框架为您做肮脏的工作。

的想法是首先每个文件映射到某种哈希表（这是由框架内部完成的），你有两个哈希表：

键=（X ，y）的值= ID
密钥=（X，Y）值=温度图

一旦你有两个哈希表，它是很容易找到哪个ID被连接到温度图在单次通过，并一旦连接完成--outp呃。

此代码的复杂性为O(n)平均情况。

需要注意的是，如果你的坐标不是整数（但使用浮点） - 你将需要使用一些基于树的地图，而不是一个哈希表，比较键时一定要非常小心 - 由于浮点算术的本质。
处理整数时这不应该是个问题。

请解释一下map1（）和map2（）代表什么，我很感兴趣。另外，为什么“：”在reduce（）之后。 – user3699166 2015-02-08 11:07:48

@ user3699166 map1，map2，reduce是提供给map-reduce框架的所有函数。 'map1'解析第一个文件并创建一个哈希表[dictionary]'（（x，y） - > id）'，同样'map2'创建一个字典'（（x，y） - > temprature）。 'reduce'结合了两个散列表。请注意，这里没有显式的哈希表，因为它是由map-reduce框架实现的。 – amit 2015-02-08 11:21:08

冒着出现哑巴的风险，我不得不说，我仍然没有得到这个观点，对不起:(你在说哪个框架？你是指一个模块吗？或者你之前是否自己定义过这些功能？ – user3699166 2015-02-08 11:25:10

答

lst1 = [[1, 0, 0], [2, 0, 10], [3, 0, 20], [4, 0, 30], [5, 0, 40]] 
lst2 = [[0, 0, 100], [0, 10, 110], [0, 20, 120], [0, 30, 130], [0, 40, 140]] 
dict1 = {(x, y): ref for ref, x, y in lst1} 
dict2 = {(x, y): temp for x, y, temp in lst2} 
matchxy = set(dict1) & set(dict2) 
lstout = sorted([dict1[xy], dict2[xy]] for xy in matchxy) 
print(lstout)

这给出了

[[1, 100], [2, 110], [3, 120], [4, 130], [5, 140]]

所需的输出

我使用集合来查找公共点。

答

您可以构造sqlite数据库表并查询它们以获取所需结果。

import sqlite3, operator 

reference = [[1, 0, 0], [2, 0, 10], [3, 0, 20], [4, 0, 30], [5, 0, 40]] 
temperature = [[0, 0, 100], [0, 10, 110], [0, 20, 120], [0, 30, 130], [0, 40, 140]]

一对帮助者 - 我喜欢使用它们，因为它使后续的代码可读。

reference_coord = operator.itemgetter(1,2) 
ref = operator.itemgetter(0) 
temperature_coord = operator.itemgetter(0,1) 
temp = operator.itemgetter(2)

创建一个数据库（在内存中）

con = sqlite3.connect(":memory:")

两种方法可以解决这个，保存在单独的表中的所有信息，或者建立一个单一的表，只有数据你想

每个清单一张表

con.execute("create table reference(coordinate TEXT PRIMARY KEY, reference INTEGER)") 
con.execute("create table temperature(coordinate TEXT PRIMARY KEY, temperature INTEGER)") 

# fill the tables 
parameters = [(str(reference_coord(item)), ref(item)) for item in reference] 
con.executemany("INSERT INTO reference(coordinate, reference) VALUES (?, ?)", parameters) 
parameters = [(str(temperature_coord(item)), temp(item)) for item in temperature] 
con.executemany("INSERT INTO temperature(coordinate, temperature) VALUES (?, ?)", parameters)

查询的数据的两个表需要

cursor = con.execute('SELECT reference.reference, temperature.temperature FROM reference, temperature WHERE reference.coordinate = temperature.coordinate') 
print(cursor.fetchall())

表，结合了数据在两个名单

con.execute("create table data(coordinate TEXT PRIMARY KEY, reference INTEGER, temperature INTEGER)")

建设只用数据你所关心的表约

parameters = [(str(reference_coord(item)), ref(item)) for item in reference] 
con.executemany("INSERT INTO data(coordinate, reference) VALUES (?, ?)", parameters) 
parameters = [(temp(item), str(temperature_coord(item))) for item in temperature] 
con.executemany("UPDATE data SET temperature=? WHERE coordinate=?", parameters)

简单的查询，因为表只有你想要的

cursor2 = con.execute('SELECT reference, temperature FROM data') 
print(cursor2.fetchall()) 

con.close()

结果：

>>> 
[(1, 100), (2, 110), (3, 120), (4, 130), (5, 140)] 
[(1, 100), (2, 110), (3, 120), (4, 130), (5, 140)] 
>>>

一旦你的数据到数据库中是相当容易从中提取信息，如果一个文件DB来代替一个数据库的数据库可以持久记忆。

如果外部库可以接受，pandas具有类似的功能，是一个很棒的软件包。

在Python中比较两个坐标列表并使用坐标值分配值

相关推荐