将数据分组到非基于网格的垃圾箱

问题描述:

我已经使用voronoi binning:我有每个垃圾箱的中心坐标,我想找到每个垃圾箱中包含的所有像素的平均值。我无法理解如何切分numpy数组来引导它。将数据分组到非基于网格的垃圾箱

这是我现在的代码:X和Y是1D数组,每个bin的中心有x和y坐标; f在二维图像中:

import numpy as np 
from scipy.spatial import KDTree 

def rebin(f, X, Y): 
    s = f.shape 
    x_grid = np.arange(s[0]) 
    y_grid = np.arange(s[1]) 
    x_grid, y_grid = np.meshgrid(x_grid,y_grid) 
    x_grid, y_grid = x_grid.flatten(), y_grid.flatten() 

    tree = KDTree(zip(X,Y)) 
    _, b = tree.query(zip(x_grid, y_grid)) 
    out = X*np.nan 

    for i in range(max(b)): 
     out[i] = np.nanmean(f[x_grid[b==i], y_grid[b==i]]) 
    return out 

for-loop目前是一个巨大的瓶颈。可能有一个非常简单的答案 - 我目前看不到它!

out = X*np.nan 
for i in range(max(b)): 
    out[i] = np.nanmean(f[x_grid[b==i], y_grid[b==i]]) 

可以通过两个呼叫被替换以np.bincount

total = np.bincount(b, weights=f[x_grid, y_grid], minlength=len(X)) 
count = np.bincount(b, minlength=len(X)) 
out = total/count 

或一个呼叫到stats.binned_statistic

out, bin_edges, binnumber = stats.binned_statistic(
    x=b, values=f[x_grid, y_grid], statistic='mean', bins=np.arange(len(X)+1)) 

例如,

import numpy as np 
from scipy.spatial import KDTree 
import scipy.stats as stats 
np.random.seed(2017) 

def rebin(f, X, Y): 
    s = f.shape 
    x_grid = np.arange(s[0]) 
    y_grid = np.arange(s[1]) 
    x_grid, y_grid = np.meshgrid(x_grid,y_grid) 
    x_grid, y_grid = x_grid.flatten(), y_grid.flatten() 

    tree = KDTree(np.column_stack((X,Y))) 
    _, b = tree.query(np.column_stack((x_grid, y_grid))) 

    out, bin_edges, binnumber = stats.binned_statistic(
     x=b, values=f[x_grid, y_grid], statistic='mean', bins=np.arange(len(X)+1)) 
    # total = np.bincount(b, weights=f[x_grid, y_grid], minlength=len(X)) 
    # count = np.bincount(b, minlength=len(X)) 
    # out = total/count 
    return out 

def orig(f, X, Y): 
    s = f.shape 
    x_grid = np.arange(s[0]) 
    y_grid = np.arange(s[1]) 
    x_grid, y_grid = np.meshgrid(x_grid,y_grid) 
    x_grid, y_grid = x_grid.flatten(), y_grid.flatten() 

    tree = KDTree(np.column_stack((X,Y))) 
    _, b = tree.query(np.column_stack((x_grid, y_grid))) 

    out = X*np.nan 
    for i in range(len(X)): 
     out[i] = np.nanmean(f[x_grid[b==i], y_grid[b==i]]) 
    return out 

N = 100 
X, Y = np.random.random((2, N)) 
f = np.random.random((N, N)) 

expected = orig(f, X, Y) 
result = rebin(f, X, Y) 
print(np.allclose(expected, result, equal_nan=True)) 
# True 

我刚刚发现有一个名为cKDTree的KDTree的cython特征完整版本,速度更快。