道格拉斯-普克算法(经纬度或坐标点抽稀)

起因

最近在做一个车联网项目,有一个场景是车辆定时上报当前所在经纬度等位置信息上报给平台,平台通过web页面在高德地图上展示车辆行驶路径。

由于车辆上报规则是每隔4s上报一次,一个小时也就是900个点,一天也就是21600个点,如果是10辆车就是216000个点,那如果是100辆车,甚至是10000辆车对于数据库存储来说会是一个灾难,对于渲染地图,过多的点,也减少页面的流畅度。

考虑到车辆直线行驶的时候,只需要记录第一个点和最后一个点即可,期间的点位都是没有必要的,如果拐弯的话,只需要记录一两个拐弯的点即可,这样就可以减少点数,进而减少存储或者绘图压力。

有没有一种合适的算法解决此问题呢,遂查阅资料,了解到一个算法----道格拉斯-普克算法

介绍

道格拉斯-普克算法 (Douglas–Peucker algorithm,亦称为拉默-道格拉斯-普克算法、迭代适应点算法、分裂与合并算法)是将曲线近似表示为一系列点,并减少点的数量的一种算法。它的优点是具有平移和旋转不变性,给定曲线与阈值后,抽样结果一定。—摘自百度百科

道格拉斯-普克算法(经纬度或坐标点抽稀)
如果有8个点,如上图(1),抽稀步骤如下:

  1. 在曲线首尾两点间虚连一条直线,求出其余各点到该直线的距离,如右图(1)。
  2. 选到点到直线距离的最大者与阈值相比较,若大于阈值,则记录该点,否则将直线两端点间各点全部舍去,如右图(2),记录第4个点,然后根据地4个点,将点分成两段1-4,4-8
  3. 然后分别对1-4,4-8重复第1、2步操作,迭代操作,即仍选距离最大者与阈值比较,依次取舍,直到无点可舍去,最后得到满足给定精度限差的曲线点坐标,如图(3)、(4)依次保留第6点、第7点,舍去其他点,即完成线的化简。

结合步骤,这里有两点数学知识,一是两点确定一条直线方程,二是求点到直线的距离。

点到直线的距离公式如下

道格拉斯-普克算法(经纬度或坐标点抽稀)

PHP代码

	/**
     * 点到直线的距离
     * @param $a float
     * @param $b float
     * @param $c float
     * @param $xy string 点坐标例如 "2,2"
     * @return number
     */
    public function getDistanceFromPointToLine($a, $b, $c, $xy)
    {
        $x = explode(",", $xy)[0];
        $y = explode(",", $xy)[1];
        return abs(($a * $x + $b * $y + $c) / sqrt($a * $a + $b * $b));
    }

    /**
     * 根据两个点求直线方程 ax+by+c=0
     * @param $xy1 string 点1,例如"1,1"
     * @param $xy2 string 点2,例如"2,2"
     * @return array
     */
    public function getLineByPoint($xy1, $xy2)
    {
        $x1 = explode(",", $xy1)[0];//第一个点的横坐标
        $y1 = explode(",", $xy1)[1];//第一个点的纵坐标
        $x2 = explode(",", $xy2)[0];//第二个点的横坐标
        $y2 = explode(",", $xy2)[1];//第二个点的横坐标

        $a = $y2 - $y1;
        $b = $x1 - $x2;
        $c = ($y1 - $y2) * $x1 - $y1 * ($x1 - $x2);

        return [$a, $b, $c];
    }

    /**
     * 稀疏点
     * @param $points array 参数为["1,2","2,3"]点集
     * @param $max float 阈值,越大稀疏效果越好但是细节越差
     * @return array
     */
    public function sparePoints($points, $max)
    {
        if (count($points) < 3) {
            return $points;
        }
        $xy1 = $points[0];//取第一个点
        $xy2 = end($points);//取最后一个点
        list($a, $b, $c) = $this->getLineByPoint($xy1, $xy2);//获取直线方程的a,b,c值

        $ret   = [];//最后稀疏以后的点集
        $dmax  = 0;//记录点到直线的最大距离
        $split = 0;//分割位置
        for ($i = 1; $i < count($points) - 1; $i++) {
            $d = $this->getDistanceFromPointToLine($a, $b, $c, $points[$i]);
            if ($d > $dmax) {
                $split = $i;
                $dmax  = $d;
            }
        }

        if ($tmp) {//如果存在点到首位点连成直线的距离大于max的,即需要再次划分
            $child1 = $this->sparePoints(array_slice($points, 0, $split + 1), $max);//按split分成左边一份,递归
            $child2 = $this->sparePoints(array_slice($points, $split), $max);//按split分成右边一份,递归
            //因为child1的最后一个点和child2的第一个点,肯定是同一个(即为分割点),合并的时候,需要注意一下
            $ret = array_merge($ret, $child1, array_slice($child2, 1));
            return $ret;
        } else {//如果不存在点到直线的距离大于阈值的,那么就直接是首尾点了
            return [$points[0], end($points)];
        }
    }

效果

随机找了一段路,根据高德地图的驾驶导航提示,选取了80个经纬度(因为这些点都是高德算出来的重要点位,经过了一定的稀疏),如下:

		$points = [
            "117.212448,39.133785",
            "117.212669,39.133667",
            "117.213165,39.133297",
            "117.213203,39.13327",
            "117.213554,39.133099",
            "117.213669,39.13295",
            "117.213921,39.132462",
            "117.214088,39.132126",
            "117.214142,39.131962",
            "117.214188,39.13176",
            "117.214233,39.131397",
            "117.21418,39.13055",
            "117.214279,39.130459",
            "117.214539,39.130375",
            "117.214874,39.130188",
            "117.216881,39.128716",
            "117.217598,39.127995",
            "117.217972,39.12759",
            "117.218338,39.127178",
            "117.218407,39.127071",
            "117.218567,39.126911",
            "117.219704,39.125702",
            "117.219795,39.12561",
            "117.220284,39.125114",
            "117.220619,39.124802",
            "117.221046,39.124348",
            "117.221138,39.124245",
            "117.221268,39.124092",
            "117.222321,39.122955",
            "117.222824,39.122406",
            "117.222916,39.122311",
            "117.223663,39.121544",
            "117.2239,39.121452",
            "117.224113,39.12159",
            "117.224251,39.121677",
            "117.225136,39.122208",
            "117.225281,39.122292",
            "117.225319,39.122311",
            "117.226273,39.122875",
            "117.226685,39.123127",
            "117.227371,39.12352",
            "117.227806,39.123779",
            "117.228477,39.124134",
            "117.228531,39.124161",
            "117.228531,39.124161",
            "117.228668,39.124187",
            "117.228897,39.124325",
            "117.229767,39.12479",
            "117.230927,39.12545",
            "117.231186,39.12561",
            "117.231659,39.125908",
            "117.231834,39.126026",
            "117.232018,39.126186",
            "117.232185,39.126362",
            "117.232353,39.126583",
            "117.232658,39.126972",
            "117.232658,39.126972",
            "117.233124,39.12748",
            "117.233253,39.127609",
            "117.233368,39.127689",
            "117.233513,39.127762",
            "117.233665,39.127823",
            "117.233734,39.127846",
            "117.233833,39.127865",
            "117.233994,39.127888",
            "117.234138,39.127892",
            "117.234329,39.127884",
            "117.234612,39.127838",
            "117.234955,39.127754",
            "117.235252,39.12767",
            "117.236282,39.12738",
            "117.237137,39.127129",
            "117.237671,39.126961",
            "117.237953,39.126949",
            "117.238213,39.126865",
            "117.238472,39.126793",
            "117.2397,39.126434",
            "117.242233,39.125698",
            "117.243538,39.12532",
            "117.243645,39.125298",
        ];

        $data = sparePoints($points, "0.00001");

运行结果

[
        "117.212448,39.133785", 
        "117.212669,39.133667", 
        "117.213203,39.13327", 
        "117.213554,39.133099", 
        "117.213669,39.13295", 
        "117.214088,39.132126", 
        "117.214188,39.13176", 
        "117.214233,39.131397", 
        "117.21418,39.13055", 
        "117.214279,39.130459", 
        "117.214539,39.130375", 
        "117.214874,39.130188", 
        "117.216881,39.128716", 
        "117.217598,39.127995", 
        "117.218338,39.127178", 
        "117.218407,39.127071", 
        "117.219704,39.125702", 
        "117.220284,39.125114", 
        "117.220619,39.124802", 
        "117.222824,39.122406", 
        "117.223663,39.121544", 
        "117.2239,39.121452", 
        "117.225136,39.122208", 
        "117.227806,39.123779", 
        "117.228531,39.124161", 
        "117.228668,39.124187", 
        "117.228897,39.124325", 
        "117.229767,39.12479", 
        "117.230927,39.12545", 
        "117.231659,39.125908", 
        "117.231834,39.126026", 
        "117.232018,39.126186", 
        "117.232185,39.126362", 
        "117.232658,39.126972", 
        "117.233253,39.127609", 
        "117.233368,39.127689", 
        "117.233513,39.127762", 
        "117.233734,39.127846", 
        "117.233994,39.127888", 
        "117.234329,39.127884", 
        "117.234612,39.127838", 
        "117.234955,39.127754", 
        "117.236282,39.12738", 
        "117.237671,39.126961", 
        "117.237953,39.126949", 
        "117.243645,39.125298"
    ]

前后图对比,图一为稀疏前,图二为稀疏后的

道格拉斯-普克算法(经纬度或坐标点抽稀)

道格拉斯-普克算法(经纬度或坐标点抽稀)

由于测试给的阈值为0.0001,可以看到地图点位由80个直接减少到46个点,并且保留了较多地图拐点细节。

因为通过高德拿到的路径经纬度本身就是稀疏过的,这算是进行了二次稀疏,能有这个效果,已经很不错了,在实际中,因为点数会更多,调整合适的阈值,稀疏效果会非常理想。