Spark宽窄依赖之间的案例

1 正常spark 算子之间的转换关系

def main(args: Array[String]):Unit = {

    val conf =new  SparkConf()

    conf.setAppName("day03")

    conf.setMaster("local")

    val sc =new SparkContext(conf)

    sc.setLogLevel("error");

  val rdd1 = sc.parallelize(List[String](

    "love1","love2","love3",

    "love4","love5","love6",

    "love7","love8","love9",

    "love10","love11","love12"

  ), 3)

val rdd2 = rdd1.mapPartitionsWithIndex((index, iter)=>{

    val list =new ListBuffer[String]()

    while (iter.hasNext) {

    val one = iter.next()

    list.+=(s"rdd1 partition = 【$index】, value = 【$one】")

    }

    list.iterator

})

rdd2.foreach(println)

打印结果如下:

Spark宽窄依赖之间的案例

添加宽依赖之后 :

/**

* repartition可以怎加分区也可以减少分区

*/

     val rdd3 = rdd2.repartition(4)

     val rdd4 = rdd3.mapPartitionsWithIndex((index, iter)=>{

    val list =new ListBuffer[String]()

    while (iter.hasNext) {

    val one = iter.next()

        list.+=(s"rdd1 partition = [$index], value = [$one]")

    }

    list.iterator

})

rdd4.foreach(println)

打印结果如下:

Spark宽窄依赖之间的案例

可以看见一个partiontion 对应多个数值 一对多  形成宽依赖