Spark宽窄依赖之间的案例
1 正常spark 算子之间的转换关系
def main(args: Array[String]):Unit = {
val conf =new SparkConf()
conf.setAppName("day03")
conf.setMaster("local")
val sc =new SparkContext(conf)
sc.setLogLevel("error");
val rdd1 = sc.parallelize(List[String](
"love1","love2","love3",
"love4","love5","love6",
"love7","love8","love9",
"love10","love11","love12"
), 3)
val rdd2 = rdd1.mapPartitionsWithIndex((index, iter)=>{
val list =new ListBuffer[String]()
while (iter.hasNext) {
val one = iter.next()
list.+=(s"rdd1 partition = 【$index】, value = 【$one】")
}
list.iterator
})
rdd2.foreach(println)
打印结果如下:
添加宽依赖之后 :
/**
* repartition可以怎加分区也可以减少分区
*/
val rdd3 = rdd2.repartition(4)
val rdd4 = rdd3.mapPartitionsWithIndex((index, iter)=>{
val list =new ListBuffer[String]()
while (iter.hasNext) {
val one = iter.next()
list.+=(s"rdd1 partition = [$index], value = [$one]")
}
list.iterator
})
rdd4.foreach(println)
打印结果如下:
可以看见一个partiontion 对应多个数值 一对多 形成宽依赖