Spark:rightOuterJoin算子
Spark:rightOuterJoin算子是Tranformation,具有shuffle
它的功能是将右边的表,不管在左边有没有出现,都会显示
默认分区器由HashPartitioner取二个rdd的最大分区为最终分区数
调取cogroup,形成二个CompactBuffer形式的对偶元组,在用flatMapValues压平取出value
进入if判断,如果value1为空,namevalue2就返回,
else--循环遍历value1,和value.2的迭代器, yield将value1,value2都有的值取出,value1没有就为None
object RightJoinDemo { def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("ReduceByKey").setMaster("local[*]") val sc = new SparkContext(conf) val rdd1 = sc.parallelize(List(("tom", 1), ("jerry", 2), ("kitty", 3)),2) val rdd2 = sc.parallelize(List(("jerry", 9), ("tom", 8), ("shuke", 7), ("tom", 2)),2) val rdd3: RDD[(String, (Iterable[Int], Iterable[Int]))] = rdd1.cogroup(rdd2) val rdd4: RDD[(String, (Option[Int], Int))] = rdd3.flatMapValues(it => if (it._1.isEmpty) { it._2.map(x => (None, x)) } else { for (q <- it._1.iterator; w <- it._1.iterator) yield (Some(q), w) } ) rdd4.saveAsTextFile("RightJoin-out2") sc.stop() } }