Spark:rightOuterJoin算子

Spark:rightOuterJoin算子是Tranformation,具有shuffle

它的功能是将右边的表,不管在左边有没有出现,都会显示

Spark:rightOuterJoin算子

默认分区器由HashPartitioner取二个rdd的最大分区为最终分区数

Spark:rightOuterJoin算子

调取cogroup,形成二个CompactBuffer形式的对偶元组,在用flatMapValues压平取出value

进入if判断,如果value1为空,namevalue2就返回,

else--循环遍历value1,和value.2的迭代器, yield将value1,value2都有的值取出,value1没有就为None

object RightJoinDemo {
  def main(args: Array[String]): Unit = {

    val conf = new SparkConf().setAppName("ReduceByKey").setMaster("local[*]")
    val sc = new SparkContext(conf)
    val rdd1 = sc.parallelize(List(("tom", 1), ("jerry", 2), ("kitty", 3)),2)
    val rdd2 = sc.parallelize(List(("jerry", 9), ("tom", 8), ("shuke", 7), ("tom", 2)),2)

    val rdd3: RDD[(String, (Iterable[Int], Iterable[Int]))] = rdd1.cogroup(rdd2)

    val rdd4: RDD[(String, (Option[Int], Int))] = rdd3.flatMapValues(it =>
      if (it._1.isEmpty) {
        it._2.map(x => (None, x))

      } else {
        for (q <- it._1.iterator; w <- it._1.iterator) yield (Some(q), w)
      }
    )
    rdd4.saveAsTextFile("RightJoin-out2")

    sc.stop()
  }
}