Spark Streaming之Window Operations操作和解析

Spark Streaming之Window Operations

官网：http://spark.apache.org/docs/latest/streaming-programming-guide.html

IDEA操作

package g5.learning

import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}

import scala.collection.mutable.ListBuffer

object WindowApp {

  def main(args: Array[String]): Unit = {

    //准备工作
    val conf = new SparkConf().setMaster("local[2]").setAppName("WindowApp")
    val ssc = new StreamingContext(conf, Seconds(10))


    //业务逻辑
    val lines = ssc.socketTextStream("hadoop001", 9999)
   lines.flatMap(_.split(",")).map((_,1)).reduceByKeyAndWindow((a:Int,b:Int) => (a + b), Seconds(10), Seconds(10))
       .print()
    //streaming的启动
    ssc.start() // Start the computation
    ssc.awaitTermination() // Wait for the computation to terminate

  }
}

注意：

1.数据可以交叉，也可以不交叉，主要是看你配置的参数
window length - The duration(持续) of the window (3 in the figure).
sliding interval - The interval at which the window operation is performed (2 in the figure).
2.这里涉及到3个时间参数，是有一定的关系的
These two parameters must be multiples of the batch(一批) interval of the source DStream (1 in the figure).
window length和sliding interval必须是(conf, Seconds(10))这个时间参数的整数倍

Spark Streaming之Window Operations操作和解析

Spark Streaming之Window Operations

IDEA操作

注意：

相关推荐