Spark Streaming之Window Operations操作和解析
Spark Streaming之Window Operations
官网:http://spark.apache.org/docs/latest/streaming-programming-guide.html
IDEA操作
package g5.learning
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
import scala.collection.mutable.ListBuffer
object WindowApp {
def main(args: Array[String]): Unit = {
//准备工作
val conf = new SparkConf().setMaster("local[2]").setAppName("WindowApp")
val ssc = new StreamingContext(conf, Seconds(10))
//业务逻辑
val lines = ssc.socketTextStream("hadoop001", 9999)
lines.flatMap(_.split(",")).map((_,1)).reduceByKeyAndWindow((a:Int,b:Int) => (a + b), Seconds(10), Seconds(10))
.print()
//streaming的启动
ssc.start() // Start the computation
ssc.awaitTermination() // Wait for the computation to terminate
}
}
注意:
1.数据可以交叉,也可以不交叉,主要是看你配置的参数
window length - The duration(持续) of the window (3 in the figure).
sliding interval - The interval at which the window operation is performed (2 in the figure).
2.这里涉及到3个时间参数,是有一定的关系的
These two parameters must be multiples of the batch(一批) interval of the source DStream (1 in the figure).
window length和sliding interval必须是(conf, Seconds(10))这个时间参数的整数倍